summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc/lpfc_nportdisc.c
AgeCommit message (Collapse)Author
2024-03-10scsi: lpfc: Define types in a union for generic void *context3 ptrJustin Tee
In LPFC_MBOXQ_t, the void *context3 ptr is used for various paths. It is treated as a generic pointer, and is type casted during its usage. The issue with this is that it can sometimes get confusing when reading code as to what the context3 ptr is being used for and mistakenly be reused in a different context. Rename context3 to ctx_u, and declare it as a union of defined ptr types. From now on, the ctx_u ptr may be used only if users define the use case type. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-11-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-10scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptrJustin Tee
In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void *ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf *ptr. Due to the void* declaration, there have been abuses of ctx_buf for things not related to lpfc_dmabuf. So, set the ptr type for *ctx_buf as lpfc_dmabuf. Remove all type casts on ctx_buf because it is no longer a void *ptr. Convert the abuse of ctx_buf for something not related to lpfc_dmabuf to use the void *context3 ptr. A particular abuse of the ctx_buf warranted a new void *ext_buf ptr. However, the usage of this new void *ext_buf is not generic. It is intended to only hold virtual addresses for extended mailbox commands. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-10scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptrJustin Tee
In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a generic void *ptr. It is named ctx_ndlp and it should only be used as an lpfc_nodelist *ptr. Due to the void* declaration, there have been abuses of ctx_ndlp for things not related to ndlp. So, set the ptr type for *ctx_ndlp as lpfc_nodelist. Remove all type casts on ctx_ndlp because it is no longer a void *ptr. Convert the abuse of ctx_ndlp for things not related to ndlps to use the void *context3 ptr. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05scsi: lpfc: Copyright updates for 14.4.0.0 patchesJustin Tee
Update copyrights to 2024 for files modified in the 14.4.0.0 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05scsi: lpfc: Change lpfc_vport load_flag member into a bitmaskJustin Tee
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change load_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which incorrectly tests vport->fc_flag rather than vport->load_flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05scsi: lpfc: Change lpfc_vport fc_flag member into a bitmaskJustin Tee
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change fc_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05scsi: lpfc: Remove D_ID swap log message from trace event loggerJustin Tee
D_ID swaps are common during cable swaps in a SAN. Thus, there's no reason to log the event at a KERN_ERR level with the trace event logger. Change the log level to KERN_INFO and the normal LOG_ELS flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-5-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()Justin Tee
The call to lpfc_sli4_resume_rpi() in lpfc_rcv_padisc() may return an unsuccessful status. In such cases, the elsiocb is not issued, the completion is not called, and thus the elsiocb resource is leaked. Check return value after calling lpfc_sli4_resume_rpi() and conditionally release the elsiocb resource. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-3-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13scsi: lpfc: Reject received PRLIs with only initiator fcn role for NPIV portsJustin Tee
Currently, NPIV ports send PRLI_ACC to all received unsolicited PRLI requests. For an NPIV port, there is no point to PRLI_ACC if the received PRLI request has the initiator function bit set and the target function bit unset. Modify the lpfc_rcv_prli_support_check() routine to send a PRLI_RJT in such cases. NPIV ports are expected to send PRLI_ACC only if the Target function bit is set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231009161812.97232-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23scsi: lpfc: Copyright updates for 14.2.0.14 patchesJustin Tee
Update copyrights to 2023 for files modified in the 14.2.0.14 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-13-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23scsi: lpfc: Make fabric zone discovery more robust when handling unsolicited ↵Justin Tee
LOGO This patch provides better target rport recovery when a target rport is running in initiator mode to discover the fabric. Such a target will issue a LOGO before switching back to strict target mode and changes are made to recover the login. Log messages are also updated accordingly. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23scsi: lpfc: Set Establish Image Pair service parameter only for Target FunctionsJustin Tee
Previously, Establish Image Pair was set in all PRLI_ACC responses regardless if the received PRLI was from an initiator or target function. Specific target vendors that can operate in both initiator and target mode, may view the PRLI_ACC with Establish Image Pair set as an invalid service parameter when operating in initiator only mode. This causes discovery issues later when the target switches on its target mode function. Revise logic that determines an rport's role as an initiator or target and set the Establish Image Pair service parameter bit only if the Target Function bit is set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topologyJames Smart
After issuing a LIP, a specific target vendor does not ACC the FLOGI that lpfc sends. However, it does send its own FLOGI that lpfc ACCs. The target then establishes the port IDs by sending a PLOGI. lpfc PLOGI_ACCs and starts the RPI registration for DID 0x000001. The target then sends a LOGO to the fabric DID. lpfc is currently treating the LOGO from the fabric DID as a link down and cleans up all the ndlps. The ndlp for DID 0x000001 is put back into NPR and discovery stops, leaving the port in stuck in bypassed mode. Change lpfc behavior such that if a LOGO is received for the fabric DID in PT2PT topology skip the lpfc_linkdown_port() routine and just move the fabric DID back to NPR. Link: https://lore.kernel.org/r/20220603174329.63777-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-10scsi: lpfc: Fill in missing ndlp kref puts in error pathsJames Smart
Code review, following every lpfc_nlp_get() call vs calls during error handling, discovered cases of missing put calls. Correct by adding ndlp kref puts in the respective error paths. Also added comments to several of the error paths to record relationships to reference counts. Link: https://lore.kernel.org/r/20220506035519.50908-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Refactor cleanup of mailbox commandsJames Smart
The intention of this patch is to refactor mailbox memory allocation and cleanup steps in one routine respectively to prevent memory leaks or memory errors related to mailbox commands. There are trivial localized fixes as well. Provide lpfc_mbox_rsrc_prep() - this routine allocates the dmabuf and the mbuf associated with it. It also catches allocation errors and returns status. Provide lpfc_mbox_rsrc_cleanup() - this routine verifies a dmabuf exists and if so releases the associated mbuf and the dmabuf memory. It then sets the ctx_buf to NULL and releases the mailbox memory to the mailbox pool. Link: https://lore.kernel.org/r/20220412222008.126521-22-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Fix field overload in lpfc_iocbq data structureJames Smart
The lpfc_iocbq data structure has void * pointers that are overloaded to be as many as 8 different data types and the driver translates the void * by casting. This patch removes the void * pointers by declaring the specific types needed by the driver. It also expands the context_un to include more seldom used pointer types to save structure bytes. It also groups the u8 types together to pack the 8 bytes needed. This work allows the lpfc_iocbq data structure to be more strongly typed and keeps it from being allocated from the 512 byte slab. [mkp: rolled in zeroday fix] Link: https://lore.kernel.org/r/20220412222008.126521-21-jsmart2021@gmail.com Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-18scsi: lpfc: Protect memory leak for NPIV ports sending PLOGI_RJTJames Smart
There is a potential memory leak in lpfc_ignore_els_cmpl() and lpfc_els_rsp_reject() that was allocated from NPIV PLOGI_RJT (lpfc_rcv_plogi()'s login_mbox). Check if cmdiocb->context_un.mbox was allocated in lpfc_ignore_els_cmpl(), and then free it back to phba->mbox_mem_pool along with mbox->ctx_buf for service parameters. For lpfc_els_rsp_reject() failure, free both the ctx_buf for service parameters and the login_mbox. Link: https://lore.kernel.org/r/20220412222008.126521-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-24Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (qla2xxx, pm8001, libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates and bug fixes. The high blast radius core update is the removal of write same, which affects block and several non-SCSI devices. The other big change, which is more local, is the removal of the SCSI pointer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits) scsi: scsi_ioctl: Drop needless assignment in sg_io() scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn() scsi: lpfc: Copyright updates for 14.2.0.0 patches scsi: lpfc: Update lpfc version to 14.2.0.0 scsi: lpfc: SLI path split: Refactor BSG paths scsi: lpfc: SLI path split: Refactor Abort paths scsi: lpfc: SLI path split: Refactor SCSI paths scsi: lpfc: SLI path split: Refactor CT paths scsi: lpfc: SLI path split: Refactor misc ELS paths scsi: lpfc: SLI path split: Refactor VMID paths scsi: lpfc: SLI path split: Refactor FDISC paths scsi: lpfc: SLI path split: Refactor LS_RJT paths scsi: lpfc: SLI path split: Refactor LS_ACC paths scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 scsi: lpfc: SLI path split: Refactor lpfc_iocbq scsi: lpfc: Use kcalloc() ...
2022-03-15scsi: lpfc: Copyright updates for 14.2.0.0 patchesJames Smart
Update copyrights to 2022 for files modified in the 14.2.0.0 patch set. Link: https://lore.kernel.org/r/20220225022308.16486-18-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-15scsi: lpfc: SLI path split: Refactor misc ELS pathsJames Smart
This patch refactors the remaining ELS paths to use SLI-4 as the primary interface. Paths include RRQ, RSCN, unsolicited ELS RQST and RSP paths, ELS timeouts, etc.: - Remove unused routines lpfc_sli4_bpl2sgl and lpfc_sli4_iocb2wqe - Conversion away from using SLI-3 iocb structures to set/access fields in common routines. Use the new generic get/set routines that were added. This move changes code from indirect structure references to using local variables with the generic routines. - Refactor routines when setting non-generic fields, to have both SLI3 and SLI4 specific sections. This replaces the set-as-SLI3 then translate to SLI4 behavior of the past. Link: https://lore.kernel.org/r/20220225022308.16486-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-15scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO pathsJames Smart
This patch refactors the PLOGI/PRLI/ADISC/LOGO paths to use SLI-4 as the primary interface: - Conversion away from using SLI-3 iocb structures to set/access fields in common routines. Use the new generic get/set routines that were added. This move changes code from indirect structure references to using local variables with the generic routines. - Refactor routines when setting non-generic fields, to have both SLI3 and SLI4 specific sections. This replaces the set-as-SLI3 then translate to SLI4 behavior of the past. Link: https://lore.kernel.org/r/20220225022308.16486-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-15scsi: lpfc: SLI path split: Refactor lpfc_iocbqJames Smart
Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure. This is a "common" structure but many of the components refer to sli-rev specific entities which can lead the developer astray as to what they actually mean, should be set to, or when they should be used. This first patch prepares the lpfc_iocbq structure so that elements common to both SLI3 and SLI4 data paths are more appropriately named, making it clear they apply generically. Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually generic to the paths are renamed to 'cmd': - iocb_flag is renamed to cmd_flag - lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag - fabric_iocb_cmpl is renamed to fabric_cmd_cmpl - wait_iocb_cmpl is renamed to wait_cmd_cmpl - iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl - rsvd2 member is renamed to num_bdes due to pre-existing usage The structure name itself will retain the iocb reference as changing to a more relevant "job" or "cmd" title induces many hundreds of line changes for only a name change. lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use in the SLI3 path only. Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-14scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loopJames Smart
When connected point to point, the driver does not know the FC4's supported by the other end. In Fabrics, it can query the nameserver. Thus the driver must send PRLIs for the FC4s it supports and enable support based on the acc(ept) or rej(ect) of the respective FC4 PRLI. Currently the driver supports SCSI and NVMe PRLIs. Unfortunately, although the behavior is per standard, many devices have come to expect only SCSI PRLIs. In this particular example, the NVMe PRLI is properly RJT'd but the target decided that it must LOGO after seeing the unexpected NVMe PRLI. The LOGO causes the sequence to restart and login is now in an infinite failure loop. Fix the problem by having the driver, on a pt2pt link, remember NVMe PRLI accept or reject status across logout as long as the link stays "up". When retrying login, if the prior NVMe PRLI was rejected, it will not be sent on the next login. Link: https://lore.kernel.org/r/20220212163120.15385-1-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-06scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIVJames Smart
During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries were still busy. This situation was only seen doing rmmod after at least 1 vport (NPIV) instance was created and destroyed. The number of messages scaled with the number of vports created. When a vport is created, it can receive a PLOGI from another initiator Nport. When this happens, the driver prepares to ack the PLOGI and prepares an RPI for registration (via mbx cmd) which includes an mbuf allocation. During the unsolicited PLOGI processing and after the RPI preparation, the driver recognizes it is one of the vport instances and decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI, the mailbox struct allocated for RPI registration is freed, but the mbuf that was also allocated is not released. Fix by freeing the mbuf with the mailbox struct in the LS_RJT path. As part of the code review to figure the issue out a couple of other areas where found that also would not have released the mbuf. Those are cleaned up as well. Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18scsi: lpfc: Skip issuing ADISC when node is in NPR stateJames Smart
When a node moves to NPR state due to a device recovery event, the nlp_fc4_types in the node are cleared. An ADISC received for a node in the NPR state triggers an ADISC. Without fc4 types being known, the calls to register with the transport are no-op'd, thus no additional references are placed on the node by transport re-registrations. A subsequent RSCN could trigger another unregister request, which will decrement the reference counts, leading to the ref count hitting zero and the node being freed while futher discovery on the node is being attempted by the RSCN event handling. Fix by skipping the trigger of an ADISC when in NPR state. The normal ADISC process will kick off in the regular discovery path after receiving a response from name server. Link: https://lore.kernel.org/r/20210707184351.67872-19-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completesJames Smart
On an RSCN event, the nodes specified in RSCN payload and in MAPPED state are moved to NPR state in order to revalidate the login. This triggers an immediate unregister from SCSI/NVMe backend. The assumption is that the node may be missing. The re-registration with the backend happens after either relogin (PLOGI/PRLI; if ADISC is disabled or login truly lost) or when ADISC completes successfully (rediscover with ADISC enabled). However, the NVMe-FC standard provides for an RSCN to be triggered when the remote port supports a discovery controller and there was a change of discovery log content. As the remote port typically also supports storage subsystems, this unregister causes all storage controller connections to fail and require reconnect. Correct by reworking the code to ensure that the unregistration only occurs when a login state is truly terminated, thereby leaving the NVMe storage controllers in place. The changes made are: - Retain node state in ADISC_ISSUE when scheduling ADISC ELS retry. - Do not clear wwpn/wwnn values upon ADISC failure. - Move MAPPED nodes to NPR during RSCN processing, but do not unregister with transport. On GIDFT completion, identify missing nodes (not marked NLP_NPR_2B_DISC) and unregister them. - Perform unregistration for nodes that will go through ADISC processing if ADISC completion fails. - Successful ADISC completion will move node back to MAPPED state. Link: https://lore.kernel.org/r/20210707184351.67872-16-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21scsi: lpfc: Fix node handling for Fabric Controller and Domain ControllerJames Smart
During link bounce testing, RPI counts were seen to differ from the number of nodes. For fabric and domain controllers, a temporary RPI is assigned, but the code isn't registering it. If the nodes do go away, such as on link down, the temporary RPI isn't being released. Change the way these two fabric services are managed, make them behave like any other remote port. Register the RPI and register with the transport. Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo. This allows them to follow normal dev_loss_tmo handling, RPI refcounting, and normal removal rules. It also allows fabric I/Os to use the RPI for traffic requests. Note: There is some logic that still has a couple of exceptions when the Domain controller (0xfffcXX). There are cases where the fabric won't have a valid login but will send RDP. Other times, it will it send a LOGO then an RDP. It makes for ad-hoc behavior to manage the node. Exceptions are documented in the code. Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21scsi: lpfc: Add ndlp kref accounting for resume RPI pathJames Smart
The driver is crashing due to a bad pointer during driver load due in an adisc acc receive routine. The driver is missing node get/put in the mbx_resume_rpi paths. Fix by adding the proper gets and puts into the resume_rpi path. Link: https://lore.kernel.org/r/20210514195559.119853-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21scsi: lpfc: Fix unreleased RPIs when NPIV ports are createdJames Smart
While testing NPIV and watching logins and used RPI levels, it was seen the used RPI count was much higher than the number of remote ports discovered. Code inspection showed that remote port removals on any NPIV instance are releasing the RPI, but not performing an UNREG_RPI with the adapter thus the reference counting never fully drops and the RPI is never fully released. This was happening on NPIV nodes due to a log of fabric ELS's to fabric addresses. This lack of UNREG_RPI was introduced by a prior node rework patch that performed the UNREG_RPI as part of node cleanup. To resolve the issue, do the following: - Restore the RPI release code, but move the location to so that it is in line with the new node cleanup design. - NPIV ports now release the RPI and drop the node when the caller sets the NLP_RELEASE_RPI flag. - Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a release of RPI to free pool. - Ensure there's an UNREG_RPI at LOGO completion so that RPI release is completed. - Stop offline_prep from skipping nodes that are UNUSED. The RPI may not have been released. - Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4. - Fixed up debugfs RPI displays for better debugging. Fixes: a70e63eee1c1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting") Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13scsi: lpfc: Fix various trivial errors in comments and log messagesJames Smart
Clean up minor issues spotted by tools and code review: - Spelling Errors - Spurious characters and errors in function headers - nvme_info wqerr and err fields source data reversed - Extraneous new line in log message 0466 - Spacing error in log message 0109 - Messages 0140 and 0141 have portname and nodename reversed - Incorrect function labelling in comment Link: https://lore.kernel.org/r/20210412013127.2387-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO responseJames Smart
Fix a crash caused by a double put on the node when the driver completed an ACC for an unsolicted abort on the same node. The second put was executed by lpfc_nlp_not_used() and is wrong because the completion routine executes the nlp_put when the iocbq was released. Additionally, the driver is issuing a LOGO then immediately calls lpfc_nlp_set_state to put the node into NPR. This call does nothing. Remove the lpfc_nlp_not_used call and additional set_state in the completion routine. Remove the lpfc_nlp_set_state post issue_logo. Isn't necessary. Link: https://lore.kernel.org/r/20210412013127.2387-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13scsi: lpfc: Fix rmmod crash due to bad ring pointers to abort_iotagJames Smart
Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in lpfc_els_free_iocb(). A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the changes was to convert from building/sending an abort within the routine to using a common routine. The reworked routine passes, without modification, the pring ptr to the new common routine. The older routine had logic to check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine is missing this check and adapt, so the SLI-3 ring pointers are being used in SLI-4 paths. Fix by cleaning up the calling routines. In review, there is no need to pass the ring ptr argument to abort_iocb at all. The routine can look at the adapter type itself and reference the proper ring. Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com Fixes: db7531d2b377 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changesJames Smart
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets, update the copyright for 2021. Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: lpfc: Fix pt2pt state transition causing rmmod hangJames Smart
On a setup with a dual port HBA and both ports direct connected, an rmmod hangs momentarily when we log an Illegal State Transition. Once it resumes, a nodelist not empty logic is hit, which forces rmmod to cleanup and exit. We're missing a state transition case in the discovery engine. Fix by adding a case for a DEVICE_RM event while in the unmapped state to avoid illegal state transition log message. Link: https://lore.kernel.org/r/20210301171821.3427-17-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: lpfc: Fix PLOGI ACC to be transmit after REG_LOGINJames Smart
The driver is seeing a scenario where PLOGI response was issued and traffic is arriving while the adapter is still setting up the login context. This is resulting in errors handling the traffic. Change the driver so that PLOGI response is sent after the login context has been setup to avoid the situation. Link: https://lore.kernel.org/r/20210301171821.3427-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: lpfc: Fix dropped FLOGI during pt2pt discovery recoveryJames Smart
When connected in pt2pt mode, there is a scenario where the remote port significantly delays sending a response to our FLOGI, but acts on the FLOGI it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and kicks off recovery logic. End result is a lot of unnecessary state changes and lots of discovery messages being logged. Fix by terminating the FLOGI and noop'ing its completion if we have already accepted the remote ports FLOGI and are now processing PLOGI. Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: lpfc: Fix pt2pt connection does not recover after LOGOJames Smart
On a pt2pt setup, between 2 initiators, if one side issues a a LOGO, there is no relogin attempt. The FC specs are grey in this area on which port (higher wwn or not) is to re-login. As there is no spec guidance, unconditionally re-PLOGI after the logout to ensure a login is re-established. Link: https://lore.kernel.org/r/20210301171821.3427-8-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Implement health checking when aborting I/OJames Smart
Several errors have occurred where the adapter stops or fails but does not raise the register values for the driver to detect failure. Thus driver is unaware of the failure. The failure typically results in I/O timeouts, the I/O timeout handler failing (after several seconds), and the error handler escalating recovery policy and resulting in more errors. Eventually, the driver is in a position where things have spiraled and it can't do recovery because other recovery ops are still outstanding and it becomes unusable. Resolve the situation by having the I/O timeout handler (actually a els, SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O, perform a mailbox command and look for a response from the hardware. If the mailbox command fails, it will mark the adapter offline and then invoke the adapter reset handler to clean up. The new I/O timeout test will be limited to a test every 5s. If there are multiple I/O timeouts concurrently, only the 1st I/O timeout will generate the mailbox command. Further testing will only occur once a timeout occurs after a 5s delay from the last mailbox command has expired. Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix crash when a fabric node is released prematurelyJames Smart
The driver's management of the fabric controller (aka pseudo-scsi initiator) node in SLI3 mode is causing this crash. The crash occurs because of a node reference imbalance that frees the fabric controller node while devloss is outstanding from the SCSI transport. This is triggered by an odd behavior where the switch reacts to a rejected RDP request with a PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and after successfully registering the RPI, incorrectly registers the fabric controller node because it has the NLP_FC4_FCP flag still set from the fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup on Link Up and ends up executing too many puts. Fix by detecting the fabric node type and clearing out the nodes internal flags that triggered a SCSI transport registration and subsequence dev_loss event. The driver cannot count on any persistence from fabric controller nodes. Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue stateJames Smart
Testing with target ports coming and going, the driver eventually reached a state where it no longer discovered the target. When the driver has issued a PRLI and receives a PRLI from the target, it is not properly updating the node's initiator/target role flags. Thus, when a subsequent RSCN is received for a target loss, the driver mis-identifies the target as an initiator and does not initiate LUN scanning. Fix by always refreshing the ndlp with the latest PRLI state information whenever a PRLI is processed. Also clear the ndlp flags when processing a PLOGI so that there is no carry over through a re-login. Link: https://lore.kernel.org/r/20210104180240.46824-4-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Update changed file copyrights for 2020James Smart
Update Copyright in files changed by the 12.8.0.6 patch set to 2020 Link: https://lore.kernel.org/r/20201115192646.12977-18-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlersJames Smart
This patch reworks the abort interfaces such that SLI-3 retains the iocb-based formatting and completions and SLI-4 now uses native WQEs and completion routines. The following changes are made: - The code is refactored from a confusing 2 routine sequence of xx_abort_iotag_issue(), which creates/formats and abort cmd, and xx_issue_abort_tag(), which then issues and handles the completion of the abort cmd - into a single interface of xx_issue_abort_iotag(). The new interface will determine whether SLI-3 or SLI-4 and then call the appropriate handler. A completion handler can now be specified to address the differences in completion handling. Note: original code is all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path, and NVMe natively using wqes. - The SLI-3 side is refactored: The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined with the logic of lpfc_sli_abort_iotag_issue() as well as the iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to create the new single SLI-3 abort routine that formats and issues the iocb. - The SLI-4 side is refactored and added to: The native WQE abort code in NVMe is moved to the new SLI-4 issue_abort_iotag() routine. Items in SCSI that set fields not set by NVMe is migrated into the new routine. Thus the routine supports NVMe and SCSI initiators. The nvmet block (target) formats the abort slightly different (like the old NVMe initiator) thus it has its own prep routine stolen from NVMe initiator and it retains the current code it has for issuing the WQE (does not use the commonized routine the initiators do). SLI-4 completion handlers were also added. - lpfc_abort_handler now becomes a wrapper that determines whether SLI-3 or SLI-4 and calls the proper abort handler. Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework remote port lock handlingJames Smart
Currently the discovery layers within the driver use the SCSI midlayer host_lock to access node-specific structures. This can contend with the I/O path and is too coarse of a lock. Rework the driver so that it uses a lock specific to the remote port node structure when accessing the structure contents. A few of the changes brought out spots were some slightly reorganized routines worked better. Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework locations of ndlp reference takingJames Smart
Now that the driver has gone to a normal ref interface (with no odd logic) the discovery logic needs to be updated to reworked so that it properly takes references when it should and give them up when it should. Rework the driver for the following get/put model: - Move gets to just before an I/O is issued. Add gets for places where an I/O was issued without one. - Ensure that failures from lpfc_nlp_get() are handled by the driver. - Check and fix the placement of lpfc_nlp_puts relative to io completions. Note: some of these paths may not release the reference on the exact io completion as the reference is held as the code takes another step in the discovery thread and which may cause another io to be issued. - Rearrange some code for error processing and calling lpfc_nlp_put. - Fix some places of incorrect reference freeing that was causing the premature releasing of the structure. - Nvmet plogi handling performs unreg_rpi's. The reference counts were unbalanced resulting in premature node removal. In some cases this caused loss of node discovery. Corrected the reftaking around nvmet plogis. Nodes that experience devloss now get released from the node list now that there is a proper reference taking. Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework remote port ref counting and node freeingJames Smart
When a remote port is disconnected and disappears, its node structure (ndlp) stays allocated and on a vport node list. While on the list it can be matched, thus requires validation checks on state to be added in numerous code paths. If the node comes back, its possible for there to be multiple node structures for the same device on the vport node list. There is no reason to keep the node structure around after it is no longer in existence, and the current implementation creates problems for itself (multiple nodes) and lots of unnecessary code for state validation. Additionally, the reference taking on the node structure didn't follow the normal model used by the kernel kref api. It included lots of odd logic to match state with reference count. The combination of this odd logic plus the way it was implicitly used in the discovery engine made its reference taking implementation suspect and extremely hard to follow. Change the driver such that the reference taking routines are now normal ref increments/decrements and callout on refcount=0. With this in place, the rework can be done such that the node structure is fully removed and deallocated when the remote port no longer exists and all references are removed. This removal logic, and the basic ref counting are intrically tied, thus in a single patch. Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-04scsi: lpfc: Fix LUN loss after cable pullDick Kennedy
On devices that support FCP sequence error recovery, which attempts to preserve the devices login across link bounce, adisc is used for device validation. Turns out the device fc4 type is cleared as part of the link bounce, but the ADISC handling doesn't restore the FC4 support as it normally would with a PRLI. This caused situations where the device wasn't reregistered with the transport thus scan logic and LUN discovery never kicked in. In the ADISC completion handling, reset the fc4 type so that transport port reregistration occurs with the remote port. Link: https://lore.kernel.org/r/20200803210229.23063-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-24scsi: lpfc: Add description for lpfc_release_rpi()'s 'ndlpl paramLee Jones
Fixes the following W=1 kernel build warning(s): drivers/scsi/lpfc/lpfc_nportdisc.c:1079: warning: Function parameter or member 'ndlp' not described in 'lpfc_release_rpi' Link: https://lore.kernel.org/r/20200723122446.1329773-20-lee.jones@linaro.org Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-02scsi: lpfc: Add an internal trace log bufferDick Kennedy
The current logging methods typically end up requesting a reproduction with a different logging level set to figure out what happened. This was mainly by design to not clutter the kernel log messages with things that were typically not interesting and the messages themselves could cause other issues. When looking to make a better system, it was seen that in many cases when more data was wanted was when another message, usually at KERN_ERR level, was logged. And in most cases, what the additional logging that was then enabled was typically. Most of these areas fell into the discovery machine. Based on this summary, the following design has been put in place: The driver will maintain an internal log (256 elements of 256 bytes). The "additional logging" messages that are usually enabled in a reproduction will be changed to now log all the time to the internal log. A new logging level is defined - LOG_TRACE_EVENT. When this level is set (it is not by default) and a message marked as KERN_ERR is logged, all the messages in the internal log will be dumped to the kernel log before the KERN_ERR message is logged. There is a timestamp on each message added to the internal log. However, this timestamp is not converted to wall time when logged. The value of the timestamp is solely to give a crude time reference for the messages. Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-02scsi: lpfc: Fix NVMe rport deregister and registration during ADISCDick Kennedy
During driver unload/reload testing, the NVMe initiator would not re-establish connectivity to NVMe controllers on reload. The failing NVMe array supports concurrent FCP and NVMe operation via different nport_id's. The array was repeatedly sending an ADISC every 2 seconds after PLOGI completed and while NVMe subsystems were executing discovery. The target would continue this state for roughly 45 seconds. The driver's current behavior on ADISC receipt is to validate a the ADISC vs the device and issue a RESUME_RPI to restore transmission. The receipt of the ADISC effectively caused a driver to take actions similar to a logout and login for the remote port, causing the deregistration of the nvme rport and a subsequent re-registration. This caused a constant reset and re-connect of the NVMe controller while this 45s window occurred. There was no need for the state changes as ADISC does not change login state. This patch corrects this behavior by validating if the remoteport is already logged in (MAPPED) and when true, avoids the call to set the ndlp state to MAPPED, which triggers the unreg/re-reg. Thus ADISC does not change the login state of the node. Link: https://lore.kernel.org/r/20200630215001.70793-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>