summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc/lpfc_nvmet.c
AgeCommit message (Collapse)Author
2018-02-12scsi: lpfc: Fix RQ empty firmware trapJames Smart
When nvme target deferred receive logic waits for exchange resources, the corresponding receive buffer is not replenished with the hardware. This can result in a lack of asynchronous receive buffer resources in the hardware, resulting in a "2885 Port Status Event: ... error 1=0x52004a01 ..." message. Correct by replenishing the buffer whenenver the deferred logic kicks in. Update corresponding debug messages and statistics as well. [mkp: applied by hand] Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-12scsi: lpfc: Add WQ Full Logic for NVME TargetJames Smart
I/O conditions on the nvme target may have the driver submitting to a full hardware wq. The hardware wq is a shared resource among all nvme controllers. When the driver hit a full wq, it failed the io posting back to the nvme-fc transport, which then escalated it into errors. Correct by maintaining a sideband queue within the driver that is added to when the WQ full condition is hit, and drained from as soon as new WQ space opens up. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-12scsi: lpfc: correct debug counters for abortJames Smart
Existing code was using the wrong field for the completion status when comparing whether to increment abort statistics Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-20scsi: lpfc: Beef up stat counters for debugJames Smart
If log verbose in not turned on, its hard to tell when certain error paths get hit. Add stats counters and corresponding logic to debugfs/sysfs to aid understanding what paths were traversed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-20scsi: lpfc: Fix -EOVERFLOW behavior for NVMET and defer_rcvJames Smart
The driver is all set to handle the defer_rcv api for the nvmet_fc transport, yet didn't properly recognize the return status when the defer_rcv occurred. The driver treated it simply as an error and aborted the io. Several residual issues occurred at that point. Finish the defer_rcv support: recognize the return status when the io request is being handled in a deferred style. This stops the rogue aborts; Replenish the async cmd rcv buffer in the deferred receive if needed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-04scsi: lpfc: small sg cnt cleanupJames Smart
The logic for sg_seg_cnt is a bit convoluted. This patch tries to clean up a couple of areas, especially around the +2 and +1 logic. This patch: - Cleans up the lpfc_sg_seg_cnt attribute to specify a real minimum rather than making the minimum be whatever the default is. - Removes the hardcoding of +2 (for the number of elements we use in a sgl for cmd iu and rsp iu) and +1 (an additional entry to compensate for nvme's reduction of io size based on a possible partial page) logic in sg list initialization. In the case where the +1 logic is referenced in host and target io checks, use the values set in the transport template as that value was properly set. There can certainly be more done in this area and it will be addressed in combined host/target driver effort. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-04scsi: lpfc: Fix crash during driver unload with running nvme trafficJames Smart
When the driver is unloading, the nvme transport could be in the process of submitting new requests, will send abort requests to terminate associations, or may make LS-related requests. The driver's abort and request entry points currently is ignorant of the unloading state and is starting the requests even though the infrastructure to complete them continues to teardown. Change the entry points for new requests to check whether unloading and if so, reject the requests. Abort routines check unloading, and if so, noop the request. An abort is noop'd as the teardown paths are already aborting/terminating the io outstanding at the time the teardown initiated. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Fix oops of nvme host during driver unload.Dick Kennedy
When running NVME io as a NVME host, if the driver is unloaded there would be oops in lpfc_sli4_issue_wqe. When unloading, controllers are torn down and the transport initiates set_property commands to reset the controller and issues aborts to terminate existing io. The drivers nvme abort and fcp io submit routines needed to recognize the driver is unloading and fail the new requests. It didn't, resulting in the oops. Revise the ls and fcp io submit routines to detect the unloading state and properly handle their cleanup. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Fix oops if nvmet_fc_register_targetport failsDick Kennedy
if nvmet targetport registration fails, the driver encounters a NULL pointer oops in lpfc_hb_timeout_handler. To fix: if registration fails, ensure nvmet_support is cleared on the port structure. Also enhanced the log message on failure. Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Set missing abort contextJames Smart
Always set ctxp->state to LPFC_NVMET_STE_ABORT if ABORT op gets called Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Reduce log spew on controller reconnectsJames Smart
There are several log messages that report abnormal terminations that by default are marked warn. These are typically the result of failures due to invalid controller state or abort completions. They are all natural when a controller resets. Unfortunately, as they are logged by default, it makes the admin very concerned. Convert the messages to Info. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Move CQ processing to a soft IRQDick Kennedy
Under heavy target nvme load duration, the lpfc irq handler is encountering cpu lockup warnings. Convert the driver to a shortened ISR handler which identifies the interrupting condition then schedules a workq thread to process the completion queue the interrupt was for. This moves all the real work into the workq element. As nvmet_fc upcalls are no longer in ISR context, don't set the feature flags Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Make ktime sampling more accurateDick Kennedy
Need to make ktime samples more accurate If ktime is turned on in the middle of an IO, the max calculation could be misleading. Base sampling on the start time of the IO as opposed to ktime_on. Make ISR ktime timestamps be from when CQE is read instead of EQE. Added additional sanity checks when deciding whether to accept an IO sample or not. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-02scsi: lpfc: Fix warning messages when NVME_TARGET_FC not definedDick Kennedy
Warning messages when NVME_TARGET_FC not defined on ppc builds The lpfc_nvmet_replenish_context() function is only meaningful when NVME target mode enabled. Surround the function body with ifdefs for target mode enablement. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-09-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates. The major driver change here is the elimination of the block based cciss driver in favour of the SCSI based hpsa driver (which now drives all the legacy cases cciss used to be required for). Plus a reset handler clean up and the redo of the SAS SMP handler to use bsg lib" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: scsi-mq: Always unprepare before requeuing a request scsi: Show .retries and .jiffies_at_alloc in debugfs scsi: Improve requeuing behavior scsi: Call scsi_initialize_rq() for filesystem requests scsi: qla2xxx: Reset the logo flag, after target re-login. scsi: qla2xxx: Fix slow mem alloc behind lock scsi: qla2xxx: Clear fc4f_nvme flag scsi: qla2xxx: add missing includes for qla_isr scsi: qla2xxx: Fix an integer overflow in sysfs code scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2() scsi: aacraid: get rid of one level of indentation scsi: aacraid: fix indentation errors scsi: storvsc: fix memory leak on ring buffer busy scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough scsi: smartpqi: remove the smp_handler stub scsi: hpsa: remove the smp_handler stub scsi: bsg-lib: pass the release callback through bsg_setup_queue scsi: Rework handling of scsi_device.vpd_pg8[03] scsi: Rework the code for caching Vital Product Data (VPD) scsi: rcu: Introduce rcu_swap_protected() ...
2017-08-25scsi: lpfc: avoid an unused function warningArnd Bergmann
The only reference to lpfc_nvmet_replenish_context() is inside of an disabled: drivers/scsi/lpfc/lpfc_nvmet.c:1457:1: error: 'lpfc_nvmet_replenish_context' defined but not used [-Werror=unused-function] This replaces the preprocessor conditional with a C condition, so the compiler can see that the function is intentionally unused. Fixes: 9a38e4f1c82f ("scsi: lpfc: Fix MRQ > 1 context list handling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24scsi: lpfc: Fix relative offset error on large nvmet target iosDick Kennedy
If the nvmet_fc transport breaks an io into multiple sequences, the driver will improperly set the relative offset on the 2nd through N sequences. Correct by properly formatting the hw cmd so the relative offset is picked up from the hw cmd. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24scsi: lpfc: Fix MRQ > 1 context list handlingDick Kennedy
Various oops including cpu LOCKUPs were seen. For asynchronously received ius where the driver must assign exchange resources, the resources were on a single get (free) list and put list (finished, waiting to be put on get list). As all cpus are sharing the lists, an interrupt for a receive frame may have to wait for all the other cpus to place their done work onto the put list before it can acquire the lock to pull from the list. Fix by breaking the resource lists into per-cpu lists or at least more than 1 list with cpu's sharing the lists). A cpu would allocate from the free list for its own cpu, and put its done work on the its own put list - avoiding the contention. As cpu load may vary, when empty, a cpu may grab from another cpu, thereby changing resource distribution. But searching for a resource only occurs on 1 or a few cpus until a single resource can be allocated. if the condition reoccurs, it starts looking at a different cpu. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10lpfc: support nvmet_fc defer_rcv callbackJames Smart
Currently, calls to nvmet_fc_rcv_fcp_req() always copied the FC-NVME cmd iu to a temporary buffer before returning, allowing the driver to immediately repost the buffer to the hardware. To address timing conditions on queue element structures vs async command reception, the nvmet_fc transport occasionally may need to hold on to the command iu buffer for a short period. In these cases, the nvmet_fc_rcv_fcp_req() will return a special return code (-EOVERFLOW). In these cases, the LLDD must delay until the new defer_rcv lldd callback is called before recycling the buffer back to the hw. This patch adds support for the new nvmet_fc transport defer_rcv callback and recognition of the new error code when passing commands to the transport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-07scsi: lpfc: Replace PCI pool old APIRomain Perier
The PCI pool API is deprecated. This commit replaces the PCI pool old API by the appropriate function with the DMA pool API. It also updates some comments, accordingly. Signed-off-by: Romain Perier <romain.perier@collabora.com> Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-01scsi: lpfc: don't double count abort errorsDan Carpenter
If lpfc_nvmet_unsol_fcp_issue_abort() fails then we accidentally increment "tgtp->xmt_abort_rsp_error" and then two lines later we increment it a second time. Fixes: 547077a44b3b ("scsi: lpfc: Adding additional stats counters for nvme.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-01scsi: lpfc: spin_lock_irq() is not nestableDan Carpenter
We're calling spin_lock_irq() multiple times, the problem is that on the first spin_unlock_irq() then we will re-enable IRQs and we don't want that. Fixes: 966bb5b71196 ("scsi: lpfc: Break up IO ctx list into a separate get and put list") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-19scsi: lpfc: Fix crash in lpfc_sli_ringtxcmpl_put when nvmet gets an abort ↵James Smart
request. When running nvme detach-ns /dev/nvme0n1 -n 1 command, the nvmet lpfc driver crashes with this stack dump: kernel BUG at /root/NVME/lpfc_8.4/lpfc_sli.c:1393! invalid opcode: 0000 [#1] SMP Workqueue: nvmet-fc-cpu0 nvmet_fc_do_work_on_cpu [nvmet_fc] lpfc_sli4_issue_wqe+0x357/0x440 [lpfc] lpfc_nvmet_xmt_fcp_abort+0x36b/0x5c0 [lpfc] nvmet_fc_abort_op+0x30/0x50 [nvmet_fc] nvmet_fc_do_work_on_cpu+0xd9/0x130 [nvmet_fc] process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xc7/0xe0 ret_from_fork+0x3f/0x70 Crash is due to an uninitialized iocbq->vport pointer. Explicitly set the iocbq->vport field to phba->pport in lpfc_nvmet_sol_fcp_issue_abort as it does all abort iocbq initialization in the routine. Using phba->pport is ok because target does not support NPIV instances. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-19scsi: lpfc: Break up IO ctx list into a separate get and put listJames Smart
Since unsol rcv ISR and command cmpl ISR both access/lock this list, separate get/put lists will reduce contention. Replaced struct list_head lpfc_nvmet_ctx_list; with struct list_head lpfc_nvmet_ctx_get_list; struct list_head lpfc_nvmet_ctx_put_list; and all correpsonding locks and counters. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-19scsi: lpfc: Reduce time spent in IRQ for received NVME commandsJames Smart
Removed unnecessary bzero of context area. Due to size of sg list, added a substantial delay and played havoc on cpu caches. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: lpfc: Fix defects reported by Coverity ScanJames Smart
Addressed the following reported defects: ** CID 1411552: Control flow issues (MISSING_BREAK) /drivers/scsi/lpfc/lpfc_sli.c: 13259 in lpfc_sli4_nvmet_handle_rcqe() ** CID 1411553: Memory - illegal accesses (OVERRUN) /drivers/scsi/lpfc/lpfc_sli.c: 16218 in lpfc_fc_frame_check() ** CID 1411553: Memory - illegal accesses (OVERRUN) Overrunning array "lpfc_rctl_names" of 202 8-byte elements at element index 244 (byte offset 1952) using index "fc_hdr->fh_r_ctl" (which evaluates to 244). ** CID 1411554: Null pointer dereferences (REVERSE_INULL) /drivers/scsi/lpfc/lpfc_nvmet.c: 2131 in lpfc_nvmet_unsol_fcp_abort_cmp() ** CID 1411555: Memory - illegal accesses (UNINIT) /drivers/scsi/lpfc/lpfc_nvmet.c: 180 in lpfc_nvmet_ctxbuf_post() Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: lpfc: Add changes to assist in NVMET debuggingJames Smart
Inconsistent error messages and context state checks Context state sanity checks were not accurate or inconsistent in the code paths. Separated LS context states from FCP. Added and modified context state sanity checks. Use context state to determine if a sol or unsol ABORT is needed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: lpfc: make a couple of functions staticColin Ian King
functions lpfc_nvmet_cleanup_io_context and lpfc_nvmet_setup_io_context can be made static as they do not need to be in global scope. Cleans up sparse warnings: "warning: symbol 'lpfc_nvmet_cleanup_io_context' was not declared. Should it be static?" "warning: symbol 'lpfc_nvmet_setup_io_context' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-11Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of user visible fixes (excepting one format string change). Four of the qla2xxx fixes only affect the firmware dump path, but it's still important to the enterprise. The rest are various NULL pointer crash conditions or outright driver hangs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: cxgb4i: libcxgbi: in error case RST tcp conn scsi: scsi_debug: Avoid PI being disabled when TPGS is enabled scsi: qla2xxx: Fix extraneous ref on sp's after adapter break scsi: lpfc: prevent potential null pointer dereference scsi: lpfc: Avoid NULL pointer dereference in lpfc_els_abort() scsi: lpfc: nvmet_fc: fix format string scsi: qla2xxx: Fix crash due to NULL pointer dereference of ctx scsi: qla2xxx: Fix mailbox pointer error in fwdump capture scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues scsi: qla2xxx: Fix crash due to mismatch mumber of Q-pair creation for Multi queue scsi: qla2xxx: Fix NULL pointer access due to redundant fc_host_port_name call scsi: qla2xxx: Fix recursive loop during target mode configuration for ISP25XX leaving system unresponsive scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats() scsi: qla2xxx: don't disable a not previously enabled PCI device
2017-05-31scsi: lpfc: nvmet_fc: fix format stringArnd Bergmann
The lpfc_nvmeio_data() tracing helper always takes a format string and three additional arguments. The latest caller has a format string with only two integer arguments, causing this harmless warning: drivers/scsi/lpfc/lpfc_nvmet.c: In function 'lpfc_nvmet_xmt_fcp_release': drivers/scsi/lpfc/lpfc_nvmet.c:802:25: error: too many arguments for format [-Werror=format-extra-args] lpfc_nvmeio_data(phba, "NVMET FCP FREE: xri x%x ste %d\n", ctxp->oxid, We could add a dummy argument here, but it seems reasonable to print the 'abort' flag as the third argument. Fixes: 19b58d9473e8 ("nvmet_fc: add req_release to lldd api") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-24Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is quite a big update because it includes a rework of the lpfc driver to separate the NVMe part from the FC part. The reason for doing this is because two separate trees (the nvme and scsi trees respectively) want to update the individual components and this separation will prevent a really nasty cross tree entanglement by the time we reach the next merge window. The rest of the fixes are the usual minor sort with no significant security implications" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (25 commits) scsi: zero per-cmd private driver data for each MQ I/O scsi: csiostor: fix use after free in csio_hw_use_fwconfig() scsi: ufs: Clean up some rpm/spm level SysFS nodes upon remove scsi: lpfc: fix build issue if NVME_FC_TARGET is not defined scsi: lpfc: Fix NULL pointer dereference during PCI error recovery scsi: lpfc: update version to 11.2.0.14 scsi: lpfc: Add MDS Diagnostic support. scsi: lpfc: Fix NVMEI's handling of NVMET's PRLI response attributes scsi: lpfc: Cleanup entry_repost settings on SLI4 queues scsi: lpfc: Fix debugfs root inode "lpfc" not getting deleted on driver unload. scsi: lpfc: Fix NVME I+T not registering NVME as a supported FC4 type scsi: lpfc: Added recovery logic for running out of NVMET IO context resources scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context scsi: lpfc: Separate NVMET data buffer pool fir ELS/CT. scsi: lpfc: Fix NMI watchdog assertions when running nvmet IOPS tests scsi: lpfc: Fix NVMEI driver not decrementing counter causing bad rport state. scsi: lpfc: Fix nvmet RQ resource needs for large block writes. scsi: lpfc: Adding additional stats counters for nvme. scsi: lpfc: Fix system crash when port is reset. scsi: lpfc: Fix used-RPI accounting problem. ...
2017-05-20nvmet-fc: remove target cpu scheduling flagJames Smart
Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-17scsi: lpfc: fix build issue if NVME_FC_TARGET is not definedJames Smart
fix build issue if NVME_FC_TARGET is not defined. noop the code. The code will never be invoked if target mode is not enabled. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Added recovery logic for running out of NVMET IO context resourcesJames Smart
Previous logic would just drop the IO. Added logic to queue the IO to wait for an IO context resource from an IO thats already in progress. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/contextJames Smart
Currently IO resources are mapped 1 to 1 with RQ buffers posted Added logic to separate RQE buffers from IO op resources (sgl/iocbq/context). During initialization, the driver will determine how many SGLs it will allocate for NVMET (based on what the firmware reports) and associate a NVMET IOCBq and NVMET context structure with each one. Now that hdr/data buffers are immediately reposted back to the RQ, 512 RQEs for each MRQ is sufficient. Also, since NVMET data buffers are now 128 bytes, lpfc_nvmet_mrq_post is not necessary anymore as we will always post the max (512) buffers per NVMET MRQ. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix nvmet RQ resource needs for large block writes.James Smart
Large block writes to the nvme target were failing because the default number of RQs posted was insufficient. Expand the NVMET RQs to 2048 RQEs and ensure a minimum of 512 RQEs are posted, no matter how many MRQs are configured. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Adding additional stats counters for nvme.James Smart
More debug messages added for nvme statistics. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-27Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into ↵Jens Axboe
for-4.12/post-merge Christoph writes: "A couple more updates for 4.12. The biggest pile is fc and lpfc updates from James, but there are various small fixes and cleanups as well." Fixes up a few merge issues, and also a warning in lpfc_nvmet_rcv_unsol_abort() if CONFIG_NVME_TARGET_FC isn't enabled. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-24Merge branch 'master' into for-4.12/post-mergeJens Axboe
2017-04-24Update ABORT processing for NVMET.James Smart
The driver with nvme had this routine stubbed. Right now XRI_ABORTED_CQE is not handled and the FC NVMET Transport has a new API for the driver. Missing code path, new NVME abort API Update ABORT processing for NVMET There are 3 new FC NVMET Transport API/ template routines for NVMET: lpfc_nvmet_xmt_fcp_release This NVMET template callback routine called to release context associated with an IO This routine is ALWAYS called last, even if the IO was aborted or completed in error. lpfc_nvmet_xmt_fcp_abort This NVMET template callback routine called to abort an exchange that has an IO in progress nvmet_fc_rcv_fcp_req When the lpfc driver receives an ABTS, this NVME FC transport layer callback routine is called. For this case there are 2 paths thru the driver: the driver either has an outstanding exchange / context for the XRI to be aborted or not. If not, a BA_RJT is issued otherwise a BA_ACC NVMET Driver abort paths: There are 2 paths for aborting an IO. The first one is we receive an IO and decide not to process it because of lack of resources. An unsolicated ABTS is immediately sent back to the initiator as a response. lpfc_nvmet_unsol_fcp_buffer lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE) The second one is we sent the IO up to the NVMET transport layer to process, and for some reason the NVME Transport layer decided to abort the IO before it completes all its phases. For this case there are 2 paths thru the driver: the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no outstanding WQEs are present for the exchange / context. lpfc_nvmet_xmt_fcp_abort if (LPFC_NVMET_IO_INP) lpfc_nvmet_sol_fcp_issue_abort (ABORT_WQE) lpfc_nvmet_sol_fcp_abort_cmp else lpfc_nvmet_unsol_fcp_issue_abort lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE) lpfc_nvmet_unsol_fcp_abort_cmp Context flags: LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange. LPFC_NVMET_XBUSY - this flag indicates the IO completed but the firmware is still busy with the corresponding exchange. The exchange should not be reused until after a XRI_ABORTED_CQE is received for that exchange. LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the exchange. LPFC_NVMET_CTX_RLS - this flag signifies a context free was requested, but we are deferring it due to an XBUSY or ABORT in progress. A ctxlock is added to the context structure that is used whenever these flags are set/read within the context of an IO. The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when the transport has resolved all IO associated with the buffer. The flag is cleared when the CTX is associated with a new IO. An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP condition active simultaneously. Both conditions must complete before the exchange is freed. When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked: If there is an outstanding IO, the driver will issue an ABORT_WQE. This should result in 3 completions for the exchange: 1) IO cmpl with XB bit set 2) Abort WQE cmpl 3) XRI_ABORTED_CQE cmpl For this scenerio, after completion #1, the NVMET Transport IO rsp callback is called. After completion #2, no action is taken with respect to the exchange / context. After completion #3, the exchange context is free for re-use on another IO. If there is no outstanding activity on the exchange, the driver will send a ABTS to the Initiator. Upon completion of this WQE, the exchange / context is freed for re-use on another IO. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24Fix max_sgl_segments settings for NVME / NVMETJames Smart
Cannot set NVME segment counts to a large number The existing module parameter lpfc_sg_seg_cnt is used for both SCSI and NVME. Limit the module parameter lpfc_sg_seg_cnt to 128 with the default being 64 for both NVME and NVMET, assuming NVME is enabled in the driver for that port. The driver will set max_sgl_segments in the NVME/NVMET template to lpfc_sg_seg_cnt + 1. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24Fix driver load issues when MRQ=8James Smart
The symptom is that the driver will fail to login to the fabric. The reason is because it is out of iocb resources. There is a one to one relationship between MRQs (receive buffers for NVMET-FC) and iocbs and the default number of IOCBs was not accounting for the number of MRQs that were being created. This fix aligns the number of MRQ resources with the total resources so that it can handle fabric events when needed. Also the initialization of ctxlock to be on FCP commands, NOT LS commands. And modified log messages so that the log output can be correlated with the analyzer trace. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24Remove hba lock from NVMET issue WQE.James Smart
Unnecessary lock is taken. ring lock should be sufficient to protect the work queue submission. This was noticed when doing performance testing. The hbalock is not needed to issue io to the nvme work queue. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24Standardize nvme SGL segment countJames Smart
Standardize default SGL segment count for nvme target and initiator The driver needs to make them the same for clarity. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-21nvmet_fc: Rework target side abort handlingJames Smart
target transport: ---------------------- There are cases when there is a need to abort in-progress target operations (writedata) so that controller termination or errors can clean up. That can't happen currently as the abort is another target op type, so it can't be used till the running one finishes (and it may not). Solve by removing the abort op type and creating a separate downcall from the transport to the lldd to request an io to be aborted. The transport will abort ios on queue teardown or io errors. In general the transport tries to call the lldd abort only when the io state is idle. Meaning: ops that transmit data (readdata or rsp) will always finish their transmit (or the lldd will see a state on the link or initiator port that fails the transmit) and the done call for the operation will occur. The transport will wait for the op done upcall before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; Similarly, ios that are not waiting for data or transmitting data must be in the nvmet layer being processed. The transport will wait for the nvmet layer completion before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; As for ops that are waiting for data (writedata), they may be outstanding indefinitely if the lldd doesn't see a condition where the initiatior port or link is bad. In those cases, the transport will call the abort function and wait for the lldd's op done upcall for the operation, where it will then clean up the io. Additionally, if a lldd receives an ABTS and matches it to an outstanding request in the transport, A new new transport upcall was created to abort the outstanding request in the transport. The transport expects any outstanding op call (readdata or writedata) will completed by the lldd and the operation upcall made. The transport doesn't act on the reported abort (e.g. clean up the io) until an op done upcall occurs, a new op is attempted, or the nvmet layer completes the io processing. fcloop: ---------------------- Updated to support the new target apis. On fcp io aborts from the initiator, the loopback context is updated to NULL out the half that has completed. The initiator side is immediately called after the abort request with an io completion (abort status). On fcp io aborts from the target, the io is stopped and the initiator side sees it as an aborted io. Target side ops, perhaps in progress while the initiator side is done, continue but noop the data movement as there's no structure on the initiator side to reference. patch also contains: ---------------------- Revised lpfc to support the new abort api commonized rsp buffer syncing and nulling of private data based on calling paths. errors in op done calls don't take action on the fod. They're bad operations which implies the fod may be bad. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21nvmet_fc: add req_release to lldd apiJames Smart
With the advent of the opdone calls changing context, the lldd can no longer assume that once the op->done call returns for RSP operations that the request struct is no longer being accessed. As such, revise the lldd api for a req_release callback that the transport will call when the job is complete. This will also be used with abort cases. Fixed text in api header for change in io complete semantics. Revised lpfc to support the new req_release api. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21nvmet_fc: add target feature flags for upcall isr contextsJames Smart
Two new feature flags were added to control whether upcalls to the transport result in context switches or stay in the calling context. NVMET_FCTGTFEAT_CMD_IN_ISR: By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the cmd handler is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. NVMET_FCTGTFEAT_OPDONE_IN_ISR By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the fcp operation done callback is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. Updated lpfc for flags Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-23scsi: lpfc: fix building without debugfs supportArnd Bergmann
On a randconfig build without CONFIG_SCSI_LPFC_DEBUG_FS, I ran into multiple compile failures: drivers/scsi/lpfc/lpfc_debugfs.h: In function 'lpfc_debug_dump_wq': drivers/scsi/lpfc/lpfc_debugfs.h:405:15: error: 'DUMP_FCP' undeclared (first use in this function); did you mean 'DUMP_VAR'? drivers/scsi/lpfc/lpfc_debugfs.h:405:15: note: each undeclared identifier is reported only once for each function it appears in drivers/scsi/lpfc/lpfc_debugfs.h:408:22: error: 'DUMP_NVME' undeclared (first use in this function); did you mean 'DUMP_NONE'? drivers/scsi/lpfc/lpfc_nvmet.c: In function 'lpfc_nvmet_xmt_ls_rsp_cmp': drivers/scsi/lpfc/lpfc_nvmet.c:109:2: error: implicit declaration of function 'lpfc_nvmeio_data'; did you mean 'lpfc_mem_free'? [-Werror=implicit-function-declaration] drivers/scsi/lpfc/lpfc_nvmet.c: In function 'lpfc_nvmet_xmt_fcp_op': drivers/scsi/lpfc/lpfc_nvmet.c:523:10: error: unused variable 'id' [-Werror=unused-variable] They are all trivial to fix, so I'm doing it in a combined patch here. Fixes: 1d9d5a9879ad ("scsi: lpfc: refactor debugfs queue dump routines") Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support") Fixes: 2b65e18202fd ("scsi: lpfc: NVME Target: Add debugfs support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-15scsi: lpfc: Finalize Kconfig options for nvmeJames Smart
Reviewing the result of what was just added for Kconfig, we made a poor choice. It worked well for full kernel builds, but not so much for how it would be deployed on a distro. Here's the final result: - lpfc will compile in NVME initiator and/or NVME target support based on whether the kernel has the corresponding subsystem support. Kconfig is not used to drive this specifically for lpfc. - There is a module parameter, lpfc_enable_fc4_type, that indicates whether the ports will do FCP-only or FCP & NVME (NVME-only not yet possible due to dependency on fc transport). As FCP & NVME divvys up exchange resources, and given NVME will not be often initially, the default is changed to FCP only. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-06scsi: lpfc: code cleanups in NVME initiator discoveryJames Smart
This patch addresses the smatch issues identified by Dan Carpenter in http://www.spinics.net/lists/linux-scsi/msg105665.html The issues are: drivers/scsi/lpfc/lpfc_ct.c:943 lpfc_cmpl_ct_cmd_gft_id() error: we previously assumed 'ndlp' could be null (see line 928) Action: moved under if check drivers/scsi/lpfc/lpfc_nvmet.c:1694 lpfc_nvmet_unsol_issue_abort() error: we previously assumed 'ndlp' could be null (see line 1690) Action: conditionalized arg in printf stmt drivers/scsi/lpfc/lpfc_nvmet.c:1792 lpfc_nvmet_sol_fcp_issue_abort() error: we previously assumed 'ndlp' could be null (see line 1788) Action: conditionalized arg in printf stmt Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>