summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2019-09-30scsi: mpt3sas: Bump mpt3sas driver version to 32.100.00.00Sreekanth Reddy
Bump mpt3sas driver version to 32.100.00.00 Link: https://lore.kernel.org/r/1568379890-18347-14-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Fix module parameter max_msix_vectorsSreekanth Reddy
Load driver with module parameter "max_msix_vectors". Value provided in module parameter is not used by mpt3sas driver. Driver loads with max controller supported MSI-X value. In _base_alloc_irq_vectors use reply_queue_count which is determined using user provided msix value insted of ioc->msix_vector_count which tells max supported msix value of the controller. Link: https://lore.kernel.org/r/1568379890-18347-13-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Reject NVMe Encap cmnds to unsupported HBASreekanth Reddy
If any faulty application issues an NVMe Encapsulated commands to HBA which doesn't support NVMe protocol then driver should return the command as invalid with the following message. "HBA doesn't support NVMe. Rejecting NVMe Encapsulated request." Otherwise below page fault kernel panic will be observed while building the PRPs as there is no PRP pools allocated for the HBA which doesn't support NVMe drives. RIP: 0010:_base_build_nvme_prp+0x3b/0xf0 [mpt3sas] Call Trace: _ctl_do_mpt_command+0x931/0x1120 [mpt3sas] _ctl_ioctl_main.isra.11+0xa28/0x11e0 [mpt3sas] ? prepare_to_wait+0xb0/0xb0 ? tty_ldisc_deref+0x16/0x20 _ctl_ioctl+0x1a/0x20 [mpt3sas] do_vfs_ioctl+0xaa/0x620 ? vfs_read+0x117/0x140 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x60/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [mkp: tweaked error string] Link: https://lore.kernel.org/r/1568379890-18347-12-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Use Component img header to get Package verSreekanth Reddy
The firmware image layout has been changed for Aero controllers. All compatible HBAs have to get Firmware Package version from Component Image Header layout. The Signature field in FW header is set to 0xEB000042 for products compatible with Component Image Header. For compatible controllers, driver fetches firmware package version from ApplicationSpecific field of Component Image Header. Link: https://lore.kernel.org/r/1568379890-18347-11-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Fail release cmnd if diag buffer is releasedSreekanth Reddy
Return the diag buffer release command with -EINVAL status if the buffer is already released. Link: https://lore.kernel.org/r/1568379890-18347-10-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Add app owned flag support for diag bufferSreekanth Reddy
Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set whenever application registers the diag buffer & it will be cleared when application unregisters the buffer. When this flag is enabled, and if application issues diag buffer register command without releasing the buffer, then register command will be failed with -EINVAL status by saying that this buffer is already registered by the application. When user issues a trace buffer register command through sysfs parameter, and if trace buffer is in released stated but not yet unregistered by the application which was owning it, then driver will unregister the buffer by itself and freshly register the 1MB sized trace buffer with the HBA firmware. Link: https://lore.kernel.org/r/1568379890-18347-9-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Reuse diag buffer allocated at load timeSreekanth Reddy
The diag buffer which is allocated during driver load time or through sysfs parameter is marked as driver allocated diag buffer. MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer. This buffer won't be de-allocated even when application issues unregister command, driver just clears the registered status bit. Same buffer will be reused while re-registering the same diag buffer type by any application. While re-registering the same diag buffer type application has to register with the same size that the buffer was allocated during driver load time. This buffer size can be read by the application by issuing diag 'query' command. This always makes sure that the memory is available for applications for collecting the firmware logs. Only thing is that this won't allow the application to re-register the diag buffer with different size, but the buffer size which is allocated during driver load time will be enough for most of the cases for collecting the firmware logs. Link: https://lore.kernel.org/r/1568379890-18347-8-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: clear release bit when buffer reregisteredSreekanth Reddy
Clear MPT3_DIAG_BUFFER_IS_RELEASED bit once diag buffer is re-registered after reading the buffer, else driver won't release the buffer and return the 'diag release' command with -EINVAL status saying that buffer is already released. Link: https://lore.kernel.org/r/1568379890-18347-7-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Maintain owner of buffer through UniqueIDSreekanth Reddy
Application A has registered a diag buffer and looking for particular event to happen to release & read the trace buffer. Meanwhile application B has unregistered the diag buffer and now Application A can't get the required diag buffer. So proper diag buffer ownership is missing. Each application has to maintain its own Unique ID. Now driver has to save the Application's UniqueID for each diag buffer type when diag buffer is registered. And driver has to allow 'release', 'read' & 'unregister' diag commands only if application's UniqueID matches with saved UniqueID for the corresponding diag buffer type. When diag buffer is registered by the driver, then the UniqueID saved by the driver is "BRCM" (i.e. 0x4252434D) for SAS3 and above generations HBA devices. For SAS2 HBAs, driver keeps the legacy UniqueID 0x07075900 for maintaining compatibility with the legacy SAS2 application and this improvement won't be applicable for SAS2 HBA devices. Any application can own the buffer registered by the driver by sending diag register request to driver with same buffer type and size (Application can get the buffer size by sending 'query' command). Then driver changes the ownership of the buffer by saving application's UniqueID for that corresponding buffer type. Also, application can re-register the diag buffer with same size without un-registering it, but diag buffer should be released before re-registering it. By allowing this, driver no need to deallocate and allocate a new buffer for re-register command, same buffer can be re-used. Link: https://lore.kernel.org/r/1568379890-18347-6-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Free diag buffer without any status checkSreekanth Reddy
Memory leak can happen when diag buffer is released but not unregistered (where buffer is deallocated) by the user. During module unload time driver is not deallocating the buffer if the buffer is in released state. Deallocate the diag buffer during module unload time without any diag buffer status checks. Link: https://lore.kernel.org/r/1568379890-18347-5-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Fix clear pending bit in ioctl statusSreekanth Reddy
When user issues diag register command from application with required size, and if driver unable to allocate the memory, then it will fail the register command. While failing the register command, driver is not currently clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set before trying to allocate the memory. As this bit is set, subsequent register command will be failed with BUSY status even when user wants to register the trace buffer will less memory. Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag register command with no memory status. Link: https://lore.kernel.org/r/1568379890-18347-4-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Display message before releasing diag bufferSreekanth Reddy
Display message before releasing the diag buffer so that user knows which event caused the release of diag buffer. Releasing of diag buffer means HBA firmware stops posting the firmware logs on the registered diag buffer. Link: https://lore.kernel.org/r/1568379890-18347-3-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: mpt3sas: Register trace buffer based on NVDATA settingsSreekanth Reddy
Currently if user wishes to enable the host trace buffer during driver load time, then user has to load the driver with module parameter 'diag_buffer_enable' set to one. Alternatively now the user can enable host trace buffer by enabling the following fields in manufacturing page11 in NVDATA (nvdata xml is used while building HBA firmware image): * HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can allocate, * HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host should allocate, * HostTraceBufferDecrementSizeKB - size by which host can reduce from buffer size and retry the buffer allocation when buffer allocation failed with previous calculated buffer size. The driver will register the trace buffer automatically without any module parameter during boot time when above fields are enabled in manufacturing page11 in HBA firmware. Driver follows the following algorithm for enabling the host trace buffer during driver load time: * If user has loaded the driver with module parameter 'diag_buffer_enable' set to one, then driver allocates 2MB buffer and registers this buffer with HBA firmware for capturing the firmware trace logs. * Else driver reads manufacture page11 data and checks whether HostTraceBufferMaxSizeKB filed is zero or not? - If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is successful, then it will register this buffer with HBA firmware, else in a loop the driver will try again by reducing the current buffer size with HostTraceBufferDecrementSizeKB size until memory allocation is successful or buffer size falls below HostTraceBufferMinSizeKB. If the memory allocation is successful, then the buffer will be registered with the firmware. Else, if the buffer size falls below the HostTraceBufferMinSizeKB, then driver won't register trace buffer with HBA firmware. - If HostTraceBufferMaxSizeKB is zero, then driver won't register trace buffer with HBA firmware. Link: https://lore.kernel.org/r/1568379890-18347-2-git-send-email-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Update lpfc version to 12.4.0.1James Smart
Update lpfc version to 12.4.0.1 Link: https://lore.kernel.org/r/20190922035906.10977-21-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: cleanup: remove unused fcp_txcmlpq_cntJames Smart
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in lpfc driver message 0387. Presumed residual (or unused) code from previous commit. Removed fcp_txcmplq_cnt. Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Complete removal of FCoE T10 PI support on SLI-4 adaptersJames Smart
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior commit in the 12.4.0.0 stream added device recognition that would prevent T10 PI enablement. However, it didn't contain a complete device list. Thus some SLI-4 FCoE adapters still had T10 PI enabled. Fix by expanding the device list that identifies FCoE devices. Link: https://lore.kernel.org/r/20190922035906.10977-19-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Update async event loggingJames Smart
This patch updates ACQE handling for: - an EEPROM failure error reported by the adapter. - ensures that all data for any ACQE, recognized or not, is logged. - Given that all data is now logged unconditionally, the default case (unrecognized) data can be reduced. Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwqJames Smart
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool before any associated sgl or cmd and rsp buffers are returned via their respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf structure is reused quickly, there may be a race condition between release of old and association of new resources. Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp buffer lists before releasing the lpfc_io_buf structure for re-use. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix hdwq sgl locks and irq handlingJames Smart
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard deadlocks were seen around lpfc_scsi_prep_cmnd(). Fix by converting the locks to irqsave/irqrestore. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix spinlock_irq issues in lpfc_els_flush_cmd()James Smart
While reviewing the CT behavior, issues with spinlock_irq were seen. The driver should be using spinlock_irqsave/irqrestore in the els flush routine. Changed to spinlock_irqsave/irqrestore. Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbqJames Smart
After study, it was determined there was a double free of a CT iocb during execution of lpfc_offline_prep and lpfc_offline. The prep routine issued an abort for some CT iocbs, but the aborts did not complete fast enough for a subsequent routine that waits for completion. Thus the driver proceeded to lpfc_offline, which releases any pending iocbs. Unfortunately, the completions for the aborts were then received which re-released the ct iocbs. Turns out the issue for why the aborts didn't complete fast enough was not their time on the wire/in the adapter. It was the lpfc_work_done routine, which requires the adapter state to be UP before it calls lpfc_sli_handle_slow_ring_event() to process the completions. The issue is the prep routine takes the link down as part of it's processing. To fix, the following was performed: - Prevent the offline routine from releasing iocbs that have had aborts issued on them. Defer to the abort completions. Also means the driver fully waits for the completions. Given this change, the recognition of "driver-generated" status which then releases the iocb is no longer valid. As such, the change made in the commit 296012285c90 is reverted. As recognition of "driver-generated" status is no longer valid, this patch reverts the changes made in commit 296012285c90 ("scsi: lpfc: Fix leak of ELS completions on adapter reset") - Modify lpfc_work_done to allow slow path completions so that the abort completions aren't ignored. - Updated the fdmi path to recognize a CT request that fails due to the port being unusable. This stops FDMI retries. FDMI will be restarted on next link up. Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix host hang at boot or slow bootJames Smart
Scenarios were seen where a host hung when the system booted or the host was very slow in booting. The link would not come up and no luns were visible to the host. After investigation, this was found to be due to the introduction of a new ACQE that adapter may generate to report a adapter hw warning. The ACQE was delivered to the driver very early in adapter initialization, when the driver did not expect command completion. As part of handling this unexpected interrupt the an EQEs are consumed and discarded and the EQ rearmed. The issue is the CQ that cause the EQE and thus the interrupt was not processed and the CQ was left unarmed. Meaning it would no longer generate a new interrupt condition. Subsequent mailbox commands used to initialize the adapter use the same CQ, and as there was no completion interrupt generated, the driver never saw the mailbox commands complete and it would wait long command timeouts. Fix by having the early flush routine also process the related CQ and rearm the CQ. Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix coverity errors on NULL pointer checksJames Smart
Coverity flagged several scenarios where checking of null pointer values wasn't consistent. Fix the code to that be consistent on checking. Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix NVMe ABTS in response to receiving an ABTSJames Smart
When the port, running as a nvme target, receives an ABTS, it submits commands to the adapter to Abort i/o outstanding in the adapter. The Abort command formatting routine left a command field set to zero, which instructs the adapter to generate an ABTS on the wire as part of cleaning up the I/O. This is common operation for an initiator, but not for a target. Fix the driver to check whether an ABTS had been received for the I/O, and if so, change the Abort command formatting so that the ABTS generation is disabled (IA=1). No need to ABTS it when the other side already has. Also refactored the code such that there is a single routine being used for nvme or nvmet ABORT requests, and IA is an argument. Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix discovery failures when target device connectivity bouncesJames Smart
An issue was seen discovering all SCSI Luns when a target device undergoes link bounce. The driver currently does not qualify the FC4 support on the target. Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is that the target will reject the PRLI if it is not supported. If a PRLI times out, the driver will retry. The driver will not proceed with the device until both SCSI and NVMe PRLIs are resolved. In the failure case, the device is FCP only and does not respond to the NVMe PRLI, thus initiating the wait/retry loop in the driver. During that time, a RSCN is received (device bounced) causing the driver to issue a GID_FT. The GID_FT response comes back before the PRLI mess is resolved and it prematurely cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE state. Discovery with the target never completes or resets. Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes, thereby restarting the discovery process for the node. Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix GPF on scsi command completionJames Smart
Faults are seen with RIP of lpfc_scsi_cmd_iocb_cmpl(). The failure is when lpfc_update_status is being called as part of the completion. After debugging, it was seen the issue was the shost pointer that the driver derived from the scsi cmd. The crash showed the cmd->device pointer being bogus, which is likely as the scsi devices were offlined prior. The bogus device pointer caused subsequent pointers derived from the location, specifically the vport, to be bogus. Fix by adjusting the calling sequence to pass in the vport rather than having to derive it from the cmd structure. Link: https://lore.kernel.org/r/20190922035906.10977-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix locking on mailbox command completionJames Smart
Symptoms were seen of the driver not having valid data for mailbox commands. After debugging, the following sequence was found: The driver maintains a port-wide pointer of the mailbox command that is currently in execution. Once finished, the port-wide pointer is cleared (done in lpfc_sli4_mq_release()). The next mailbox command issued will set the next pointer and so on. The mailbox response data is only copied if there is a valid port-wide pointer. In the failing case, it was seen that a new mailbox command was being attempted in parallel with the completion. The parallel path was seeing the mailbox no long in use (flag check under lock) and thus set the port pointer. The completion path had cleared the active flag under lock, but had not touched the port pointer. The port pointer is cleared after the lock is released. In this case, the completion path cleared the just-set value by the parallel path. Fix by making the calls that clear mbox state/port pointer while under lock. Also slightly cleaned up the error path. Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix device recovery errors after PLOGI failuresJames Smart
When target-side fault injections are made, the driver isn't reconnecting to the remote port. The driver is logging "2753" error messages which state: "PLOGI failure DID:1B2400 Status:x3/xf0240008" The failures status is indicating a Illegal field error, which points to the Temporary RPI field being used for the ELS. This error typically means the driver used an RPI that was already registered (shouldn't be registered if using it in this context). Study has found that if the driver were in discovery attempts and encountered an error, it wouldn't flag the temporary rpi in error. Yet the rpi was released for reallocation in these error paths and another ELS could allocate the rpi. In the failure situation a retry was done on an ELS that had encountered an error, and as the rpi wasn't marked in error, the ELS reused the rpi it originally allocated. But that rpi had been allocated by a different ELS issued after the original error and before the retry attempt. The different ELS had succeeded and the RPI was registered. Fix by marking the rpi state for the node to be in error, aka as needing reallocation, upon an error in the els processing. Error state marking is always done prior to release back to the internal rpi free list, which the driver wasn't doing in cases prior. Also enhanced some of the logging to help in the next case of problem troubleshooting. Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix rpi release when deleting vportJames Smart
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp pointer. However, further testing has shown that this change causes a later state change to occasionally be skipped, which results in a reference count never being decremented thus the rpi is never released, which causes a vport delete to never succeed. Revise the fix in the prior patch to no longer null the ndlp. Instead the RELEASE_RPI flag is set which will drive the release of the rpi. Given the new code was added at a deep indentation level, refactor the code block using a new routine that avoids the indentation issues. Fixes: 9b1640686470 ("scsi: lpfc: Fix use-after-free mailbox cmd completion") Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix NVME io abort failures causing hangsJames Smart
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix miss of register read failure checkJames Smart
Coverity flagged missing status check on register read that flags a poisoned data return value. Add checking of register read status. Link: https://lore.kernel.org/r/20190922035906.10977-4-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix premature re-enabling of interrupts in lpfc_sli_host_downJames Smart
Use of spin_lock_irq may re-enable interrupts prematurely. Convert to spin_lock. Note: code is under the phba->hba_lock which has been locked with irqsave. Link: https://lore.kernel.org/r/20190922035906.10977-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix pt2pt discovery on SLI3 HBAsJames Smart
After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link trace showed the port was assigned a non-zero n_port_id, but didn't use the address on the PRLI. The assigned address is set on the port by the CONFIG_LINK mailbox command. The driver responded to the PRLI before the mailbox command completed. Thus the PRLI response used the old n_port_id. Defer the PRLI response until CONFIG_LINK completes. Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-24Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "Some later additions that weren't quite done for the first pull request, and also a few fixes that have arrived since. This contains: - Kill silly pktcdvd warning on attempting to register a non-scsi passthrough device (me) - Use symbolic constants for the block t10 protection types, and switch to handling it in core rather than in the drivers (Max) - libahci platform missing node put fix (Nishka) - Small series of fixes for BFQ (Paolo) - Fix possible nbd crash (Xiubo)" * tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block: block: drop device references in bsg_queue_rq() block: t10-pi: fix -Wswitch warning pktcdvd: remove warning on attempting to register non-passthrough dev ata: libahci_platform: Add of_node_put() before loop exit nbd: fix possible page fault for nbd disk nbd: rename the runtime flags as NBD_RT_ prefixed block, bfq: push up injection only after setting service time block, bfq: increase update frequency of inject limit block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1 block, bfq: update inject limit only after injection occurred block: centralize PI remapping logic to the block layer block: use symbolic constants for t10_pi type
2019-09-23scsi: qla2xxx: Fix Nport ID display valueQuinn Tran
For N2N, the NPort ID is assigned by driver in the PLOGI ELS. According to FW Spec the byte order for SID is not the same as DID. Link: https://lore.kernel.org/r/20190912180918.6436-8-hmadhani@marvell.com Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Tested-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Fix N2N link up failQuinn Tran
During link up/bounce, qla driver would do command flush as part of cleanup. In this case, the flush can intefere with FW state. This patch allows FW to be in control of link up. Link: https://lore.kernel.org/r/20190912180918.6436-7-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Fix N2N link resetQuinn Tran
Fix stalled link recovery for N2N with FC-NVMe connection. Link: https://lore.kernel.org/r/20190912180918.6436-6-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Optimize NPIV tear down processQuinn Tran
In the case of NPIV port is being torn down, this patch will set a flag to indicate VPORT_DELETE. This would prevent relogin to be triggered. Link: https://lore.kernel.org/r/20190912180918.6436-5-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Fix stale mem access on driver unloadQuinn Tran
On driver unload, 'remove_one' thread was allowed to advance, while session cleanup still lag behind. This patch ensures session deletion will finish before remove_one can advance. Link: https://lore.kernel.org/r/20190912180918.6436-4-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Fix unbound sleep in fcport delete path.Quinn Tran
There are instances, though rare, where a LOGO request cannot be sent out and the thread in free session done can wait indefinitely. Fix this by putting an upper bound to sleep. Link: https://lore.kernel.org/r/20190912180918.6436-3-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qla2xxx: Silence fwdump template messageHimanshu Madhani
Print if fwdt template is present or not, only when ql2xextended_error_logging is enabled. Link: https://lore.kernel.org/r/20190912180918.6436-2-hmadhani@marvell.com Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: hisi_sas: Make three functions staticYueHaibing
Fix sparse warnings: drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6: warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static? drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5: warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static? drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6: warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static? Link: https://lore.kernel.org/r/20190923054035.19036-1-yuehaibing@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: megaraid: disable device when probe failed after enabled deviceXiang Chen
For pci device, need to disable device when probe failed after enabled device. Link: https://lore.kernel.org/r/1567818450-173315-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queueLong Li
storvsc doesn't use a dedicated hardware queue for a given CPU queue. When issuing I/O, it selects returning CPU (hardware queue) dynamically based on vmbus channel usage across all channels. This patch advertises num_present_cpus() as number of hardware queues. This will have upper layer setup 1:1 mapping between hardware queue and CPU queue and avoid unnecessary locking when issuing I/O. Link: https://lore.kernel.org/r/1567790660-48142-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: qedf: Remove always false 'tmp_prio < 0' statementAustin Kim
Since tmp_prio is declared as u8, the following statement is always false. tmp_prio < 0 So remove 'always false' statement. Link: https://lore.kernel.org/r/20190919075548.GA112801@LGEARND20B15 Signed-off-by: Austin Kim <austindh.kim@gmail.com> Acked-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: ufs: skip shutdown if hba is not poweredStanley Chu
In some cases, hba may go through shutdown flow without successful initialization and then make system hang. For example, if ufshcd_change_power_mode() gets error and leads to ufshcd_hba_exit() to release resources of the host, future shutdown flow may hang the system since the host register will be accessed in unpowered state. To solve this issue, simply add checking to skip shutdown for above kind of situation. Link: https://lore.kernel.org/r/1568780438-28753-1-git-send-email-stanley.chu@mediatek.com Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23Merge tag 'pci-v5.4-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it (Krzysztof Wilczynski) - Fix incorrect PCIe device types and remove dev->has_secondary_link to simplify code that deals with upstream/downstream ports (Mika Westerberg) - After suspend, restore Resizable BAR size bits correctly for 1MB BARs (Sumit Saxena) - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra) Virtualization: - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna Labs (Ali Saidi) - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg) - Remove group write permissions from sysfs sriov_numvfs, sriov_drivers_autoprobe (Kelsey Skunberg) Hotplug: - Simplify pciehp indicator control (Denis Efremov) Peer-to-peer DMA: - Allow P2P DMA between root ports for whitelisted bridges (Logan Gunthorpe) - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe) - DMA map P2P DMA requests that traverse host bridge (Logan Gunthorpe) Amazon Annapurna Labs host bridge driver: - Add DT binding and controller driver (Jonathan Chocron) Hyper-V host bridge driver: - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui) - Fix PCI domain number collisions (Haiyang Zhang) - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang) - Fix build errors on non-SYSFS config (Randy Dunlap) i.MX6 host bridge driver: - Limit DBI register length (Stefan Agner) Intel VMD host bridge driver: - Fix config addressing issues (Jon Derrick) Layerscape host bridge driver: - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao) - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately (Xiaowei Bao) Mediatek host bridge driver: - Add MT7629 controller support (Jianjun Wang) Mobiveil host bridge driver: - Fix CPU base address setup (Hou Zhiqiang) - Make "num-lanes" property optional (Hou Zhiqiang) Tegra host bridge driver: - Fix OF node reference leak (Nishka Dasgupta) - Disable MSI for root ports to work around design problem (Vidya Sagar) - Add Tegra194 DT binding and controller support (Vidya Sagar) - Add support for sideband pins and slot regulators (Vidya Sagar) - Add PIPE2UPHY support (Vidya Sagar) Misc: - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg) - Unexport pci_bus_get(), etc (Kelsey Skunberg) - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in the PCI core (Kelsey Skunberg) - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg) - Mark expected switch fall-through (Gustavo A. R. Silva) - Propagate errors for optional regulators and PHYs (Thierry Reding) - Fix kernel command line resource_alignment parameter issues (Logan Gunthorpe)" * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits) PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI arm64: tegra: Add PCIe slot supply information in p2972-0000 platform arm64: tegra: Add configuration for PCIe C5 sideband signals PCI: tegra: Add support to enable slot regulators PCI: tegra: Add support to configure sideband pins PCI: vmd: Fix shadow offsets to reflect spec changes PCI: vmd: Fix config addressing when using bus offsets PCI: dwc: Add validation that PCIe core is set to correct mode PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port PCI: Add ACS quirk for Amazon Annapurna Labs root ports PCI: Add Amazon's Annapurna Labs vendor ID MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries dt-bindings: PCI: tegra: Add sideband pins configuration entries PCI: tegra: Add Tegra194 PCIe support PCI: Get rid of dev->has_secondary_link flag ...
2019-09-23scsi: bnx2fc: Handle scope bits when array returns BUSY or TSFLaurence Oberman
The qla2xxx driver had this issue as well when the newer array firmware returned the retry_delay_timer in the fcp_rsp. The bnx2fc is not handling the masking of the scope bits either so the retry_delay_timestamp value lands up being a large value added to the timer timestamp delaying I/O for up to 27 Minutes. This patch adds similar code to handle this to the bnx2fc driver to avoid the huge delay. Link: https://lore.kernel.org/r/1568210202-12794-1-git-send-email-loberman@redhat.com Signed-off-by: Laurence Oberman <loberman@redhat.com> Reported-by: David Jeffery <djeffery@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: core: fix dh and multipathing for SCSI hosts without request batchingSteffen Maier
This was missing from scsi_device_from_queue() due to the introduction of another new scsi_mq_ops_no_commit of linux-next commit 8930a6c20791 ("scsi: core: add support for request batching") from Martin's scsi/5.4/scsi-queue or James' scsi/misc. Only devicehandler code seems to call scsi_device_from_queue(): *** drivers/scsi/scsi_dh.c: scsi_dh_activate[255] sdev = scsi_device_from_queue(q); scsi_dh_set_params[302] sdev = scsi_device_from_queue(q); scsi_dh_attach[325] sdev = scsi_device_from_queue(q); scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q); Fixes multipath tools follow-on errors: $ multipath -v6 ... libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha failed: No such device ... mpatha: failed to load map, error 19 ... showing also as kernel messages: device-mapper: table: 252:0: multipath: error attaching hardware handler device-mapper: ioctl: error adding target to table Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23scsi: core: fix missing .cleanup_rq for SCSI hosts without request batchingSteffen Maier
This was missing from scsi_mq_ops_no_commit of linux-next commit 8930a6c20791 ("scsi: core: add support for request batching") from Martin's scsi/5.4/scsi-queue or James' scsi/misc. See also linux-next commit b7e9e1fb7a92 ("scsi: implement .cleanup_rq callback") from block/for-next. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>