summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-07scsi: lpfc: Implement health checking when aborting I/OJames Smart
Several errors have occurred where the adapter stops or fails but does not raise the register values for the driver to detect failure. Thus driver is unaware of the failure. The failure typically results in I/O timeouts, the I/O timeout handler failing (after several seconds), and the error handler escalating recovery policy and resulting in more errors. Eventually, the driver is in a position where things have spiraled and it can't do recovery because other recovery ops are still outstanding and it becomes unusable. Resolve the situation by having the I/O timeout handler (actually a els, SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O, perform a mailbox command and look for a response from the hardware. If the mailbox command fails, it will mark the adapter offline and then invoke the adapter reset handler to clean up. The new I/O timeout test will be limited to a test every 5s. If there are multiple I/O timeouts concurrently, only the 1st I/O timeout will generate the mailbox command. Further testing will only occur once a timeout occurs after a 5s delay from the last mailbox command has expired. Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix crash when nvmet transport calls host_releaseJames Smart
When lpfc is running in NVMET mode and supports the NVME-1 addendum changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link resulted in an Oops in lpfc_nvmet_host_release. The fix requires lpfc NVMET to maintain an additional reference on any node structure that acts as the hosthandle for the NVMET transport. This reference get is a one-time addition, is taken prior to the upcall of an unsolicited LS_REQ, and is released when the NVMET transport releases the hosthandle during the host_release downcall. Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix vport create loggingJames Smart
When with testing with large numbers of npiv vports and link bounces, the driver is flooding the messages file, even with log_verbose = 0. The new LOG_TRACE_EVENT messages are still generating events to the messages files. Fix by converting the vport create msg from LOG_TRACE_EVENT to LOG_VPORT. Link: https://lore.kernel.org/r/20210104180240.46824-12-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix NVMe recovery after mailbox timeoutJames Smart
If a mailbox command times out, the SLI port is deemed in error and the port is reset. The HBA cleanup is not returning I/Os to the NVMe layer before the port is unregistered. This is due to the HBA being marked offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler rather than an general adapter reset routine. The mailbox timeout handler mailbox handler only cleaned up SCSI I/Os. Fix by reworking the mailbox handler to: - After handling the mailbox error, detect the board is already in failure (may be due to another error), and leave cleanup to the other handler. - If the mailbox command timeout is initial detector of the port error, continue with the board cleanup and marking the adapter offline (!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic reset adapter routine that is subsequently invoked, will clean up the I/Os. - Have the reset adapter routine flush all NVMe and SCSI I/Os if the adapter has been marked failed (!SLI_ACTIVE). - Rework the NVMe I/O terminate routine to take a status code to fail the I/O with and update so that cleaned up I/O calls the wqe completion routine. Currently it is bypassing the wqe cleanup and calling the NVMe I/O completion directly. The wqe completion routine will take care of data structure and node cleanup then call the NVMe I/O completion handler. Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix target reset failingJames Smart
Target reset is failed by the target as an invalid command. The Target Reset TMF has been obsoleted in T10 for a while, but continues to be used. On (newer) devices, the TMF is rejected causing the reset handler to escalate to adapter resets. Fix by having Target Reset TMF rejections be translated into a LOGO and re-PLOGI with the target device. This provides the same semantic action (although, if the device also supports nvme traffic, it will terminate nvme traffic as well - but it's still recoverable). Link: https://lore.kernel.org/r/20210104180240.46824-10-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix error log messages being logged following SCSI task mgntJames Smart
A successful task mgmt command is logging errors, making it look like problems were encountered. This is due to log messages for the device/target and bus reset handlers having the LOG_TRACE_EVENT flag set. Fix by adjusting the event flag such that the call to the logging routine only receives a LOG_TRACE_EVENT if a prior call actually failed. Link: https://lore.kernel.org/r/20210104180240.46824-9-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Prevent duplicate requests to unregister with cpuhp frameworkJames Smart
In the lpfc offline routine, called for various reasons such as sysfs attribute, driver unload, or port error, the driver is calling __lpfc_cpuhp_remove() to destroy the hot plug data. If the offline routine is called while the driver is in the process of being unloaded, a request using lpfc_cpuhp_remove() is also made from lpfc_sli4_hba_unset(). The cpuhp elements are no longer valid when the second removal request is made. Fix by only calling the cpuhp removal once when the adapter is in the process of unloading. Link: https://lore.kernel.org/r/20210104180240.46824-8-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix FW reset action if I/Os are outstandingJames Smart
If the port is configured for NVME and has any outstanding IOs when a FW reset is requesteed, outstanding I/Os are not properly cleaned up. This causes the fw download request to fail. Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be manually flushed by the driver on port reset. Link: https://lore.kernel.org/r/20210104180240.46824-7-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Use the nvme-fc transport supplied timeout for LS requestsJames Smart
When lpfc generates a GEN_REQUEST wqe for the nvme LS (such as Create Association), the timeout is set to R_A_TOV without regard to the timeout value supplied by the nvme-fc transport. The driver should be setting the timeout to the value passed into the routine. Additionally the caller should be setting the timeout value to the value in the ls request set by the nvme transport. Instead, it unconditionally is setting it to a driver defined value. So the driver actually overrode the value twice. Fix by using the timeout provided to the routine, and for the caller, set the timeout to the ls request timeout value. Link: https://lore.kernel.org/r/20210104180240.46824-6-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix crash when a fabric node is released prematurelyJames Smart
The driver's management of the fabric controller (aka pseudo-scsi initiator) node in SLI3 mode is causing this crash. The crash occurs because of a node reference imbalance that frees the fabric controller node while devloss is outstanding from the SCSI transport. This is triggered by an odd behavior where the switch reacts to a rejected RDP request with a PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and after successfully registering the RPI, incorrectly registers the fabric controller node because it has the NLP_FC4_FCP flag still set from the fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup on Link Up and ends up executing too many puts. Fix by detecting the fabric node type and clearing out the nodes internal flags that triggered a SCSI transport registration and subsequence dev_loss event. The driver cannot count on any persistence from fabric controller nodes. Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue stateJames Smart
Testing with target ports coming and going, the driver eventually reached a state where it no longer discovered the target. When the driver has issued a PRLI and receives a PRLI from the target, it is not properly updating the node's initiator/target role flags. Thus, when a subsequent RSCN is received for a target loss, the driver mis-identifies the target as an initiator and does not initiate LUN scanning. Fix by always refreshing the ndlp with the latest PRLI state information whenever a PRLI is processed. Also clear the ndlp flags when processing a PLOGI so that there is no carry over through a re-login. Link: https://lore.kernel.org/r/20210104180240.46824-4-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3James Smart
A very long time ago, there was a feature: auto sli mode. It gave the user the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in, or even force SLI2 mode if configured. Because of the convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should have been called only once. Additionally, the driver no longer supports SLI-2, so only SLI-3 mode should be allowed. The following changes were made: - Force module parameter to SLI3 only. - Rip out redundant CONFIG_PORT mbox commands. - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine. - Added changes for offline to online behavior Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: lpfc: Fix PLOGI S_ID of 0 on pt2pt configJames Smart
Under some pt2pt situations, the other end of the link may issue a LOGO after successfully completing PLOGI and assigning addresses to the port. Thus the driver may attempt a new PLOGI to re-create the login, but the LOGO handling cleared the address back to 0. Once this happens, the other end, which may be address 0, gets all confused and this cannot be resolved without an administrative action to bounce the link. Fix by assuming that address assignment only occurs on the 1st PLOGI after link up, and regardless of login state, the address assignment sticks. The FC standards aren't particularly clear in this situation (it only describes initial PLOGI), but there is nothing that contradicts this and behaviors on the devices tested appears to conform to the understanding. Thus, don't reset the port address to 0 as part of LOGO handling. Port addresses will only reset on link down. Link: https://lore.kernel.org/r/20210104180240.46824-2-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: hisi_sas: Remove auto_affine_msi_experimental module_paramJohn Garry
Now that the driver always uses managed interrupts, delete auto_affine_msi_experimental module param. Link: https://lore.kernel.org/r/1609763622-34119-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Fix tm request when non-fatal error happensJaegeuk Kim
When non-fatal error like line-reset happens, ufshcd_err_handler() starts to abort tasks by ufshcd_try_to_abort_task(). When it tries to issue a task management request, we hit two warnings: WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70 WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c After fixing the above warnings we hit another tm_cmd timeout which may be caused by unstable controller state: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out Then, ufshcd_err_handler() enters full reset, and kernel gets stuck. It turned out ufshcd_print_trs() printed too many messages on console which requires CPU locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages. This is actually not an error case. Link: https://lore.kernel.org/r/20210107185316.788815-3-jaegeuk@kernel.org Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs") Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Fix livelock of ufshcd_clear_ua_wluns()Jaegeuk Kim
When gate_work/ungate_work experience an error during hibern8_enter or exit we can livelock: ufshcd_err_handler() ufshcd_scsi_block_requests() ufshcd_reset_and_restore() ufshcd_clear_ua_wluns() -> stuck ufshcd_scsi_unblock_requests() In order to avoid this, ufshcd_clear_ua_wluns() can be called per recovery flows such as suspend/resume, link_recovery, and error_handler. Link: https://lore.kernel.org/r/20210107185316.788815-2-jaegeuk@kernel.org Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets") Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Replace sprintf and snprintf with sysfs_emitBean Huo
sprintf and snprintf may cause output defect in sysfs content, it is better to use new added sysfs_emit function which knows the size of the temporary buffer. Link: https://lore.kernel.org/r/20210106211541.23039-1-huobean@gmail.com Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Fix all Kconfig help text indentationRandy Dunlap
Use consistent and expected indentation for all Kconfig text. Link: https://lore.kernel.org/r/20210106205554.18082-1-rdunlap@infradead.org Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Relax locking around ibmvfc_queuecommand()Tyrel Datwyler
The driver's queuecommand routine is still wrapped to hold the host lock for the duration of the call. This will become problematic when moving to multiple queues due to the lock contention preventing asynchronous submissions to mulitple queues. There is no real legitimate reason to hold the host lock, and previous patches have insured proper protection of moving ibmvfc_event objects between free and sent lists. Link: https://lore.kernel.org/r/20210106201835.1053593-6-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Complete commands outside the host/queue lockTyrel Datwyler
Drain the command queue and place all commands on a completion list. Perform command completion on that list outside the host/queue locks. Further, move purged command compeletions outside the host_lock as well. Link: https://lore.kernel.org/r/20210106201835.1053593-5-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Define per-queue state/list locksTyrel Datwyler
Define per-queue locks for protecting queue state and event pool sent/free lists. The evt list lock is initially redundant but it allows the driver to be modified in the follow-up patches to relax the queue locking around submissions and completions. Link: https://lore.kernel.org/r/20210106201835.1053593-4-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Make command event pool queue specificTyrel Datwyler
There is currently a single command event pool per host. In anticipation of providing multiple queues add a per-queue event pool definition and reimplement the existing CRQ to use its queue defined event pool for command submission and completion. Link: https://lore.kernel.org/r/20210106201835.1053593-3-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Define generic queue structure for CRQsTyrel Datwyler
The primary and async CRQs are nearly identical outside of the format and length of each message entry in the dma mapped page that represents the queue data. These queues can be represented with a generic queue structure that uses a union to differentiate between message format of the mapped page. This structure will further be leveraged in a followup patcheset that introduces Sub-CRQs. Link: https://lore.kernel.org/r/20210106201835.1053593-2-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ibmvfc: Fix missing cast of ibmvfc_event pointer to u64 handleTyrel Datwyler
Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands") sets the vfcFrame correlation token to the pointer handle of the associated ibmvfc_event. However, that commit failed to cast the pointer to an appropriate type which in this case is a u64. As such sparse warnings are generated for both correlation token assignments. ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types) ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt Add the appropriate u64 casts when assigning an ibmvfc_event as a correlation token. Link: https://lore.kernel.org/r/20210106203721.1054693-1-tyreld@linux.ibm.com Fixes: 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEMRandy Dunlap
Building ufshcd-pltfrm.c on arch/s390/ has a linker error since S390 does not support IOMEM, so add a dependency on HAS_IOMEM. s390-linux-ld: drivers/scsi/ufs/ufshcd-pltfrm.o: in function `ufshcd_pltfrm_init': ufshcd-pltfrm.c:(.text+0x38e): undefined reference to `devm_platform_ioremap_resource' where that devm_ function is inside an #ifdef CONFIG_HAS_IOMEM/#endif block. Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com Link: https://lore.kernel.org/r/20210106040822.933-1-rdunlap@infradead.org Fixes: 03b1781aa978 ("[SCSI] ufs: Add Platform glue driver for ufshcd") Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: linux-scsi@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Make UPIU trace easier differentiate among CDB, OSF, and TMBean Huo
Transaction Specific Fields (TSF) in the UPIU package could be CDB (SCSI/UFS Command Descriptor Block), OSF (Opcode Specific Field), and TM I/O parameter (Task Management Input/Output Parameter). But, currently, we take all of these as CDB in the UPIU trace. Thus makes user confuse among CDB, OSF, and TM message. So fix it with this patch. Link: https://lore.kernel.org/r/20210105113446.16027-7-huobean@gmail.com Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Distinguish between TM request UPIU and response UPIU in TM UPIU ↵Bean Huo
trace Distinguish between TM request UPIU and response UPIU in TM UPIU trace, for the TM response, let TM UPIU trace print its TM response UPIU. Link: https://lore.kernel.org/r/20210105113446.16027-6-huobean@gmail.com Acked-by: Avri Altman <avri.altman@wdc.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Distinguish between query REQ and query RSP in query traceBean Huo
Currently, in the query completion trace print, since we use hba->lrb[tag].ucd_req_ptr and didn't differentiate UPIU between request and response, thus header and transaction-specific field in UPIU printed by query trace are identical. This is not very practical. As below: query_send: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00 query_complete: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00 For the failure analysis, we want to understand the real response reported by the UFS device, however, the current query trace tells us nothing. After this patch, the query trace on the query_send, and the above a pair of query_send and query_complete will be: query_send: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00 ufshcd_upiu: HDR:36 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 01 00 00 00 00 Link: https://lore.kernel.org/r/20210105113446.16027-5-huobean@gmail.com Acked-by: Avri Altman <avri.altman@wdc.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Don't call trace_ufshcd_upiu() in case trace poit is disabledBean Huo
Don't call trace_ufshcd_upiu() in case ufshba_upiu trace poit is not enabled. Link: https://lore.kernel.org/r/20210105113446.16027-4-huobean@gmail.com Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Use __print_symbolic() for UFS trace string printBean Huo
__print_symbolic() is designed for exporting the print formatting table to userspace and allows parsing tool, such as trace-cmd and perf, to analyze trace log according to this print formatting table, meanwhile, by using __print_symbolic()s, save space in the trace ring buffer. original print format: print fmt: "%s: %s: HDR:%s, CDB:%s", __get_str(str), __get_str(dev_name), __print_hex(REC->hdr, sizeof(REC->hdr)), __print_hex(REC->tsf, sizeof(REC->tsf)) after this change: print fmt: "%s: %s: HDR:%s, CDB:%s", print_symbolic(REC->str_t, {0, "send"}, {1, "complete"}, {2, "dev_complete"}, {3, "query_send"}, {4, "query_complete"}, {5, "query_complete_err"}, {6, "tm_send"}, {7, "tm_complete"}, {8, "tm_complete_err"}), __get_str(dev_name), __print_hex(REC->hdr, sizeof(REC->hdr)), __print_hex(REC->tsf, sizeof(REC->tsf)) Note: This patch just converts current __get_str(str) to __print_symbolic(), the original tracing log will not be affected by this change, so it doesn't break what current parsers expect. Link: https://lore.kernel.org/r/20210105113446.16027-3-huobean@gmail.com Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: ufs: Remove stringize operator '#' restrictionBean Huo
Current EM macro definition, we use stringize operator '#', which turns the argument it precedes into a quoted string. Thus requires the symbol of __print_symbolic() should be the string corresponding to the name of the enum. However, we have other cases, the symbol and enum name are not the same, we can redefine EM/EMe, but there will introduce some redundant codes. This patch is to remove this restriction, let others reuse the current EM/EMe definition. Link: https://lore.kernel.org/r/20210105113446.16027-2-huobean@gmail.com Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regressionArnd Bergmann
Phil Oester reported that a fix for a possible buffer overrun that I sent caused a regression that manifests in this output: Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0. Severity: Critical Message ID: PCI1308 The original code tried to handle the sense data pointer differently when using 32-bit 64-bit DMA addressing, which would lead to a 32-bit dma_addr_t value of 0x11223344 to get stored 32-bit kernel: 44 33 22 11 ?? ?? ?? ?? 64-bit LE kernel: 44 33 22 11 00 00 00 00 64-bit BE kernel: 00 00 00 00 44 33 22 11 or a 64-bit dma_addr_t value of 0x1122334455667788 to get stored as 32-bit kernel: 88 77 66 55 ?? ?? ?? ?? 64-bit kernel: 88 77 66 55 44 33 22 11 In my patch, I tried to ensure that the same value is used on both 32-bit and 64-bit kernels, and picked what seemed to be the most sensible combination, storing 32-bit addresses in the first four bytes (as 32-bit kernels already did), and 64-bit addresses in eight consecutive bytes (as 64-bit kernels already did), but evidently this was incorrect. Always storing the dma_addr_t pointer as 64-bit little-endian, i.e. initializing the second four bytes to zero in case of 32-bit addressing, apparently solved the problem for Phil, and is consistent with what all 64-bit little-endian machines did before. I also checked in the history that in previous versions of the code, the pointer was always in the first four bytes without padding, and that previous attempts to fix 64-bit user space, big-endian architectures and 64-bit DMA were clearly flawed and seem to have introduced made this worse. Link: https://lore.kernel.org/r/20210104234137.438275-1-arnd@kernel.org Fixes: 381d34e376e3 ("scsi: megaraid_sas: Check user-provided offsets") Fixes: 107a60dd71b5 ("scsi: megaraid_sas: Add support for 64bit consistent DMA") Fixes: 94cd65ddf4d7 ("[SCSI] megaraid_sas: addded support for big endian architecture") Fixes: 7b2519afa1ab ("[SCSI] megaraid_sas: fix 64 bit sense pointer truncation") Reported-by: Phil Oester <kernel@linuxace.com> Tested-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07scsi: docs: ABI: sysfs-driver-ufs: Add DeepSleep power modeAdrian Hunter
Update sysfs documentation for addition of DeepSleep power mode. Link: https://lore.kernel.org/r/20210104155026.16417-1-adrian.hunter@intel.com Fixes: fe1d4c2ebcae ("scsi: ufs: Add DeepSleep feature") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: sd: Remove obsolete variable in sd_remove()Lukas Bulwahn
Commit 996e509bbc95 ("sd: use __register_blkdev to avoid a modprobe for an unregistered dev_t") removed blk_register_region(devt, ...) in sd_remove() and since then, devt is unused in sd_remove(). Hence, make W=1 warns: drivers/scsi/sd.c:3516:8: warning: variable 'devt' set but not used [-Wunused-but-set-variable] Simply remove this obsolete variable. [mkp: fixed commit sha] Link: https://lore.kernel.org/r/20201214095424.12479-1-lukas.bulwahn@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: sd: Suppress spurious errors when WRITE SAME is being disabledEwan D. Milne
The block layer code will split a large zeroout request into multiple bios and if WRITE SAME is disabled because the storage device reports that it does not support it (or support the length used), we can get an error message from the block layer despite the setting of RQF_QUIET on the first request. This is because more than one request may have already been submitted. Fix this by setting RQF_QUIET when BLK_STS_TARGET is returned to fail the request early, we don't need to log a message because we did not actually submit the command to the device, and the block layer code will handle the error by submitting individual write bios. Link: https://lore.kernel.org/r/20201207221021.28243-1-emilne@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: scsi_debug: Fix memleak in scsi_debug_init()Dinghao Liu
When sdeb_zbc_model does not match BLK_ZONED_NONE, BLK_ZONED_HA or BLK_ZONED_HM, we should free sdebug_q_arr to prevent memleak. Also there is no need to execute sdebug_erase_store() on failure of sdeb_zbc_model_str(). Link: https://lore.kernel.org/r/20201226061503.20050-1-dinghao.liu@zju.edu.cn Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"Colin Ian King
There is a spelling mistake in the Kconfig help text. Fix it. Link: https://lore.kernel.org/r/20201217172019.57768-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: qedi: Correct max length of CHAP secretNilesh Javali
The CHAP secret displayed garbage characters causing iSCSI login authentication failure. Correct the CHAP password max length. Link: https://lore.kernel.org/r/20201217105144.8055-1-njavali@marvell.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: Correct the LUN used in eh_device_reset_handler() callbackCan Guo
Users can initiate resets to specific SCSI device/target/host through IOCTL. When this happens, the SCSI cmd passed to eh_device/target/host _reset_handler() callbacks is initialized with a request whose tag is -1. In this case it is not right for eh_device_reset_handler() callback to count on the LUN get from hba->lrb[-1]. Fix it by getting LUN from the SCSI device associated with the SCSI cmd. Link: https://lore.kernel.org/r/1609157080-26283-1-git-send-email-cang@codeaurora.org Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: ufs-exynos: Apply vendor-specific values for three timeoutsKiwoong Kim
Set optimized values for the following timeouts: - FC0_PROTECTION_TIMER - TC0_REPLAY_TIMER - AFC0_REQUEST_TIMER Exynos doesn't yet use traffic class #1. Link: https://lore.kernel.org/r/a0ff44f665a4f31d2f945fd71de03571204c576c.1608513782.git.kwmad.kim@samsung.com Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: Add a quirk to permit overriding UniPro defaultsKiwoong Kim
The UniPro specification states that attribute IDs of the following parameters are vendor-specific so some SoCs could have no regions at the defined addresses: - DME_LocalFC0ProtectionTimeOutVal - DME_LocalTC0ReplayTimeOutVal - DME_LocalAFC0ReqTimeOutVal In addition, the following parameters should be set considering the compatibility between host and device. - PA_PWRMODEUSERDATA0 - PA_PWRMODEUSERDATA1 - PA_PWRMODEUSERDATA2 - PA_PWRMODEUSERDATA3 - PA_PWRMODEUSERDATA4 - PA_PWRMODEUSERDATA5 Introduce a quirk to allow vendor drivers to override the UniPro defaults. Link: https://lore.kernel.org/r/1fedd3dea0ccc980913a5995a10510d86a5b01b9.1608513782.git.kwmad.kim@samsung.com Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: Relocate flush of exceptional eventKiwoong Kim
The current flush location does not guarantee disabling BKOPS for the case of requesting device power off. 1) The exceptional event handler is queued 2) ufs suspend starts with a request of device power off 3) BKOPS is disabled in ufs suspend 4) The queued work for the handler is done and BKOPS is re-enabled Relocate the flush statement to ensure BKOPS remain disabled. Link: https://lore.kernel.org/r/1608360039-16390-1-git-send-email-kwmad.kim@samsung.com Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRLStanley Chu
Flush during hibern8 is sufficient on MediaTek platforms, thus enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling fWriteBoosterBufferFlush during WriteBooster initialization. Link: https://lore.kernel.org/r/20201222072928.32328-1-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRLStanley Chu
UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL is intended to skip enabling fWriteBoosterBufferFlushEn while WriteBooster is initializing. Therefore it is better to apply the checking during WriteBooster initialization only. Link: https://lore.kernel.org/r/20201222072905.32221-3-stanley.chu@mediatek.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-05scsi: ufs: Fix possible power drain during system suspendStanley Chu
Currently if device needs to do flush or BKOP operations, the device VCC power is kept during runtime-suspend period. However, if system suspend is happening while device is runtime-suspended, such power may not be disabled successfully. The reasons may be, 1. If current PM level is the same as SPM level, device will keep runtime-suspended by ufshcd_system_suspend(). 2. Flush recheck work may not be scheduled successfully during system suspend period. If it can wake up the system, this is also not the intention of the recheck work. To fix this issue, simply runtime-resume the device if the flush is allowed during runtime suspend period. Flush capability will be disabled while leaving runtime suspend, and also not be allowed in system suspend period. Link: https://lore.kernel.org/r/20201222072905.32221-2-stanley.chu@mediatek.com Fixes: 51dd905bd2f6 ("scsi: ufs: Fix WriteBooster flush during runtime suspend") Reviewed-by: Chaotian Jing <chaotian.jing@mediatek.com> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-04Merge branch '5.11/scsi-postmerge' into 5.11/scsi-fixesMartin K. Petersen
Merge two commits that had dependencies on other 5.11 trees (the block and the irq trees respectively). - We reverted a megaraid_sas change in 5.10 due to missing block layer plumbing. Now that this is in place, reinstate the change. - The hisi_sas driver had a dependency on a driver core irq change that went in through Thomas' tree. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-03Linux 5.11-rc2v5.11-rc2Linus Torvalds
2021-01-02Merge tag 's390-5.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 cleanups from Vasily Gorbik: "Update defconfigs and sort config select list" * tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/Kconfig: sort config S390 select list once again s390: update defconfigs
2021-01-02Merge tag 'pm-5.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a crash in intel_pstate during resume from suspend-to-RAM that may occur after recent changes and two resource leaks in error paths in the operating performance points (OPP) framework, add a new C-states table to intel_idle and update the cpuidle MAINTAINERS entry to cover the governors too. Specifics: - Fix recently introduced crash in the intel_pstate driver that occurs if scale-invariance is disabled during resume from suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR values made by the platform firmware (Rafael Wysocki). - Fix a memory leak and add a missing clk_put() in error paths in the OPP framework (Quanyang Wang, Viresh Kumar). - Add new C-states table for SnowRidge processors to the intel_idle driver (Artem Bityutskiy). - Update the MAINTAINERS entry for cpuidle to make it clear that the governors are covered by it too (Lukas Bulwahn)" * tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: add SnowRidge C-state table cpufreq: intel_pstate: Fix fast-switch fallback path opp: Call the missing clk_put() on error opp: fix memory leak in _allocate_opp_table MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
2021-01-02Merge branches 'pm-cpufreq' and 'pm-cpuidle'Rafael J. Wysocki
* pm-cpufreq: cpufreq: intel_pstate: Fix fast-switch fallback path * pm-cpuidle: intel_idle: add SnowRidge C-state table MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK