Age | Commit message (Collapse) | Author |
|
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.
Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.
The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.
Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link
resulted in an Oops in lpfc_nvmet_host_release.
The fix requires lpfc NVMET to maintain an additional reference on any node
structure that acts as the hosthandle for the NVMET transport. This
reference get is a one-time addition, is taken prior to the upcall of an
unsolicited LS_REQ, and is released when the NVMET transport releases the
hosthandle during the host_release downcall.
Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When with testing with large numbers of npiv vports and link bounces, the
driver is flooding the messages file, even with log_verbose = 0.
The new LOG_TRACE_EVENT messages are still generating events to the
messages files.
Fix by converting the vport create msg from LOG_TRACE_EVENT to LOG_VPORT.
Link: https://lore.kernel.org/r/20210104180240.46824-12-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If a mailbox command times out, the SLI port is deemed in error and the
port is reset. The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine. The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.
Fix by reworking the mailbox handler to:
- After handling the mailbox error, detect the board is already in
failure (may be due to another error), and leave cleanup to the
other handler.
- If the mailbox command timeout is initial detector of the port error,
continue with the board cleanup and marking the adapter offline
(!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
reset adapter routine that is subsequently invoked, will clean up the
I/Os.
- Have the reset adapter routine flush all NVMe and SCSI I/Os if the
adapter has been marked failed (!SLI_ACTIVE).
- Rework the NVMe I/O terminate routine to take a status code to fail the
I/O with and update so that cleaned up I/O calls the wqe completion
routine. Currently it is bypassing the wqe cleanup and calling the NVMe
I/O completion directly. The wqe completion routine will take care of
data structure and node cleanup then call the NVMe I/O completion
handler.
Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Target reset is failed by the target as an invalid command.
The Target Reset TMF has been obsoleted in T10 for a while, but continues
to be used. On (newer) devices, the TMF is rejected causing the reset
handler to escalate to adapter resets.
Fix by having Target Reset TMF rejections be translated into a LOGO and
re-PLOGI with the target device. This provides the same semantic action
(although, if the device also supports nvme traffic, it will terminate nvme
traffic as well - but it's still recoverable).
Link: https://lore.kernel.org/r/20210104180240.46824-10-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A successful task mgmt command is logging errors, making it look like
problems were encountered. This is due to log messages for the
device/target and bus reset handlers having the LOG_TRACE_EVENT flag set.
Fix by adjusting the event flag such that the call to the logging routine
only receives a LOG_TRACE_EVENT if a prior call actually failed.
Link: https://lore.kernel.org/r/20210104180240.46824-9-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In the lpfc offline routine, called for various reasons such as sysfs
attribute, driver unload, or port error, the driver is calling
__lpfc_cpuhp_remove() to destroy the hot plug data. If the offline routine
is called while the driver is in the process of being unloaded, a request
using lpfc_cpuhp_remove() is also made from lpfc_sli4_hba_unset(). The
cpuhp elements are no longer valid when the second removal request is made.
Fix by only calling the cpuhp removal once when the adapter is in the
process of unloading.
Link: https://lore.kernel.org/r/20210104180240.46824-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If the port is configured for NVME and has any outstanding IOs when a FW
reset is requesteed, outstanding I/Os are not properly cleaned up. This
causes the fw download request to fail.
Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be
manually flushed by the driver on port reset.
Link: https://lore.kernel.org/r/20210104180240.46824-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When lpfc generates a GEN_REQUEST wqe for the nvme LS (such as Create
Association), the timeout is set to R_A_TOV without regard to the timeout
value supplied by the nvme-fc transport. The driver should be setting the
timeout to the value passed into the routine. Additionally the caller
should be setting the timeout value to the value in the ls request set by
the nvme transport. Instead, it unconditionally is setting it to a driver
defined value. So the driver actually overrode the value twice.
Fix by using the timeout provided to the routine, and for the caller, set
the timeout to the ls request timeout value.
Link: https://lore.kernel.org/r/20210104180240.46824-6-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport. This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.
Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event. The driver cannot count on any persistence from fabric controller
nodes.
Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Testing with target ports coming and going, the driver eventually reached a
state where it no longer discovered the target. When the driver has issued
a PRLI and receives a PRLI from the target, it is not properly updating the
node's initiator/target role flags. Thus, when a subsequent RSCN is
received for a target loss, the driver mis-identifies the target as an
initiator and does not initiate LUN scanning.
Fix by always refreshing the ndlp with the latest PRLI state information
whenever a PRLI is processed. Also clear the ndlp flags when processing a
PLOGI so that there is no carry over through a re-login.
Link: https://lore.kernel.org/r/20210104180240.46824-4-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A very long time ago, there was a feature: auto sli mode. It gave the user
the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in,
or even force SLI2 mode if configured. Because of the convoluted logic,
the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should
have been called only once. Additionally, the driver no longer supports
SLI-2, so only SLI-3 mode should be allowed.
The following changes were made:
- Force module parameter to SLI3 only.
- Rip out redundant CONFIG_PORT mbox commands.
- Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
- Added changes for offline to online behavior
Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Under some pt2pt situations, the other end of the link may issue a LOGO
after successfully completing PLOGI and assigning addresses to the port.
Thus the driver may attempt a new PLOGI to re-create the login, but the
LOGO handling cleared the address back to 0. Once this happens, the other
end, which may be address 0, gets all confused and this cannot be resolved
without an administrative action to bounce the link.
Fix by assuming that address assignment only occurs on the 1st PLOGI after
link up, and regardless of login state, the address assignment sticks. The
FC standards aren't particularly clear in this situation (it only describes
initial PLOGI), but there is nothing that contradicts this and behaviors on
the devices tested appears to conform to the understanding.
Thus, don't reset the port address to 0 as part of LOGO handling. Port
addresses will only reset on link down.
Link: https://lore.kernel.org/r/20210104180240.46824-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Now that the driver always uses managed interrupts, delete
auto_affine_msi_experimental module param.
Link: https://lore.kernel.org/r/1609763622-34119-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When non-fatal error like line-reset happens, ufshcd_err_handler() starts
to abort tasks by ufshcd_try_to_abort_task(). When it tries to issue a task
management request, we hit two warnings:
WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c
After fixing the above warnings we hit another tm_cmd timeout which may be
caused by unstable controller state:
__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
Then, ufshcd_err_handler() enters full reset, and kernel gets stuck. It
turned out ufshcd_print_trs() printed too many messages on console which
requires CPU locks. Likewise hba->silence_err_logs, we need to avoid too
verbose messages. This is actually not an error case.
Link: https://lore.kernel.org/r/20210107185316.788815-3-jaegeuk@kernel.org
Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When gate_work/ungate_work experience an error during hibern8_enter or exit
we can livelock:
ufshcd_err_handler()
ufshcd_scsi_block_requests()
ufshcd_reset_and_restore()
ufshcd_clear_ua_wluns() -> stuck
ufshcd_scsi_unblock_requests()
In order to avoid this, ufshcd_clear_ua_wluns() can be called per recovery
flows such as suspend/resume, link_recovery, and error_handler.
Link: https://lore.kernel.org/r/20210107185316.788815-2-jaegeuk@kernel.org
Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
sprintf and snprintf may cause output defect in sysfs content, it is better
to use new added sysfs_emit function which knows the size of the temporary
buffer.
Link: https://lore.kernel.org/r/20210106211541.23039-1-huobean@gmail.com
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use consistent and expected indentation for all Kconfig text.
Link: https://lore.kernel.org/r/20210106205554.18082-1-rdunlap@infradead.org
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver's queuecommand routine is still wrapped to hold the host lock
for the duration of the call. This will become problematic when moving to
multiple queues due to the lock contention preventing asynchronous
submissions to mulitple queues. There is no real legitimate reason to hold
the host lock, and previous patches have insured proper protection of
moving ibmvfc_event objects between free and sent lists.
Link: https://lore.kernel.org/r/20210106201835.1053593-6-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Drain the command queue and place all commands on a completion list.
Perform command completion on that list outside the host/queue locks.
Further, move purged command compeletions outside the host_lock as well.
Link: https://lore.kernel.org/r/20210106201835.1053593-5-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Define per-queue locks for protecting queue state and event pool sent/free
lists. The evt list lock is initially redundant but it allows the driver to
be modified in the follow-up patches to relax the queue locking around
submissions and completions.
Link: https://lore.kernel.org/r/20210106201835.1053593-4-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is currently a single command event pool per host. In anticipation of
providing multiple queues add a per-queue event pool definition and
reimplement the existing CRQ to use its queue defined event pool for
command submission and completion.
Link: https://lore.kernel.org/r/20210106201835.1053593-3-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The primary and async CRQs are nearly identical outside of the format and
length of each message entry in the dma mapped page that represents the
queue data. These queues can be represented with a generic queue structure
that uses a union to differentiate between message format of the mapped
page.
This structure will further be leveraged in a followup patcheset that
introduces Sub-CRQs.
Link: https://lore.kernel.org/r/20210106201835.1053593-2-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands")
sets the vfcFrame correlation token to the pointer handle of the associated
ibmvfc_event. However, that commit failed to cast the pointer to an
appropriate type which in this case is a u64. As such sparse warnings are
generated for both correlation token assignments.
ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types)
ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val
ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt
Add the appropriate u64 casts when assigning an ibmvfc_event as a
correlation token.
Link: https://lore.kernel.org/r/20210106203721.1054693-1-tyreld@linux.ibm.com
Fixes: 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Building ufshcd-pltfrm.c on arch/s390/ has a linker error since S390 does
not support IOMEM, so add a dependency on HAS_IOMEM.
s390-linux-ld: drivers/scsi/ufs/ufshcd-pltfrm.o: in function `ufshcd_pltfrm_init':
ufshcd-pltfrm.c:(.text+0x38e): undefined reference to `devm_platform_ioremap_resource'
where that devm_ function is inside an #ifdef CONFIG_HAS_IOMEM/#endif
block.
Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com
Link: https://lore.kernel.org/r/20210106040822.933-1-rdunlap@infradead.org
Fixes: 03b1781aa978 ("[SCSI] ufs: Add Platform glue driver for ufshcd")
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: linux-scsi@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Transaction Specific Fields (TSF) in the UPIU package could be CDB
(SCSI/UFS Command Descriptor Block), OSF (Opcode Specific Field), and TM
I/O parameter (Task Management Input/Output Parameter). But, currently, we
take all of these as CDB in the UPIU trace. Thus makes user confuse among
CDB, OSF, and TM message. So fix it with this patch.
Link: https://lore.kernel.org/r/20210105113446.16027-7-huobean@gmail.com
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
trace
Distinguish between TM request UPIU and response UPIU in TM UPIU trace, for
the TM response, let TM UPIU trace print its TM response UPIU.
Link: https://lore.kernel.org/r/20210105113446.16027-6-huobean@gmail.com
Acked-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently, in the query completion trace print, since we use
hba->lrb[tag].ucd_req_ptr and didn't differentiate UPIU between request and
response, thus header and transaction-specific field in UPIU printed by
query trace are identical. This is not very practical. As below:
query_send: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00
query_complete: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00
For the failure analysis, we want to understand the real response reported
by the UFS device, however, the current query trace tells us nothing. After
this patch, the query trace on the query_send, and the above a pair of
query_send and query_complete will be:
query_send: HDR:16 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00
ufshcd_upiu: HDR:36 00 00 0e 00 81 00 00 00 00 00 00, CDB:06 0e 03 00 00 00 00 00 00 00 00 01 00 00 00 00
Link: https://lore.kernel.org/r/20210105113446.16027-5-huobean@gmail.com
Acked-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Don't call trace_ufshcd_upiu() in case ufshba_upiu trace poit is not
enabled.
Link: https://lore.kernel.org/r/20210105113446.16027-4-huobean@gmail.com
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
__print_symbolic() is designed for exporting the print formatting table to
userspace and allows parsing tool, such as trace-cmd and perf, to analyze
trace log according to this print formatting table, meanwhile, by using
__print_symbolic()s, save space in the trace ring buffer.
original print format:
print fmt: "%s: %s: HDR:%s, CDB:%s", __get_str(str), __get_str(dev_name),
__print_hex(REC->hdr, sizeof(REC->hdr)),
__print_hex(REC->tsf, sizeof(REC->tsf))
after this change:
print fmt: "%s: %s: HDR:%s, CDB:%s",
print_symbolic(REC->str_t, {0, "send"},
{1, "complete"},
{2, "dev_complete"},
{3, "query_send"},
{4, "query_complete"},
{5, "query_complete_err"},
{6, "tm_send"},
{7, "tm_complete"},
{8, "tm_complete_err"}),
__get_str(dev_name), __print_hex(REC->hdr, sizeof(REC->hdr)),
__print_hex(REC->tsf, sizeof(REC->tsf))
Note: This patch just converts current __get_str(str) to __print_symbolic(),
the original tracing log will not be affected by this change, so it
doesn't break what current parsers expect.
Link: https://lore.kernel.org/r/20210105113446.16027-3-huobean@gmail.com
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Current EM macro definition, we use stringize operator '#', which turns the
argument it precedes into a quoted string. Thus requires the symbol of
__print_symbolic() should be the string corresponding to the name of the
enum.
However, we have other cases, the symbol and enum name are not the same, we
can redefine EM/EMe, but there will introduce some redundant codes. This
patch is to remove this restriction, let others reuse the current EM/EMe
definition.
Link: https://lore.kernel.org/r/20210105113446.16027-2-huobean@gmail.com
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Phil Oester reported that a fix for a possible buffer overrun that I sent
caused a regression that manifests in this output:
Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0.
Severity: Critical
Message ID: PCI1308
The original code tried to handle the sense data pointer differently when
using 32-bit 64-bit DMA addressing, which would lead to a 32-bit dma_addr_t
value of 0x11223344 to get stored
32-bit kernel: 44 33 22 11 ?? ?? ?? ??
64-bit LE kernel: 44 33 22 11 00 00 00 00
64-bit BE kernel: 00 00 00 00 44 33 22 11
or a 64-bit dma_addr_t value of 0x1122334455667788 to get stored as
32-bit kernel: 88 77 66 55 ?? ?? ?? ??
64-bit kernel: 88 77 66 55 44 33 22 11
In my patch, I tried to ensure that the same value is used on both 32-bit
and 64-bit kernels, and picked what seemed to be the most sensible
combination, storing 32-bit addresses in the first four bytes (as 32-bit
kernels already did), and 64-bit addresses in eight consecutive bytes (as
64-bit kernels already did), but evidently this was incorrect.
Always storing the dma_addr_t pointer as 64-bit little-endian,
i.e. initializing the second four bytes to zero in case of 32-bit
addressing, apparently solved the problem for Phil, and is consistent with
what all 64-bit little-endian machines did before.
I also checked in the history that in previous versions of the code, the
pointer was always in the first four bytes without padding, and that
previous attempts to fix 64-bit user space, big-endian architectures and
64-bit DMA were clearly flawed and seem to have introduced made this worse.
Link: https://lore.kernel.org/r/20210104234137.438275-1-arnd@kernel.org
Fixes: 381d34e376e3 ("scsi: megaraid_sas: Check user-provided offsets")
Fixes: 107a60dd71b5 ("scsi: megaraid_sas: Add support for 64bit consistent DMA")
Fixes: 94cd65ddf4d7 ("[SCSI] megaraid_sas: addded support for big endian architecture")
Fixes: 7b2519afa1ab ("[SCSI] megaraid_sas: fix 64 bit sense pointer truncation")
Reported-by: Phil Oester <kernel@linuxace.com>
Tested-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update sysfs documentation for addition of DeepSleep power mode.
Link: https://lore.kernel.org/r/20210104155026.16417-1-adrian.hunter@intel.com
Fixes: fe1d4c2ebcae ("scsi: ufs: Add DeepSleep feature")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit 996e509bbc95 ("sd: use __register_blkdev to avoid a modprobe for an
unregistered dev_t") removed blk_register_region(devt, ...) in sd_remove()
and since then, devt is unused in sd_remove().
Hence, make W=1 warns:
drivers/scsi/sd.c:3516:8:
warning: variable 'devt' set but not used [-Wunused-but-set-variable]
Simply remove this obsolete variable.
[mkp: fixed commit sha]
Link: https://lore.kernel.org/r/20201214095424.12479-1-lukas.bulwahn@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The block layer code will split a large zeroout request into multiple bios
and if WRITE SAME is disabled because the storage device reports that it
does not support it (or support the length used), we can get an error
message from the block layer despite the setting of RQF_QUIET on the first
request. This is because more than one request may have already been
submitted.
Fix this by setting RQF_QUIET when BLK_STS_TARGET is returned to fail the
request early, we don't need to log a message because we did not actually
submit the command to the device, and the block layer code will handle the
error by submitting individual write bios.
Link: https://lore.kernel.org/r/20201207221021.28243-1-emilne@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When sdeb_zbc_model does not match BLK_ZONED_NONE, BLK_ZONED_HA or
BLK_ZONED_HM, we should free sdebug_q_arr to prevent memleak. Also there is
no need to execute sdebug_erase_store() on failure of sdeb_zbc_model_str().
Link: https://lore.kernel.org/r/20201226061503.20050-1-dinghao.liu@zju.edu.cn
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a spelling mistake in the Kconfig help text. Fix it.
Link: https://lore.kernel.org/r/20201217172019.57768-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The CHAP secret displayed garbage characters causing iSCSI login
authentication failure. Correct the CHAP password max length.
Link: https://lore.kernel.org/r/20201217105144.8055-1-njavali@marvell.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Users can initiate resets to specific SCSI device/target/host through
IOCTL. When this happens, the SCSI cmd passed to eh_device/target/host
_reset_handler() callbacks is initialized with a request whose tag is -1.
In this case it is not right for eh_device_reset_handler() callback to
count on the LUN get from hba->lrb[-1]. Fix it by getting LUN from the SCSI
device associated with the SCSI cmd.
Link: https://lore.kernel.org/r/1609157080-26283-1-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Set optimized values for the following timeouts:
- FC0_PROTECTION_TIMER
- TC0_REPLAY_TIMER
- AFC0_REQUEST_TIMER
Exynos doesn't yet use traffic class #1.
Link: https://lore.kernel.org/r/a0ff44f665a4f31d2f945fd71de03571204c576c.1608513782.git.kwmad.kim@samsung.com
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The UniPro specification states that attribute IDs of the following
parameters are vendor-specific so some SoCs could have no regions at the
defined addresses:
- DME_LocalFC0ProtectionTimeOutVal
- DME_LocalTC0ReplayTimeOutVal
- DME_LocalAFC0ReqTimeOutVal
In addition, the following parameters should be set considering the
compatibility between host and device.
- PA_PWRMODEUSERDATA0
- PA_PWRMODEUSERDATA1
- PA_PWRMODEUSERDATA2
- PA_PWRMODEUSERDATA3
- PA_PWRMODEUSERDATA4
- PA_PWRMODEUSERDATA5
Introduce a quirk to allow vendor drivers to override the UniPro defaults.
Link: https://lore.kernel.org/r/1fedd3dea0ccc980913a5995a10510d86a5b01b9.1608513782.git.kwmad.kim@samsung.com
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The current flush location does not guarantee disabling BKOPS for the case
of requesting device power off.
1) The exceptional event handler is queued
2) ufs suspend starts with a request of device power off
3) BKOPS is disabled in ufs suspend
4) The queued work for the handler is done and BKOPS is re-enabled
Relocate the flush statement to ensure BKOPS remain disabled.
Link: https://lore.kernel.org/r/1608360039-16390-1-git-send-email-kwmad.kim@samsung.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Flush during hibern8 is sufficient on MediaTek platforms, thus enable
UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling
fWriteBoosterBufferFlush during WriteBooster initialization.
Link: https://lore.kernel.org/r/20201222072928.32328-1-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL is intended to skip enabling
fWriteBoosterBufferFlushEn while WriteBooster is initializing. Therefore
it is better to apply the checking during WriteBooster initialization only.
Link: https://lore.kernel.org/r/20201222072905.32221-3-stanley.chu@mediatek.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently if device needs to do flush or BKOP operations, the device VCC
power is kept during runtime-suspend period.
However, if system suspend is happening while device is runtime-suspended,
such power may not be disabled successfully.
The reasons may be,
1. If current PM level is the same as SPM level, device will keep
runtime-suspended by ufshcd_system_suspend().
2. Flush recheck work may not be scheduled successfully during system
suspend period. If it can wake up the system, this is also not the
intention of the recheck work.
To fix this issue, simply runtime-resume the device if the flush is allowed
during runtime suspend period. Flush capability will be disabled while
leaving runtime suspend, and also not be allowed in system suspend period.
Link: https://lore.kernel.org/r/20201222072905.32221-2-stanley.chu@mediatek.com
Fixes: 51dd905bd2f6 ("scsi: ufs: Fix WriteBooster flush during runtime suspend")
Reviewed-by: Chaotian Jing <chaotian.jing@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Merge two commits that had dependencies on other 5.11 trees (the block
and the irq trees respectively).
- We reverted a megaraid_sas change in 5.10 due to missing block
layer plumbing. Now that this is in place, reinstate the change.
- The hisi_sas driver had a dependency on a driver core irq change
that went in through Thomas' tree.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 cleanups from Vasily Gorbik:
"Update defconfigs and sort config select list"
* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/Kconfig: sort config S390 select list once again
s390: update defconfigs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
|
|
* pm-cpufreq:
cpufreq: intel_pstate: Fix fast-switch fallback path
* pm-cpuidle:
intel_idle: add SnowRidge C-state table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
|