Age | Commit message (Collapse) | Author |
|
The pointer to device can be obtained from ufs->hba->dev, remove
superfluous function parameter.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20241031150033.3440894-3-peter.griffin@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Remove empty method. When the method is not set, the call is not made,
saving a few cycles.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20241031150033.3440894-2-peter.griffin@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The previous patch in this series introduced identical code in both
branches of an if-statement. Move that code outside the if-statement.
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-12-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Whether or not MCQ is used, call scsi_add_host() from
ufshcd_add_scsi_host(). For MCQ this patch swaps the order of the
scsi_add_host() and UFS device initialization. This patch prepares for
combining the two scsi_add_host() calls.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-11-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Previous changes guarantee that hba->scsi_host_added is true before
ufshcd_device_init() is called. Hence, remove the code from
ufshcd_device_init() that depends on hba->scsi_host_added being false.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-10-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Expand the ufshcd_device_init(hba, true) call and remove all code that
depends on init_dev_params == false. This change prepares for combining
the two scsi_add_host() calls.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-9-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
ufshcd_async_scan() is called (asynchronously) only by ufshcd_init().
Move the ufshcd_device_init(hba, true) call from ufshcd_async_scan()
into ufshcd_init(). This patch prepares for moving both scsi_add_host()
calls into ufshcd_add_scsi_host(). Calling ufshcd_device_init() from
ufshcd_init() without holding hba->host_sem is safe. This is safe
because hba->host_sem serializes core code and sysfs callbacks. The
ufshcd_device_init() call is moved before the scsi_add_host() call and
hence happens before any SCSI sysfs attributes are created.
Since ufshcd_device_init() may call scsi_add_host(), only call
scsi_add_host() from ufshcd_add_scsi_host() if the SCSI host has not yet
been added by ufshcd_device_init().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-8-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Move the ufshcd_device_init() and ufshcd_process_hba_result() calls to
the ufshcd_probe_hba() callers. This change refactors the code without
modifying the behavior of the UFSHCI driver. This change prepares for
moving one ufshcd_device_init() call into ufshcd_init().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The comment /* UFSHCD_QUIRK_REINIT_AFTER_MAX_GEAR_SWITCH is set */ is
only correct if ufshcd_device_init() is only called by
ufshcd_probe_hba(). Convert the comment into an explicit check. This
patch prepares for moving the ufshcd_device_init() calls.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-6-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Prepare for moving a ufshcd_device_init() call from inside
ufshcd_probe_hba() into the ufshcd_probe_hba() callers by introducing
the function ufshcd_process_probe_result(). No functionality has been
changed.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Call ufshcd_add_scsi_host() after host controller initialization has
completed. This is safe because no code between the old and new
ufshcd_add_scsi_host() call site depends on the scsi_add_host() call.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Prepare for inlining one ufshcd_device_init() call by introducing the
new function ufshcd_post_device_init(). No functionality has been
changed.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Move the code for adding a SCSI host and also the code for managing TMF
tags from ufshcd_init() into a new function called
ufshcd_add_scsi_host(). This patch prepares for combining the two
scsi_add_host() calls into a single call. No functionality has been
changed.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241016201249.2256266-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is no need to serialize single read/write calls to the host
controller registers. Remove the redundant host_lock calls that protect
access to the request list cLear register: UTRLCLR.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241024075033.562562-4-avri.altman@wdc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is no need to serialize single read/write calls to the host
controller registers. Remove the redundant host_lock calls that protect
access to the task management request List cLear register: UTMRLCLR.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241024075033.562562-3-avri.altman@wdc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is no need to serialize single read/write calls to the host
controller registers. Remove the redundant host_lock calls that protect
access to the task management doorbell register: UTMRLDBR.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241024075033.562562-2-avri.altman@wdc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
From the UFSHCI specification: "CleanUp Command Return Code (RTC): host
controller sets this return code to provide more details of the cleanup
process. It is valid only when CUS is 1." Hence, do not read RTC if the
CUS bitfield is zero.
Cc: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use blk_mq_quiesce_tagset() instead of ufshcd_scsi_block_requests() and
blk_mq_wait_quiesce_done(). Since this patch removes the last callers of
ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests(), remove
these functions.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-6-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests()
calls were introduced in ufshcd_exception_event_handler() to prevent
that querying the exception event information would time out. Commit
10fe5888a40e ("scsi: ufs: increase the scsi query response timeout")
increased the timeout for querying exception information from 30 ms to
1.5 s and thereby eliminated the risk that a timeout would happen.
Hence, the calls to block and unblock SCSI requests are superfluous.
Remove these calls.
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Tested-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The MCQ code is also valid for legacy mode. Hence, remove the legacy
mode code and retain the MCQ code. Since it is not an error if a command
completes while ufshcd_try_to_abort_task() is in progress, use
dev_info() instead of dev_err() to report this.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The only statement that follows the 'out:' label in
ufshcd_try_to_abort_task() is a return-statement. Simplify this function
by changing 'goto out' statements into return statements.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Move the ufshcd_mcq_enable_esi() definition such that it occurs
immediately before the ufshcd_mcq_config_esi() definition.
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241022193130.2733293-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace UFSHCD_QUIRK_BROKEN_64BIT_ADDRESS with
ufs_hba_variant_ops::set_dma_mask. Update the Renesas driver
accordingly. This patch enables supporting other configurations than
32-bit or 64-bit DMA addresses, e.g. 36-bit DMA addresses.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241018194753.775074-1-bvanassche@acm.org
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If ufshcd_rtc_work calls ufshcd_rpm_put_sync() and the pm's usage_count
is 0, we will enter the runtime suspend callback. However, the runtime
suspend callback will wait to flush ufshcd_rtc_work, causing a deadlock.
Replace ufshcd_rpm_put_sync() with ufshcd_rpm_put() to avoid the
deadlock.
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
Cc: stable@vger.kernel.org #6.11.x
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20241024015453.21684-1-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If the sg_copy_buffer() call returns less than sdebug_sector_size, then
we drop out of the copy loop. However, we still report that we copied
the full expected amount, which is not proper.
Fix by keeping a running total and return that value.
Fixes: 84f3a3c01d70 ("scsi: scsi_debug: Atomic write support")
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241018101655.4207-1-john.g.garry@oracle.com
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The current so called "inner loop" in ufshcd_hba_execute_hce() is open
coding ufshcd_wait_for_register(). Replace it by
ufshcd_wait_for_register(). This is a code simplification - no
functional change.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241016102141.441382-1-avri.altman@wdc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Performance problems may occur if there is a problem with the asymmetric
connected lane such as h/w failure. Currently, only check connected
lane for rx/tx is checked if it is not 0. But it should also be checked
if it is asymmetrically connected.
Signed-off-by: SEO HOYOUNG <hy50.seo@samsung.com>
Link: https://lore.kernel.org/r/e82b4b65b5f6501a687c624dd06e5c362e160f32.1728544727.git.hy50.seo@samsung.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Yihang Li <liyihang9@huawei.com> says:
This series contains some fixes including:
- Adjust priority of registering and exiting debugfs for security;
- Create trigger_dump at the end of the debugfs initialization;
- Add firmware information check;
- Enable all PHYs that are not disabled by user during controller reset;
- Reset PHY again if phyup timeout;
- Check usage count only when the runtime PM status is RPM_SUSPENDING;
- Add cond_resched() for no forced preemption model;
- Default enable interrupt coalescing;
- Update disk locked timeout to 7 seconds;
- Add time interval between two H2D FIS following soft reset spec;
- Update v3 hw STP_LINK_TIMER setting;
- Create all dump files during debugfs initialization;
- Add latest_dump for the debugfs dump;
Link: https://lore.kernel.org/r/20241008021822.2617339-1-liyihang9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Before that, after the user triggers the dump, the latest dump
information can be viewed in the directory with the maximum number in
the dump directory.
After this series patch, the driver creates all debugfs directories and
files during initialization. Therefore, users cannot know the directory
where the latest dump information is stored. So, add latest_dump file to
notify users where the latest dump information is stored.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-14-liyihang9@huawei.com
Reviewed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For the current debugfs of hisi_sas, after user triggers dump, the
driver allocate memory space to save the register information and create
debugfs files to display the saved information. In this process, the
debugfs files created after each dump.
Therefore, when the dump is triggered while the driver is unbind, the
following hang occurs:
[67840.853907] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[67840.862947] Mem abort info:
[67840.865855] ESR = 0x0000000096000004
[67840.869713] EC = 0x25: DABT (current EL), IL = 32 bits
[67840.875125] SET = 0, FnV = 0
[67840.878291] EA = 0, S1PTW = 0
[67840.881545] FSC = 0x04: level 0 translation fault
[67840.886528] Data abort info:
[67840.889524] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[67840.895117] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[67840.900284] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[67840.905709] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002803a1f000
[67840.912263] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000
[67840.919177] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[67840.996435] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[67841.003628] pc : down_write+0x30/0x98
[67841.007546] lr : start_creating.part.0+0x60/0x198
[67841.012495] sp : ffff8000b979ba20
[67841.016046] x29: ffff8000b979ba20 x28: 0000000000000010 x27: 0000000000024b40
[67841.023412] x26: 0000000000000012 x25: ffff20202b355ae8 x24: ffff20202b35a8c8
[67841.030779] x23: ffffa36877928208 x22: ffffa368b4972240 x21: ffff8000b979bb18
[67841.038147] x20: ffff00281dc1e3c0 x19: fffffffffffffffe x18: 0000000000000020
[67841.045515] x17: 0000000000000000 x16: ffffa368b128a530 x15: ffffffffffffffff
[67841.052888] x14: ffff8000b979bc18 x13: ffffffffffffffff x12: ffff8000b979bb18
[67841.060263] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa368b1289b18
[67841.067640] x8 : 0000000000000012 x7 : 0000000000000000 x6 : 00000000000003a9
[67841.075014] x5 : 0000000000000000 x4 : ffff002818c5cb00 x3 : 0000000000000001
[67841.082388] x2 : 0000000000000000 x1 : ffff002818c5cb00 x0 : 00000000000000a0
[67841.089759] Call trace:
[67841.092456] down_write+0x30/0x98
[67841.096017] start_creating.part.0+0x60/0x198
[67841.100613] debugfs_create_dir+0x48/0x1f8
[67841.104950] debugfs_create_files_v3_hw+0x88/0x348 [hisi_sas_v3_hw]
[67841.111447] debugfs_snapshot_regs_v3_hw+0x708/0x798 [hisi_sas_v3_hw]
[67841.118111] debugfs_trigger_dump_v3_hw_write+0x9c/0x120 [hisi_sas_v3_hw]
[67841.125115] full_proxy_write+0x68/0xc8
[67841.129175] vfs_write+0xd8/0x3f0
[67841.132708] ksys_write+0x70/0x108
[67841.136317] __arm64_sys_write+0x24/0x38
[67841.140440] invoke_syscall+0x50/0x128
[67841.144385] el0_svc_common.constprop.0+0xc8/0xf0
[67841.149273] do_el0_svc+0x24/0x38
[67841.152773] el0_svc+0x38/0xd8
[67841.156009] el0t_64_sync_handler+0xc0/0xc8
[67841.160361] el0t_64_sync+0x1a4/0x1a8
[67841.164189] Code: b9000882 d2800002 d2800023 f9800011 (c85ffc05)
[67841.170443] ---[ end trace 0000000000000000 ]---
To fix this issue, create all directories and files during debugfs
initialization. In this way, the driver only needs to allocate memory
space to save information each time the user triggers dumping.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-13-liyihang9@huawei.com
Reviewed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
At present, it is found that some SATA HDD disks may continue to return
the HOLD primitive for more than 500ms when they are busy writing data,
which is more likely to trigger an STP link timeout exception. Now
Modify STP link timer from 500ms to the maximum value of 1.048575s.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-12-liyihang9@huawei.com
Reviewed-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Spec says at least 5us between two H2D FIS when do soft reset, but be
generous and sleep for about 1ms.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-11-liyihang9@huawei.com
Reviewed-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The SATA disk will be locked after the disk sends the DMA Setup frame
until all data frame transmission is completed. The
CFG_ICT_TIMER_STEP_TRSH register is used for sata disk to configure the
step size of the timer which records the time when the disk is
locked. The unit is 1us and the default step size is 150ms. If the disk
is locked for more than 7 timer steps, the io to be sent to the disk
will end abnormally.
The current timeout is only about 1 second, it is easy to trigger IO
abnormal end when the SATA hard disk returns data slowly. Adjust the
timeout to 7 seconds based on ERC time of most disks.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-10-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In the current interrupt reporting mode, each CQ entry reports an
interrupt. However, when there are a large number of I/O hardware
completion interrupts, the following issue may occur:
[ 4682.678657][ C129] irq 134: nobody cared (try booting with the "irqpoll" option)
[ 4682.708455][ C129] Call trace:
[ 4682.711589][ C129] dump_backtrace+0x0/0x1e4
[ 4682.715934][ C129] show_stack+0x20/0x2c
[ 4682.719933][ C129] dump_stack+0xd8/0x140
[ 4682.724017][ C129] __report_bad_irq+0x54/0x180
[ 4682.728625][ C129] note_interrupt+0x1ec/0x2f0
[ 4682.733143][ C129] handle_irq_event+0x118/0x1ac
[ 4682.737834][ C129] handle_fasteoi_irq+0xc8/0x200
[ 4682.742613][ C129] __handle_domain_irq+0x84/0xf0
[ 4682.747391][ C129] gic_handle_irq+0x88/0x2c0
[ 4682.751822][ C129] el1_irq+0xbc/0x140
[ 4682.755648][ C129] _find_next_bit.constprop.0+0x20/0x94
[ 4682.761036][ C129] cpumask_next+0x24/0x30
[ 4682.765208][ C129] gic_ipi_send_mask+0x48/0x170
[ 4682.769900][ C129] __ipi_send_mask+0x34/0x110
[ 4682.775720][ C129] smp_cross_call+0x3c/0xcc
[ 4682.780064][ C129] arch_send_call_function_single_ipi+0x38/0x44
[ 4682.786146][ C129] send_call_function_single_ipi+0xd0/0xe0
[ 4682.791794][ C129] generic_exec_single+0xb4/0x170
[ 4682.796659][ C129] smp_call_function_single_async+0x2c/0x40
[ 4682.802395][ C129] blk_mq_complete_request_remote.part.0+0xec/0x100
[ 4682.808822][ C129] blk_mq_complete_request+0x30/0x70
[ 4682.813950][ C129] scsi_mq_done+0x48/0xac
[ 4682.818128][ C129] sas_scsi_task_done+0xb0/0x150 [libsas]
[ 4682.823692][ C129] slot_complete_v3_hw+0x230/0x710 [hisi_sas_v3_hw]
[ 4682.830120][ C129] cq_thread_v3_hw+0xbc/0x190 [hisi_sas_v3_hw]
[ 4682.836114][ C129] irq_thread_fn+0x34/0xa4
[ 4682.840371][ C129] irq_thread+0xc4/0x130
[ 4682.844455][ C129] kthread+0x108/0x13c
[ 4682.848365][ C129] ret_from_fork+0x10/0x18
[ 4682.852621][ C129] handlers:
[ 4682.855577][ C129] [<00000000949e52bf>] cq_interrupt_v3_hw [hisi_sas_v3_hw] threaded [<000000005d8e3b68>] cq_thread_v3_hw [hisi_sas_v3_hw]
[ 4682.868084][ C129] Disabling IRQ #134
When the IRQ management layer processes each hardware interrupt, if the
return value of the interrupt handler is IRQ_WAKE_THREAD, it will wake
up the handler thread for this interrupt action and set IRQTF_RUNTHREAD
flag, wait for the interrupt handling thread to clear the
IRQTF_RUNTHREAD flag after execution. Later in note_interrupt(), use
irq_count to count hardware interrupts and irqs_unhandled to count
interrupts for which no thread handler is responsible. When irq_count
reaches 100000 and irqs_unhandled reaches 99000, irq will be disabled.
In the performance test scenario, I/O completion hardware interrupts are
continuously and quickly generated. As a result, the interrupt
processing thread is cyclically called in irq_thread() and does not
exit, this affects the response of the interrupt thread to the hardware
interrupt and causes irqs_unhandled to grow to 99000. Finally, the irq
is disabled.
Therefore, default enable interrupt coalescing to reduce the generation
of hardware interrupts, this helps interrupt processing threads to stop
calling in irq_thread().
For interrupt coalescing, according to the actual performance test, set
the count of CQ entries to 10 and the interrupt coalescing timeout
period to 10us based on the actual performance test.
Before and after interrupt coalescing is enabled, the 4K read/write
performance is improved by about 3%, and the 256K read/write performance
is basically the same.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-9-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For no forced preemption model kernel, in the scenario where the
expander is connected to 12 high performance SAS SSDs, the following
call trace may occur:
[ 214.409199][ C240] watchdog: BUG: soft lockup - CPU#240 stuck for 22s! [irq/149-hisi_sa:3211]
[ 214.568533][ C240] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 214.575224][ C240] pc : fput_many+0x8c/0xdc
[ 214.579480][ C240] lr : fput+0x1c/0xf0
[ 214.583302][ C240] sp : ffff80002de2b900
[ 214.587298][ C240] x29: ffff80002de2b900 x28: ffff1082aa412000
[ 214.593291][ C240] x27: ffff3062a0348c08 x26: ffff80003a9f6000
[ 214.599284][ C240] x25: ffff1062bbac5c40 x24: 0000000000001000
[ 214.605277][ C240] x23: 000000000000000a x22: 0000000000000001
[ 214.611270][ C240] x21: 0000000000001000 x20: 0000000000000000
[ 214.617262][ C240] x19: ffff3062a41ae580 x18: 0000000000010000
[ 214.623255][ C240] x17: 0000000000000001 x16: ffffdb3a6efe5fc0
[ 214.629248][ C240] x15: ffffffffffffffff x14: 0000000003ffffff
[ 214.635241][ C240] x13: 000000000000ffff x12: 000000000000029c
[ 214.641234][ C240] x11: 0000000000000006 x10: ffff80003a9f7fd0
[ 214.647226][ C240] x9 : ffffdb3a6f0482fc x8 : 0000000000000001
[ 214.653219][ C240] x7 : 0000000000000002 x6 : 0000000000000080
[ 214.659212][ C240] x5 : ffff55480ee9b000 x4 : fffffde7f94c6554
[ 214.665205][ C240] x3 : 0000000000000002 x2 : 0000000000000020
[ 214.671198][ C240] x1 : 0000000000000021 x0 : ffff3062a41ae5b8
[ 214.677191][ C240] Call trace:
[ 214.680320][ C240] fput_many+0x8c/0xdc
[ 214.684230][ C240] fput+0x1c/0xf0
[ 214.687707][ C240] aio_complete_rw+0xd8/0x1fc
[ 214.692225][ C240] blkdev_bio_end_io+0x98/0x140
[ 214.696917][ C240] bio_endio+0x160/0x1bc
[ 214.701001][ C240] blk_update_request+0x1c8/0x3bc
[ 214.705867][ C240] scsi_end_request+0x3c/0x1f0
[ 214.710471][ C240] scsi_io_completion+0x7c/0x1a0
[ 214.715249][ C240] scsi_finish_command+0x104/0x140
[ 214.720200][ C240] scsi_softirq_done+0x90/0x180
[ 214.724892][ C240] blk_mq_complete_request+0x5c/0x70
[ 214.730016][ C240] scsi_mq_done+0x48/0xac
[ 214.734194][ C240] sas_scsi_task_done+0xbc/0x16c [libsas]
[ 214.739758][ C240] slot_complete_v3_hw+0x260/0x760 [hisi_sas_v3_hw]
[ 214.746185][ C240] cq_thread_v3_hw+0xbc/0x190 [hisi_sas_v3_hw]
[ 214.752179][ C240] irq_thread_fn+0x34/0xa4
[ 214.756435][ C240] irq_thread+0xc4/0x130
[ 214.760520][ C240] kthread+0x108/0x13c
[ 214.764430][ C240] ret_from_fork+0x10/0x18
This is because in the hisi_sas driver, both the hardware interrupt
handler and the interrupt thread are executed on the same CPU. In the
performance test scenario, function irq_wait_for_interrupt() will always
return 0 if lots of interrupts occurs and the CPU will be continuously
consumed. As a result, the CPU cannot run the watchdog thread. When the
watchdog time exceeds the specified time, call trace occurs.
To fix it, add cond_resched() to execute the watchdog thread.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-8-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
RPM_SUSPENDING
Users can suspend the machine with 'echo disk > /sys/power/state', but
the suspend will fail because the SAS controller cannot be suspended:
[root@localhost ~]# echo freeze > /sys/power/state
-bash: echo: write error: Device or resource busy
[15104.142955] PM: suspend entry (s2idle)
...
[15104.283465] hisi_sas_v3_hw 0000:32:04.0: entering suspend state
[15104.283480] hisi_sas_v3_hw 0000:30:04.0: entering suspend state
[15104.283500] hisi_sas_v3_hw 0000:32:04.0: PM suspend: host status cannot be suspended
[15104.283508] hisi_sas_v3_hw 0000:30:04.0: PM suspend: host status cannot be suspended
[15104.283516] hisi_sas_v3_hw 0000:32:04.0: PM: pci_pm_suspend(): suspend_v3_hw+0x0/0x210 [hisi_sas_v3_hw] returns -16
[15104.283527] hisi_sas_v3_hw 0000:32:04.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c0 returns -16
[15104.283524] hisi_sas_v3_hw 0000:30:04.0: PM: pci_pm_suspend(): suspend_v3_hw+0x0/0x210 [hisi_sas_v3_hw] returns -16
[15104.283533] hisi_sas_v3_hw 0000:32:04.0: PM: failed to suspend async: error -16
[15104.283536] hisi_sas_v3_hw 0000:30:04.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c0 returns -16
[15104.283542] hisi_sas_v3_hw 0000:30:04.0: PM: failed to suspend async: error -16
The problem is that when the ->runtime_suspend() callback
suspend_v3_hw() is executing, the current runtime PM status is
RPM_ACTIVE and the usage count of the controller is not 0, so return
immediately.
To fix it, Check the device usage count only when the runtime PM status
is RPM_SUSPENDING.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-7-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In commit 89954f024c3a ("scsi: hisi_sas: Ensure all enabled PHYs up
during controller reset"), we enable PHYs in parallel through async
operations and wait for PHYs come up. However, for some directly
attached SATA disks, the PHY not come up after a timeout period and the
hardware is not ready. At this time, we should get the latest PHY
hardware state, if the new PHY state is not ready but the old PHY state
is ready, call work HISI_PHYE_LINK_RESET to give it another chance to
phyup.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-6-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
controller reset
For the controller reset operation(such as FLR or clear nexus ha in SCSI
EH), we will disable all PHYs and then enable PHY based on the
hisi_hba->phy_state obtained in hisi_sas_controller_reset_prepare(). If
the device is removed before controller reset or the PHY is not attached
to any device in directly attached scenario, the corresponding bit of
phy_state is not set. After controller reset done, the PHY is disabled.
The device cannot be identified even if user reconnect the disk.
Therefore, for PHYs that are not disabled by user, hisi_sas_phy_enable()
needs to be executed even if the corresponding bit of phy_state is not
set.
Fixes: 89954f024c3a ("scsi: hisi_sas: Ensure all enabled PHYs up during controller reset")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-5-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For security purposes, after information is obtained through the FW,
check information to ensure data correctness.
- In v1 and v2 hw, the maximum number of PHYs is 9, while in v3 it is 8.
- In v2 and v3 hw, the maximum number of hardware queues is 16, while
in v1 it is 32.
Also add some debug logs for failure.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-4-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In the current debugfs initialization process, the interface
trigger_dump is created first, and then the dump directory is created to
store the register dump information.
The issue is that after the trigger_dump interface is created, users can
access the interface to trigger dump and call
debugfs_create_files_v3_hw(). In debugfs_create_files_v3_hw(), if
.debugfs_dump_dentry is NULL, the file for storing dump information is
created under /sys/kernel/debug, and the memory and information cannot
be released after the driver is uninstalled.
Therefore, the creation of the trigger_dump interface is placed at the
end of debugfs initialization.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-3-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
To be safe, we should register debugfs at the last stage of driver
initialization and then unregister debugfs at the first stage of driver
uninstallation.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-2-liyihang9@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a null-ptr-deref issue reported by KASAN:
BUG: KASAN: null-ptr-deref in target_alloc_device+0xbc4/0xbe0 [target_core_mod]
...
kasan_report+0xb9/0xf0
target_alloc_device+0xbc4/0xbe0 [target_core_mod]
core_dev_setup_virtual_lun0+0xef/0x1f0 [target_core_mod]
target_core_init_configfs+0x205/0x420 [target_core_mod]
do_one_initcall+0xdd/0x4e0
...
entry_SYSCALL_64_after_hwframe+0x76/0x7e
In target_alloc_device(), if allocing memory for dev queues fails, then
dev will be freed by dev->transport->free_device(), but dev->transport
is not initialized at that time, which will lead to a null pointer
reference problem.
Fixing this bug by freeing dev with hba->backend->ops->free_device().
Fixes: 1526d9f10c61 ("scsi: target: Make state_list per CPU")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20241011113444.40749-1-wanghai38@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A sanity check on phy_mask was added in commit 3668651def2c ("scsi:
mpi3mr: Sanitise num_phys"). This causes warning messages when more than
64 phys are detected and devices connected to phys greater than 64 are
dropped.
The phy_mask bitmap is only needed for controller phys and not required
for expander phys. Controller phys can go up to a maximum of 64 and
therefore u64 is good enough to contain phy_mask bitmap.
To suppress those warnings and allow devices to be discovered as before
the offending commit, restrict the phy_mask setting and lowest phy
setting only to the controller phys.
Fixes: 3668651def2c ("scsi: mpi3mr: Sanitise num_phys")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410051943.Mp9o5DlF-lkp@intel.com/
Reported-by: Alexander Motin <mav@ixsystems.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241008074353.200379-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a history of deadlock if reboot is performed at the beginning
of booting. SDEV_QUIESCE was set for all LU's scsi_devices by UFS
shutdown, and at that time the audio driver was waiting on
blk_mq_submit_bio() holding a mutex_lock while reading the fw binary.
After that, a deadlock issue occurred while audio driver shutdown was
waiting for mutex_unlock of blk_mq_submit_bio(). To solve this, set
SDEV_OFFLINE for all LUs except WLUN, so that any I/O that comes down
after a UFS shutdown will return an error.
[ 31.907781]I[0: swapper/0: 0] 1 130705007 1651079834 11289729804 0 D( 2) 3 ffffff882e208000 * init [device_shutdown]
[ 31.907793]I[0: swapper/0: 0] Mutex: 0xffffff8849a2b8b0: owner[0xffffff882e28cb00 kworker/6:0 :49]
[ 31.907806]I[0: swapper/0: 0] Call trace:
[ 31.907810]I[0: swapper/0: 0] __switch_to+0x174/0x338
[ 31.907819]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc
[ 31.907826]I[0: swapper/0: 0] schedule+0x7c/0xe8
[ 31.907834]I[0: swapper/0: 0] schedule_preempt_disabled+0x24/0x40
[ 31.907842]I[0: swapper/0: 0] __mutex_lock+0x408/0xdac
[ 31.907849]I[0: swapper/0: 0] __mutex_lock_slowpath+0x14/0x24
[ 31.907858]I[0: swapper/0: 0] mutex_lock+0x40/0xec
[ 31.907866]I[0: swapper/0: 0] device_shutdown+0x108/0x280
[ 31.907875]I[0: swapper/0: 0] kernel_restart+0x4c/0x11c
[ 31.907883]I[0: swapper/0: 0] __arm64_sys_reboot+0x15c/0x280
[ 31.907890]I[0: swapper/0: 0] invoke_syscall+0x70/0x158
[ 31.907899]I[0: swapper/0: 0] el0_svc_common+0xb4/0xf4
[ 31.907909]I[0: swapper/0: 0] do_el0_svc+0x2c/0xb0
[ 31.907918]I[0: swapper/0: 0] el0_svc+0x34/0xe0
[ 31.907928]I[0: swapper/0: 0] el0t_64_sync_handler+0x68/0xb4
[ 31.907937]I[0: swapper/0: 0] el0t_64_sync+0x1a0/0x1a4
[ 31.908774]I[0: swapper/0: 0] 49 0 11960702 11236868007 0 D( 2) 6 ffffff882e28cb00 * kworker/6:0 [__bio_queue_enter]
[ 31.908783]I[0: swapper/0: 0] Call trace:
[ 31.908788]I[0: swapper/0: 0] __switch_to+0x174/0x338
[ 31.908796]I[0: swapper/0: 0] __schedule+0x5ec/0x9cc
[ 31.908803]I[0: swapper/0: 0] schedule+0x7c/0xe8
[ 31.908811]I[0: swapper/0: 0] __bio_queue_enter+0xb8/0x178
[ 31.908818]I[0: swapper/0: 0] blk_mq_submit_bio+0x194/0x67c
[ 31.908827]I[0: swapper/0: 0] __submit_bio+0xb8/0x19c
Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
Cc: stable@vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek@samsung.com>
Link: https://lore.kernel.org/r/20240829093913.6282-2-sh8267.baek@samsung.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
After the SQ cleanup fix, the CQ will receive a response with the
corresponding tag marked as OCS: ABORTED. To align with the behavior of
Legacy SDB mode, the handling of OCS: ABORTED has been changed to match
that of OCS_INVALID_COMMAND_STATUS (SDB), with both returning a SCSI
result of DID_REQUEUE.
Furthermore, the workaround implemented before the SQ cleanup fix can be
removed.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20241001091917.6917-3-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When setting the ICU bit without using read-modify-write, SQRTCy will
restart SQ again and receive an RTC return error code 2 (Failure - SQ
not stopped).
Additionally, the error log has been modified so that this type of error
can be observed.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20241001091917.6917-2-peter.wang@mediatek.com
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
linux@treblig.org says:
Hi,
This removes a pile of dead functions in the SCSI bfa driver.
These were spotted by hunting for unused symbols in a unmodular
kernel build, and then double checking by grepping for the function
name.
It's been build tested only, I don't have the hardware, but
it's strictly full function (and the occasional struct) deletion,
so there should be no change in functionality.
Thanks to David Hildenbrand for the suggestion of hunting
for unused symbols.
Dave
Link: https://lore.kernel.org/r/20240915125633.25036-1-linux@treblig.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some more unused functions that didn't group elsewhere.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20240915125633.25036-6-linux@treblig.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
These functions aren't called anywhere, remove them.
Build tested only.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20240915125633.25036-5-linux@treblig.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
These functions aren't called anywhere, remove them.
Build tested only.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20240915125633.25036-4-linux@treblig.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|