summaryrefslogtreecommitdiff
path: root/drivers/scsi/hisi_sas
AgeCommit message (Collapse)Author
2022-01-31scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internalJohn Garry
The hisi_sas_slot.is_internal member is not set properly for ATA commands which the driver sends directly. A TMF struct pointer is normally used as a test to set this, but it is NULL for those commands. It's not ideal, but pass an empty TMF struct to set that member properly. Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24scsi: hisi_sas: Remove useless DMA-32 fallback configurationChristophe JAILLET
As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/1bf2d3660178b0e6f172e5208bc0bd68d31d9268.1642237482.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-07scsi: hisi_sas: Remove unused variable and check in ↵Xiang Chen
hisi_sas_send_ata_reset_each_phy() In commit 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list"), we use asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list, and it is enough to use asd_sas_port->phy_mask to check the state of phy, so remove the unused check and variable. Link: https://lore.kernel.org/r/1641300126-53574-1-git-send-email-chenxiang66@hisilicon.com Fixes: 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list") Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Colin King <colin.i.king@gmail.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Use autosuspend for the host controllerXiang Chen
The controller may frequently enter and exit suspend for each I/O which we need to deal with. This is inefficient and may cause too much suspend and resume activity for the controller. To avoid this, use a default 5s autosuspend for the controller to stop frequently suspending and resuming. This value may still be modified via sysfs interfaces. Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Keep controller active between ISR of phyup and the event ↵Xiang Chen
being processed It is possible that controller may become suspended between processing a phyup interrupt and the event being processed by libsas. As such, we can't ensure the controller is active when processing the phyup event - this may cause the phyup event to be lost or other issues. To avoid any possible issues, add pm_runtime_get_noresume() in phyup interrupt handler and pm_runtime_put_sync() in the work handler exit to ensure that we stay always active. Since we only want to call pm_runtime_get_noresume() for v3 hw, signal this will a new event, HISI_PHYE_PHY_UP_PM. Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Add more logs for runtime suspend/resumeXiang Chen
Add some logs at the beginning and end of suspend/resume. Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_listXiang Chen
Most places that use asd_sas_port->phy_list are protected by spinlock asd_sas_port->phy_list_lock, however there are still some places which miss grabbing the lock. Add it in function hisi_sas_refresh_port_id() when accessing asd_sas_port->phy_list. This carries a risk that list mutates while at the same time dropping the lock in function hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list to avoid this risk. Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend"John Garry
This reverts commit b14a37e011d829404c29a5ae17849d7efb034893. In that commit, we had to filter out phy-up events during suspend, as it work cause a deadlock between processing the phyup event and the resume HA function try to drain the HA event workqueue to complete the resume process. Now that we no longer try to drain the HA event queue during the HA resume processor, the deadlock would not occur, so remove the special handling for it. Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22scsi: libsas: Don't always drain event workqueue for HA resumeJohn Garry
For the hisi_sas driver, if a directly attached disk is removed during suspend, a hang will occur in the resume process: The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device link between SCSI devices and hisi_hba"), it is ensured that the HBA device cannot be runtime suspended when any SCSI device associated is active. Other drivers which use libsas don't worry about this as none support runtime suspend. The mentioned hang occurs when an disk is removed during suspend. In the removal process - from PHYE_RESUME_TIMEOUT event processing - we call into scsi_remove_device(), which is being processed in the HA event workqueue. Here we wait for all suppliers of the SCSI device to resume, which includes the HBA device (from the above commit). However the HBA device cannot resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from calling sas_resume_ha() -> sas_drain_work()). This is the deadlock. There does not appear to be any need for the sas_drain_work() to be called at all in sas_resume_ha() as it is not syncing against anything, so allow LLDDs to avoid this by providing a variant of sas_resume_ha() which does "sync", i.e. doesn't drain the event workqueue. Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Fix phyup timeout on FPGAQi Liu
The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger. To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes. Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Prevent parallel FLR and controller resetQi Liu
If we issue a controller reset command during executing a FLR a hung task may be found: Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18 This is a race condition which occurs when the FLR completes first. Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem . Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Prevent parallel controller reset and control phy commandQi Liu
A user may issue a control phy command from sysfs at any time, even if the controller is resetting. If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect. To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running. Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Factor out task prep and delivery codeJohn Garry
The task prep code is the same between the normal path (in hisi_sas_task_prep()) and the internal abort path, so factor is out into a common function. Link: https://lore.kernel.org/r/1639579061-179473-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Pass abort structure for internal abortJohn Garry
To help factor out code in future, it's useful to know if we're executing an internal abort, so pass a pointer to the structure. The idea is that a NULL pointer means not an internal abort. Link: https://lore.kernel.org/r/1639579061-179473-4-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Make internal abort have no task protoJohn Garry
For an internal abort, the task does not have a protocol, so set to none. This will make it easier to differentiate internal abort tasks in future. Link: https://lore.kernel.org/r/1639579061-179473-3-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16scsi: hisi_sas: Start delivery hisi_sas_task_exec() directlyJohn Garry
Currently we start delivery of commands to the DQ after returning from hisi_sas_task_exec() with success. Let's just start delivery directly in that function without having to check if some local variable is set. Link: https://lore.kernel.org/r/1639579061-179473-2-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-06scsi: hisi_sas: Use non-atomic bitmap functions when possibleChristophe JAILLET
All uses of the 'hisi_hba->slot_index_tags' bitmap are protected with the 'hisi_hba->lock' spinlock. Prefer the non-atomic '__[set|clear]_bit()' functions to save a few cycles. Link: https://lore.kernel.org/r/8ee33e463523db080e6a2c06f332e47abb69359b.1637961191.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-06scsi: hisi_sas: Remove some useless code in hisi_sas_alloc()Christophe JAILLET
The 'hisi_hba->slot_index_tags' bitmap is allocated with bitmap_zalloc() so it is already cleared. There is no need to clear it another time, one bit at a time. Remove the corresponding useless code. Link: https://lore.kernel.org/r/41c86e7e3e05a13bd586d8ee1b81296140b7a6eb.1637961191.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-06scsi: hisi_sas: Use devm_bitmap_zalloc() when applicableChristophe JAILLET
'hisi_hba->slot_index_tags' is a bitmap. Use 'devm_bitmap_zalloc()' to simplify code, improve the semantic, and avoid some open-coded arithmetic in allocator arguments. Link: https://lore.kernel.org/r/4afa3f71e66c941c660627c7f5b0223b51968ebb.1637961191.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-11-29scsi: Remove superfluous #include <linux/async.h> directivesBart Van Assche
Remove this include directive from code that does not use any functionality from kernel/async.c. Link: https://lore.kernel.org/r/20211129194609.3466071-13-bvanassche@acm.org Reviewed-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-11-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, smartpqi, lpfc, target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug fixes. Notable core changes are the removal of scsi->tag which caused some churn in obsolete drivers and a sweep through all drivers to call scsi_done() directly instead of scsi->done() which removes a pointer indirection from the hot path and a move to register core sysfs files earlier, which means they're available to KOBJ_ADD processing, which necessitates switching all drivers to using attribute groups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: lpfc: Update lpfc version to 14.0.0.3 scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss scsi: lpfc: Fix link down processing to address NULL pointer dereference scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine scsi: lpfc: Correct sysfs reporting of loop support after SFP status change scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup() scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer scsi: ufs: mediatek: Avoid sched_clock() misuse scsi: mpt3sas: Make mpt3sas_dev_attrs static scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions scsi: target: core: Stop using bdevname() scsi: aha1542: Use memcpy_{from,to}_bvec() scsi: sr: Add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: target: Perform ALUA group changes in one step scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path scsi: target: Fix alua_tg_pt_gps_count tracking scsi: target: Fix ordered tag handling ...
2021-10-18block: drop unused includes in <linux/blkdev.h>Christoph Hellwig
Drop various include not actually used in blkdev.h itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16scsi: hisi_sas: Switch to attribute groupsBart Van Assche
struct device supports attribute groups directly but does not support struct device_attribute directly. Hence switch to attribute groups. Link: https://lore.kernel.org/r/20211012233558.4066756-21-bvanassche@acm.org Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failureLuo Jiaxing
If the softreset fails in the I_T reset, libsas will then continue to issue a controller reset to try to recover. However a faulty disk may cause the softreset to fail, and resetting the controller will not help this scenario. Indeed, we will just continue the cycle of error handle handling to try to recover. So if the softreset fails upon certain conditions, just disable the phy associated with the disk. The user needs to handle this problem. Link: https://lore.kernel.org/r/1634041588-74824-5-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12scsi: hisi_sas: Wait for phyup in hisi_sas_control_phy()Xiang Chen
When issuing a hardreset/linkreset/phy_set_linkrate from sysfs, the phy will be disabled and re-enabled for the directly attached scenario. It takes some time for the phy to come back up after re-enabling the phy. If the controller becomes suspended while waiting for the phy to come back, the phy up may be lost (along with the disk). To solve this problem, wait for the phy up to occur with a timeout. Indeed this is already done in hisi_sas_debug_I_T_nexus_reset() for local phys, so just relocate the functionality to hisi_sas_control_phy(). Since the HA workqueue is drained when suspending the controller, and the phy control function is called from the same workqueue, we can guarantee that the controller will not be suspended during this period. Link: https://lore.kernel.org/r/1634041588-74824-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12scsi: hisi_sas: Initialise devices in .slave_alloc callbackXiang Chen
Perform driver-specific SCSI device initialization in the designated SCSI midlayer callback instead of relying on the libsas "device found" callback. The SCSI midlayer .slave_alloc interface is called prior to sending any I/O to the device. Link: https://lore.kernel.org/r/1634041588-74824-2-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: hisi_sas: Increase debugfs_dump_index after dump is completedLuo Jiaxing
The hisi_hba debugfs_dump_index member should increased after a dump insertion completed, and not before it has started, so fix the code to do so. Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: hisi_sas: Replace del_timer() calls with del_timer_sync()Xiang Chen
Some usage of del_timer() in the driver is potentially unsafe. When running the sas_task->slow_task timer in hisi_sas_exec_internal_tmf_task(), execution may be blocked in function hisi_sas_task_exec(); so it is possible that the timer is running when the callback to disable the timer is running. This could be dangerous, as we immediately release resources which the timer callback uses after disabling the timer. The same situation may be found at other sites, such as _hisi_sas_internal_task_abort(). Change calls to del_timer() to del_timer_sync() as necessary, to ensure any timer has finished when disabling. Also remove calls to timer_pending() prior to del_timer() as it is not necessary. Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: hisi_sas: Rename HISI_SAS_{RESET -> RESETTING}_BITLuo Jiaxing
HISI_SAS_RESET_BIT means that the controller is being reset, and so the name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT. Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: hisi_sas: Stop printing queue count in v3 hardware probeJohn Garry
The number of hardware queues is available from sysfs. Remove the print in the v3 hardware probe function. Link: https://lore.kernel.org/r/1629799260-120116-3-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: hisi_sas: Use managed PCI functionsXiang Chen
Use managed PCI functions such as pcim_enable_device() and pcim_iomap_regions() to simplify exception handling code. Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-11scsi: hisi_sas: Use scsi_cmd_to_rq() instead of scsi_cmnd.requestBart Van Assche
Prepare for removal of the request pointer by using scsi_cmd_to_rq() instead. This patch does not change any functionality. Link: https://lore.kernel.org/r/20210809230355.8186-22-bvanassche@acm.org Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-11Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "This is a set of minor fixes and clean ups in the core and various drivers. The only core change in behaviour is the I/O retry for spinup notify, but that shouldn't impact anything other than the failing case" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits) scsi: virtio_scsi: Add validation for residual bytes from response scsi: ipr: System crashes when seeing type 20 error scsi: core: Retry I/O for Notify (Enable Spinup) Required error scsi: mpi3mr: Fix warnings reported by smatch scsi: qedf: Add check to synchronize abort and flush scsi: MAINTAINERS: Add mpi3mr driver maintainers scsi: libfc: Fix array index out of bound exception scsi: mvsas: Use DEVICE_ATTR_RO()/RW() macro scsi: megaraid_mbox: Use DEVICE_ATTR_ADMIN_RO() macro scsi: qedf: Use DEVICE_ATTR_RO() macro scsi: qedi: Use DEVICE_ATTR_RO() macro scsi: message: mptfc: Switch from pci_ to dma_ API scsi: be2iscsi: Fix some missing space in some messages scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe() scsi: ufs: Fix build warning without CONFIG_PM scsi: bnx2fc: Remove meaningless bnx2fc_abts_cleanup() return value assignment scsi: qla2xxx: Add heartbeat check scsi: virtio_scsi: Do not overwrite SCSI status scsi: libsas: Add LUN number check in .slave_alloc callback scsi: core: Inline scsi_mq_alloc_queue() ...
2021-07-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits) scsi: aha1740: Avoid over-read of sense buffer scsi: arcmsr: Avoid over-read of sense buffer scsi: ips: Avoid over-read of sense buffer scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe() scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame() scsi: elx: libefc: Fix less than zero comparison of a unsigned int scsi: elx: efct: Fix pointer error checking in debugfs init scsi: elx: efct: Fix is_originator return code type scsi: elx: efct: Fix link error for _bad_cmpxchg scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel() scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session() scsi: elx: efct: Fix error handling in efct_hw_init() scsi: elx: efct: Remove redundant initialization of variable lun scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected" scsi: lpfc: Fix build error in lpfc_scsi.c scsi: target: iscsi: Remove redundant continue statement scsi: qla4xxx: Remove redundant continue statement scsi: ppa: Switch to use module_parport_driver() scsi: imm: Switch to use module_parport_driver() scsi: mpt3sas: Fix error return value in _scsih_expander_add() ...
2021-06-22scsi: libsas: Add LUN number check in .slave_alloc callbackYufen Yu
Offlining a SATA device connected to a hisi SAS controller and then scanning the host will result in detecting 255 non-existent devices: # lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc # echo "offline" > /sys/block/sdb/device/state # echo "- - -" > /sys/class/scsi_host/host2/scan # lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh ... [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb After a REPORT LUN command issued to the offline device fails, the SCSI midlayer tries to do a sequential scan of all devices whose LUN number is not 0. However, SATA does not support LUN numbers at all. Introduce a generic sas_slave_alloc() handler which will return -ENXIO for SATA devices if the requested LUN number is larger than 0 and make libsas drivers use this function as their .slave_alloc callback. Link: https://lore.kernel.org/r/20210622034037.1467088-1-yuyufen@huawei.com Reported-by: Wu Bo <wubo40@huawei.com> Suggested-by: John Garry <john.garry@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: hisi_sas: Speed up error handling when internal abort timeout occursLuo Jiaxing
If an internal task abort timeout occurs, the controller has developed a fault, and needs to be reset to be recovered. When this occurs during error handling, the current policy is to allow error handling to continue, and the inevitable nexus ha reset will handle the required reset. However various steps of error handling need to taken before this happens. These also involve some level of HW interaction, which will also fail with various timeouts. Speed up this process by recording a HW fault bit for an internal abort timeout - when this is set, just automatically error any HW interaction, and essentially go straight to clear nexus ha (to reset the controller). Link: https://lore.kernel.org/r/1623058179-80434-6-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: hisi_sas: Reset controller for internal abort timeoutLuo Jiaxing
If an internal task abort timeout occurs, the controller has developed a fault, and needs to be reset to be recovered. However if a timeout occurs during SCSI error handling, issuing a controller reset immediately may conflict with the error handling. To handle internal abort in these two scenarios, only queue the reset when not in an error handling function. In the case of a timeout during error handling, do nothing and rely on the inevitable ha nexus reset to reset the controller. Link: https://lore.kernel.org/r/1623058179-80434-5-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: hisi_sas: Include HZ in timer macrosLuo Jiaxing
Include HZ in timer macros to make the code more concise. Link: https://lore.kernel.org/r/1623058179-80434-4-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: hisi_sas: Run I_T nexus resets in parallel for clear nexus resetLuo Jiaxing
For a clear nexus reset operation, the I_T nexus resets are executed serially for each device. For devices attached through an expander, this may take 2s per device; so, in total, could take a long time. Reduce the total time by running the I_T nexus resets in parallel through async operations. Link: https://lore.kernel.org/r/1623058179-80434-3-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: hisi_sas: Put a limit of link reset retriesLuo Jiaxing
If an OOB event is received but the phy still fails to come up, a link reset will be issued repeatedly at an interval of 20s until the phy comes up. Set a limit for link reset issue retries to avoid printing the timeout message endlessly. Link: https://lore.kernel.org/r/1623058179-80434-2-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: libsas: Introduce more SAM status code aliases in enum exec_statusBart Van Assche
This patch prepares for converting SAM status codes into an enum. Without this patch converting SAM status codes into an enumeration type would trigger complaints about enum type mismatches for the SAS code. Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Cc: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irqYang Yingliang
irqs allocated with devm_request_irq() should not be freed using free_irq(). Doing so causes a dangling pointer and a subsequent double free. Link: https://lore.kernel.org/r/20210519130519.2661938-1-yangyingliang@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw()Sergey Shtylyov
After commit 6c11dc060427 ("scsi: hisi_sas: Fix IRQ checks") we have the error codes returned by platform_get_irq() ready for the propagation upsream in interrupt_init_v1_hw() -- that will fix still broken deferred probing. Let's propagate the error codes from devm_request_irq() as well since I don't see the reason to override them with -ENOENT... Link: https://lore.kernel.org/r/49ba93a3-d427-7542-d85a-b74fe1a33a73@omp.ru Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13scsi: hisi_sas: Fix IRQ checksSergey Shtylyov
Commit df2d8213d9e3 ("hisi_sas: use platform_get_irq()") failed to take into account that irq_of_parse_and_map() and platform_get_irq() have a different way of indicating an error: the former returns 0 and the latter returns a negative error code. Fix up the IRQ checks! Link: https://lore.kernel.org/r/810f26d3-908b-1d6b-dc5c-40019726baca@omprussia.ru Fixes: df2d8213d9e3 ("hisi_sas: use platform_get_irq()") Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Print SATA device SAS address for soft reset failureLuo Jiaxing
Add (pseudo) SAS address for ATA software reset failure log to assist in debugging. Link: https://lore.kernel.org/r/1617709711-195853-7-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Warn in v3 hw channel interrupt handler when status reg clearedLuo Jiaxing
If a channel interrupt occurs without any status bit set, the handler will return directly. However, if such redundant interrupts are received, it's better to check what happen, so add logs for this. Link: https://lore.kernel.org/r/1617709711-195853-6-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Directly snapshot registers when executing a resetJianqin Xie
The debugfs snapshot should be executed before the reset occurs to ensure that the register contents are saved properly. As such, it is incorrect to queue the debugfs dump when running a reset as the reset will occur prior to the snapshot work item is handler. Therefore, directly snapshot registers in the reset work handler. Link: https://lore.kernel.org/r/1617709711-195853-5-git-send-email-john.garry@huawei.com Signed-off-by: Jianqin Xie <xiejianqin@hisilicon.com> Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Call sas_unregister_ha() to roll back if .hw_init() failsXiang Chen
Function sas_unregister_ha() needs to be called to roll back if hisi_hba->hw->hw_init() fails in function hisi_sas_probe() or hisi_sas_v3_probe(). Make that change. Link: https://lore.kernel.org/r/1617709711-195853-4-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Print SAS address for v3 hw erroneous completion printLuo Jiaxing
To help debugging efforts, print the device SAS address for v3 hw erroneous completion log. Here is an example print: hisi_sas_v3_hw 0000:b4:02.0: erroneous completion iptt=2193 task=000000002b0c13f8 dev id=17 addr=570fd45f9d17b001 Link: https://lore.kernel.org/r/1617709711-195853-3-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12scsi: hisi_sas: Delete some unused callbacksLuo Jiaxing
The debugfs code has been relocated to v3 hw driver, so delete unused struct hisi_sas_hw function pointers snapshot_{prepare, restore}. Link: https://lore.kernel.org/r/1617709711-195853-2-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>