linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-02-07	scsi: core: Add scsi_done_direct() for immediate completion	Sebastian Andrzej Siewior
	Add scsi_done_direct() which behaves like scsi_done() except that it invokes blk_mq_complete_request_direct() in order to complete the request. Callers from process context can complete the request directly instead waking ksoftirqd. Link: https://lore.kernel.org/r/Yfw7JaszshmfYa1d@flow Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-07	scsi: core: Make "access_state" sysfs attribute always visible	Martin Wilck
	If a SCSI device handler module is loaded after some SCSI devices have already been probed (e.g. via request_module() by dm-multipath), the "access_state" and "preferred_path" sysfs attributes remain invisible for these devices, although the handler is attached and live. The reason is that the visibility is only checked when the sysfs attribute group is first created. This results in an inconsistent user experience depending on the load order of SCSI low-level drivers vs. device handler modules. This patch changes user space API: attempting to read the "access_state" or "preferred_path" attributes will now result in -EINVAL rather than -ENODEV for devices that have no device handler, and tests for the existence of these attributes will have a different result. Link: https://lore.kernel.org/r/20220127141351.30706-1-mwilck@suse.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: lpfc: Remove redundant flush_workqueue() call	Minghao Chi (CGEL ZTE)
	destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() call. Link: https://lore.kernel.org/r/20220127014330.1185114-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: qedi: Remove redundant flush_workqueue() calls	Minghao Chi (CGEL ZTE)
	destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() calls. Link: https://lore.kernel.org/r/20220127013934.1184923-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: bfa: Replace snprintf() with sysfs_emit()	Yang Guang
	coccinelle report: ./drivers/scsi/bfa/bfad_attr.c:908:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:860:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:888:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:853:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:808:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:728:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:822:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:927:9-17: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:900:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:874:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:714:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:839:8-16: WARNING: use scnprintf or sprintf Use sysfs_emit() instead of scnprintf() or sprintf(). Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.1643181468.git.yang.guang5@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David Yang <davidcomponentone@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: mvsas: Replace snprintf() with sysfs_emit()	Yang Guang
	coccinelle report: ./drivers/scsi/mvsas/mv_init.c:699:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/mvsas/mv_init.c:747:8-16: WARNING: use scnprintf or sprintf Use sysfs_emit() instead of scnprintf() or sprintf(). Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.1643182948.git.yang.guang5@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David Yang <davidcomponentone@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: bnx2fc: Make use of the helper macro kthread_run()	Yin Xiujiang
	Repalce kthread_create/wake_up_process() with kthread_run() to simplify the code. Link: https://lore.kernel.org/r/20220126014248.466806-1-yinxiujiang@kylinos.cn Signed-off-by: Yin Xiujiang <yinxiujiang@kylinos.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal	John Garry
	The hisi_sas_slot.is_internal member is not set properly for ATA commands which the driver sends directly. A TMF struct pointer is normally used as a test to set this, but it is NULL for those commands. It's not ideal, but pass an empty TMF struct to set that member properly. Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: bnx2fc: Fix typo in comments	Cai Huoqing
	Replace 'Offlaod' with 'Offload'. Link: https://lore.kernel.org/r/20220128063101.6953-1-cai.huoqing@linux.dev Acked-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: ufs: Add checking lifetime attribute for WriteBooster	Jinyoung Choi
	Because WB performs writes in SLC mode, it is not possible to use WriteBooster indefinitely. Vendors can set a lifetime limit in the device. If the lifetime exceeds this limit, the device ican disable the WB feature. The feature is defined in the "bWriteBoosterBufferLifeTimeEst (IDN = 1E)" attribute. With lifetime exceeding the limit value, the current driver continuously performs the following query: - Write Flag: WB_ENABLE / DISABLE - Read attr: Available Buffer Size - Read attr: Current Buffer Size This patch recognizes that WriteBooster is no longer supported by the device, and prevents unnecessary queries. Link: https://lore.kernel.org/r/1891546521.01643252701746.JavaMail.epsvc@epcpadp3 Reviewed-by: Asutosh Das <quic_asutoshd@quicinc.com> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task	John Garry
	Currently a use-after-free may occur if a sas_task is aborted by the upper layer before we handle the I/O completion in mpi_ssp_completion() or mpi_sata_completion(). In this case, the following are the two steps in handling those I/O completions: - Call complete() to inform the upper layer handler of completion of the I/O. - Release driver resources associated with the sas_task in pm8001_ccb_task_free() call. When complete() is called, the upper layer may free the sas_task. As such, we should not touch the associated sas_task afterwards, but we do so in the pm8001_ccb_task_free() call. Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering. Link: https://lore.kernel.org/r/1643289172-165636-4-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: pm8001: Fix use-after-free for aborted TMF sas_task	John Garry
	Currently a use-after-free may occur if a TMF sas_task is aborted before we handle the IO completion in mpi_ssp_completion(). The abort occurs due to timeout. When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the sas_task is freed in pm8001_exec_internal_tmf_task(). However, if the I/O completion occurs later, the I/O completion still thinks that the sas_task is available. Fix this by clearing the ccb->task if the TMF times out - the I/O completion handler does nothing if this pointer is cleared. Link: https://lore.kernel.org/r/1643289172-165636-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: pm8001: Fix warning for undescribed param in process_one_iomb()	John Garry
	make W=1 complains of an undescribed function parameter: drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or member 'circularQ' not described in 'process_one_iomb' Fix it. Link: https://lore.kernel.org/r/1643289172-165636-2-git-send-email-john.garry@huawei.com Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: core: Reallocate device's budget map on queue depth change	Ming Lei
	We currently use ->cmd_per_lun as initial queue depth for setting up the budget_map. Martin Wilck reported that it is common for the queue_depth to be subsequently updated in slave_configure() based on detected hardware characteristics. As a result, for some drivers, the static host template settings for cmd_per_lun and can_queue won't actually get used in practice. And if the default values are used to allocate the budget_map, memory may be consumed unnecessarily. Fix the issue by reallocating the budget_map after ->slave_configure() returns. At that time the device queue_depth should accurately reflect what the hardware needs. Link: https://lore.kernel.org/r/20220127153733.409132-1-ming.lei@redhat.com Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Martin Wilck <martin.wilck@suse.com> Suggested-by: Martin Wilck <martin.wilck@suse.com> Tested-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe	John Meneghini
	Running tests with a debug kernel shows that bnx2fc_recv_frame() is modifying the per_cpu lport stats counters in a non-mpsafe way. Just boot a debug kernel and run the bnx2fc driver with the hardware enabled. [ 1391.699147] BUG: using smp_processor_id() in preemptible [00000000] code: bnx2fc_ [ 1391.699160] caller is bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc] [ 1391.699174] CPU: 2 PID: 4355 Comm: bnx2fc_l2_threa Kdump: loaded Tainted: G B [ 1391.699180] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 1391.699183] Call Trace: [ 1391.699188] dump_stack_lvl+0x57/0x7d [ 1391.699198] check_preemption_disabled+0xc8/0xd0 [ 1391.699205] bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc] [ 1391.699215] ? do_raw_spin_trylock+0xb5/0x180 [ 1391.699221] ? bnx2fc_npiv_create_vports.isra.0+0x4e0/0x4e0 [bnx2fc] [ 1391.699229] ? bnx2fc_l2_rcv_thread+0xb7/0x3a0 [bnx2fc] [ 1391.699240] bnx2fc_l2_rcv_thread+0x1af/0x3a0 [bnx2fc] [ 1391.699250] ? bnx2fc_ulp_init+0xc0/0xc0 [bnx2fc] [ 1391.699258] kthread+0x364/0x420 [ 1391.699263] ? _raw_spin_unlock_irq+0x24/0x50 [ 1391.699268] ? set_kthread_struct+0x100/0x100 [ 1391.699273] ret_from_fork+0x22/0x30 Restore the old get_cpu/put_cpu code with some modifications to reduce the size of the critical section. Link: https://lore.kernel.org/r/20220124145110.442335-1-jmeneghi@redhat.com Fixes: d576a5e80cd0 ("bnx2fc: Improve stats update mechanism") Tested-by: Guangwu Zhang <guazhang@redhat.com> Acked-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31	scsi: pm80xx: Fix double completion for SATA devices	Ajish Koshy
	Current code handles completions for SATA devices in mpi_sata_completion() and mpi_sata_event(). However, at the time when any SATA event happens, for almost all the event types, the command is still in the target. It is therefore incorrect to complete the task in sata_event(). There are some events for which we get sata_completions, some need recovery procedure and others abort. All the tasks must be completed via sata_completion() path. Removed the task done related code from sata_events(). For tasks where we don't get completions, let top layer call abort() to abort the command post timeout. Link: https://lore.kernel.org/r/20220124082255.86223-1-Ajish.Koshy@microchip.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Co-developed-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Add environmental reporting log subpage	Douglas Gilbert
	Log subpages are starting to appear in real devices (e.g. SSDs) so add support for one. Adopt approach where all "wild" sub-pages are themselves listed as long as there is at least one non-wild page or subpage for a given page number. Link: https://lore.kernel.org/r/20220109012853.301953-10-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Add no_rwlock parameter	Douglas Gilbert
	By default, this driver places a read lock around all user data fetches and a write lock around all user data modifying operations (e.g. WRITE commands). These locks have "per store" granularity. Other drivers that have a similar function (e.g. null_blk) do not take this data integrity step and run significantly faster in some tests. In the common case of a (simulated) device to device copy (e.g. what dd and its variants do) there should be no need for locks around data accesses. So add the driver and sysfs parameter no_rwlock which is boolean and when set does what its name suggests. The default is false for backward comaptibility. Link: https://lore.kernel.org/r/20220109012853.301953-7-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Divide power on reset UNIT ATTENTION	Douglas Gilbert
	To distinguish between resets sent by the SCSI mid-level error handling and newly introduced devices (LUs), this Unit Attention: power on, reset, or bus reset occurred [0x29,0x0] has been subdivided into that UA for the reset case and this new UA: power on occurred [0x29,0x1] for the new device (LU) case. This makes debug a little easier to follow when it is turned on (e.g. 'echo 0x1 > opts'). Bump driver version number. Link: https://lore.kernel.org/r/20220109012853.301953-6-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Refine sdebug_blk_mq_poll()	Douglas Gilbert
	Refine the sdebug_blk_mq_poll() function so it only takes the spinlock on the queue when it can see one or more requests with the in_use bitmap flag set. Link: https://lore.kernel.org/r/20220109012853.301953-5-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Use TASK SET FULL more	Douglas Gilbert
	When the internal in_use bit array in this driver is full returning SCSI_MLQUEUE_HOST_BUSY leads to the mid-level reissuing the request which is unhelpful. Previously TASK SET FULL status was only returned if ALL_TSF [0x400] is placed in the opts variable (at load time or via sysfs). Now ignore that setting and always return TASK SET FULL when in_use array is full. Also set DID_ABORT together with TASK SET FULL so the mid-level gives up immediately. Aside: the situations addressed by this patch lead to lockups and timeouts. They have only been detected when blk_poll() is used. That mechanism is relatively new in the SCSI subsystem suggesting the mid-level may need more work in that area. Link: https://lore.kernel.org/r/20220109012853.301953-4-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Strengthen defer_t accesses	Douglas Gilbert
	Use READ_ONCE() and WRITE_ONCE() macros when accessing the sdebug_defer::defer_t value. Link: https://lore.kernel.org/r/20220109012853.301953-3-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: scsi_debug: Address races following module load	Douglas Gilbert
	When scsi_debug is loaded as a module with many (simulated) hosts, targets, and devices (LUs), modprobe can take a long time to return. Only a small amount of this time is spent in the scsi_debug_init(); the rest is other parts of the kernel reacting to to the appearance of new storage devices. As soon as scsi_debug_init() has completed the user space may call 'rmmod scsi_debug' and this was found to cause race problems as outlined here: https://bugzilla.kernel.org/show_bug.cgi?id=212337 To reliably generate this race a sysfs parameter called rm_all_hosts was added and the code was strengthened in this area. The main change was to make the count of scsi_debug hosts present an atomic. Then it was found that the handling of the existing add_host parameter needed the same strengthening. Further: 'echo -9999 > /sys/bus/pseudo/drivers/scsi_debug/add_host has the same effect as rm_all_hosts so rm_all_hosts was not needed. To inhibit a race between two invocations of writes to add_host, a mutex was added. Also address a possible race when rmmod is called but LUs are still being added. The logic to remove (all) hosts is rather crude: it works backwards down a linked lists of hosts. Any pending requests are terminated with DID_NO_CONNECT as are any new requests. In the case where not all hosts are being removed, the ones that remain may have lost requests as just outlined. The lowest numbered host (id) hosts will remain. Cc: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220109012853.301953-2-dgilbert@interlog.com Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: myrs: Fix crash in error case	Tong Zhang
	In myrs_detect(), cs->disable_intr is NULL when privdata->hw_init() fails with non-zero. In this case, myrs_cleanup(cs) will call a NULL ptr and crash the kernel. [ 1.105606] myrs 0000:00:03.0: Unknown Initialization Error 5A [ 1.105872] myrs 0000:00:03.0: Failed to initialize Controller [ 1.106082] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1.110774] Call Trace: [ 1.110950] myrs_cleanup+0xe4/0x150 [myrs] [ 1.111135] myrs_probe.cold+0x91/0x56a [myrs] [ 1.111302] ? DAC960_GEM_intr_handler+0x1f0/0x1f0 [myrs] [ 1.111500] local_pci_probe+0x48/0x90 Link: https://lore.kernel.org/r/20220123225717.1069538-1-ztong0001@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: 53c700: Remove redundant assignment to pointer SCp	Colin Ian King
	Pointer SCp is being re-assigned the same value that it was initialized to a few lines earlier, the assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20220123175530.110462-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: ufs: Treat link loss as fatal error	Kiwoong Kim
	This event is raised when link is lost as specified in UFSHCI spec and that means communication is not possible. Thus initializing UFS interface needs to be done. Make UFS driver considers Link Lost as fatal in the INT_FATAL_ERRORS mask. This will trigger a host reset whenever a link lost interrupt occurs. Link: https://lore.kernel.org/r/1642743475-54275-1-git-send-email-kwmad.kim@samsung.com Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-25	scsi: ufs: Use generic error code in ufshcd_set_dev_pwr_mode()	Kiwoong Kim
	The return value of ufshcd_set_dev_pwr_mode() is passed to device PM core. However, the function currently returns a SCSI result which the PM core doesn't understand. This might lead to unexpected behaviors in userland; a platform reset was observed in Android. Use a generic error code for SSU failures. Link: https://lore.kernel.org/r/1642743182-54098-1-git-send-email-kwmad.kim@samsung.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Update version to 10.02.07.300-k	Nilesh Javali
	Link: https://lore.kernel.org/r/20220110050218.3958-18-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Check for firmware dump already collected	Joe Carnuccio
	While allocating firmware dump, check if dump is already collected and do not re-allocate the buffer. Link: https://lore.kernel.org/r/20220110050218.3958-17-njavali@marvell.com Cc: stable@vger.kernel.org Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Add devids and conditionals for 28xx	Joe Carnuccio
	This is an update to the original 28xx adapter enablement. Add a bunch of conditionals that are applicable for 28xx. Link: https://lore.kernel.org/r/20220110050218.3958-16-njavali@marvell.com Fixes: ecc89f25e225 ("scsi: qla2xxx: Add Device ID for ISP28XX") Cc: stable@vger.kernel.org Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()	Saurav Kashyap
	[ 12.323788] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/1020 [ 12.332297] caller is qla2xxx_create_qpair+0x32a/0x5d0 [qla2xxx] [ 12.338417] CPU: 7 PID: 1020 Comm: systemd-udevd Tainted: G I --------- --- 5.14.0-29.el9.x86_64 #1 [ 12.348827] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.6.0 05/22/2018 [ 12.356356] Call Trace: [ 12.358821] dump_stack_lvl+0x34/0x44 [ 12.362514] check_preemption_disabled+0xd9/0xe0 [ 12.367164] qla2xxx_create_qpair+0x32a/0x5d0 [qla2xxx] [ 12.372481] qla2x00_probe_one+0xa3a/0x1b80 [qla2xxx] [ 12.377617] ? _raw_spin_lock_irqsave+0x19/0x40 [ 12.384284] local_pci_probe+0x42/0x80 [ 12.390162] ? pci_match_device+0xd7/0x110 [ 12.396366] pci_device_probe+0xfd/0x1b0 [ 12.402372] really_probe+0x1e7/0x3e0 [ 12.408114] __driver_probe_device+0xfe/0x180 [ 12.414544] driver_probe_device+0x1e/0x90 [ 12.420685] __driver_attach+0xc0/0x1c0 [ 12.426536] ? __device_attach_driver+0xe0/0xe0 [ 12.433061] ? __device_attach_driver+0xe0/0xe0 [ 12.439538] bus_for_each_dev+0x78/0xc0 [ 12.445294] bus_add_driver+0x12b/0x1e0 [ 12.451021] driver_register+0x8f/0xe0 [ 12.456631] ? 0xffffffffc07bc000 [ 12.461773] qla2x00_module_init+0x1be/0x229 [qla2xxx] [ 12.468776] do_one_initcall+0x44/0x200 [ 12.474401] ? load_module+0xad3/0xba0 [ 12.479908] ? kmem_cache_alloc_trace+0x45/0x410 [ 12.486268] do_init_module+0x5c/0x280 [ 12.491730] __do_sys_init_module+0x12e/0x1b0 [ 12.497785] do_syscall_64+0x3b/0x90 [ 12.503029] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 12.509764] RIP: 0033:0x7f554f73ab2e Link: https://lore.kernel.org/r/20220110050218.3958-15-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters	Joe Carnuccio
	28XX adapters are capable of detecting both T10 PI tag escape values as well as IP guard. This was missed due to the adapter type missed in the corresponding macros. Fix this by adding support for 28xx in those macros. Link: https://lore.kernel.org/r/20220110050218.3958-14-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Tested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: edif: Fix clang warning	Quinn Tran
	Silence compile warning due to unaligned memory access. qla_edif.c:713:45: warning: taking address of packed member 'u' of class or structure 'auth_complete_cmd' may result in an unaligned pointer value [-Waddress-of-packed-member] fcport = qla2x00_find_fcport_by_pid(vha, &appplogiok.u.d_id); Link: https://lore.kernel.org/r/20220110050218.3958-13-njavali@marvell.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix warning for missing error code	Nilesh Javali
	Fix smatch-reported warning message: drivers/scsi/qla2xxx/qla_target.c:3324 qlt_xmit_response() warn: missing error code 'res' Link: https://lore.kernel.org/r/20220110050218.3958-12-njavali@marvell.com Fixes: 4a8f71014b4d ("scsi: qla2xxx: Fix unmap of already freed sgl") Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix device reconnect in loop topology	Arun Easi
	A device logout in loop topology initiates a device connection teardown which loses the FW device handle. In loop topo, the device handle is not regrabbed leading to device login failures and eventually to loss of the device. Fix this by taking the main login path that does it. Link: https://lore.kernel.org/r/20220110050218.3958-11-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Add ql2xnvme_queues module param to configure number of NVMe ↵	Shreyas Deodhar
	queues Add ql2xnvme_queues module parameter to configure number of NVMe queues Usage: Number of NVMe Queues that can be configured. Final value will be min(ql2xnvme_queues, num_cpus, num_chip_queues), 1 - Minimum number of queues supported 8 - Default value 128 - Maximum number of queues supported Link: https://lore.kernel.org/r/20220110050218.3958-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix wrong FDMI data for 64G adapter	Bikash Hazarika
	Corrected transmission speed mask values for FC. Supported Speed: 16 32 20 Gb/s ===> Should be 64 instead of 20 Supported Speed: 16G 32G 48G ===> Should be 64G instead of 48G Link: https://lore.kernel.org/r/20220110050218.3958-9-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Add retry for exec firmware	Quinn Tran
	Per FW request, Exec FW can fail due to temporary error resulting in driver not attaching to the adapter. Add retry of this command up to 4 retries. Link: https://lore.kernel.org/r/20220110050218.3958-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix scheduling while atomic	Quinn Tran
	The driver makes a call into midlayer (fc_remote_port_delete) which can put the thread to sleep. The thread that originates the call is in interrupt context. The combination of the two trigger a crash. Schedule the call in non-interrupt context where it is more safe. kernel: BUG: scheduling while atomic: swapper/7/0/0x00010000 kernel: Call Trace: kernel: <IRQ> kernel: dump_stack+0x66/0x81 kernel: __schedule_bug.cold.90+0x5/0x1d kernel: __schedule+0x7af/0x960 kernel: schedule+0x28/0x80 kernel: schedule_timeout+0x26d/0x3b0 kernel: wait_for_completion+0xb4/0x140 kernel: ? wake_up_q+0x70/0x70 kernel: __wait_rcu_gp+0x12c/0x160 kernel: ? sdev_evt_alloc+0xc0/0x180 [scsi_mod] kernel: synchronize_sched+0x6c/0x80 kernel: ? call_rcu_bh+0x20/0x20 kernel: ? __bpf_trace_rcu_invoke_callback+0x10/0x10 kernel: sdev_evt_alloc+0xfd/0x180 [scsi_mod] kernel: starget_for_each_device+0x85/0xb0 [scsi_mod] kernel: ? scsi_init_io+0x360/0x3d0 [scsi_mod] kernel: scsi_init_io+0x388/0x3d0 [scsi_mod] kernel: device_for_each_child+0x54/0x90 kernel: fc_remote_port_delete+0x70/0xe0 [scsi_transport_fc] kernel: qla2x00_schedule_rport_del+0x62/0xf0 [qla2xxx] kernel: qla2x00_mark_device_lost+0x9c/0xd0 [qla2xxx] kernel: qla24xx_handle_plogi_done_event+0x55f/0x570 [qla2xxx] kernel: qla2x00_async_login_sp_done+0xd2/0x100 [qla2xxx] kernel: qla24xx_logio_entry+0x13a/0x3c0 [qla2xxx] kernel: qla24xx_process_response_queue+0x306/0x400 [qla2xxx] kernel: qla24xx_msix_rsp_q+0x3f/0xb0 [qla2xxx] kernel: __handle_irq_event_percpu+0x40/0x180 kernel: handle_irq_event_percpu+0x30/0x80 kernel: handle_irq_event+0x36/0x60 Link: https://lore.kernel.org/r/20220110050218.3958-7-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix premature hw access after PCI error	Quinn Tran
	After a recoverable PCI error has been detected and recovered, qla driver needs to check to see if the error condition still persist and/or wait for the OS to give the resume signal. Sep 8 22:26:03 localhost kernel: WARNING: CPU: 9 PID: 124606 at qla_tmpl.c:440 qla27xx_fwdt_entry_t266+0x55/0x60 [qla2xxx] Sep 8 22:26:03 localhost kernel: RIP: 0010:qla27xx_fwdt_entry_t266+0x55/0x60 [qla2xxx] Sep 8 22:26:03 localhost kernel: Call Trace: Sep 8 22:26:03 localhost kernel: ? qla27xx_walk_template+0xb1/0x1b0 [qla2xxx] Sep 8 22:26:03 localhost kernel: ? qla27xx_execute_fwdt_template+0x12a/0x160 [qla2xxx] Sep 8 22:26:03 localhost kernel: ? qla27xx_fwdump+0xa0/0x1c0 [qla2xxx] Sep 8 22:26:03 localhost kernel: ? qla2xxx_pci_mmio_enabled+0xfb/0x120 [qla2xxx] Sep 8 22:26:03 localhost kernel: ? report_mmio_enabled+0x44/0x80 Sep 8 22:26:03 localhost kernel: ? report_slot_reset+0x80/0x80 Sep 8 22:26:03 localhost kernel: ? pci_walk_bus+0x70/0x90 Sep 8 22:26:03 localhost kernel: ? aer_dev_correctable_show+0xc0/0xc0 Sep 8 22:26:03 localhost kernel: ? pcie_do_recovery+0x1bb/0x240 Sep 8 22:26:03 localhost kernel: ? aer_recover_work_func+0xaa/0xd0 Sep 8 22:26:03 localhost kernel: ? process_one_work+0x1a7/0x360 .. Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-8041:22: detected PCI disconnect. Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-107ff:22: qla27xx_fwdt_entry_t262: dump ram MB failed. Area 5h start 198013h end 198013h Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-107ff:22: Unable to capture FW dump Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-1015:22: cmd=0x0, waited 5221 msecs Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-680d:22: mmio enabled returning. Sep 8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-d04c:22: MBX Command timeout for cmd 0, iocontrol=ffffffff jiffies=10140f2e5 mb[0-3]=[0xffff 0xffff 0xffff 0xffff] Link: https://lore.kernel.org/r/20220110050218.3958-6-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix warning message due to adisc being flushed	Quinn Tran
	Fix warning message due to adisc being flushed. Linux kernel triggered a warning message where a different error code type is not matching up with the expected type. Add additional translation of one error code type to another. WARNING: CPU: 2 PID: 1131623 at drivers/scsi/qla2xxx/qla_init.c:498 qla2x00_async_adisc_sp_done+0x294/0x2b0 [qla2xxx] CPU: 2 PID: 1131623 Comm: drmgr Not tainted 5.13.0-rc1-autotest #1 .. GPR28: c000000aaa9c8890 c0080000079ab678 c00000140a104800 c00000002bd19000 NIP [c00800000790857c] qla2x00_async_adisc_sp_done+0x294/0x2b0 [qla2xxx] LR [c008000007908578] qla2x00_async_adisc_sp_done+0x290/0x2b0 [qla2xxx] Call Trace: [c00000001cdc3620] [c008000007908578] qla2x00_async_adisc_sp_done+0x290/0x2b0 [qla2xxx] (unreliable) [c00000001cdc3710] [c0080000078f3080] __qla2x00_abort_all_cmds+0x1b8/0x580 [qla2xxx] [c00000001cdc3840] [c0080000078f589c] qla2x00_abort_all_cmds+0x34/0xd0 [qla2xxx] [c00000001cdc3880] [c0080000079153d8] qla2x00_abort_isp_cleanup+0x3f0/0x570 [qla2xxx] [c00000001cdc3920] [c0080000078fb7e8] qla2x00_remove_one+0x3d0/0x480 [qla2xxx] [c00000001cdc39b0] [c00000000071c274] pci_device_remove+0x64/0x120 [c00000001cdc39f0] [c0000000007fb818] device_release_driver_internal+0x168/0x2a0 [c00000001cdc3a30] [c00000000070e304] pci_stop_bus_device+0xb4/0x100 [c00000001cdc3a70] [c00000000070e4f0] pci_stop_and_remove_bus_device+0x20/0x40 [c00000001cdc3aa0] [c000000000073940] pci_hp_remove_devices+0x90/0x130 [c00000001cdc3b30] [c0080000070704d0] disable_slot+0x38/0x90 [rpaphp] [ c00000001cdc3b60] [c00000000073eb4c] power_write_file+0xcc/0x180 [c00000001cdc3be0] [c0000000007354bc] pci_slot_attr_store+0x3c/0x60 [c00000001cdc3c00] [c00000000055f820] sysfs_kf_write+0x60/0x80 [c00000001cdc3c20] [c00000000055df10] kernfs_fop_write_iter+0x1a0/0x290 [c00000001cdc3c70] [c000000000447c4c] new_sync_write+0x14c/0x1d0 [c00000001cdc3d10] [c00000000044b134] vfs_write+0x224/0x330 [c00000001cdc3d60] [c00000000044b3f4] ksys_write+0x74/0x130 [c00000001cdc3db0] [c00000000002df70] system_call_exception+0x150/0x2d0 [c00000001cdc3e10] [c00000000000d45c] system_call_common+0xec/0x278 Link: https://lore.kernel.org/r/20220110050218.3958-5-njavali@marvell.com Cc: stable@vger.kernel.org Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Fix stuck session in gpdb	Quinn Tran
	Fix stuck sessions in get port database. When a thread is in the process of re-establishing a session, a flag is set to prevent multiple threads / triggers from doing the same task. This flag was left on, where any attempt to relogin was locked out. Clear this flag, if the attempt has failed. Link: https://lore.kernel.org/r/20220110050218.3958-4-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Implement ref count for SRB	Saurav Kashyap
	The timeout handler and the done function are racing. When qla2x00_async_iocb_timeout() starts to run it can be preempted by the normal response path (via the firmware?). qla24xx_async_gpsc_sp_done() releases the SRB unconditionally. When scheduling back to qla2x00_async_iocb_timeout() qla24xx_async_abort_cmd() will access an freed sp->qpair pointer: qla2xxx [0000:83:00.0]-2871:0: Async-gpsc timeout - hdl=63d portid=234500 50:06:0e:80:08:77:b6:21. qla2xxx [0000:83:00.0]-2853:0: Async done-gpsc res 0, WWPN 50:06:0e:80:08:77:b6:21 qla2xxx [0000:83:00.0]-2854:0: Async-gpsc OUT WWPN 20:45:00:27:f8:75:33:00 speeds=2c00 speed=0400. qla2xxx [0000:83:00.0]-28d8:0: qla24xx_handle_gpsc_event 50:06:0e:80:08:77:b6:21 DS 7 LS 6 rc 0 login 1\|1 rscn 1\|0 lid 5 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: qla24xx_async_abort_cmd+0x1b/0x1c0 [qla2xxx] Obvious solution to this is to introduce a reference counter. One reference is taken for the normal code path (the 'good' case) and one for the timeout path. As we always race between the normal good case and the timeout/abort handler we need to serialize it. Also we cannot assume any order between the handlers. Since this is slow path we can use proper synchronization via locks. When we are able to cancel a timer (del_timer returns 1) we know there can't be any error handling in progress because the timeout handler hasn't expired yet, thus we can safely decrement the refcounter by one. If we are not able to cancel the timer, we know an abort handler is running. We have to make sure we call sp->done() in the abort handlers before calling kref_put(). Link: https://lore.kernel.org/r/20220110050218.3958-3-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Co-developed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: qla2xxx: Refactor asynchronous command initialization	Daniel Wagner
	Move common open-coded asynchronous command initializing code such as setting up the timer and the done callback into one function. This is a preparation step and allows us later on to change the low level error flow handling at a central place. Link: https://lore.kernel.org/r/20220110050218.3958-2-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: bfa: Remove useless DMA-32 fallback configuration	Christophe JAILLET
	As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/5663cef9b54004fa56cca7ce65f51eadfc3ecddb.1642238127.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: hisi_sas: Remove useless DMA-32 fallback configuration	Christophe JAILLET
	As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/1bf2d3660178b0e6f172e5208bc0bd68d31d9268.1642237482.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: 3w-sas: Remove useless DMA-32 fallback configuration	Christophe JAILLET
	As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/dbbe8671ca760972d80f8d35f3170b4609bee368.1642236763.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()	John Meneghini
	The bnx2fc_destroy() functions are removing the interface before calling destroy_work. This results multiple WARNings from sysfs_remove_group() as the controller rport device attributes are removed too early. Replace the fcoe_port's destroy_work queue. It's not needed. The problem is easily reproducible with the following steps. Example: $ dmesg -w & $ systemctl enable --now fcoe $ fipvlan -s -c ens2f1 $ fcoeadm -d ens2f1.802 [ 583.464488] host2: libfc: Link down on port (7500a1) [ 583.472651] bnx2fc: 7500a1 - rport not created Yet!! [ 583.490468] ------------[ cut here ]------------ [ 583.538725] sysfs group 'power' not found for kobject 'rport-2:0-0' [ 583.568814] WARNING: CPU: 3 PID: 192 at fs/sysfs/group.c:279 sysfs_remove_group+0x6f/0x80 [ 583.607130] Modules linked in: dm_service_time 8021q garp mrp stp llc bnx2fc cnic uio rpcsec_gss_krb5 auth_rpcgss nfsv4 ... [ 583.942994] CPU: 3 PID: 192 Comm: kworker/3:2 Kdump: loaded Not tainted 5.14.0-39.el9.x86_64 #1 [ 583.984105] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 584.016535] Workqueue: fc_wq_2 fc_rport_final_delete [scsi_transport_fc] [ 584.050691] RIP: 0010:sysfs_remove_group+0x6f/0x80 [ 584.074725] Code: ff 5b 48 89 ef 5d 41 5c e9 ee c0 ff ff 48 89 ef e8 f6 b8 ff ff eb d1 49 8b 14 24 48 8b 33 48 c7 c7 ... [ 584.162586] RSP: 0018:ffffb567c15afdc0 EFLAGS: 00010282 [ 584.188225] RAX: 0000000000000000 RBX: ffffffff8eec4220 RCX: 0000000000000000 [ 584.221053] RDX: ffff8c1586ce84c0 RSI: ffff8c1586cd7cc0 RDI: ffff8c1586cd7cc0 [ 584.255089] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffb567c15afc00 [ 584.287954] R10: ffffb567c15afbf8 R11: ffffffff8fbe7f28 R12: ffff8c1486326400 [ 584.322356] R13: ffff8c1486326480 R14: ffff8c1483a4a000 R15: 0000000000000004 [ 584.355379] FS: 0000000000000000(0000) GS:ffff8c1586cc0000(0000) knlGS:0000000000000000 [ 584.394419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 584.421123] CR2: 00007fe95a6f7840 CR3: 0000000107674002 CR4: 00000000000606e0 [ 584.454888] Call Trace: [ 584.466108] device_del+0xb2/0x3e0 [ 584.481701] device_unregister+0x13/0x60 [ 584.501306] bsg_unregister_queue+0x5b/0x80 [ 584.522029] bsg_remove_queue+0x1c/0x40 [ 584.541884] fc_rport_final_delete+0xf3/0x1d0 [scsi_transport_fc] [ 584.573823] process_one_work+0x1e3/0x3b0 [ 584.592396] worker_thread+0x50/0x3b0 [ 584.609256] ? rescuer_thread+0x370/0x370 [ 584.628877] kthread+0x149/0x170 [ 584.643673] ? set_kthread_struct+0x40/0x40 [ 584.662909] ret_from_fork+0x22/0x30 [ 584.680002] ---[ end trace 53575ecefa942ece ]--- Link: https://lore.kernel.org/r/20220115040044.1013475-1-jmeneghi@redhat.com Fixes: 0cbf32e1681d ("[SCSI] bnx2fc: Avoid calling bnx2fc_if_destroy with unnecessary locks") Tested-by: Guangwu Zhang <guazhang@redhat.com> Co-developed-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices	Steffen Maier
	Suppose we have an environment with a number of non-NPIV FCP devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical FCP channel (HBA port) and its I_T nexus. Plus a number of storage target ports zoned to such shared channel. Now one target port logs out of the fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port recovery depending on the ADISC result. This happens on all such FCP devices (in different Linux images) concurrently as they all receive a copy of this RSCN. In the following we look at one of those FCP devices. Requests other than FSF_QTCB_FCP_CMND can be slow until they get a response. Depending on which requests are affected by slow responses, there are different recovery outcomes. Here we want to fix failed recoveries on port or adapter level by avoiding recovery requests that can be slow. We need the cached N_Port_ID for the remote port "link" test with ADISC. Just before sending the ADISC, we now intentionally forget the old cached N_Port_ID. The idea is that on receiving an RSCN for a port, we have to assume that any cached information about this port is stale. This forces a fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for the same port. Since we typically can still communicate with the nameserver efficiently, we now reach steady state quicker: Either the nameserver still does not know about the port so we stop recovery, or the nameserver already knows the port potentially with a new N_Port_ID and we can successfully and quickly perform open port recovery. For the one case, where ADISC returns successfully, we re-initialize port->d_id because that case does not involve any port recovery. This also solves a problem if the storage WWPN quickly logs into the fabric again but with a different N_Port_ID. Such as on virtual WWPN takeover during target NPIV failover. [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the RSCN from the storage FDISC was ignored by zfcp and we could not successfully recover the failover. On some later failback on the storage, we could have been lucky if the virtual WWPN got the same old N_Port_ID from the SAN switch as we still had cached. Then the related RSCN triggered a successful port reopen recovery. However, there is no guarantee to get the same N_Port_ID on NPIV FDISC. Even though NPIV-enabled FCP devices are not affected by this problem, this code change optimizes recovery time for gone remote ports as a side effect. The timely drop of cached N_Port_IDs prevents unnecessary slow open port attempts. While the problem might have been in code before v2.6.32 commit 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix depends on the gid_pn_work introduced with that commit, so we mark it as culprit to satisfy fix dependencies. Note: Point-to-point remote port is already handled separately and gets its N_Port_ID from the cached peer_d_id. So resetting port->d_id in general does not affect PtP. Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com Fixes: 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: <stable@vger.kernel.org> #2.6.32+ Suggested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24	scsi: pm8001: Fix bogus FW crash for maxcpus=1	John Garry
	According to the comment in check_fw_ready() we should not check the IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers. However we check this very field in process_oq() for processing the highest index interrupt vector. The highest interrupt vector is checked as the FW is programmed to signal fatal errors through this irq. Change that function to not check IOP1_READY for those mentioned controllers, but do check ILA_READY in both cases. The reason I assume that this was not hit earlier was because we always allocated 64 MSI(X), and just did not pass the vector index check in process_oq(), i.e. the handler never ran for vector index 63. Link: https://lore.kernel.org/r/1642508105-95432-1-git-send-email-john.garry@huawei.com Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>