Age | Commit message (Collapse) | Author |
|
This patch refactors the CT paths to use SLI-4 as the primary interface.
- Introduce generic lpfc_sli_prep_gen_req jump table routine
- Introduce generic lpfc_sli_prep_xmit_seq64 jump table routine
- Rename lpfcdiag_loop_post_rxbufs to lpfcdiag_sli3_loop_post_rxbufs to
indicate that it is an SLI3 only path
- Create new prep_wqe routine for unsolicited ELS rsp WQEs.
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
Link: https://lore.kernel.org/r/20220225022308.16486-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch refactors the remaining ELS paths to use SLI-4 as the primary
interface. Paths include RRQ, RSCN, unsolicited ELS RQST and RSP paths, ELS
timeouts, etc.:
- Remove unused routines lpfc_sli4_bpl2sgl and lpfc_sli4_iocb2wqe
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
Link: https://lore.kernel.org/r/20220225022308.16486-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The patch refactors the general ELS handling paths to migrate to SLI-4
structures or common element abstractions. The fabric login paths are
revised as part of this patch:
- New generic lpfc_sli_prep_els_req_rsp jump table routine
- Introduce ls_rjt_error_be and ulp_bde64_le unions to correct legacy
endianness assignments
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
- Clean up poor indentation on some of the ELS paths
Link: https://lore.kernel.org/r/20220225022308.16486-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce lpfc_prep_wqe routine.
The lpfc_prep_wqe() routine is used with lpfc_sli_issue_iocb() and
lpfc_sli_issue_iocb_wait(). The routine performs additional SLI-4 wqe field
setting that the generic routines did not perform as they kept their
actions compatible with both SLI3 and SLI4.
Link: https://lore.kernel.org/r/20220225022308.16486-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Convert the SLI4 fast and slow paths to use native SLI4 wqe constructs
instead of iocb SLI3-isms.
Includes the following:
- Create simple get_xxx and set_xxx routines to wrapper access to common
elements in both SLI3 and SLI4 commands - allowing calling routines to
avoid sli-rev-specific structures to access the elements.
- using the wqe in the job structure as the primary element
- use defines from SLI-4, not SLI-3
- Removal of iocb to wqe conversion from fast and slow path
- Add below routines to handle fast path
lpfc_prep_embed_io - prepares the wqe for fast path
lpfc_wqe_bpl2sgl - manages bpl to sgl conversion
lpfc_sli_wqe2iocb - converts a WQE to IOCB for SLI-3 path
- Add lpfc_sli3_iocb2wcqecmpl in completion path to convert an SLI-3
iocb completion to wcqe completion
- Refactor some of the code that works on both revs for clarity
Link: https://lore.kernel.org/r/20220225022308.16486-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure.
This is a "common" structure but many of the components refer to sli-rev
specific entities which can lead the developer astray as to what they
actually mean, should be set to, or when they should be used.
This first patch prepares the lpfc_iocbq structure so that elements common
to both SLI3 and SLI4 data paths are more appropriately named, making it
clear they apply generically.
Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually
generic to the paths are renamed to 'cmd':
- iocb_flag is renamed to cmd_flag
- lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag
- fabric_iocb_cmpl is renamed to fabric_cmd_cmpl
- wait_iocb_cmpl is renamed to wait_cmd_cmpl
- iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl
- rsvd2 member is renamed to num_bdes due to pre-existing usage
The structure name itself will retain the iocb reference as changing to a
more relevant "job" or "cmd" title induces many hundreds of line changes
for only a name change.
lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use
in the SLI3 path only.
Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Messages around firmware download were incorrectly tagged as being related
to discovery trace events. Thus, firmware download status ended up dumping
the trace log as well as the firmware update message. As there were a
couple of log messages in this state, the trace log was dumped multiple
times.
Resolve this by converting from trace events to SLI events.
Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
|
|
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
When building under -Warray-bounds, a warning is generated when casting a
u32 into MAILBOX_t (which is larger). This warning is conservative, but
it's not an unreasonable change to make to improve future robustness. Use a
tagged struct_group that can refer to either the specific fields or the
first u32 separately, silencing this warning:
drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_reset_barrier':
drivers/scsi/lpfc/lpfc_sli.c:4787:29: error: array subscript 'MAILBOX_t[0]' is partly outside array bounds of 'volatile uint32_t[1]' {aka 'volatile unsigned int[1]'} [-Werror=array-bounds]
4787 | ((MAILBOX_t *)&mbox)->mbxCommand = MBX_KILL_BOARD;
| ^~
drivers/scsi/lpfc/lpfc_sli.c:4752:27: note: while referencing 'mbox'
4752 | volatile uint32_t mbox;
| ^~~~
There is no change to the resulting executable instruction code.
Link: https://lore.kernel.org/r/20211203223351.107323-1-keescook@chromium.org
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Extraneous teardown routines are present in the firmware dump path causing
altered states in firmware captures.
When a firmware dump is requested via sysfs, trigger the dump immediately
without tearing down structures and changing adapter state.
The driver shall rely on pre-existing firmware error state clean up
handlers to restore the adapter.
Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If an FC link down transition while PLOGIs are outstanding to fabric well
known addresses, outstanding ABTS requests may result in a NULL pointer
dereference. Driver unload requests may hang with repeated "2878" log
messages.
The Link down processing results in ABTS requests for outstanding ELS
requests. The Abort WQEs are sent for the ELSs before the driver had set
the link state to down. Thus the driver is sending the Abort with the
expectation that an ABTS will be sent on the wire. The Abort request is
stalled waiting for the link to come up. In some conditions the driver may
auto-complete the ELSs thus if the link does come up, the Abort completions
may reference an invalid structure.
Fix by ensuring that Abort set the flag to avoid link traffic if issued due
to conditions where the link failed.
Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Applications determine loop support in part by querying the 'pls' sysfs
node. Reporting of 'pls' (Private Loop Support) is derived from the
descriptor returned by the COMMON_GET_SLI4_PARAMETERS mailbox command,
which is issued during initialization or after a reset.
The value of this field may change if there is a dynamic SFP change. The
driver currently will not pick up the change as there was no reset
scenario.
Rework to commonize the sending of the COMMON_GET_SLI4_PARAMETERS
command. Add the calling of the routine after receipt of an async event
indicating an SFP change.
Link: https://lore.kernel.org/r/20211020211417.88754-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
driver_resource_setup()
In cases when lpfc_enable_pci_dev() fails, lpfc_printf_log() with
LOG_TRACE_EVENT set will call lpfc_dmp_dbg() which uses the
phba->port_list_lock.
However, phba->port_list_lock does not get initialized until
lpfc_setup_driver_resource_phase1(). Thus, any initialization routine with
LOG_TRACE_EVENT log message prior to lpfc_setup_driver_resource_phase1()
will crash.
Revert LOG_TRACE_EVENT back to LOG_INIT for all log messages in routines
prior to lpfc_setup_driver_resource_phase1().
Link: https://lore.kernel.org/r/20211020211417.88754-2-jsmart2021@gmail.com
CC: Zheyu Ma <zheyuma97@gmail.com>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS
conflict reported by sfr.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When an FC-GS I/O is aborted by lpfc, the driver requires a node pointer
for a dereference operation. In the abort I/O routine, the driver miscasts
a context pointer to the wrong data type and overwrites a single byte
outside of the allocated space. This miscast is done in the abort I/O
function handler because the handler works on both FC-GS and FC-LS
commands. However, the code neglected to get the correct job location for
the node.
Fix this by acquiring the necessary node pointer from the correct job
structure depending on the I/O type.
Link: https://lore.kernel.org/r/20211004231210.35524-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.
After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.
Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.
Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently congestion management framework results are cleared whenever the
framework settings changed (such as it being turned off then back on). This
unfortunately means prior stats, rolled up to higher time windows lose
meaning.
Change such that stats are not cleared. Thus they pause and resume with
prior values still being considered.
Link: https://lore.kernel.org/r/20210910233159.115896-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.
There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.
Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.
The following modifications are made:
- A new flag is set in PCI error entrypoints so the driver can track being
called by that path.
- An interlock is added in the SLI hw error path and the PCI error path
such that only one of the paths proceeds with the teardown logic.
- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
cmds in cases of hw error.
- If entering the SLI port re-init calls, a case where SLI error teardown
was quick and beat the PCI calls now reporting error, check whether the
SLI port is still live on the PCI bus.
- In the PCI reset code to bring the adapter back, recheck the IRQ
settings. Different checks for SLI3 vs SLI4.
- In I/O completions, that may be called as part of the cleanup or
underway just before the hw error, check the state of the adapter. If
in error, shortcut handling that would expect further adapter
completions as the hw error won't be sending them.
- In routines waiting on I/O completions, which may have been in progress
prior to the hw error, detect the device is being torn down and abort
from their waits and just give up. This points to a larger issue in the
driver on ref-counting for data structures, as it doesn't have
ref-counting on q and port structures. We'll do this fix for now as it
would be a major rework to be done differently.
- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
failed back due to hw error.
- In I/O buf allocation, done at the start of new I/Os, check hw state and
fail if hw error.
Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting
of outstanding aborted I/Os and ABORT IOCBs. Thus,
lpfc_reset_flush_io_context() called from any TMF routine does not properly
wait to flush all outstanding FCP IOCBs leading to a block layer crash on
an invalid scsi_cmnd->request pointer.
kernel BUG at ../block/blk-core.c:1489!
RIP: 0010:blk_requeue_request+0xaf/0xc0
...
Call Trace:
<IRQ>
__scsi_queue_insert+0x90/0xe0 [scsi_mod]
blk_done_softirq+0x7e/0x90
__do_softirq+0xd2/0x280
irq_exit+0xd5/0xe0
do_IRQ+0x4c/0xd0
common_interrupt+0x87/0x87
</IRQ>
Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ,
LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a
new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to
build an ABORT iocb.
Restore lpfc_reset_flush_io_context() functionality by including counting
of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb().
Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com
Fixes: e1364711359f ("scsi: lpfc: Fix illegal memory access on Abort IOCBs")
Cc: <stable@vger.kernel.org> # v5.12+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.
Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.
Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.
Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The kernel test robot reported the following sparse warning:
".../lpfc_els.c:3984:25: sparse: sparse: cast from restricted __be16"
For the error being flagged, using be32_to_cpu() on a be16 data type, it
was simple enough. But a review of other elements and warnings were also
evaluated.
This patch corrected several items in the original patch:
- Using be32_to_cpu() on a be16 data type
- cpu_to_le32() used on a std uint32_t (CPU) data type.
Note: This is a byte array, but stored in LE layout by hardware at
32-bit boundaries. So it possibly needed conversion.
- Using cpu_to_le32() on a std uint16_t and assigned to a char typeA
- Using le32_to_cpu() on a le16 type
- Missing cpu_to_le16() on an assignment
Link: https://lore.kernel.org/r/20210830231243.6227-1-jsmart2021@gmail.com
Fixes: 9064aeb2df8e ("scsi: lpfc: Add EDC ELS support")
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver provides overwatch of the cm behavior by maintaining a set of rx
I/O statistics. This information is also used in later updating of the cm
statistics buffer.
Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Complete the enablement of the cm framework feature in the adapter. Perform
the following:
- Detect the presence of the congestion management framework feature.
When the cm framework is present:
- Issue the SET_FEATURE command to enable the feature.
- Register the cm statistics buffer with the adapter.
- Read the cm enablement buffer to determine the cm framework state for cm
management.
When cm management is enabled:
- Monitor all FPIN and congestion signalling events, incrementing
counters.
- Regularly sync with the adapter to communicate congestion events and to
receive an rx request limit.
- Monitor requests for rx data and ensure that no more than the
adapter prescribed limit is issued on the link. If the limit is
exceeded, SCSI and/or NVMe traffic is temporarily suspended.
- Maintain the minute, hourly, daily statistics buffer.
- Monitor for congestion enablement change events, causing a reread of the
enablement buffer and acting on any change in enablement.
And:
- Add teardown logic, including buffer deregistration, on adapter
detachment or reset.
Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When congestion mgmt is enabled, cmf has the driver regularly issue a
command to synchronize reporting of congestion mgmt events such as fpin and
signal delivery.
This patch adds the definition of the CMF_SYNC WQE and its CQE fields as
well as support for issuing the command. The patch also adds the few
remaining cmf-related SLI additions, such as feature definition for
enablement of CMF and notifications to the driver if the cm enablement mode
changes.
Link: https://lore.kernel.org/r/20210816162901.121235-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
As part of the cmf framework, the firmware maintains a table with
congestion related state information, specifically whether enabled and if
enabled, whether monitoring or actively managing congestion.
Add definition of the table and add support to read the table from the
adapter and determine if it is enabled. In support of this, the READ_OBJECT
mailbox command definition is added to the driver.
Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The cmf framework requires the driver to maintain a cm statistics table,
accessible inband, of congestion related statistics that are reported per
minute, rolled up to per hour, and rolled up again per day. Several days
worth may be maintained. The table is registered with the adapter when the
MIB feature is enabled.
Add definition of the table and add support to register the table with the
adapter. Includes definition and initialization of event counters that are
later added to the statistics table.
Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.
Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.
Implement handlers for congestion signals from the fabric and maintain
statistics for them.
Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
MIB support is currently limited to detecting support in the adapter and
ensuring FDMI support is enabled if present. For the new framework MIB
support also requires active enablement of support via the SET_FEATURES
command with the firmware.
Rework the MIB detection and enablement for the following:
- Move detection away from the get_sli4_parameters routine, and into the
hba_setup path. get_sli4_parameters is only called once at attachment
while hba_setup is called as part of any SLI port reset path. This
ensures detection after firmware download.
- Update SET_FEATURES mbx command for the MIB enablement feature and add
support for the feature.
- Create the cmf_setup routine to encapsulate the detection of MIB support
and perform the enablement of the MIB support feature.
Link: https://lore.kernel.org/r/20210816162901.121235-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Implement the SET_HOST_DATA mbox command to set date / time during
initialization. It is used by the firmware for various purposes including
congestion management and firmware dumps.
Link: https://lore.kernel.org/r/20210816162901.121235-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The pointer pcmd is being initialized with a value that is never read, the
assignment is redundant and can be removed.
Link: https://lore.kernel.org/r/20210721095350.41564-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unused value")
|
|
Mailbox commands sent via ioctl/bsg from user applications may be
interrupted from processing by a concurrently triggered PCI function
reset. The command will not generate a completion due to the reset. This
results in a user application hang waiting for the mailbox command to
complete.
Resolve by changing the function reset handler to detect that there was an
outstanding mailbox command and simulate a mailbox completion. Add some
additional debug when a mailbox command times out.
Link: https://lore.kernel.org/r/20210707184351.67872-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In the routine that generically cleans up an ELS after completion, the NDLP
put is done prior to the freeing of the IOCB. The IOCB may reference the
NDLP.
Move the lpfc_nlp_put() after freeing the IOCB.
Link: https://lore.kernel.org/r/20210707184351.67872-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Define additional status fields in mailbox commands to help provide
additional information when downloading new firmware.
Link: https://lore.kernel.org/r/20210707184351.67872-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.
The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...
|
|
Using list_move_tail() instead of list_del() + list_add_tail().
Link: https://lore.kernel.org/r/1623113493-49384-1-git-send-email-zou_wei@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add the VMID in wqe before sending out the request. The type of VMID
depends on the configured type and is checked before being appended.
Link: https://lore.kernel.org/r/20210608043556.274139-11-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add supporting datastructures for mailbox command which helps in
determining if the firmware supports appid. Allocate resources for VMID at
initialization time and clean them up on removal.
Link: https://lore.kernel.org/r/20210608043556.274139-7-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The abort_cmd_ia flag in an abort wqe describes whether an ABTS basic link
service should be transmitted on the FC link or not. Code added in
lpfc_sli4_issue_abort_iotag() set the abort_cmd_ia flag incorrectly,
surpressing ABTS transmission.
A previous LPFC change to build an abort wqe inverted prior logic that
determined whether an ABTS was to be issued on the FC link.
Revert this logic to its proper state.
Link: https://lore.kernel.org/r/20210528212240.11387-1-jsmart2021@gmail.com
Fixes: db7531d2b377 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver is encountering a crash in lpfc_free_iocb_list() while
performing initial attachment.
Code review found this to be an errant failure path that was taken, jumping
to a tag that then referenced structures that were uninitialized.
Fix the failure path.
Link: https://lore.kernel.org/r/20210514195559.119853-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.
Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.
Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.
Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver is crashing due to a bad pointer during driver load due in an
adisc acc receive routine. The driver is missing node get/put in the
mbx_resume_rpi paths.
Fix by adding the proper gets and puts into the resume_rpi path.
Link: https://lore.kernel.org/r/20210514195559.119853-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Fixes: a70e63eee1c1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The pointer tmp_hdr is being assigned a value that is never read, the
assignment is redundant and can be removed.
Link: https://lore.kernel.org/r/20210420104123.376420-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The dump command for reading a region passes a requested read length
specified in words (4-byte units). The response overwrites the same field
with the actual number of bytes read.
The mailbox handler for DUMP which reads VPD data (region 23) is treating
the response field as if it were still a word_cnt, thus multiplying it by 4
to set the read's "length". Given the read value was calculated based on
the size of the read buffer, the longer response length runs off the end of
the buffer.
Fix by reworking the code to use the response field as a byte count.
Link: https://lore.kernel.org/r/20210421234511.102206-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In devloss timer handler and in backend calls to terminate remote port I/O,
there is logic to walk through all active IOCBs and validate them to
potentially trigger an abort request. This logic is causing illegal memory
accesses which leads to a crash. Abort IOCBs, which may be on the list, do
not have an associated lpfc_io_buf struct. The driver is trying to map an
lpfc_io_buf struct on the IOCB and which results in a bogus address thus
the issue.
Fix by skipping over ABORT IOCBs (CLOSE IOCBs are ABORTS that don't send
ABTS) in the IOCB scan logic.
Link: https://lore.kernel.org/r/20210421234433.102079-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.
Standardize the pointer format to x%px.
Note: %px use is limited to discovery objects in order to aid core
analysis.
Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Clean up minor issues spotted by tools and code review:
- Spelling Errors
- Spurious characters and errors in function headers
- nvme_info wqerr and err fields source data reversed
- Extraneous new line in log message 0466
- Spacing error in log message 0109
- Messages 0140 and 0141 have portname and nodename reversed
- Incorrect function labelling in comment
Link: https://lore.kernel.org/r/20210412013127.2387-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In SLI-4, when performing a mailbox command with MBX_POLL, the driver uses
the BMBX register to send the command rather than the MQ. A flag is set
indicating the BMBX register is active and saves the mailbox job struct
(mboxq) in the mbox_active element of the adapter. The routine then waits
for completion or timeout. The mailbox job struct is not freed by the
routine. In cases of timeout, the adapter will be reset. The
lpfc_sli_mbox_sys_flush() routine will clean up the mbox in preparation for
the reset. It clears the BMBX active flag and marks the job structure as
MBX_NOT_FINISHED. But, it never frees the mboxq job structure. Expectation
in both normal completion and timeout cases is that the issuer of the mbx
command will free the structure. Unfortunately, not all calling paths are
freeing the memory in cases of error.
All calling paths were looked at and updated, if missing, to free the mboxq
memory regardless of completion status.
Link: https://lore.kernel.org/r/20210412013127.2387-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix a crash caused by a double put on the node when the driver completed an
ACC for an unsolicted abort on the same node. The second put was executed
by lpfc_nlp_not_used() and is wrong because the completion routine executes
the nlp_put when the iocbq was released. Additionally, the driver is
issuing a LOGO then immediately calls lpfc_nlp_set_state to put the node
into NPR. This call does nothing.
Remove the lpfc_nlp_not_used call and additional set_state in the
completion routine. Remove the lpfc_nlp_set_state post issue_logo. Isn't
necessary.
Link: https://lore.kernel.org/r/20210412013127.2387-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|