Age | Commit message (Collapse) | Author |
|
When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in sd. If we fail allocating the special page, we return
busy, and it'll get retried. But since ordering is honored for dispatch
requests, we can keep retrying this same IO and failing. Behind that IO
could be requests that want to free memory, but they never get the
chance. This means you get repeated spews of traces like this:
[1201401.625972] Call Trace:
[1201401.631748] dump_stack+0x4d/0x65
[1201401.639445] warn_alloc+0xec/0x190
[1201401.647335] __alloc_pages_slowpath+0xe84/0xf30
[1201401.657722] ? get_page_from_freelist+0x11b/0xb10
[1201401.668475] ? __alloc_pages_slowpath+0x2e/0xf30
[1201401.679054] __alloc_pages_nodemask+0x1f9/0x210
[1201401.689424] alloc_pages_current+0x8c/0x110
[1201401.699025] sd_setup_write_same16_cmnd+0x51/0x150
[1201401.709987] sd_init_command+0x49c/0xb70
[1201401.719029] scsi_setup_cmnd+0x9c/0x160
[1201401.727877] scsi_queue_rq+0x4d9/0x610
[1201401.736535] blk_mq_dispatch_rq_list+0x19a/0x360
[1201401.747113] blk_mq_sched_dispatch_requests+0xff/0x190
[1201401.758844] __blk_mq_run_hw_queue+0x95/0xa0
[1201401.768653] blk_mq_run_work_fn+0x2c/0x30
[1201401.777886] process_one_work+0x14b/0x400
[1201401.787119] worker_thread+0x4b/0x470
[1201401.795586] kthread+0x110/0x150
[1201401.803089] ? rescuer_thread+0x320/0x320
[1201401.812322] ? kthread_park+0x90/0x90
[1201401.820787] ? do_syscall_64+0x53/0x150
[1201401.829635] ret_from_fork+0x29/0x40
Ensure that the discard page allocation has a mempool backing, so we
know we can make progress.
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Several conflicts, seemingly all over the place.
I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.
The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.
The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.
cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.
__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy. Or at least I think it was :-)
Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.
The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Update the driver version to 12.0.0.9
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On driver termination, after the driver stops fw logging by writing a
register on the chip, the driver immediately unmaps and frees the logging
buffer, without confirming in any way that the chip has received the write
and terminated the logging. As termination on the chip is not immediate,
the chip may issue a dma request to the now unmapped dma buffer, resulting
in a iommu fault.
Change the driver to receive a confirmation that logging ahs been
terminated. As the driver always issues an SLI reset with the device as
part of shutdown, and as part of that is receiving confirmation that the
reset is complete - the driver was modified to perform the write to disable
fw logging prior to the SLI reset and only free the fw log buffer after the
SLI reset is complete. That guarantees use of the fw log buffer is fully
terminated when it is unmapped.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Driver missed classifying the chip type for G7 when reporting supported
topologies. This resulted in loop being shown as supported on FC links that
are not supported per the standard.
Add the chip classifications to the topology checks in the driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The current discovery state machine the driver treated FLOGI oddly. When
point to point, an FLOGI is to be exchanged by the two ports, with the port
with the most significant WWN then proceeding with PLOGI. The
implementation in the driver was keyed to closely with "what have I sent",
not with what has happened between the two endpoints. Thus, it blatantly
would ACC an FLOGI, but reject PLOGI's until it had its FLOGI ACC'd. The
problem is - the sending of FLOGI may be delayed for some reason, or the
response to FLOGI held off by the other side. In the failing situation the
other side sent an FLOGI, which was ACC'd, then sent PLOGIs which were then
rjt'd until the retry count for the PLOGIs were exceeded and the port gave
up. The FLOGI may have been very late in transmit, or the response held off
until the PLOGIs failed. Given the other port had the higher WWN, no PLOGIs
would occur and communication stopped.
Correct the situation by changing the FLOGI handling. Defer any response to
an FLOGI until the driver has sent its FLOGI as well. Then, upon either
completion of the sent FLOGI, or upon sending an ACC to a received FLOGI
(which may be received before or just after FLOGI was sent). the driver
will act on who has the higher WWN. if the other port does, the driver will
noop any handling of an FLOGI response (if outstanding) and wait for PLOGI.
If the local port does, the driver will transition to sending PLOGI and
will noop any action on responding to an FLOGI (if not yet received).
Fortunately, to implement this, it only took another state flag and
deferring any FLOGI response if the FLOGI has yet to be transmit. All
subsequent actions were already in place.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In some link initialization sequences, the fw generates an erroneous FLOGI
payload to the driver without an intervening link bounce. The driver, when
it sees a 2nd FLOGI without an intervening link bounce, automatically
performs a link bounce. In this, the link bounce causes the situate to
repeat and in a nasty loop of link bounces.
Resolve the issue by validating the FLOGI payload. The erroneous FLOGI will
contain VVL signatures that are not normal. When the driver sees these, it
will simply reject the flogi rather than bouncing the link. The reject is
consumed within the firmware.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Two initiator ports were cable swapped and after swap both went down. The
driver internally swaps the nlp nodes based on matching node wwn's but not
the same nport id as before. After detecting a change in the nodes RPI, the
driver sends an UNREG_RPI command and clears the NLP_RPI_REGISTERED flag,
then swaps the node information with the other node. But the other node's
NLP_RPI_REGISTERED flag is also cleared, but it is done so without an
UNREG_RPI being sent, which causes the later REG_RPI for that other node to
fail as the hardware believes its still registered.
Additionally, if the node swap occurred while the two nodes had PLOGI's in
flight, the fc4_types weren't properly getting swapped such that when the
PLOGIs commpleted and PRLI's were then sent, the PRLI's acted on bad
protocol types so the PRLI was for the wrong protocol. NVME devices saw
SCSI FCP PRLIs and vice versa.
Clean up the node swap so that the NLP_RPI_REGISTERED flag is handled
properly.
Fix the handling of the fc4_types when the nodes are swapped as well
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Depending on the chipset, the number of NPIV vports may vary and be in
excess of what most switches support (256). To avoid confusion with the
users, limit the reported NPIV vports to 256.
Additionally correct the 16G adapter which is reporting a bogus NPIV vport
number if the link is down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Driver is hitting null pring pointers in lpfc_do_work().
Pointer assignment occurs based on SLI-revision. If recovering after an
error, its possible the sli revision for the port was cleared, making the
lpfc_phba_elsring() not return a ring pointer, thus the null pointer.
Add SLI revision checking to lpfc_phba_elsring() and status checking to all
callers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Renumber one of the 0711 log messages so there isn't a duplication.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands. The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port. During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.
Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.
Filter out multiple UNREG_RPIs being queued up for the same remote port.
Beef up log messages in this area.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.
Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update manufacturer attribute to reflect Broadcom Inc, not Emulex
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While trying to get adapter fw-log for a function whose buffsize was set to
0, kernel panic occurred.
When buffsize is 0, the kernel buffer for the log won't be allocated. When
fw log usage was enabled, it failed to check the buffer size, and log usage
was started. Eventually the driver referenced the unallocated log buffer.
Added checks of the buffer size before allowing fw logging to be enabled
and added check for valid buffer if enabling fw log.
Performed a couple other minor cleanups while fixing this:
- clarified log messages
- re-evaluated log message severity
- treat any error as an error, not only a couple codes
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If "interface" is NULL then we can't release it and trying to will only
lead to an Oops.
Fixes: aea71a024914 ("[SCSI] bnx2fc: Introduce interface structure for each vlan interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This reverts commit db186382af21e926e90df19499475f2552192b77.
This commit introduced regression with FCP discovery so revert it to fix
discovery for FCP luns.
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
dma_addr_t can be u64 on pae systems but isa_virt_to_bus only ever
returns unsigned long (because an ISA physical address can only be 24
bits). Cast to unsigned long to avoid division.
Fixes: 1794ef2b150d ("scsi: aha1542: convert to DMA mapping API")
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
NULL check before some freeing functions is not needed.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
NULL check before some freeing functions is not needed.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
NULL check before some freeing functions is not needed.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Nesting in __qla2x00_abort_all_cmds() is way too deep. Reduce the nesting
level by introducing a helper function. This patch does not change any
functionality.
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
flush_scheduled_work() is not required as csio_hw_exit_workers() calls
cancel_work_sync() for hw->evtq_work.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1056537 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Khalid Aziz <khalid@gonehiking.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
By spec, the ufs sense data is 18 bytes long.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replaced vmalloc + memset with vzalloc
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replaced vmalloc + memset with vzalloc
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four obvious bug fixes. The vmw_pscsi is so old that it's amazing
no-one noticed before now"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix a race in sub-channel creation that can cause panic
scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload
scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
scsi: lpfc: fix block guard enablement on SLI3 adapters
|
|
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.
* tag 'v4.20-rc5': (664 commits)
Linux 4.20-rc5
PCI: Fix incorrect value returned from pcie_get_speed_cap()
MAINTAINERS: Update linux-mips mailing list address
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
...
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Notice that, in this particular case, I replaced "Missed the backend's
Closing state -- fallthrough" with "fall through - Missed the backend's
Closing state", which contains the "fall through" annotation at the
beginnig of the code comment, which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Notice that, in this particular case, I replaced "Fall thru" with a "Fall
through" annotation and added a dash as a token in order to separate the
"Fall through" annotation from the rest of the comment on the same line,
which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Addresses-Coverity-ID: 1195463 ("Missing break in switch")
Addresses-Coverity-ID: 1195464 ("Missing break in switch")
Addresses-Coverity-ID: 1195465 ("Missing break in switch")
Addresses-Coverity-ID: 1195466 ("Missing break in switch")
Addresses-Coverity-ID: 1357338 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Also, a break statement is properly aligned.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch improves code readability but does not change any functionality.
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a spelling mistake in some description text, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add /* fallthrough */ annotation, to eliminate compilation warning:
warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We can concurrently try to open the same sub-channel from 2 paths:
path #1: vmbus_onoffer() -> vmbus_process_offer() -> handle_sc_creation().
path #2: storvsc_probe() -> storvsc_connect_to_vsp() ->
-> storvsc_channel_init() -> handle_multichannel_storage() ->
-> vmbus_are_subchannels_present() -> handle_sc_creation().
They conflict with each other, but it was not an issue before the recent
commit ae6935ed7d42 ("vmbus: split ring buffer allocation from open"),
because at the beginning of vmbus_open() we checked newchannel->state so
only one path could succeed, and the other would return with -EINVAL.
After ae6935ed7d42, the failing path frees the channel's ringbuffer by
vmbus_free_ring(), and this causes a panic later.
Commit ae6935ed7d42 itself is good, and it just reveals the longstanding
race. We can resolve the issue by removing path #2, i.e. removing the
second vmbus_are_subchannels_present() in handle_multichannel_storage().
BTW, the comment "Check to see if sub-channels have already been created"
in handle_multichannel_storage() is incorrect: when we unload the driver,
we first close the sub-channel(s) and then close the primary channel, next
the host sends rescind-offer message(s) so primary->sc_list will become
empty. This means the first vmbus_are_subchannels_present() in
handle_multichannel_storage() is never useful.
Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
Cc: stable@vger.kernel.org
Cc: Long Li <longli@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
1. Removed logic to update HW producer index in interrupt context.
2. Update HW producer index after UIO ring and buffer gets initialized.
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Default packet size is 0x400. For jumbo packets set to 0x2400.
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add packet filter to avoid unnecessary packet processing in iscsiuio.
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The kernel panic was observed after switch side perturbation,
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8132b5a0>] strcmp+0x20/0x40
PGD 0 Oops: 0000 [#1] SMP
CPU: 8 PID: 647 Comm: kworker/8:1 Tainted: G W OE ------------ 3.10.0-693.el7.x86_64 #1
Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018
Workqueue: slowpath-13:00. qed_slowpath_task [qed]
task: ffff880429eb8fd0 ti: ffff880429190000 task.ti: ffff880429190000
RIP: 0010:[<ffffffff8132b5a0>] [<ffffffff8132b5a0>] strcmp+0x20/0x40
RSP: 0018:ffff880429193c68 EFLAGS: 00010202
RAX: 000000000000000a RBX: 0000000000000002 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88042bda7a41
RBP: ffff880429193c68 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000007 R11: ffff88042b3af338 R12: ffff880420b007a0
R13: ffff88081aa56af8 R14: 0000000000000001 R15: ffff88081aa50410
FS: 0000000000000000(0000) GS:ffff88042fe00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000019f2000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff880429193d20 ffffffffc02a0c90 ffffc90004b32000 ffff8803fd3ec600
ffff88042bda7800 ffff88042bda7a00 ffff88042bda7840 ffff88042bda7a40
0000000129193d10 2e3836312e323931 ff000a342e363232 ffffffffc01ad99d
Call Trace:
[<ffffffffc02a0c90>] qedi_get_protocol_tlv_data+0x270/0x470 [qedi]
[<ffffffffc01ad99d>] ? qed_mfw_process_tlv_req+0x24d/0xbf0 [qed]
[<ffffffffc01653ae>] qed_mfw_fill_tlv_data+0x5e/0xd0 [qed]
[<ffffffffc01ad9b9>] qed_mfw_process_tlv_req+0x269/0xbf0 [qed]
Fix kernel NULL pointer deref by checking for session is online before
getting iSCSI TLV data.
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver load on some systems failed with error,
[0004:01:00.5]:[qedi_request_msix_irq:2524]:8: request_irq failed.
Allocate the IRQs based on MSIX count obtained from qed module instead of
number of queues.
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|