Age | Commit message (Collapse) | Author |
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/all/37bd6895bb946f6d785ab5fe32f1a6f4b9e77c26.1738746904.git.namcao@linutronix.de
|
|
Documentation/process/license-rules.rst and checkpatch expect the SPDX
identifier syntax for multiple licenses to use capital "OR". Correct it
to keep consistent format and avoid copy-paste issues.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230823092912.122674-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual wide collection of unrelated items in drivers:
- Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5,
rxe, usnic, usnic, bnxt_re, ocrdma, iser:
- remove unnecessary NULL checks
- kmap obsolescence
- pci_enable_pcie_error_reporting() obsolescence
- unused variables and macros
- trace event related warnings
- casting warnings
- Code cleanups for irdm and erdma
- EFA reporting of 128 byte PCIe TLP support
- mlx5 more agressively uses the out of order HW feature
- Big rework of how state machines and tasks work in rxe
- Fix a syzkaller found crash netdev refcount leak in siw
- bnxt_re revises their HW description header
- Congestion control for bnxt_re
- Use mmu_notifiers more safely in hfi1
- mlx5 gets better support for PCIe relaxed ordering inside VMs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits)
RDMA/efa: Add rdma write capability to device caps
RDMA/mlx5: Use correct device num_ports when modify DC
RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call
RDMA/rxe: Fix spinlock recursion deadlock on requester
RDMA/mlx5: Fix flow counter query via DEVX
RDMA/rxe: Protect QP state with qp->state_lock
RDMA/rxe: Move code to check if drained to subroutine
RDMA/rxe: Remove qp->req.state
RDMA/rxe: Remove qp->comp.state
RDMA/rxe: Remove qp->resp.state
RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
net/mlx5: Update relaxed ordering read HCA capabilities
RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
RDMA: Add ib_virt_dma_to_page()
RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame()
RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c
IB/hfi1: Place struct mmu_rb_handler on cache line start
IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests
...
|
|
Every caller passes in zero, meaning they don't want any partial copy to
zero the remainder of the destination buffer.
Which is just as well, because the implementation of that function
didn't actually even look at that argument, and wasn't even aware it
existed, although some misleading comments did mention it still.
The 'zerorest' thing is a historical artifact of how "copy_from_user()"
worked, in that it would zero the rest of the kernel buffer that it
copied into.
That zeroing still exists, but it's long since been moved to generic
code, and the raw architecture-specific code doesn't do it. See
_copy_from_user() in lib/usercopy.c for this all.
However, while __copy_user_nocache() shares some history and superficial
other similarities with copy_from_user(), it is in many ways also very
different.
In particular, while the code makes it *look* similar to the generic
user copy functions that can copy both to and from user space, and take
faults on both reads and writes as a result, __copy_user_nocache() does
no such thing at all.
__copy_user_nocache() always copies to kernel space, and will never take
a page fault on the destination. What *can* happen, though, is that the
non-temporal stores take a machine check because one of the use cases is
for writing to stable memory, and any memory errors would then take
synchronous faults.
So __copy_user_nocache() does look a lot like copy_from_user(), but has
faulting behavior that is more akin to our old copy_in_user() (which no
longer exists, but copied from user space to user space and could fault
on both source and destination).
And it very much does not have the "zero the end of the destination
buffer", since a problem with the destination buffer is very possibly
the very source of the partial copy.
So this whole thing was just a confusing historical artifact from having
shared some code with a completely different function with completely
different use cases.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is no need to check 'rdi->qp_dev' for NULL. The field 'qp_dev'
is created in rvt_register_device() which will fail if the 'qp_dev'
allocation fails in rvt_driver_qp_init(). Overwise this pointer
doesn't changed and passed to rvt_qp_exit() by the next step.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Link: https://lore.kernel.org/r/20230303124408.16685-1-n.petrova@fintech.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The "cplen" result used by the memcpy() into struct rvt_swqe "wqe" may
be sized to 80 for struct rvt_ud_wr (which is member "ud_wr", not "wr"
which is only 40 bytes in size). Change the destination union member so
the compiler can use the correct bounds check.
struct rvt_swqe {
union {
struct ib_send_wr wr; /* don't use wr.sg_list */
struct rvt_ud_wr ud_wr;
...
};
...
};
Silences false positive memcpy() run-time warning:
memcpy: detected field-spanning write (size 80) of single field "&wqe->wr" at drivers/infiniband/sw/rdmavt/qp.c:2043 (size 40)
Reported-by: Zhang Yi <yizhan@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216561
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230218185701.never.779-kees@kernel.org
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Following patches have dependencies.
Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:
https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The documentation of the function rvt_error_qp says both r_lock and
s_lock need to be held when calling that function.
It also asserts using lockdep that both of those locks are held.
rvt_error_qp is called form rvt_send_cq, which is called from
rvt_qp_complete_swqe, which is called from rvt_send_complete, which is
called from rvt_ruc_loopback in two places. Both of these places do not
hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of
these places.
The r_lock acquiring cannot be added in rvt_qp_complete_swqe because
some of its other callers already have r_lock acquired.
Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The documentation of the function rvt_error_qp says both r_lock and s_lock
need to be held when calling that function. It also asserts using lockdep
that both of those locks are held. However, the commit I referenced in
Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no
longer covered by r_lock. This results in the lockdep assertion failing
and also possibly in a race condition.
Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error")
Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The rdma-core test suite sends an unaligned remote address and expects a
failure.
ERROR: test_atomic_non_aligned_addr (tests.test_atomic.AtomicTest)
The qib/hfi1 rc handling validates properly, but the test has the client
and server on the same system.
The loopback of these operations is a distinct code path.
Fix by syntaxing the proposed remote address in the loopback code path.
Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Link: https://lore.kernel.org/r/1642584489-141005-1-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Return negative -ENOMEM instead of positive ENOMEM. Returning a postive
value will cause an Oops because it becomes an ERR_PTR() in the
create_qp() function.
Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20211013080645.GD6010@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
use SPDX-License-Identifier instead of a verbose license text
Link: https://lore.kernel.org/r/20210823023530.48-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Convert QP object to follow IB/core general allocation scheme. That
change allows us to make sure that restrack properly kref the memory.
Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The rdmavt QP has fields that are both needed for the control and data
path. Such mixed declaration caused to the very specific allocation flow
with kzalloc_node and SGE list embedded into the struct rvt_qp.
This patch separates QP creation to two: regular memory allocation for the
control path and specific code for the SGE list, while the access to the
later is performed through derefenced pointer.
Such pointer and its context are expected to be in the cache, so
performance difference is expected to be negligible, if any exists.
Link: https://lore.kernel.org/r/f66c1e20ccefba0db3c69c58ca9c897f062b4d1c.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Compilation with W=1 produces warnings similar to the below.
drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment
starts with '/**', but isn't a kernel-doc comment. Refer
Documentation/doc-guide/kernel-doc.rst
All such occurrences were found with the following one line
git grep -A 1 "\/\*\*" drivers/infiniband/
Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Fixes the following W=1 kernel build warning(s):
drivers/infiniband/sw/rdmavt/qp.c:1929: warning: Function parameter or member 'post_parms' not described in 'rvt_qp_valid_operation'
Link: https://lore.kernel.org/r/20210126124732.3320971-8-lee.jones@linaro.org
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Fixes the following W=1 kernel build warning(s):
drivers/infiniband/sw/rdmavt/qp.c:165: warning: Function parameter or member 'rdi' not described in 'rvt_wss_init'
drivers/infiniband/sw/rdmavt/qp.c:329: warning: Function parameter or member 'rdi' not described in 'init_qpn_table'
drivers/infiniband/sw/rdmavt/qp.c:534: warning: Function parameter or member 'type' not described in 'alloc_qpn'
drivers/infiniband/sw/rdmavt/qp.c:664: warning: Function parameter or member 'wqe' not described in 'rvt_swqe_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:664: warning: Function parameter or member 'lkey' not described in 'rvt_swqe_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:682: warning: Function parameter or member 'qp' not described in 'rvt_qp_sends_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:682: warning: Function parameter or member 'lkey' not described in 'rvt_qp_sends_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:706: warning: Function parameter or member 'qp' not described in 'rvt_qp_acks_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:706: warning: Function parameter or member 'lkey' not described in 'rvt_qp_acks_has_lkey'
drivers/infiniband/sw/rdmavt/qp.c:866: warning: Function parameter or member 'rdi' not described in 'rvt_init_qp'
drivers/infiniband/sw/rdmavt/qp.c:920: warning: Function parameter or member 'rdi' not described in '_rvt_reset_qp'
drivers/infiniband/sw/rdmavt/qp.c:1736: warning: Function parameter or member 'udata' not described in 'rvt_destroy_qp'
drivers/infiniband/sw/rdmavt/qp.c:1924: warning: Function parameter or member 'qp' not described in 'rvt_qp_valid_operation'
drivers/infiniband/sw/rdmavt/qp.c:1924: warning: Function parameter or member 'post_parms' not described in 'rvt_qp_valid_operation'
drivers/infiniband/sw/rdmavt/qp.c:1924: warning: Function parameter or member 'wr' not described in 'rvt_qp_valid_operation'
drivers/infiniband/sw/rdmavt/qp.c:2020: warning: Function parameter or member 'call_send' not described in 'rvt_post_one_wr'
drivers/infiniband/sw/rdmavt/qp.c:2621: warning: Function parameter or member 'qp' not described in 'rvt_stop_rnr_timer'
Link: https://lore.kernel.org/r/20210121094519.2044049-31-lee.jones@linaro.org
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Some functions have different names between their prototypes and the
kernel-doc markup.
Others need to be fixed, as kernel-doc markups should use this format:
identifier - description
Link: https://lore.kernel.org/r/78b98c41a5a0f4c0106433d305b143028a4168b0.1606823973.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.
Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.
Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.
Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
The lookaside count is improperly initialized to the size of the
Receive Queue with the additional +1. In the traces below, the
RQ size is 384, so the count was set to 385.
The lookaside count is then rarely refreshed. Note the high and
incorrect count in the trace below:
rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c
qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1
The head,tail indicate there is only one RWQE posted although the count
says 385 and we correctly return the element 0.
The next call to rvt_get_rwqe with the decremented count:
rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9058 wr_id 0 qpn c
qpt 2 pid 3018 num_sge 0 head 1 tail 1, count 384
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1
Note that the RQ is empty (head == tail) yet we return the RWQE at tail 1,
which is not valid because of the bogus high count.
Best case, the RWQE has never been posted and the rc logic sees an RWQE
that is too small (all zeros) and puts the QP into an error state.
In the worst case, a server slow at posting receive buffers might fool
rvt_get_rwqe() into fetching an old RWQE and corrupt memory.
Fix by deleting the faulty initialization code and creating an
inline to fetch the posted count and convert all callers to use
new inline.
Fixes: f592ae3c999f ("IB/rdmavt: Fracture single lock used for posting and processing RWQEs")
Link: https://lore.kernel.org/r/20200728183848.22226.29132.stgit@awfm-01.aw.intel.com
Reported-by: Zhaojuan Guo <zguo@redhat.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In case of failure of alloc_ud_wq_attr(), the memory allocated by
rvt_alloc_rq() is not freed. Fix it by calling rvt_free_rq() using the
existing clean-up code.
Fixes: d310c4bf8aea ("IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs")
Link: https://lore.kernel.org/r/20200614041148.131983-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Linux 5.7-rc6
Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
resolved by deleting dr_cq_event, matching how netdev resolved it.
Required for dependencies in the following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Adds capability to create a qpn to be recognized as an accelerated
UD QP for ipoib.
This is accomplished by reserving 0x81 in byte[0] of the qpn as the
prefix for these qp types and reserving qpns between 0x810000 and
0x81ffff.
The hfi1 capability mask already contained a flag for the VNIC netdev.
This has been renamed and extended to include both VNIC and ipoib.
The rvt code to allocate qps now recognizes this flag and sets 0x81
into byte[0] of the qpn.
The code to allocate qpns is modified to reset the qpn numbering when it
is detected that a value is located in byte[0] for a UD QP and it is a
qpn being requested for net dev use. If it is a regular UD QP then it is
allowable to have bits set in byte[0] of the qpn and provide the
previously normal behavior.
The code to free the qpn now checks for the AIP prefix value of 0x81 and
removes it from the qpn before being freed so that the lower 16 bit
number can be reused.
This patch requires minor changes in the IB core and ipoib to facilitate
the creation of accelerated UP QPs.
Link: https://lore.kernel.org/r/20200511160607.173205.11757.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The commit below modified rvt_create_mmap_info() to return ERR_PTR's but
didn't update the callers to handle them. Modify rvt_create_mmap_info() to
only return ERR_PTR and fix all error checking after
rvt_create_mmap_info() was called.
Fixes: ff23dfa13457 ("IB: Pass only ib_udata in function prototypes")
Link: https://lore.kernel.org/r/20200424173146.10970-1-sudipm.mukherjee@gmail.com
Cc: stable@vger.kernel.org [5.4+]
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Required due to dependencies in following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The proper return code is "-EOPNOTSUPP" when the requested QP type is
not supported by the provider.
Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
PGD 0
Oops: 0000 1 SMP
Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
Call Trace:
IRQ
[ffffffff810a6b85] queue_work_on+0x45/0x50
[ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
[ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff81097316] call_timer_fn+0x36/0x110
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff8109982d] run_timer_softirq+0x22d/0x310
[ffffffff81090b3f] __do_softirq+0xef/0x280
[ffffffff816b6a5c] call_softirq+0x1c/0x30
[ffffffff8102d3c5] do_softirq+0x65/0xa0
[ffffffff81090ec5] irq_exit+0x105/0x110
[ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
[ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
EOI
[ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
[ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
[ffffffff81034fee] arch_cpu_idle+0xe/0x30
[ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
[ffffffff81051af6] start_secondary+0x1b6/0x230
Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP ffff88105df43d48
CR2: 0000000000000102
The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.
Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add kernel-doc for missing function parameters.
Remove excess kernel-doc descriptions.
Fix expected kernel-doc formatting (use ':' instead of '-' after @funcarg).
../drivers/infiniband/sw/rdmavt/ah.c:138: warning: Excess function parameter 'udata' description in 'rvt_destroy_ah'
../drivers/infiniband/sw/rdmavt/vt.c:698: warning: Function parameter or member 'pkey_table' not described in 'rvt_init_port'
../drivers/infiniband/sw/rdmavt/cq.c:561: warning: Excess function parameter 'rdi' description in 'rvt_driver_cq_init'
../drivers/infiniband/sw/rdmavt/cq.c:575: warning: Excess function parameter 'rdi' description in 'rvt_cq_exit'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'qp' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'aeth' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2591: warning: Function parameter or member 'qp' not described in 'rvt_stop_rc_timers'
../drivers/infiniband/sw/rdmavt/qp.c:2624: warning: Function parameter or member 'qp' not described in 'rvt_del_timers_sync'
../drivers/infiniband/sw/rdmavt/qp.c:2697: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter_init'
../drivers/infiniband/sw/rdmavt/qp.c:2728: warning: Function parameter or member 'iter' not described in 'rvt_qp_iter_next'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'rdi' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'v' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter'
Link: https://lore.kernel.org/r/20191010035240.251184229@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
a new struct with the same layout and use it instead.
Fixes: dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add some helper functions to hide struct rvt_swqe details.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a
non-zero reference count. IBTA 11.2.2 notes no such return value or error
case:
Output Modifiers:
- Verb results:
- Operation completed successfully.
- Invalid HCA handle.
- Invalid address handle.
ULPs never test for this error and this will leak memory.
The reference count exists to allow for driver independent progress
mechanisms to process UD SWQEs in parallel with post sends. The SWQE will
hold a reference count until the UD SWQE completes and then drops the
reference.
Fix by removing need to reference count the AH. Add a UD specific
allocation to each SWQE entry to cache the necessary information for
independent progress. Copy the information during the post send
processing.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Currently QP allowed_ops is set after the QP is completely initialized.
This curtails the use of this optimization for any initialization before
allowed_ops is set.
Fix by adding a helper to determine the correct allowed_ops and moving the
setting of the allowed_ops to just after QP allocation.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
When a completion queue is full, the associated queue pairs are not put
into the error state. According to the IBTA specification, this is a
violation.
Quote from IBTA spec:
C9-218: A Requester Class F error occurs when the CQ is inaccessible or
full and an attempt is made to complete a WQE. The Affected QP shall be
moved to the error state and affiliated asynchronous errors generated as
described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The
current WQE and any subsequent WQEs are left in an unknown state.
C11-37: The CI shall generate a CQ Error when a CQ overrun is
detected. This condition will result in an Affiliated Asynchronous Error
for any associated Work Queues when they attempt to use that
CQ. Completions can no longer be added to the CQ. It is not guaranteed
that completions present in the CQ at the time the error occurred can be
retrieved. Possible causes include a CQ overrun or a CQ protection error.
Put the qp in error state when cq is full. Implement a state called full
to continue to put other associated QPs in error state.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Usage of single lock prevents fetching posted and processing receive work
queue entries from progressing simultaneously and impacts overall
performance.
Fracture the single lock used for posting and processing Receive Work
Queue Entries (RWQEs) to allow the circular buffer to be filled and
emptied at the same time. Two new spinlocks - one for the producers and
one for the consumers used for posting and processing RWQEs simultaneously
and the two indices are define on two different cache lines. The threshold
count is used to avoid reading other index in different cache line every
time.
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the
providers but are not in uapi directory. As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.
Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
For dependencies in next patches.
Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Convert all completions to use the new completion routine that
fixes a race between post send and completion where fields from
a SWQE can be read after SWQE has been freed.
This patch also addresses issues reported in
https://marc.info/?l=linux-kernel&m=155656897409107&w=2.
The reserved operation path has no need for any barrier.
The barrier for the other path is addressed by the
smp_load_acquire() barrier.
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace the following form:
sizeof(struct rvt_sge) * init_attr->cap.max_send_sge + sizeof(struct rvt_swqe)
with:
struct_size(swq, sg_list, init_attr->cap.max_send_sge)
and so on...
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The qpn allocation logic has a WARN_ON() that intends to detect the use of
an index that will introduce bits in the lower order bits of the QOS bits
in the QPN.
Unfortunately, it has the following bugs:
- it misfires when wrapping QPN allocation for non-QOS
- it doesn't correctly detect low order QOS bits (despite the comment)
The WARN_ON() should not be applied to non-QOS (qos_shift == 1).
Additionally, it SHOULD test the qpn bits per the table below:
2 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^
[ 0, 0, 0, 0, 0, 0, sc0], qp bit 1 always 0*
3-4 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^
[ 0, 0, 0, 0, 0, sc1, sc0], qp bits [21] always 0
5-8 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^
[ 0, 0, 0, 0, sc2, sc1, sc0] qp bits [321] always 0
Fix by qualifying the warning for qos_shift > 1 and producing the correct
mask to insure the above bits are zero without generating a superfluous
warning.
Fixes: 501edc42446e ("IB/rdmavt: Correct warning during QPN allocation")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The reference count adjustments on reference count completion
are open coded throughout.
Add a routine to do all reference count adjustments and use.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
QP creation already records the allowed_ops.
Take advantage of that single field to replace multiple qp_type
specific tests.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.
Make ib_udata the only argument psssed.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The RC/UC code path can go through a software loopback. In this code path
the receive side QP is manipulated.
If two threads are working on the QP receive side (i.e. post_send, and
modify_qp to an error state), QP information can be corrupted.
(post_send via loopback)
set r_sge
loop
update r_sge
(modify_qp)
take r_lock
update r_sge <---- r_sge is now incorrect
(post_send)
update r_sge <---- crash, etc.
...
This can lead to one of the two following crashes:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: hfi1_copy_sge+0xf1/0x2e0 [hfi1]
PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0
Call Trace:
ruc_loopback+0x49b/0xbc0 [hfi1]
hfi1_do_send+0x38e/0x3e0 [hfi1]
_hfi1_do_send+0x1e/0x20 [hfi1]
process_one_work+0x17f/0x440
worker_thread+0x126/0x3c0
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
or:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: rvt_clear_mr_refs+0x45/0x370 [rdmavt]
PGD 80000006ae5eb067 PUD ef15d0067 PMD 0
Call Trace:
rvt_error_qp+0xaa/0x240 [rdmavt]
rvt_modify_qp+0x47f/0xaa0 [rdmavt]
ib_security_modify_qp+0x8f/0x400 [ib_core]
ib_modify_qp_with_udata+0x44/0x70 [ib_core]
modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs]
ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs]
ib_uverbs_write+0x272/0x430 [ib_uverbs]
vfs_write+0xc0/0x1f0
SyS_write+0x7f/0xf0
system_call_fastpath+0x1c/0x21
Fix by using the appropriate locking on the receiving QP.
Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Cc: <stable@vger.kernel.org> #v4.9+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The IBTA spec notes:
o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving
an IETH, the Invalidate operation must not take place until after the
normal transport header validation checks have been successfully
completed.
The rdmavt loopback code does the validation after the invalidate.
Fix by relocating the operation specific logic for all SEND variants until
after the validity checks.
Cc: <stable@vger.kernel.org> #v4.20+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|