summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rdmavt
AgeCommit message (Collapse)Author
2020-04-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "The majority of the patches are cleanups, refactorings and clarity improvements. This cycle saw some more activity from Syzkaller, I think we are now clean on all but one of those bugs, including the long standing and obnoxious rdma_cm locking design defect. Continue to see many drivers getting cleanups, with a few new user visible features. Summary: - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1 - Lots of cleanup patches for hns - Convert more places to use refcount - Aggressively lock the RDMA CM code that syzkaller says isn't working - Work to clarify ib_cm - Use the new ib_device lifecycle model in bnxt_re - Fix mlx5's MR cache which seems to be failing more often with the new ODP code - mlx5 'dynamic uar' and 'tx steering' user interfaces" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits) RDMA/bnxt_re: make bnxt_re_ib_init static IB/qib: Delete struct qib_ivdev.qp_rnd RDMA/hns: Fix uninitialized variable bug RDMA/hns: Modify the mask of QP number for CQE of hip08 RDMA/hns: Reduce the maximum number of extend SGE per WQE RDMA/hns: Reduce PFC frames in congestion scenarios RDMA/mlx5: Add support for RDMA TX flow table net/mlx5: Add support for RDMA TX steering IB/hfi1: Call kobject_put() when kobject_init_and_add() fails IB/hfi1: Fix memory leaks in sysfs registration and unregistration IB/mlx5: Move to fully dynamic UAR mode once user space supports it IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib IB/mlx5: Extend QP creation to get uar page index from user space IB/mlx5: Extend CQ creation to get uar page index from user space IB/mlx5: Expose UAR object and its alloc/destroy commands IB/hfi1: Get rid of a warning RDMA/hns: Remove redundant judgment of qp_type RDMA/hns: Remove redundant assignment of wc->smac when polling cq RDMA/hns: Remove redundant qpc setup operations RDMA/hns: Remove meaningless prints ...
2020-03-18IB/rdmavt: Delete unused routineMike Marciniszyn
This routine was obsoleted by the patch below. Delete it. Fixes: a2a074ef396f ("RDMA: Handle ucontext allocations by IB/core") Link: https://lore.kernel.org/r/20200316210454.7753.94689.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13IB/rdmavt: Free kernel completion queue when doneKaike Wan
When a kernel ULP requests the rdmavt to create a completion queue, it allocated the queue and set cq->kqueue to point to it. However, when the completion queue is destroyed, cq->queue is freed instead, leading to a memory leak: https://lore.kernel.org/r/215235485.15264050.1583334487658.JavaMail.zimbra@redhat.com unreferenced object 0xffffc90006639000 (size 12288): comm "kworker/u128:0", pid 8, jiffies 4295777598 (age 589.085s) hex dump (first 32 bytes): 4d 00 00 00 4d 00 00 00 00 c0 08 ac 8b 88 ff ff M...M........... 00 00 00 00 80 00 00 00 00 00 00 00 10 00 00 00 ................ backtrace: [<0000000035a3d625>] __vmalloc_node_range+0x361/0x720 [<000000002942ce4f>] __vmalloc_node.constprop.30+0x63/0xb0 [<00000000f228f784>] rvt_create_cq+0x98a/0xd80 [rdmavt] [<00000000b84aec66>] __ib_alloc_cq_user+0x281/0x1260 [ib_core] [<00000000ef3764be>] nvme_rdma_cm_handler+0xdb7/0x1b80 [nvme_rdma] [<00000000936b401c>] cma_cm_event_handler+0xb7/0x550 [rdma_cm] [<00000000d9c40b7b>] addr_handler+0x195/0x310 [rdma_cm] [<00000000c7398a03>] process_one_req+0xdd/0x600 [ib_core] [<000000004d29675b>] process_one_work+0x920/0x1740 [<00000000efedcdb5>] worker_thread+0x87/0xb40 [<000000005688b340>] kthread+0x327/0x3f0 [<0000000043a168d6>] ret_from_fork+0x3a/0x50 This patch fixes the issue by freeing cq->kqueue instead. Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory") Link: https://lore.kernel.org/r/20200313123957.14343.43879.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> # 5.4.x Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04Merge tag 'v5.6-rc4' into rdma.git for-nextJason Gunthorpe
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04RDMA/providers: Fix return value when QP type isn't supportedKamal Heib
The proper return code is "-EOPNOTSUPP" when the requested QP type is not supported by the provider. Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/rdmavt: Reset all QPs when the device is shut downKaike Wan
When the hfi1 device is shut down during a system reboot, it is possible that some QPs might have not not freed by ULPs. More requests could be post sent and a lingering timer could be triggered to schedule more packet sends, leading to a crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0 PGD 0 Oops: 0000 1 SMP Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1 Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018 task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000 RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP: 0018:ffff88105df43d48 EFLAGS: 00010046 RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000 RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000 R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0 R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000 FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000 ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98 ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde Call Trace: IRQ [ffffffff810a6b85] queue_work_on+0x45/0x50 [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1] [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff81097316] call_timer_fn+0x36/0x110 [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff8109982d] run_timer_softirq+0x22d/0x310 [ffffffff81090b3f] __do_softirq+0xef/0x280 [ffffffff816b6a5c] call_softirq+0x1c/0x30 [ffffffff8102d3c5] do_softirq+0x65/0xa0 [ffffffff81090ec5] irq_exit+0x105/0x110 [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50 [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80 EOI [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0 [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210 [ffffffff81034fee] arch_cpu_idle+0xe/0x30 [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0 [ffffffff81051af6] start_secondary+0x1b6/0x230 Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00 RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP ffff88105df43d48 CR2: 0000000000000102 The solution is to reset the QPs before the device resources are freed. This reset will change the QP state to prevent post sends and delete timers to prevent callbacks. Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-03IB/rdmavt: Correct comments in rdmavt_qp.h headerMike Marciniszyn
Comments need to be with the definition of rvt_restart_sge(). Other comments were duplicated in sw/rdmavt/rc.c and were removed. Fixes: 385156c5f2a6 ("IB/hfi: Move RC functions into a header file") Link: https://lore.kernel.org/r/20191219211934.58387.88014.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22infiniband: fix sw/rdmavt/ kernel-doc notationrd.dunlab@gmail.com
Add kernel-doc for missing function parameters. Remove excess kernel-doc descriptions. Fix expected kernel-doc formatting (use ':' instead of '-' after @funcarg). ../drivers/infiniband/sw/rdmavt/ah.c:138: warning: Excess function parameter 'udata' description in 'rvt_destroy_ah' ../drivers/infiniband/sw/rdmavt/vt.c:698: warning: Function parameter or member 'pkey_table' not described in 'rvt_init_port' ../drivers/infiniband/sw/rdmavt/cq.c:561: warning: Excess function parameter 'rdi' description in 'rvt_driver_cq_init' ../drivers/infiniband/sw/rdmavt/cq.c:575: warning: Excess function parameter 'rdi' description in 'rvt_cq_exit' ../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'qp' not described in 'rvt_add_rnr_timer' ../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'aeth' not described in 'rvt_add_rnr_timer' ../drivers/infiniband/sw/rdmavt/qp.c:2591: warning: Function parameter or member 'qp' not described in 'rvt_stop_rc_timers' ../drivers/infiniband/sw/rdmavt/qp.c:2624: warning: Function parameter or member 'qp' not described in 'rvt_del_timers_sync' ../drivers/infiniband/sw/rdmavt/qp.c:2697: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter_init' ../drivers/infiniband/sw/rdmavt/qp.c:2728: warning: Function parameter or member 'iter' not described in 'rvt_qp_iter_next' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'rdi' not described in 'rvt_qp_iter' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'v' not described in 'rvt_qp_iter' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter' Link: https://lore.kernel.org/r/20191010035240.251184229@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-10IB/rdmavt: Fix variable shadowing issue in rvt_create_cqNathan Chancellor
clang warns: drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (err) ^~~ drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs here return err; ^~~ drivers/infiniband/sw/rdmavt/cq.c:260:3: note: remove the 'if' if its condition is always false if (err) ^~~~~~~~ drivers/infiniband/sw/rdmavt/cq.c:253:7: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!cq->ip) { ^~~~~~~ drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs here return err; ^~~ drivers/infiniband/sw/rdmavt/cq.c:253:3: note: remove the 'if' if its condition is always false if (!cq->ip) { ^~~~~~~~~~~~~~ drivers/infiniband/sw/rdmavt/cq.c:211:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 2 warnings generated. The function scoped err variable is uninitialized when the flow jumps into the if statement. The if scoped err variable shadows the function scoped err variable, preventing the err assignments within the if statement to be reflected at the function level, which will cause uninitialized use when the goto statements are taken. Just remove the if scoped err declaration so that there is only one copy of the err variable for this function. Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory") Link: https://github.com/ClangBuiltLinux/linux/issues/594 Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08RDMA/rvt: Do not use a kernel header in the ABIJason Gunthorpe
rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce a new struct with the same layout and use it instead. Fixes: dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Add trace for map_mr_sgMike Marciniszyn
Add trace to debug map_mr_sg handling. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Enhance trace information for FRWR debugMike Marciniszyn
This patch enhances the MR trace information to enable more focused debug of MR issues. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR detailsMichael J. Ruhl
Add some helper functions to hide struct rvt_swqe details. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPsMichael J. Ruhl
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a non-zero reference count. IBTA 11.2.2 notes no such return value or error case: Output Modifiers: - Verb results: - Operation completed successfully. - Invalid HCA handle. - Invalid address handle. ULPs never test for this error and this will leak memory. The reference count exists to allow for driver independent progress mechanisms to process UD SWQEs in parallel with post sends. The SWQE will hold a reference count until the UD SWQE completes and then drops the reference. Fix by removing need to reference count the AH. Add a UD specific allocation to each SWQE entry to cache the necessary information for independent progress. Copy the information during the post send processing. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Set QP allowed opcodes after QP allocationMichael J. Ruhl
Currently QP allowed_ops is set after the QP is completely initialized. This curtails the use of this optimization for any initialization before allowed_ops is set. Fix by adding a helper to determine the correct allowed_ops and moving the setting of the allowed_ops to just after QP allocation. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is fullKamenee Arumugam
When a completion queue is full, the associated queue pairs are not put into the error state. According to the IBTA specification, this is a violation. Quote from IBTA spec: C9-218: A Requester Class F error occurs when the CQ is inaccessible or full and an attempt is made to complete a WQE. The Affected QP shall be moved to the error state and affiliated asynchronous errors generated as described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The current WQE and any subsequent WQEs are left in an unknown state. C11-37: The CI shall generate a CQ Error when a CQ overrun is detected. This condition will result in an Affiliated Asynchronous Error for any associated Work Queues when they attempt to use that CQ. Completions can no longer be added to the CQ. It is not guaranteed that completions present in the CQ at the time the error occurred can be retrieved. Possible causes include a CQ overrun or a CQ protection error. Put the qp in error state when cq is full. Implement a state called full to continue to put other associated QPs in error state. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Fracture single lock used for posting and processing RWQEsKamenee Arumugam
Usage of single lock prevents fetching posted and processing receive work queue entries from progressing simultaneously and impacts overall performance. Fracture the single lock used for posting and processing Receive Work Queue Entries (RWQEs) to allow the circular buffer to be filled and emptied at the same time. Two new spinlocks - one for the producers and one for the consumers used for posting and processing RWQEs simultaneously and the two indices are define on two different cache lines. The threshold count is used to avoid reading other index in different cache line every time. Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move receive work queue struct into uapi directoryKamenee Arumugam
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the providers but are not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move rvt_cq_wc struct into uapi directoryKamenee Arumugam
The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-20IB/{rdmavt, qib, hfi1}: Convert to new completion APIMike Marciniszyn
Convert all completions to use the new completion routine that fixes a race between post send and completion where fields from a SWQE can be read after SWQE has been freed. This patch also addresses issues reported in https://marc.info/?l=linux-kernel&m=155656897409107&w=2. The reserved operation path has no need for any barrier. The barrier for the other path is addressed by the smp_load_acquire() barrier. Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky
Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Clean destroy CQ in drivers do not return errorsLeon Romanovsky
Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-10RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Things are looking pretty quiet here in RDMA, not too many bug fixes rolling in right now. The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/efa: Remove MAYEXEC flag check from mmap flow mlx5: avoid 64-bit division IB/hfi1: Validate page aligned for a given virtual address IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value IB/hfi1: Insure freeze_work work_struct is canceled on shutdown IB/rdmavt: Fix alloc_qpn() WARN_ON() RDMA/core: Fix panic when port_data isn't initialized RDMA/uverbs: Pass udata on uverbs error unwind RDMA/core: Clear out the udata before error unwind RDMA/hns: Fix PD memory leak for internal allocation RDMA/srp: Rename SRP sysfs name after IB device rename trigger
2019-05-30IB/rdmavt: Use struct_size() helperGustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace the following form: sizeof(struct rvt_sge) * init_attr->cap.max_send_sge + sizeof(struct rvt_swqe) with: struct_size(swq, sg_list, init_attr->cap.max_send_sge) and so on... Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr valueMike Marciniszyn
The command 'ibv_devinfo -v' reports 0 for max_mr. Fix by assigning the query values after the mr lkey_table has been built rather than early on in the driver. Fixes: 7b1e2099adc8 ("IB/rdmavt: Move memory registration into rdmavt") Reviewed-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29IB/rdmavt: Fix alloc_qpn() WARN_ON()Mike Marciniszyn
The qpn allocation logic has a WARN_ON() that intends to detect the use of an index that will introduce bits in the lower order bits of the QOS bits in the QPN. Unfortunately, it has the following bugs: - it misfires when wrapping QPN allocation for non-QOS - it doesn't correctly detect low order QOS bits (despite the comment) The WARN_ON() should not be applied to non-QOS (qos_shift == 1). Additionally, it SHOULD test the qpn bits per the table below: 2 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, 0, 0, sc0], qp bit 1 always 0* 3-4 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, 0, sc1, sc0], qp bits [21] always 0 5-8 data VLs: [qp7, qp6, qp5, qp4, qp3, qp2, qp1] ^ [ 0, 0, 0, 0, sc2, sc1, sc0] qp bits [321] always 0 Fix by qualifying the warning for qos_shift > 1 and producing the correct mask to insure the above bits are zero without generating a superfluous warning. Fixes: 501edc42446e ("IB/rdmavt: Correct warning during QPN allocation") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-06RDMA/rdmavt: Catch use-after-free access of AH structuresLeon Romanovsky
Prior to commit d345691471b4 ("RDMA: Handle AH allocations by IB/core"), AH destroy path is rdmavt returned -EBUSY warning to application and caused to potential leakage of kernel memory of AH structure. After that commit, the AH structure is always freed but such early return in driver code can potentially cause to use-after-free error. Add warning to catch such situation to help driver developers to fix AH release path. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24Merge branch 'rdma_mmap' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/{rdmavt, qib, hfi1}: Use new routine to release reference countsMike Marciniszyn
The reference count adjustments on reference count completion are open coded throughout. Add a routine to do all reference count adjustments and use. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/rdmavt: Use more efficient allowed_opsMike Marciniszyn
QP creation already records the allowed_ops. Take advantage of that single field to replace multiple qp_type specific tests. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24IB/rdmavt: Fix ab/ba include issuesMike Marciniszyn
The currently include file ordering for rdmavt headers has an ab/ba include issue the precludes using inlines from rdma_vt.h in rdmavt_qp.h. At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h. Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h and move qp related inlines to rdmavt_qp.h. Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared by rdmavt_cq.h and rdmavt_qp.h. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-16IB/rdmavt: Fix frwr memory registrationJosh Collier
Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky
Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle AH allocations by IB/coreLeon Romanovsky
Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass only ib_udata in function prototypesShamir Rabinovitch
Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down ib_x destroy pathShamir Rabinovitch
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA/rdmavt: Use correct sizing on buffers holding page DMA addressesShiraz Saleem
The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04IB/rdmavt: Fix concurrency panics in QP post_send and modify to errorMichael J. Ruhl
The RC/UC code path can go through a software loopback. In this code path the receive side QP is manipulated. If two threads are working on the QP receive side (i.e. post_send, and modify_qp to an error state), QP information can be corrupted. (post_send via loopback) set r_sge loop update r_sge (modify_qp) take r_lock update r_sge <---- r_sge is now incorrect (post_send) update r_sge <---- crash, etc. ... This can lead to one of the two following crashes: BUG: unable to handle kernel NULL pointer dereference at (null) IP: hfi1_copy_sge+0xf1/0x2e0 [hfi1] PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0 Call Trace: ruc_loopback+0x49b/0xbc0 [hfi1] hfi1_do_send+0x38e/0x3e0 [hfi1] _hfi1_do_send+0x1e/0x20 [hfi1] process_one_work+0x17f/0x440 worker_thread+0x126/0x3c0 kthread+0xd1/0xe0 ret_from_fork_nospec_begin+0x21/0x21 or: BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: rvt_clear_mr_refs+0x45/0x370 [rdmavt] PGD 80000006ae5eb067 PUD ef15d0067 PMD 0 Call Trace: rvt_error_qp+0xaa/0x240 [rdmavt] rvt_modify_qp+0x47f/0xaa0 [rdmavt] ib_security_modify_qp+0x8f/0x400 [ib_core] ib_modify_qp_with_udata+0x44/0x70 [ib_core] modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs] ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs] ib_uverbs_write+0x272/0x430 [ib_uverbs] vfs_write+0xc0/0x1f0 SyS_write+0x7f/0xf0 system_call_fastpath+0x1c/0x21 Fix by using the appropriate locking on the receiving QP. Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt") Cc: <stable@vger.kernel.org> #v4.9+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04IB/rdmavt: Fix loopback send with invalidate orderingMike Marciniszyn
The IBTA spec notes: o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving an IETH, the Invalidate operation must not take place until after the normal transport header validation checks have been successfully completed. The rdmavt loopback code does the validation after the invalidate. Fix by relocating the operation specific logic for all SEND variants until after the validity checks. Cc: <stable@vger.kernel.org> #v4.20+ Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13RDMA/rdmavt: Adapt to handle non-uniform sizes on umem SGEsShiraz, Saleem
rdmavt expects a uniform size on all umem SGEs which is currently at PAGE_SIZE. Adapt to a umem API change which could return non-uniform sized SGEs due to combining contiguous PAGE_SIZE regions into an SGE. Use for_each_sg_page variant to unfold the larger SGEs into a list of PAGE_SIZE elements. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>