summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2019-10-21RDMA/hns: Fix wrong parameters when initial mtt of srq->idx_queWeihang Li
The parameters npages used to initial mtt of srq->idx_que shouldn't be same with srq's. And page_shift should be calculated from idx_buf_pg_sz. This patch fixes above issues and use field named npage and page_shift in hns_roce_buf instead of two temporary variables to let us use them anywhere. Fixes: 18df508c7970 ("RDMA/hns: Remove if-else judgment statements for creating srq") Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21RDMA/hns: remove a redundant le16_to_cpuWeihang Li
Type of ah->av.vlan is u16, there will be a problem using le16_to_cpu on it. Fixes: 82e620d9c3a0 ("RDMA/hns: Modify the data structure of hns_roce_av") Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fieldsParav Pandit
Current code tries to derive VLAN ID and compares it with GID attribute for matching entry. This raw search fails on macvlan netdevice as its not a VLAN device, but its an upper device of a VLAN netdevice. Due to this limitation, incoming QP1 packets fail to match in the GID table. Such packets are dropped. Hence, to support it, use the existing rdma_read_gid_l2_fields() that takes care of diffferent device types. Fixes: dbf727de7440 ("IB/core: Use GID table in AH creation and dmac resolution") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/qedr: Fix reported firmware versionKamal Heib
Remove spaces from the reported firmware version string. Actual value: $ cat /sys/class/infiniband/qedr0/fw_ver 8. 37. 7. 0 Expected value: $ cat /sys/class/infiniband/qedr0/fw_ver 8.37.7.0 Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com> Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/siw: free siw_base_qp in kref release routineKrishnamraju Eraparaju
As siw_free_qp() is the last routine to access 'siw_base_qp' structure, freeing this structure early in siw_destroy_qp() could cause touch-after-free issue. Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp(). Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/iwcm: move iw_rem_ref() calls out of spinlockKrishnamraju Eraparaju
kref release routines usually perform memory release operations, hence, they should not be called with spinlocks held. one such case is: SIW kref release routine siw_free_qp(), which can sleep via vfree() while freeing queue memory. Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks. Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18iw_cxgb4: fix ECN check on the passive acceptPotnuri Bharat Teja
pass_accept_req() is using the same skb for handling accept request and sending accept reply to HW. Here req and rpl structures are pointing to same skb->data which is over written by INIT_TP_WR() and leads to accessing corrupt req fields in accept_cr() while checking for ECN flags. Reordered code in accept_cr() to fetch correct req fields. Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Use a common pad buffer for 9B and 16B packetsMike Marciniszyn
There is no reason for a different pad buffer for the two packet types. Expand the current buffer allocation to allow for both packet types. Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Avoid excessive retry for TID RDMA READ requestKaike Wan
A TID RDMA READ request could be retried under one of the following conditions: - The RC retry timer expires; - A later TID RDMA READ RESP packet is received before the next expected one. For the latter, under normal conditions, the PSN in IB space is used for comparison. More specifically, the IB PSN in the incoming TID RDMA READ RESP packet is compared with the last IB PSN of a given TID RDMA READ request to determine if the request should be retried. This is similar to the retry logic for noraml RDMA READ request. However, if a TID RDMA READ RESP packet is lost due to congestion, header suppresion will be disabled and each incoming packet will raise an interrupt until the hardware flow is reloaded. Under this condition, each packet KDETH PSN will be checked by software against r_next_psn and a retry will be requested if the packet KDETH PSN is later than r_next_psn. Since each TID RDMA READ segment could have up to 64 packets and each TID RDMA READ request could have many segments, we could make far more retries under such conditions, and thus leading to RETRY_EXC_ERR status. This patch fixes the issue by removing the retry when the incoming packet KDETH PSN is later than r_next_psn. Instead, it resorts to RC timer and normal IB PSN comparison for any request retry. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17RDMA/mlx5: Clear old rate limit when closing QPRafi Wiener
Before QP is closed it changes to ERROR state, when this happens the QP was left with old rate limit that was already removed from the table. Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions") Signed-off-by: Rafi Wiener <rafiw@mellanox.com> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-08IB/mlx5: Introduce and use mkey context setting helper routineParav Pandit
Introduce and use set_mkc_access_pd_addr_fields() which sets mkey context's access rights, PD, address fields. Thereby avoid the code duplication. Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08RDMA/iser: Use iser_err instead of pr_err for loggingMax Gurtovoy
Make sure all the debug prints in ib_iser module use the common driver logger. Link: https://lore.kernel.org/r/1570366580-24097-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter seriesDevesh Sharma
Broadcom's 575xx adapter series has support for SRIOV VFs. Making changes to enable SRIOV VF support. There are two major area where changes are done: - Added new DB location for control-path and data-path DB ring - New devices do not need to issue the sriov-config slow-path command thus, skipping to call that firmware command. For now enabling support for 64 RoCE VFs. Link: https://lore.kernel.org/r/1570081715-14301-1-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08RDMA/srp: Calculate max_it_iu_size if remote max_it_iu length availableHonggang Li
The default maximum immediate size is too big for old srp clients, which do not support immediate data. According to the SRP and SRP-2 specifications, the IOControllerProfile attributes for SRP target ports contains the maximum initiator to target iu length. The maximum initiator to target iu length can be obtained by sending MAD packets to query subnet manager port and SRP target ports. We should calculate the max_it_iu_size instead of the default value, when remote maximum initiator to target iu length available. Link: https://lore.kernel.org/r/20190927174352.7800-2-honli@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08RDMA/srp: Add parse function for maximum initiator to target IU sizeHonggang Li
According to SRP specifications 'srp-r16a' and 'srp2r06', IOControllerProfile attributes for SRP target port include the maximum initiator to target IU size. SRP connection daemons, such as srp_daemon, can get the value from the subnet manager. The SRP connection daemon can pass this value to kernel. This patch adds a parse function for it. Upstream commit [1] enables the kernel parameter, 'use_imm_data', by default. [1] also use (8 * 1024) as the default value for kernel parameter 'max_imm_data'. With those default values, the maximum initiator to target IU size will be 8260. In case the SRPT modules, which include the in-tree 'ib_srpt.ko' module, do not support SRP-2 'immediate data' feature, the default maximum initiator to target IU size is significantly smaller than 8260. For 'ib_srpt.ko' module, which built from source before [2], the default maximum initiator to target IU is 2116. [1] introduces a regression issue for old srp targets with default kernel parameters, as the connection will be rejected because of a too large maximum initiator to target IU size. [1] commit 882981f4a411 ("RDMA/srp: Add support for immediate data") [2] commit 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data") Link: https://lore.kernel.org/r/20190927174352.7800-1-honli@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Add missing synchronize_srcu() for MW casesJason Gunthorpe
While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Put live in the correct place for ODP MRsJason Gunthorpe
live is used to signal to the pagefault thread that the MR is initialized and ready for use. It should be after the umem is assigned and all other setup is completed. This prevents races (at least) of the form: CPU0 CPU1 mlx5_ib_alloc_implicit_mr() implicit_mr_alloc() live = 1 imr->umem = umem num_pending_prefetch_inc() if (live) atomic_inc(num_pending_prefetch) atomic_set(num_pending_prefetch,0) // Overwrites other thread's store Further, live is being used with SRCU as the 'update' in an acquire/release fashion, so it can not be read and written raw. Move all live = 1's to after MR initialization is completed and use smp_store_release/smp_load_acquire() for manipulating it. Add a missing live = 0 when an implicit MR child is deleted, before queuing work to do synchronize_srcu(). The barriers in update_odp_mr() were some broken attempt to create a acquire/release, but were not even applied consistently and missed the point, delete it as well. Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcuJason Gunthorpe
During destroy setting live = 0 and then synchronize_srcu() prevents num_pending_prefetch from incrementing, and also, ensures that all work holding that count is queued on the WQ. Testing before causes races of the form: CPU0 CPU1 dereg_mr() mlx5_ib_advise_mr_prefetch() srcu_read_lock() num_pending_prefetch_inc() if (!live) live = 0 atomic_read() == 0 // skip flush_workqueue() atomic_inc() queue_work(); srcu_read_unlock() WARN_ON(atomic_read()) // Fails Swap the order so that the synchronize_srcu() prevents this. Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR") Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()Jason Gunthorpe
This fixes a race of the form: CPU0 CPU1 mlx5_ib_invalidate_range() mlx5_ib_invalidate_range() // This one actually makes npages == 0 ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) // This one does nothing ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) dying = 1; dying = 1; schedule_work(&umem_odp->work); // Double schedule of the same work schedule_work(&umem_odp->work); // BOOM npages and dying must be read and written under the umem_mutex lock. Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call mlx5_ib_update_xlt, and both need to be done in the same locking region, hoist the lock out of unmap. This avoids an expensive double critical section in mlx5_ib_invalidate_range(). Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MRJason Gunthorpe
mlx5_ib_update_xlt() must be protected against parallel free of the MR it is accessing, also it must be called single threaded while updating the HW. Otherwise we can have races of the form: CPU0 CPU1 mlx5_ib_update_xlt() mlx5_odp_populate_klm() odp_lookup() == NULL pklm = ZAP implicit_mr_get_data() implicit_mr_alloc() <update interval tree> mlx5_ib_update_xlt mlx5_odp_populate_klm() odp_lookup() != NULL pklm = VALID mlx5_ib_post_send_wait() mlx5_ib_post_send_wait() // Replaces VALID with ZAP This can be solved by putting both the SRCU and the umem_mutex lock around every call to mlx5_ib_update_xlt(). This ensures that the content of the interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr) will not change while it is running, and thus the posted WRs to update the KLM will always reflect the correct information. The race above will resolve by either having CPU1 wait till CPU0 completes the ZAP or CPU0 will run after the add and instead store VALID. The pagefault path adding children already holds the umem_mutex and SRCU, so the only missed lock is during MR destruction. Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Do not allow rereg of a ODP MRJason Gunthorpe
This code is completely broken, the umem of a ODP MR simply cannot be discarded without a lot more locking, nor can an ODP mkey be blithely destroyed via destroy_mkey(). Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/core: Fix wrong iterating on portsMohamad Heib
rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/cm: Use container_of() instead of typecastParav Pandit
Use container_of() macro to get to timewait info structure instead of typecasting. Link: https://lore.kernel.org/r/20191002122517.17721-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/mlx5: Remove unnecessary else statementErez Alfasi
'else' is not generally useful after a break or return. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/mlx5: Remove unnecessary return statementErez Alfasi
There is no reason to call return at the end of function which returns void. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-3-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Group boolean parameters to take less spaceLeon Romanovsky
Clean the code to store all boolean parameters inside one variable. Link: https://lore.kernel.org/r/20191002122517.17721-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Postpone HCA removal until after configfs directory removalBart Van Assche
A shortcoming of the SCSI target core is that it does not have an API for removing tpg or wwn objects. Wait until these directories have been removed before allowing HCA removal to finish. See also Bart Van Assche, "Re: Why using configfs as the only interface is wrong for a storage target", 2011-02-07 (https://www.spinics.net/lists/linux-scsi/msg50248.html). This patch fixes the following kernel crash: ================================================================== BUG: KASAN: use-after-free in __configfs_open_file.isra.4+0x1a8/0x400 Read of size 8 at addr ffff88811880b690 by task restart-lio-srp/1215 CPU: 1 PID: 1215 Comm: restart-lio-srp Not tainted 5.3.0-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x74/0x32d __kasan_report.cold.6+0x1b/0x36 kasan_report+0x12/0x17 __asan_load8+0x54/0x90 __configfs_open_file.isra.4+0x1a8/0x400 configfs_open_file+0x13/0x20 do_dentry_open+0x2b1/0x770 vfs_open+0x58/0x60 path_openat+0x5fa/0x14b0 do_filp_open+0x115/0x180 do_sys_open+0x1d4/0x2a0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f2f2bd3fcce Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 19 d7 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffd155f7850 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000564609ba88e0 RCX: 00007f2f2bd3fcce RDX: 0000000000000241 RSI: 0000564609ba8cf0 RDI: 00000000ffffff9c RBP: 00007ffd155f7950 R08: 0000000000000000 R09: 0000000000000020 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 0000564609ba8cf0 Allocated by task 995: save_stack+0x21/0x90 __kasan_kmalloc.constprop.9+0xc7/0xd0 kasan_kmalloc+0x9/0x10 __kmalloc+0x153/0x370 srpt_add_one+0x4f/0x561 [ib_srpt] add_client_context+0x251/0x290 [ib_core] ib_register_client+0x1da/0x220 [ib_core] iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock] do_one_initcall+0xcd/0x43a do_init_module+0x103/0x380 load_module+0x3b77/0x3eb0 __do_sys_finit_module+0x12d/0x1b0 __x64_sys_finit_module+0x43/0x50 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1221: save_stack+0x21/0x90 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 slab_free_freelist_hook+0x67/0x1e0 kfree+0xcb/0x2a0 srpt_remove_one+0x596/0x670 [ib_srpt] remove_client_context+0x9a/0xe0 [ib_core] disable_device+0x106/0x1b0 [ib_core] __ib_unregister_device+0x5f/0xf0 [ib_core] ib_unregister_driver+0x11a/0x170 [ib_core] 0xffffffffa087f666 __x64_sys_delete_module+0x1f8/0x2c0 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88811880b300 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 912 bytes inside of 4096-byte region [ffff88811880b300, ffff88811880c300) The buggy address belongs to the page: page:ffffea0004620200 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0x0 compound_mapcount: 0 flags: 0x2fff000000010200(slab|head) raw: 2fff000000010200 dead000000000100 dead000000000122 ffff88811ac0de00 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88811880b580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811880b600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88811880b680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88811880b700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811880b780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: https://lore.kernel.org/r/20190930231707.48259-16-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Make the code for handling port identities more systematicBart Van Assche
Introduce a new data structure for the information about an RDMA port name. This patch does not change any functionality. Link: https://lore.kernel.org/r/20190930231707.48259-15-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Rework the code that waits until an RDMA port is no longer in useBart Van Assche
The current implementation does not wait until srpt_release_channel() has finished and hence can trigger a use-after-free. Rework srpt_release_sport() such that it waits until srpt_release_channel() has finished. This patch fixes the following KASAN complaint: ================================================================== BUG: KASAN: use-after-free in srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt] Read of size 8 at addr ffff888115c71100 by task kworker/4:3/807 CPU: 4 PID: 807 Comm: kworker/4:3 Not tainted 5.3.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events srpt_release_channel_work [ib_srpt] Call Trace: dump_stack+0x86/0xca print_address_description+0x74/0x32d __kasan_report.cold.6+0x1b/0x36 kasan_report+0x12/0x17 __asan_load8+0x54/0x90 srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt] srpt_free_ioctx_ring.part.24+0x50/0x80 [ib_srpt] srpt_release_channel_work+0x2ad/0x390 [ib_srpt] process_one_work+0x51a/0xa60 worker_thread+0x67/0x5b0 kthread+0x1dc/0x200 ret_from_fork+0x24/0x30 Allocated by task 984: save_stack+0x21/0x90 __kasan_kmalloc.constprop.9+0xc7/0xd0 kasan_kmalloc+0x9/0x10 __kmalloc+0x153/0x370 srpt_add_one+0x4f/0x570 [ib_srpt] add_client_context+0x251/0x290 [ib_core] ib_register_client+0x1da/0x220 [ib_core] iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock] do_one_initcall+0xcd/0x43a do_init_module+0x103/0x380 load_module+0x3b77/0x3eb0 __do_sys_finit_module+0x12d/0x1b0 __x64_sys_finit_module+0x43/0x50 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1128: save_stack+0x21/0x90 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 slab_free_freelist_hook+0x67/0x1e0 kfree+0xcb/0x2a0 srpt_remove_one+0x569/0x5b0 [ib_srpt] remove_client_context+0x9a/0xe0 [ib_core] disable_device+0x106/0x1b0 [ib_core] __ib_unregister_device+0x5f/0xf0 [ib_core] ib_unregister_device_and_put+0x48/0x60 [ib_core] nldev_dellink+0x120/0x180 [ib_core] rdma_nl_rcv+0x287/0x480 [ib_core] netlink_unicast+0x2cc/0x370 netlink_sendmsg+0x3b1/0x630 __sys_sendto+0x1db/0x290 __x64_sys_sendto+0x80/0xa0 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888115c71100 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [ffff888115c71100, ffff888115c72100) The buggy address belongs to the page: page:ffffea0004571c00 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0xffff888115c70000 compound_mapcount: 0 flags: 0x2fff000000010200(slab|head) raw: 2fff000000010200 ffffea00045ac408 ffffea0004593208 ffff88811ac0de00 raw: ffff888115c70000 0000000000070002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888115c71000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888115c71080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888115c71100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888115c71180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888115c71200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: https://lore.kernel.org/r/20190930231707.48259-14-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Rework the approach for closing an RDMA channelBart Van Assche
Instead of relying on a waitqueue, report when the identity of an RDMA channel can be reused through a completion. Link: https://lore.kernel.org/r/20190930231707.48259-13-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Improve a debug messageBart Van Assche
The ib_srpt driver uses two different identifiers while registering a session with the LIO core. Report both identifiers if the modified pr_debug() statement is enabled. Link: https://lore.kernel.org/r/20190930231707.48259-12-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Fix handling of iWARP loginsBart Van Assche
The path_rec pointer is NULL set for IB and RoCE logins but not for iWARP logins. Hence check the path_rec pointer before dereferencing it. Link: https://lore.kernel.org/r/20190930231707.48259-11-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srpt: Fix handling of SR-IOV and iWARP portsBart Van Assche
Management datagrams (MADs) are not supported by SR-IOV VFs nor by iWARP ports. Support SR-IOV VFs and iWARP ports by only logging an error message if MAD handler registration fails. Link: https://lore.kernel.org/r/20190930231707.48259-10-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srp: Make route resolving error messages more informativeBart Van Assche
The IPv6 scope ID is essential when setting up an iWARP connection between IPv6 link-local addresses. Report the scope ID in error messages. Link: https://lore.kernel.org/r/20190930231707.48259-9-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srp: Honor the max_send_sge device attributeBart Van Assche
Instead of assuming that max_send_sge >= 3, restrict the number of scatter gather elements to what is supported by the RDMA adapter. Link: https://lore.kernel.org/r/20190930231707.48259-8-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/srp: Remove two castsBart Van Assche
This patch does not change any functionality. Link: https://lore.kernel.org/r/20190930231707.48259-7-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/siw: Make node GUIDs valid EUI-64 identifiersBart Van Assche
>From the IBTA: "GUID (Global Unique Identifier): A globally unique EUI-64 compliant identifier." Make sure that siw GUIDs are valid EUI-64 identifiers. Link: https://lore.kernel.org/r/20190930231707.48259-6-bvanassche@acm.org Cc: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error pathLeon Romanovsky
Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cxgb4: Do not dma memory off of the stackGreg KH
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack, which is generally considered a very bad thing. On some architectures it could be a security problem, but odds are none of them actually run this driver, so it's just a "normal" bug. Resolve this by allocating the memory for a message off of the heap instead of the stack. kmalloc() always will give us a proper memory location that DMA will work correctly from. Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernelPotnuri Bharat Teja
Remove iw_cxgb3 module from kernel as the corresponding HW Chelsio T3 has reached EOL. Link: https://lore.kernel.org/r/20190930074252.20133-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cm: Fix memory leak in cm_add/remove_oneJack Morgenstein
In the process of moving the debug counters sysfs entries, the commit mentioned below eliminated the cm_infiniband sysfs directory. This sysfs directory was tied to the cm_port object allocated in procedure cm_add_one(). Before the commit below, this cm_port object was freed via a call to kobject_put(port->kobj) in procedure cm_remove_port_fs(). Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated. This, however, meant that kfree was never called for the cm_port buffers. Fix this by adding explicit kfree(port) calls to functions cm_add_one() and cm_remove_one(). Note: the kfree call in the first chunk below (in the cm_add_one error flow) fixes an old, undetected memory leak. Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device") Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/core: Fix an error handling path in 'res_get_common_doit()'Christophe JAILLET
According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/i40iw: Associate ibdev to netdev before IB device registrationShiraz, Saleem
i40iw IB device registration fails with ENODEV. ib_register_device setup_device/setup_port_data i40iw_port_immutable ib_query_port iw_query_port ib_device_get_netdev(ENODEV) ib_device_get_netdev() does not have a netdev associated with the ibdev and thus fails. Use ib_device_set_netdev() to associate netdev to ibdev in i40iw before IB device registration. Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/rxe: Verify modify_device maskKamal Heib
Verify that the passed mask to rxe_modify_device() is supported. Link: https://lore.kernel.org/r/20190923104158.5331-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/bnxt_re: Remove unsupported modify_device callbackKamal Heib
There is no need to return always zero for function which is not supported. Link: https://lore.kernel.org/r/20190923104158.5331-3-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/core: Fix return code when modify_device isn't supportedKamal Heib
The proper return code is "-EOPNOTSUPP" when modify_device callback is not supported. Link: https://lore.kernel.org/r/20190923104158.5331-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/siw: Fix port number endianness in a debug messageBart Van Assche
sin_port and sin6_port are big endian member variables. Convert these port numbers into CPU endianness before printing. Link: https://lore.kernel.org/r/20190930231707.48259-5-bvanassche@acm.org Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/siw: Simplify several debug messagesBart Van Assche
Do not print the remote address if it is not used. Use %pISp instead of %pI4 %d. Link: https://lore.kernel.org/r/20190930231707.48259-4-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iwcm: Fix a lock inversion issueBart Van Assche
This patch fixes the lock inversion complaint: ============================================ WARNING: possible recursive locking detected 5.3.0-rc7-dbg+ #1 Not tainted -------------------------------------------- kworker/u16:6/171 is trying to acquire lock: 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm] but task is already holding lock: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u16:6/171: #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] stack backtrace: CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x8a/0xd6 __lock_acquire.cold+0xe1/0x24d lock_acquire+0x106/0x240 __mutex_lock+0x12e/0xcb0 mutex_lock_nested+0x1f/0x30 rdma_destroy_id+0x78/0x4a0 [rdma_cm] iw_conn_req_handler+0x5c9/0x680 [rdma_cm] cm_work_handler+0xe62/0x1100 [iw_cm] process_one_work+0x56d/0xac0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 This is not a bug as there are actually two lock classes here. Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iw_cxgb4: fix SRQ access from dump_qp()Potnuri Bharat Teja
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used by the application. This patch matches the QPID before dumping them. Also removes unwanted SRQ id addition to QP id xarray. Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray") Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>