summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2022-01-05RDMA/core: Don't infoleak GRH fieldsLeon Romanovsky
If dst->is_global field is not set, the GRH fields are not cleared and the following infoleak is reported. ===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 vfs_write+0x8ce/0x2030 fs/read_write.c:588 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline] __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Local variable resp created at: ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 Bytes 40-59 of 144 are uninitialized Memory access of size 144 starts at ffff888167523b00 Data copied to user address 0000000020000100 CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ===================================================== Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr") Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05RDMA/uverbs: Check for null return of kmalloc_arrayJiasheng Jiang
Because of the possible failure of the allocation, data might be NULL pointer and will cause the dereference of the NULL pointer later. Therefore, it might be better to check it and return -ENOMEM. Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api") Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"Maor Gottlieb
This patch is not the full fix and still causes to call traces during mlx5_ib_dereg_mr(). This reverts commit f0ae4afe3d35e67db042c58a52909e06262b740f. Fixes: f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow") Link: https://lore.kernel.org/r/20211222101312.1358616-1-maorg@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-04RDMA/rxe: Prevent double freeing rxe_map_set()Li Zhijian
The same rxe_map_set could be freed twice: rxe_reg_user_mr() -> rxe_mr_init_user() -> rxe_mr_free_map_set() # 1st -> rxe_drop_ref() ... -> rxe_mr_cleanup() -> rxe_mr_free_map_set() # 2nd Follow normal convection and put resource cleanup either in the error unwind of the allocator, or the overall free function. Leave the object unchanged with a NULL cur_map_set on failure and remove the unncessary free in rxe_mr_init_user(). Link: https://lore.kernel.org/r/20211228014406.1033444-1-lizhijian@cn.fujitsu.com Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14RDMA/hns: Replace kfree() with kvfree()Jiacheng Shi
Variables allocated by kvmalloc_array() should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree() with kvfree() here. Fixes: 6fd610c5733d ("RDMA/hns: Support 0 hop addressing for SRQ buffer") Link: https://lore.kernel.org/r/20211210094234.5829-1-billsjc@sjtu.edu.cn Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn> Acked-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14IB/qib: Fix memory leak in qib_user_sdma_queue_pkts()José Expósito
The wrong goto label was used for the error case and missed cleanup of the pkt allocation. Fixes: d39bf40e55e6 ("IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields") Link: https://lore.kernel.org/r/20211208175238.29983-1-jose.exposito89@gmail.com Addresses-Coverity-ID: 1493352 ("Resource leak") Signed-off-by: José Expósito <jose.exposito89@gmail.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14RDMA/hns: Fix RNR retransmission issue for HIP08Yangyang Li
Due to the discrete nature of the HIP08 timer unit, a requester might finish the timeout period sooner, in elapsed real time, than its responder does, even when both sides share the identical RNR timeout length included in the RNR Nak packet and the responder indeed starts the timing prior to the requester. Furthermore, if a 'providential' resend packet arrived before the responder's timeout period expired, the responder is certainly entitled to drop the packet silently in the light of IB protocol. To address this problem, our team made good use of certain hardware facts: 1) The timing resolution regards the transmission arrangements is 1 microsecond, e.g. if cq_period field is set to 3, it would be interpreted as 3 microsecond by hardware 2) A QPC field shall inform the hardware how many timing unit (ticks) constitutes a full microsecond, which, by default, is 1000 3) It takes 14ns for the processor to handle a packet in the buffer, so the RNR timeout length of 10ns would ensure our processing mechanism is disabled during the entire timeout period and the packet won't be dropped silently To achieve (3), we permanently set the QPC field mentioned in (2) to zero which nominally indicates every time tick is equivalent to a microsecond in wall-clock time; now, a RNR timeout period at face value of 10 would only last 10 ticks, which is 10ns in wall-clock time. It's worth noting that we adapt the driver by magnifying certain configuration parameters(cq_period, eq_period and ack_timeout)by 1000 given the user assumes the configuring timing unit to be microseconds. Also, this particular improvisation is only deployed on HIP08 since other hardware has already solved this issue. Fixes: cfc85f3e4b7f ("RDMA/hns: Add profile support for hip08 driver") Link: https://lore.kernel.org/r/20211209140655.49493-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQTatyana Nikolova
Completion events (CEs) are lost if the application is allowed to arm the CQ more than two times when no new CE for this CQ has been generated by the HW. Check if arming has been done for the CQ and if not, arm the CQ for any event otherwise promote to arm the CQ for any event only when the last arm event was solicited. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07RDMA/irdma: Report correct WC errorsShiraz Saleem
Return IBV_WC_REM_OP_ERR for responder QP errors instead of IBV_WC_REM_ACCESS_ERR. Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07RDMA/irdma: Fix a potential memory allocation issue in ↵Christophe JAILLET
'irdma_prm_add_pble_mem()' 'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in 'pchunk->sizeofbitmap'. When it is allocated, the size (in bytes) is computed by: size_in_bits >> 3 There are 2 issues (numbers bellow assume that longs are 64 bits): - there is no guarantee here that 'pchunk->bitmapmem.size' is modulo BITS_PER_LONG but bitmaps are stored as longs (sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long)) - the number of bytes is computed with a shift, not a round up, so we may allocate less memory than needed (sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2 longs are needed = 16 bytes) Fix both issues by using 'bitmap_zalloc()' and remove the useless 'bitmapmem' from 'struct irdma_chunk'. While at it, remove some useless NULL test before calling kfree/bitmap_free. Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Link: https://lore.kernel.org/r/5e670b640508e14b1869c3e8e4fb970d78cbe997.1638692171.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07RDMA/irdma: Fix a user-after-free in add_pble_prmShiraz Saleem
When irdma_hmc_sd_one fails, 'chunk' is freed while its still on the PBLE info list. Add the chunk entry to the PBLE info list only after successful setting of the SD in irdma_hmc_sd_one. Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager") Link: https://lore.kernel.org/r/20211207152135.2192-1-shiraz.saleem@intel.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddrMike Marciniszyn
This buffer is currently allocated in hfi1_init(): if (reinit) ret = init_after_reset(dd); else ret = loadtime_init(dd); if (ret) goto done; /* allocate dummy tail memory for all receive contexts */ dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev, sizeof(u64), &dd->rcvhdrtail_dummy_dma, GFP_KERNEL); if (!dd->rcvhdrtail_dummy_kvaddr) { dd_dev_err(dd, "cannot allocate dummy tail memory\n"); ret = -ENOMEM; goto done; } The reinit triggered path will overwrite the old allocation and leak it. Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation to hfi1_free_devdata(). Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: 46b010d3eeb8 ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07IB/hfi1: Fix early init panicMike Marciniszyn
The following trace can be observed with an init failure such as firmware load failures: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP PTI CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1 Workqueue: events work_for_cpu_fn RIP: 0010:0x0 Code: Bad RIP value. RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00 RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100 R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180 R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000 FS: 0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0 Call Trace: receive_context_interrupt+0x1f/0x40 [hfi1] __free_irq+0x201/0x300 free_irq+0x2e/0x60 pci_free_irq+0x18/0x30 msix_free_irq.part.2+0x46/0x80 [hfi1] msix_clean_up_interrupts+0x2b/0x70 [hfi1] hfi1_init_dd+0x640/0x1a90 [hfi1] do_init_one.isra.19+0x34d/0x680 [hfi1] local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1a7/0x360 worker_thread+0x1cf/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 The free_irq() results in a callback to the registered interrupt handler, and rcd->do_interrupt is NULL because the receive context data structures are not fully initialized. Fix by ensuring that the do_interrupt is always assigned and adding a guards in the slow path handler to detect and handle a partially initialized receive context and noop the receive. Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const array") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07IB/hfi1: Insure use of smp_processor_id() is preempt disabledMike Marciniszyn
The following BUG has just surfaced with our 5.16 testing: BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081 caller is sdma_select_user_engine+0x72/0x210 [hfi1] CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S 5.16.0-rc1+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 Call Trace: <TASK> dump_stack_lvl+0x33/0x42 check_preemption_disabled+0xbf/0xe0 sdma_select_user_engine+0x72/0x210 [hfi1] ? _raw_spin_unlock_irqrestore+0x1f/0x31 ? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1] hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1] ? hfi1_write_iter+0xb8/0x200 [hfi1] hfi1_write_iter+0xb8/0x200 [hfi1] do_iter_readv_writev+0x163/0x1c0 do_iter_write+0x80/0x1c0 vfs_writev+0x88/0x1a0 ? recalibrate_cpu_khz+0x10/0x10 ? ktime_get+0x3e/0xa0 ? __fget_files+0x66/0xa0 do_writev+0x65/0x100 do_syscall_64+0x3a/0x80 Fix this long standing bug by moving the smp_processor_id() to after the rcu_read_lock(). The rcu_read_lock() implicitly disables preemption. Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07IB/hfi1: Correct guard on eager buffer deallocationMike Marciniszyn
The code tests the dma address which legitimately can be 0. The code should test the kernel logical address to avoid leaking eager buffer allocations that happen to map to a dma address of 0. Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled") Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-29RDMA/rtrs: Call {get,put}_cpu_ptr to silence a debug kernel warningGuoqing Jiang
With preemption enabled (CONFIG_DEBUG_PREEMPT=y), the following appeared when rnbd client tries to map remote block device. BUG: using smp_processor_id() in preemptible [00000000] code: bash/1733 caller is debug_smp_processor_id+0x17/0x20 CPU: 0 PID: 1733 Comm: bash Not tainted 5.16.0-rc1 #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x78 dump_stack+0x10/0x12 check_preemption_disabled+0xe4/0xf0 debug_smp_processor_id+0x17/0x20 rtrs_clt_update_all_stats+0x3b/0x70 [rtrs_client] rtrs_clt_read_req+0xc3/0x380 [rtrs_client] ? rtrs_clt_init_req+0xe3/0x120 [rtrs_client] rtrs_clt_request+0x1a7/0x320 [rtrs_client] ? 0xffffffffc0ab1000 send_usr_msg+0xbf/0x160 [rnbd_client] ? rnbd_clt_put_sess+0x60/0x60 [rnbd_client] ? send_usr_msg+0x160/0x160 [rnbd_client] ? sg_alloc_table+0x27/0xb0 ? sg_zero_buffer+0xd0/0xd0 send_msg_sess_info+0xe9/0x180 [rnbd_client] ? rnbd_clt_put_sess+0x60/0x60 [rnbd_client] ? blk_mq_alloc_tag_set+0x2ef/0x370 rnbd_clt_map_device+0xba8/0xcd0 [rnbd_client] ? send_msg_open+0x200/0x200 [rnbd_client] rnbd_clt_map_device_store+0x3e5/0x620 [rnbd_client To supress the calltrace, let's call get_cpu_ptr/put_cpu_ptr pair in rtrs_clt_update_rdma_stats to disable preemption when accessing per-cpu variable. While at it, let's make the similar change in rtrs_clt_update_wc_stats. And for rtrs_clt_inc_failover_cnt, though it was only called inside rcu section, but it still can be preempted in case CONFIG_PREEMPT_RCU is enabled, so change it to {get,put}_cpu_ptr pair either. Link: https://lore.kernel.org/r/20211128133501.38710-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25RDMA/hns: Do not destroy QP resources in the hw resetting phaseYangyang Li
When hns_roce_v2_destroy_qp() is called, the brief calling process of the driver is as follows: ...... hns_roce_v2_destroy_qp hns_roce_v2_qp_modify hns_roce_cmd_mbox hns_roce_qp_destroy If hns_roce_cmd_mbox() detects that the hardware is being reset during the execution of the hns_roce_cmd_mbox(), the driver will not be able to get the return value from the hardware (the firmware cannot respond to the driver's mailbox during the hardware reset phase). The driver needs to wait for the hardware reset to complete before continuing to execute hns_roce_qp_destroy(), otherwise it may happen that the driver releases the resources but the hardware is still accessing. In order to fix this problem, HNS RoCE needs to add a piece of code to wait for the hardware reset to complete. The original interface get_hw_reset_stat() is the instantaneous state of the hardware reset, which cannot accurately reflect whether the hardware reset is completed, so it needs to be replaced with the ae_dev_reset_cnt interface. The sign that the hardware reset is complete is that the return value of the ae_dev_reset_cnt interface is greater than the original value reset_cnt recorded by the driver. Fixes: 6a04aed6afae ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset") Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25RDMA/hns: Do not halt commands during reset until laterYangyang Li
is_reset is used to indicate whether the hardware starts to reset. When hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet started to reset. If is_reset is set at this time, all mailbox operations of resource destroy actions will be intercepted by driver. When the driver cleans up resources, but the hardware is still accessed, the following errors will appear: arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000003f arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0800 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043e arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0800 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000020880000436 arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0880 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043a arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0840 hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0) arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 hns3 0000:35:00.0: received unknown or unhandled event of vector0 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 {34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7 is_reset will be set correctly in check_aedev_reset_status(), so the setting in hns_roce_hw_v2_reset_notify_down() should be deleted. Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting") Link: https://lore.kernel.org/r/20211123084809.37318-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25RDMA/mlx5: Fix releasing unallocated memory in dereg MR flowAlaa Hleihel
For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though it is a user MR. This causes function mlx5_free_priv_descs() to think that it is a kernel MR, leading to wrongly accessing mr->descs that will get wrong values in the union which leads to attempt to release resources that were not allocated in the first place. For example: DMA-API: mlx5_core 0000:08:00.1: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes] WARNING: CPU: 8 PID: 1021 at kernel/dma/debug.c:961 check_unmap+0x54f/0x8b0 RIP: 0010:check_unmap+0x54f/0x8b0 Call Trace: debug_dma_unmap_page+0x57/0x60 mlx5_free_priv_descs+0x57/0x70 [mlx5_ib] mlx5_ib_dereg_mr+0x1fb/0x3d0 [mlx5_ib] ib_dereg_mr_user+0x60/0x140 [ib_core] uverbs_destroy_uobject+0x59/0x210 [ib_uverbs] uobj_destroy+0x3f/0x80 [ib_uverbs] ib_uverbs_cmd_verbs+0x435/0xd10 [ib_uverbs] ? uverbs_finalize_object+0x50/0x50 [ib_uverbs] ? lock_acquire+0xc4/0x2e0 ? lock_acquired+0x12/0x380 ? lock_acquire+0xc4/0x2e0 ? lock_acquire+0xc4/0x2e0 ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs] ? lock_release+0x28a/0x400 ib_uverbs_ioctl+0xc0/0x140 [ib_uverbs] ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs] __x64_sys_ioctl+0x7f/0xb0 do_syscall_64+0x38/0x90 Fix it by reorganizing the dereg flow and mlx5_ib_mr structure: - Move the ib_umem field into the user MRs structure in the union as it's applicable only there. - Function mlx5_ib_dereg_mr() will now call mlx5_free_priv_descs() only in case there isn't udata, which indicates that this isn't a user MR. Fixes: f18ec4223117 ("RDMA/mlx5: Use a union inside mlx5_ib_mr") Link: https://lore.kernel.org/r/66bb1dd253c1fd7ceaa9fc411061eefa457b86fb.1637581144.git.leonro@nvidia.com Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25RDMA: Fix use-after-free in rxe_queue_cleanupPavel Skripkin
On error handling path in rxe_qp_from_init() qp->sq.queue is freed and then rxe_create_qp() will drop last reference to this object. qp clean up function will try to free this queue one time and it causes UAF bug. Fix it by zeroing queue pointer after freeing queue in rxe_qp_from_init(). Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory") Link: https://lore.kernel.org/r/20211121202239.3129-1-paskripkin@gmail.com Reported-by: syzbot+aab53008a5adf26abe91@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17RDMA/nldev: Check stat attribute before accessing itLeon Romanovsky
The access to non-existent netlink attribute causes to the following kernel panic. Fix it by checking existence before trying to read it. general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 6744 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nla_get_u32 include/net/netlink.h:1554 [inline] RIP: 0010:nldev_stat_set_mode_doit drivers/infiniband/core/nldev.c:1909 [inline] RIP: 0010:nldev_stat_set_doit+0x578/0x10d0 drivers/infiniband/core/nldev.c:2040 Code: fa 4c 8b a4 24 f8 02 00 00 48 b8 00 00 00 00 00 fc ff df c7 84 24 80 00 00 00 00 00 00 00 49 8d 7c 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 02 RSP: 0018:ffffc90004acf2e8 EFLAGS: 00010247 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90002b94000 RDX: 0000000000000000 RSI: ffffffff8684c5ff RDI: 0000000000000004 RBP: ffff88807cda4000 R08: 0000000000000000 R09: ffff888023fb8027 R10: ffffffff8684c5d7 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: ffff888041024280 R15: ffff888031ade780 FS: 00007eff9dddd700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ef24000 CR3: 0000000036902000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 822cf785ac6d ("RDMA/nldev: Split nldev_stat_set_mode_doit out of nldev_stat_set_doit") Link: https://lore.kernel.org/r/b21967c366f076ff1988862f9c8a1aa0244c599f.1637151999.git.leonro@nvidia.com Reported-by: syzbot+9111d2255a9710e87562@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17RDMA/mlx4: Do not fail the registration on port statsJack Wang
If the FW doesn't support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT, mlx4 driver will fail the ib_setup_port_attrs, which is called from ib_register_device()/enable_device_and_get(), in the end leads to device not detected[1][2] To fix it, add a new mlx4_ib_hw_stats_ops1, w/o alloc_hw_port_stats if FW does not support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2014094 [2] https://lore.kernel.org/linux-rdma/CAMGffEn2wvEnmzc0xe=xYiCLqpphiHDBxCxqAELrBofbUAMQxw@mail.gmail.com Fixes: 4b5f4d3fb408 ("RDMA: Split the alloc_hw_stats() ops to port and device variants") Link: https://lore.kernel.org/r/20211115101519.27210-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16IB/hfi1: Properly allocate rdma counter desc memoryDennis Dalessandro
When optional counter support was added the allocation of the memory holding the counter descriptors was not cleared properly. This caused WARN_ON()s in the IB/sysfs code to be hit. This is because the uninitialized memory made some of the counters wrongly look like optional counters. Use kzalloc. While here change the sizeof() calls to use the pointer rather than the name of the type. WARNING: CPU: 0 PID: 32644 at drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core] CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S W 5.15.0+ #36 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016 Workqueue: events work_for_cpu_fn RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core] RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202 RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138 RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124 RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001 R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800 R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff888667a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0 Call Trace: ib_register_device.cold.44+0x23e/0x2d0 [ib_core] rvt_register_device+0xfa/0x230 [rdmavt] hfi1_register_ib_device+0x623/0x690 [hfi1] init_one.cold.36+0x2d1/0x49b [hfi1] local_pci_probe+0x45/0x80 work_for_cpu_fn+0x16/0x20 process_one_work+0x1b1/0x360 worker_thread+0x1d4/0x3a0 kthread+0x11a/0x140 ret_from_fork+0x22/0x30 Fixes: 5e2ddd1e5982 ("RDMA/counter: Add optional counter support") Link: https://lore.kernel.org/r/20211115200913.124104.47770.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16RDMA/core: Set send and receive CQ before forwarding to the driverLeon Romanovsky
Preset both receive and send CQ pointers prior to call to the drivers and overwrite it later again till the mlx4 is going to be changed do not overwrite ibqp properties. This change is needed for mlx5, because in case of QP creation failure, it will go to the path of QP destroy which relies on proper CQ pointers. BUG: KASAN: use-after-free in create_qp.cold+0x164/0x16e [mlx5_ib] Write of size 8 at addr ffff8880064c55c0 by task a.out/246 CPU: 0 PID: 246 Comm: a.out Not tainted 5.15.0+ #291 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x83/0xdf create_qp.cold+0x164/0x16e [mlx5_ib] mlx5_ib_create_qp+0x358/0x28a0 [mlx5_ib] create_qp.part.0+0x45b/0x6a0 [ib_core] ib_create_qp_user+0x97/0x150 [ib_core] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs] ib_uverbs_ioctl+0x169/0x260 [ib_uverbs] __x64_sys_ioctl+0x866/0x14d0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 246: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0xa4/0xd0 create_qp.part.0+0x92/0x6a0 [ib_core] ib_create_qp_user+0x97/0x150 [ib_core] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs] ib_uverbs_ioctl+0x169/0x260 [ib_uverbs] __x64_sys_ioctl+0x866/0x14d0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 246: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x10c/0x150 slab_free_freelist_hook+0xb4/0x1b0 kfree+0xe7/0x2a0 create_qp.part.0+0x52b/0x6a0 [ib_core] ib_create_qp_user+0x97/0x150 [ib_core] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs] ib_uverbs_ioctl+0x169/0x260 [ib_uverbs] __x64_sys_ioctl+0x866/0x14d0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory") Link: https://lore.kernel.org/r/2dbb2e2cbb1efb188a500e5634be1d71956424ce.1636631035.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, smartpqi, lpfc, target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug fixes. Notable core changes are the removal of scsi->tag which caused some churn in obsolete drivers and a sweep through all drivers to call scsi_done() directly instead of scsi->done() which removes a pointer indirection from the hot path and a move to register core sysfs files earlier, which means they're available to KOBJ_ADD processing, which necessitates switching all drivers to using attribute groups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: lpfc: Update lpfc version to 14.0.0.3 scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss scsi: lpfc: Fix link down processing to address NULL pointer dereference scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine scsi: lpfc: Correct sysfs reporting of loop support after SFP status change scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup() scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer scsi: ufs: mediatek: Avoid sched_clock() misuse scsi: mpt3sas: Make mpt3sas_dev_attrs static scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions scsi: target: core: Stop using bdevname() scsi: aha1542: Use memcpy_{from,to}_bvec() scsi: sr: Add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: target: Perform ALUA group changes in one step scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path scsi: target: Fix alua_tg_pt_gps_count tracking scsi: target: Fix ordered tag handling ...
2021-11-04Merge tag 'char-misc-5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of char and misc and other tiny driver subsystem updates for 5.16-rc1. Loads of things in here, all of which have been in linux-next for a while with no reported problems (except for one called out below.) Included are: - habanana labs driver updates, including dma_buf usage, reviewed and acked by the dma_buf maintainers - iio driver update (going through this tree not staging as they really do not belong going through that tree anymore) - counter driver updates - hwmon driver updates that the counter drivers needed, acked by the hwmon maintainer - xillybus driver updates - binder driver updates - extcon driver updates - dma_buf module namespaces added (will cause a build error in arm64 for allmodconfig, but that change is on its way through the drm tree) - lkdtm driver updates - pvpanic driver updates - phy driver updates - virt acrn and nitr_enclaves driver updates - smaller char and misc driver updates" * tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (386 commits) comedi: dt9812: fix DMA buffers on stack comedi: ni_usb6501: fix NULL-deref in command paths arm64: errata: Enable TRBE workaround for write to out-of-range address arm64: errata: Enable workaround for TRBE overwrite in FILL mode coresight: trbe: Work around write to out of range coresight: trbe: Make sure we have enough space coresight: trbe: Add a helper to determine the minimum buffer size coresight: trbe: Workaround TRBE errata overwrite in FILL mode coresight: trbe: Add infrastructure for Errata handling coresight: trbe: Allow driver to choose a different alignment coresight: trbe: Decouple buffer base from the hardware base coresight: trbe: Add a helper to pad a given buffer area coresight: trbe: Add a helper to calculate the trace generated coresight: trbe: Defer the probe on offline CPUs coresight: trbe: Fix incorrect access of the sink specific data coresight: etm4x: Add ETM PID for Kryo-5XX coresight: trbe: Prohibit trace before disabling TRBE coresight: trbe: End the AUX handle on truncation coresight: trbe: Do not truncate buffer on IRQ coresight: trbe: Fix handling of spurious interrupts ...
2021-11-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A typical collection of patches this cycle, mostly fixing with a few new features: - Fixes from static tools. clang warnings, dead code, unused variable, coccinelle sweeps, etc - Driver bug fixes and minor improvements in rxe, bnxt_re, hfi1, mlx5, irdma, qedr - rtrs ULP bug fixes an improvments - Additional counters for bnxt_re - Support verbs CQ notifications in EFA - Continued reworking and fixing of rxe - netlink control to enable/disable optional device counters - rxe now can use AH objects for its UD path, fixing various bugs in the process - Add DMABUF support to EFA" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (103 commits) RDMA/core: Require the driver to set the IOVA correctly during rereg_mr RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callback RDMA/irdma: optimize rx path by removing unnecessary copy RDMA/qed: Use helper function to set GUIDs RDMA/hns: Use the core code to manage the fixed mmap entries IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis Networks IB/qib: Rebranding of qib driver to Cornelis Networks IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks RDMA/bnxt_re: Use helper function to set GUIDs RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descs RDMA/qedr: Fix NULL deref for query_qp on the GSI QP RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility RDMA/hns: Fix initial arm_st of CQ RDMA/rxe: Make rxe_type_info static const RDMA/rxe: Use 'bitmap_zalloc()' when applicable RDMA/rxe: Save a few bytes from struct rxe_pool RDMA/irdma: Remove the unused variable local_qp RDMA/core: Fix missed initialization of rdma_hw_stats::lock RDMA/efa: Add support for dmabuf memory regions RDMA/umem: Allow pinned dmabuf umem usage ...
2021-11-03RDMA/core: Require the driver to set the IOVA correctly during rereg_mrAharon Landau
If the driver returns a new MR during rereg it has to fill it with the IOVA from the proper source. If IB_MR_REREG_TRANS is set then the IOVA is cmd.hca_va, otherwise the IOVA comes from the old MR. mlx5 for example has two calls inside rereg_mr: return create_real_mr(new_pd, umem, mr->ibmr.iova, new_access_flags); and return create_real_mr(new_pd, new_umem, iova, new_access_flags); Unconditionally overwriting the iova in the newly allocated MR will corrupt the iova if the first path is used. Remove the redundant initializations from ib_uverbs_rereg_mr(). Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr") Link: https://lore.kernel.org/r/4b0a31bbc372842613286a10d7a8cbb0ee6069c7.1635400472.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-03RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callbackKamal Heib
There is no need to return always zero for function which is not supported, especially since 0 is the wrong return code. Link: https://lore.kernel.org/r/20211102073054.410838-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe
Patches held over for a possible rc8. * for-rc: RDMA/qedr: Fix NULL deref for query_qp on the GSI QP RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility RDMA/hns: Fix initial arm_st of CQ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01Merge tag 'v5.15' into rdma.git for-nextJason Gunthorpe
Pull in the accepted for-rc patches as the next merge needs a newer base. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01RDMA/irdma: optimize rx path by removing unnecessary copyZhu Yanjun
In the function irdma_post_recv, the function irdma_copy_sg_list is not needed since the struct irdma_sge and ib_sge have the similar member variables. The struct irdma_sge can be replaced with the struct ib_sge totally. This can increase the rx performance of irdma. Link: https://lore.kernel.org/r/20211030104226.253346-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/hns: Use the core code to manage the fixed mmap entriesChengchang Tang
Add a new implementation for mmap by using the new mmap entry API. This makes way for further use of the dynamic mmap allocator in this driver. Link: https://lore.kernel.org/r/20211028105640.1056-1-liangwenpeng@huawei.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis NetworksScott Breyer
Changes instances of Intel to Cornelis in identifying strings Link: https://lore.kernel.org/r/20211028124611.26694.71239.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29IB/qib: Rebranding of qib driver to Cornelis NetworksScott Breyer
Changes instances of Intel to Cornelis in identifying strings Link: https://lore.kernel.org/r/20211028124606.26694.71567.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29IB/hfi1: Rebranding of hfi1 driver to Cornelis NetworksScott Breyer
Changes instances of Intel to Cornelis in identifying strings Link: https://lore.kernel.org/r/20211028124601.26694.35662.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/bnxt_re: Use helper function to set GUIDsKamal Heib
Use addrconf_addr_eui48() helper function to set the GUIDs and remove the driver specific version. Link: https://lore.kernel.org/r/20211028094359.160407-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descsKamal Heib
For some reason when introducing the fixed commit the "active_pds" and "active_ahs" descriptors got dropped, which lead to the following panic when trying to access the first entry in the descriptors. bnxt_re: Broadcom NetXtreme-C/E RoCE Driver BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 2 PID: 594 Comm: kworker/u32:1 Not tainted 5.15.0-rc6+ #2 Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.12.1 12/07/2020 Workqueue: bnxt_re bnxt_re_task [bnxt_re] RIP: 0010:strlen+0x0/0x20 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 31 RSP: 0018:ffffb25fc47dfbb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000008100 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000000fffffff4 R09: 0000000000000000 R10: ffff8a05c71fc028 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8a05c3dee800 FS: 0000000000000000(0000) GS:ffff8a092fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000048d3da001 CR4: 00000000001706e0 Call Trace: kernfs_name_hash+0x12/0x80 kernfs_find_ns+0x35/0xd0 kernfs_remove_by_name_ns+0x32/0x90 remove_files+0x2b/0x60 create_files+0x1d3/0x1f0 internal_create_group+0x17b/0x1f0 internal_create_groups.part.0+0x3d/0xa0 setup_port+0x180/0x3b0 [ib_core] ? __cond_resched+0x16/0x40 ? kmem_cache_alloc_trace+0x278/0x3d0 ib_setup_port_attrs+0x99/0x240 [ib_core] ib_register_device+0xcc/0x160 [ib_core] bnxt_re_task+0xba/0x170 [bnxt_re] process_one_work+0x1eb/0x390 worker_thread+0x53/0x3d0 ? process_one_work+0x390/0x390 kthread+0x10f/0x130 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Fixes: 13f30b0fa0a9 ("RDMA/counter: Add a descriptor in struct rdma_hw_stats") Link: https://lore.kernel.org/r/20211027205448.127821-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/qedr: Fix NULL deref for query_qp on the GSI QPAlok Prasad
This patch fixes a crash caused by querying the QP via netlink, and corrects the state of GSI qp. GSI qp's have a NULL qed_qp. The call trace is generated by: $ rdma res show BUG: kernel NULL pointer dereference, address: 0000000000000034 Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012 RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206 RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000 RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090 RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048 R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0 Call Trace: qedr_query_qp+0x82/0x360 [qedr] ib_query_qp+0x34/0x40 [ib_core] ? ib_query_qp+0x34/0x40 [ib_core] fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core] ? __nla_put+0x20/0x30 ? nla_put+0x33/0x40 fill_res_qp_entry+0xe3/0x120 [ib_core] res_get_common_dumpit+0x3f8/0x5d0 [ib_core] ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core] nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core] netlink_dump+0x156/0x2f0 __netlink_dump_start+0x1ab/0x260 rdma_nl_rcv+0x1de/0x330 [ib_core] ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core] netlink_unicast+0x1b8/0x270 netlink_sendmsg+0x33e/0x470 sock_sendmsg+0x63/0x70 __sys_sendto+0x13f/0x180 ? setup_sgl.isra.12+0x70/0xc0 __x64_sys_sendto+0x28/0x30 do_syscall_64+0x3a/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: stable@vger.kernel.org Fixes: cecbcddf6461 ("qedr: Add support for QP verbs") Link: https://lore.kernel.org/r/20211027184329.18454-1-palok@marvell.com Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibilityYixing Liu
The upper limit of MAX_LP_MSG_LEN on HIP08 is 64K, and the upper limit on HIP09 is 16K. Regardless of whether it is HIP08 or HIP09, only 16K will be used. In order to ensure compatibility, it is unified to 16K. Setting MAX_LP_MSG_LEN to 16K will not cause performance loss on HIP08. Fixes: fbed9d2be292 ("RDMA/hns: Fix configuration of ack_req_freq in QPC") Link: https://lore.kernel.org/r/20211029100537.27299-1-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29RDMA/hns: Fix initial arm_st of CQHaoyue Xu
We set the init CQ status to ARMED before. As a result, an unexpected CEQE would be reported. Therefore, the init CQ status should be set to no_armed rather than REG_NXT_CEQE. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/20211029095846.26732-1-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable") 4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28RDMA/rxe: Make rxe_type_info static constJoe Perches
Make struct rxe_type_info static const and local to the only uses. Moves a bit of data to text. $ size drivers/infiniband/sw/rxe/rxe_pool.o* (defconfig w/ infiniband swe) text data bss dec hex filename 4456 12 0 4468 1174 drivers/infiniband/sw/rxe/rxe_pool.o.new 3817 652 0 4469 1175 drivers/infiniband/sw/rxe/rxe_pool.o.old Link: https://lore.kernel.org/r/166b715d71f98336e8ecab72b0dbdd266eee9193.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/rxe: Use 'bitmap_zalloc()' when applicableChristophe JAILLET
'index.table' is a bitmap. So use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Using 'bitmap_zalloc()' also allows the removal of a now useless 'bitmap_zero()'. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Link: https://lore.kernel.org/r/4a3e11d45865678d570333d1962820eb13168848.1635093628.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/rxe: Save a few bytes from struct rxe_poolChristophe JAILLET
'table_size' is never read, it can be removed. In fact, the only place that uses something that could be 'table_size' is 'alloc_index()'. In this function, it is re-computed from 'min_index' and 'max_index'. Link: https://lore.kernel.org/r/2c42065049bb2b99bededdc423a9babf4a98adee.1635093628.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/irdma: Remove the unused variable local_qpZhu Yanjun
Since the member variable local_qp is not used, remove it. Link: https://lore.kernel.org/r/20211027175457.201822-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/core: Fix missed initialization of rdma_hw_stats::lockMark Zhang
alloc_and_bind() creates a new rdma_hw_stats structure but misses initializing the mutex lock. This causes debug kernel failures: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 4 PID: 64464 at kernel/locking/mutex.c:575 __mutex_lock+0x9c3/0x12b0 Call Trace: fill_res_counter_entry+0x6ee/0x1020 [ib_core] res_get_common_dumpit+0x907/0x10a0 [ib_core] nldev_stat_get_dumpit+0x20a/0x290 [ib_core] netlink_dump+0x451/0x1040 __netlink_dump_start+0x583/0x830 rdma_nl_rcv_msg+0x3f3/0x7c0 [ib_core] rdma_nl_rcv+0x264/0x410 [ib_core] netlink_unicast+0x433/0x700 netlink_sendmsg+0x707/0xbf0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x193/0x240 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Instead of requiring all users to open code initialization of the lock put it in the general rdma_alloc_hw_stats_struct() function and remove duplicates. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/4a22986c4685058d2c735d91703ee7d865815bb9.1635237668.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/efa: Add support for dmabuf memory regionsGal Pressman
Implement a dmabuf importer for the EFA driver. As ODP is not supported, the pinned dmabuf are used to prevent the move_notify callback from being called. Link: https://lore.kernel.org/r/20211012120903.96933-4-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-28RDMA/umem: Allow pinned dmabuf umem usageGal Pressman
Introduce ib_umem_dmabuf_get_pinned() which allows the driver to get a dmabuf umem which is pinned and does not require move_notify callback implementation. The returned umem is pinned and DMA mapped like standard cpu umems, and is released through ib_umem_release() (incl. unpinning and unmapping). Link: https://lore.kernel.org/r/20211012120903.96933-3-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-27Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>