summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core/device.c
AgeCommit message (Collapse)Author
2025-05-06RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problemZhu Yanjun
Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc3/0x670 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 strlen+0x93/0xa0 lib/string.c:420 __fortify_strlen include/linux/fortify-string.h:268 [inline] get_kobj_path_length lib/kobject.c:118 [inline] kobject_get_path+0x3f/0x2a0 lib/kobject.c:158 kobject_uevent_env+0x289/0x1870 lib/kobject_uevent.c:545 ib_register_device drivers/infiniband/core/device.c:1472 [inline] ib_register_device+0x8cf/0xe00 drivers/infiniband/core/device.c:1393 rxe_register_device+0x275/0x320 drivers/infiniband/sw/rxe/rxe_verbs.c:1552 rxe_net_add+0x8e/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:550 rxe_newlink+0x70/0x190 drivers/infiniband/sw/rxe/rxe.c:225 nldev_newlink+0x3a3/0x680 drivers/infiniband/core/nldev.c:1796 rdma_nl_rcv_msg+0x387/0x6e0 drivers/infiniband/core/netlink.c:195 rdma_nl_rcv_skb.constprop.0.isra.0+0x2e5/0x450 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2566 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620 __sys_sendmsg+0x16d/0x220 net/socket.c:2652 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This problem is similar to the problem that the commit 1d6a9e7449e2 ("RDMA/core: Fix use-after-free when rename device name") fixes. The root cause is: the function ib_device_rename() renames the name with lock. But in the function kobject_uevent(), this name is accessed without lock protection at the same time. The solution is to add the lock protection when this name is accessed in the function kobject_uevent(). Fixes: 779e0bf47632 ("RDMA/core: Do not indicate device ready when device enablement fails") Link: https://patch.msgid.link/r/20250506151008.75701-1-yanjun.zhu@linux.dev Reported-by: syzbot+e2ce9e275ecc70a30b72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e2ce9e275ecc70a30b72 Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-18RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj()Patrisious Haddad
Change rdma_counter allocation to use rdma_zalloc_drv_obj() instead of, explicitly allocating at core, in order to be contained inside driver specific structures. Adjust all drivers that use it to have their containing structure, and add driver specific initialization operation. This change is needed to allow upcoming patches to implement optional-counters binding whereas inside each driver specific counter struct his bound optional-counters will be maintained. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/a5a484f421fc2e5595158e61a354fba43272b02d.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-13RDMA/core: Fix use-after-free when rename device nameWang Liang
Syzbot reported a slab-use-after-free with the following call trace: ================================================================== BUG: KASAN: slab-use-after-free in nla_put+0xd3/0x150 lib/nlattr.c:1099 Read of size 5 at addr ffff888140ea1c60 by task syz.0.988/10025 CPU: 0 UID: 0 PID: 10025 Comm: syz.0.988 Not tainted 6.14.0-rc4-syzkaller-00859-gf77f12010f67 #0 Hardware name: Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0x16e/0x5b0 mm/kasan/report.c:521 kasan_report+0x143/0x180 mm/kasan/report.c:634 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105 nla_put+0xd3/0x150 lib/nlattr.c:1099 nla_put_string include/net/netlink.h:1621 [inline] fill_nldev_handle+0x16e/0x200 drivers/infiniband/core/nldev.c:265 rdma_nl_notify_event+0x561/0xef0 drivers/infiniband/core/nldev.c:2857 ib_device_notify_register+0x22/0x230 drivers/infiniband/core/device.c:1344 ib_register_device+0x1292/0x1460 drivers/infiniband/core/device.c:1460 rxe_register_device+0x233/0x350 drivers/infiniband/sw/rxe/rxe_verbs.c:1540 rxe_net_add+0x74/0xf0 drivers/infiniband/sw/rxe/rxe_net.c:550 rxe_newlink+0xde/0x1a0 drivers/infiniband/sw/rxe/rxe.c:212 nldev_newlink+0x5ea/0x680 drivers/infiniband/core/nldev.c:1795 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x6dd/0x9e0 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:709 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:724 ____sys_sendmsg+0x53a/0x860 net/socket.c:2564 ___sys_sendmsg net/socket.c:2618 [inline] __sys_sendmsg+0x269/0x350 net/socket.c:2650 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f42d1b8d169 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 ... RSP: 002b:00007f42d2960038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f42d1da6320 RCX: 00007f42d1b8d169 RDX: 0000000000000000 RSI: 00004000000002c0 RDI: 000000000000000c RBP: 00007f42d1c0e2a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f42d1da6320 R15: 00007ffe399344a8 </TASK> Allocated by task 10025: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4294 [inline] __kmalloc_node_track_caller_noprof+0x28b/0x4c0 mm/slub.c:4313 __kmemdup_nul mm/util.c:61 [inline] kstrdup+0x42/0x100 mm/util.c:81 kobject_set_name_vargs+0x61/0x120 lib/kobject.c:274 dev_set_name+0xd5/0x120 drivers/base/core.c:3468 assign_name drivers/infiniband/core/device.c:1202 [inline] ib_register_device+0x178/0x1460 drivers/infiniband/core/device.c:1384 rxe_register_device+0x233/0x350 drivers/infiniband/sw/rxe/rxe_verbs.c:1540 rxe_net_add+0x74/0xf0 drivers/infiniband/sw/rxe/rxe_net.c:550 rxe_newlink+0xde/0x1a0 drivers/infiniband/sw/rxe/rxe.c:212 nldev_newlink+0x5ea/0x680 drivers/infiniband/core/nldev.c:1795 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x6dd/0x9e0 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:709 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:724 ____sys_sendmsg+0x53a/0x860 net/socket.c:2564 ___sys_sendmsg net/socket.c:2618 [inline] __sys_sendmsg+0x269/0x350 net/socket.c:2650 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 10035: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:576 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4609 [inline] kfree+0x196/0x430 mm/slub.c:4757 kobject_rename+0x38f/0x410 lib/kobject.c:524 device_rename+0x16a/0x200 drivers/base/core.c:4525 ib_device_rename+0x270/0x710 drivers/infiniband/core/device.c:402 nldev_set_doit+0x30e/0x4c0 drivers/infiniband/core/nldev.c:1146 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x6dd/0x9e0 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:709 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:724 ____sys_sendmsg+0x53a/0x860 net/socket.c:2564 ___sys_sendmsg net/socket.c:2618 [inline] __sys_sendmsg+0x269/0x350 net/socket.c:2650 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because if rename device happens, the old name is freed in ib_device_rename() with lock, but ib_device_notify_register() may visit the dev name locklessly by event RDMA_REGISTER_EVENT or RDMA_NETDEV_ATTACH_EVENT. Fix this by hold devices_rwsem in ib_device_notify_register(). Reported-by: syzbot+f60349ba1f9f08df349f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=25bc6f0ed2b88b9eb9b8 Fixes: 9cbed5aab5ae ("RDMA/nldev: Add support for RDMA monitoring") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250313092421.944658-1-wangliang74@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-03RDMA/core: Don't expose hw_counters outside of init net namespaceRoman Gushchin
Commit 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") accidentally almost exposed hw counters to non-init net namespaces. It didn't expose them fully, as an attempt to read any of those counters leads to a crash like this one: [42021.807566] BUG: kernel NULL pointer dereference, address: 0000000000000028 [42021.814463] #PF: supervisor read access in kernel mode [42021.819549] #PF: error_code(0x0000) - not-present page [42021.824636] PGD 0 P4D 0 [42021.827145] Oops: 0000 [#1] SMP PTI [42021.830598] CPU: 82 PID: 2843922 Comm: switchto-defaul Kdump: loaded Tainted: G S W I XXX [42021.841697] Hardware name: XXX [42021.849619] RIP: 0010:hw_stat_device_show+0x1e/0x40 [ib_core] [42021.855362] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7 f0 fa ff ff <48> 8b 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48 [42021.873931] RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287 [42021.879108] RAX: ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 0000000000000000 [42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI: ffff940c7517aef0 [42021.893230] RBP: ffff97fe90f03e70 R08: ffff94085f1aa000 R09: 0000000000000000 [42021.900294] R10: ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87ca2530 [42021.907355] R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15: ffff94085f1aa000 [42021.914418] FS: 00007fda1a3b9700(0000) GS:ffff94453fb80000(0000) knlGS:0000000000000000 [42021.922423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42021.928130] CR2: 0000000000000028 CR3: 00000042dcfb8003 CR4: 00000000003726f0 [42021.935194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42021.949324] Call Trace: [42021.951756] <TASK> [42021.953842] [<ffffffff86c58674>] ? show_regs+0x64/0x70 [42021.959030] [<ffffffff86c58468>] ? __die+0x78/0xc0 [42021.963874] [<ffffffff86c9ef75>] ? page_fault_oops+0x2b5/0x3b0 [42021.969749] [<ffffffff87674b92>] ? exc_page_fault+0x1a2/0x3c0 [42021.975549] [<ffffffff87801326>] ? asm_exc_page_fault+0x26/0x30 [42021.981517] [<ffffffffc0775680>] ? __pfx_show_hw_stats+0x10/0x10 [ib_core] [42021.988482] [<ffffffffc077564e>] ? hw_stat_device_show+0x1e/0x40 [ib_core] [42021.995438] [<ffffffff86ac7f8e>] dev_attr_show+0x1e/0x50 [42022.000803] [<ffffffff86a3eeb1>] sysfs_kf_seq_show+0x81/0xe0 [42022.006508] [<ffffffff86a11134>] seq_read_iter+0xf4/0x410 [42022.011954] [<ffffffff869f4b2e>] vfs_read+0x16e/0x2f0 [42022.017058] [<ffffffff869f50ee>] ksys_read+0x6e/0xe0 [42022.022073] [<ffffffff8766f1ca>] do_syscall_64+0x6a/0xa0 [42022.027441] [<ffffffff8780013b>] entry_SYSCALL_64_after_hwframe+0x78/0xe2 The problem can be reproduced using the following steps: ip netns add foo ip netns exec foo bash cat /sys/class/infiniband/mlx4_0/hw_counters/* The panic occurs because of casting the device pointer into an ib_device pointer using container_of() in hw_stat_device_show() is wrong and leads to a memory corruption. However the real problem is that hw counters should never been exposed outside of the non-init net namespace. Fix this by saving the index of the corresponding attribute group (it might be 1 or 2 depending on the presence of driver-specific attributes) and zeroing the pointer to hw_counters group for compat devices during the initialization. With this fix applied hw_counters are not available in a non-init net namespace: find /sys/class/infiniband/mlx4_0/ -name hw_counters /sys/class/infiniband/mlx4_0/ports/1/hw_counters /sys/class/infiniband/mlx4_0/ports/2/hw_counters /sys/class/infiniband/mlx4_0/hw_counters ip netns add foo ip netns exec foo bash find /sys/class/infiniband/mlx4_0/ -name hw_counters Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Maher Sanalla <msanalla@nvidia.com> Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://patch.msgid.link/20250227165420.3430301-1-roman.gushchin@linux.dev Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-24RDMA/core: Support link status events dispatchingYuyu Li
Currently the dispatching of link status events is implemented by each RDMA driver independently, and most of them have very similar patterns. Add support for this in ib_core so that we can get rid of duplicate codes in each driver. A new last_port_state is added in ib_port_cache to cache the port state of the last link status events dispatching. The original port_state in ib_port_cache is not used here because it will be updated when ib_dispatch_event() is called, which means it may be changed between two link status events, and may lead to a loss of event dispatching. Some drivers currently have some private stuff in their link status events handler in addition to event dispatching, and cannot be perfectly integrated into the ib_core handling process. For these drivers, add a new ops report_port_event() so that they can keep their current processing. Finally, events of LAG devices are not supported yet in this patch as currently there is no way to obtain ibdev from upper netdev in ib_core. This can be a TODO work after the core have more support for LAG. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-24RDMA/core: Add ib_query_netdev_port() to query netdev port by IB device.Yuyu Li
Query the port number of a netdev associated with an ibdev. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-24RDMA/core: Remove unused ibdev_printkDr. David Alan Gilbert
The last use of ibdev_printk() was removed in 2019 by commit b2299e83815c ("RDMA: Delete DEBUG code") Remove it. Note: The __ibdev_printk() is still used in the idev_err etc functions so leave that. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241221014021.343979-4-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/nldev: Add IB device and net device rename eventsChiara Meiohas
Implement event sending for IB device rename and IB device port associated netdevice rename. In iproute2, rdma monitor displays the IB device name, port and the netdevice name when displaying event info. Since users can modiy these names, we track and notify on renaming events. Note: In order to receive netdevice rename events, drivers must use the ib_device_set_netdev() API when attaching net devices to IB devices. $ rdma monitor $ rmmod mlx5_ib [UNREGISTER] dev 1 rocep8s0f1 [UNREGISTER] dev 0 rocep8s0f0 $ modprobe mlx5_ib [REGISTER] dev 2 mlx5_0 [NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2 [REGISTER] dev 3 mlx5_1 [NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3 [RENAME] dev 2 rocep8s0f0 [RENAME] dev 3 rocep8s0f1 $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev [UNREGISTER] dev 2 rocep8s0f0 [REGISTER] dev 4 mlx5_0 [NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2 [RENAME] dev 4 rdmap8s0f0 $ echo 4 > /sys/class/net/eth2/device/sriov_numvfs [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7 [REGISTER] dev 5 mlx5_0 [NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8 [REGISTER] dev 6 mlx5_1 [NETDEV_ATTACH] dev 6 mlx5_1 port 1 netdev 12 eth9 [RENAME] dev 5 rocep8s0f0v0 [RENAME] dev 6 rocep8s0f0v1 [REGISTER] dev 7 mlx5_0 [NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10 [RENAME] dev 7 rocep8s0f0v2 [REGISTER] dev 8 mlx5_0 [NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11 [RENAME] dev 8 rocep8s0f0v3 $ ip link set eth2 name myeth2 [NETDEV_RENAME] netdev 4 myeth2 $ ip link set eth1 name myeth1 ** no events received, because eth1 is not attached to an IB device ** Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/093c978ef2766fd3ab4ff8798eeb68f2f11582f6.1730367038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Add device ufile cleanup operationPatrisious Haddad
Add a driver operation to allow preemptive cleanup of ufile HW resources before the standard ufile cleanup flow begins. Thus, expediting the final cleanup phase which leads to fast teardown overall. This allows the use of driver specific clean up procedures to make the cleanup process more efficient. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/cabe00d75132b5732cb515944e3c500a01fb0b4a.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/nldev: Add support for RDMA monitoringChiara Meiohas
Introduce a new netlink command to allow rdma event monitoring. The rdma events supported now are IB device registration/unregistration and net device attachment/detachment. Example output of rdma monitor and the commands which trigger the events: $ rdma monitor $ rmmod mlx5_ib [UNREGISTER] dev 1 rocep8s0f1 [UNREGISTER] dev 0 rocep8s0f0 $ modprobe mlx5_ib [REGISTER] dev 2 mlx5_0 [NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2 [REGISTER] dev 3 mlx5_1 [NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3 $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev [UNREGISTER] dev 2 rocep8s0f0 [REGISTER] dev 4 mlx5_0 [NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2 $ echo 4 > /sys/class/net/eth2/device/sriov_numvfs [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7 [REGISTER] dev 5 mlx5_0 [NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8 [REGISTER] dev 6 mlx5_0 [NETDEV_ATTACH] dev 6 mlx5_0 port 1 netdev 12 eth9 [REGISTER] dev 7 mlx5_0 [NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10 [REGISTER] dev 8 mlx5_0 [NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11 $ echo 0 > /sys/class/net/eth2/device/sriov_numvfs [UNREGISTER] dev 5 rocep8s0f0v0 [UNREGISTER] dev 6 rocep8s0f0v1 [UNREGISTER] dev 7 rocep8s0f0v2 [UNREGISTER] dev 8 rocep8s0f0v3 [NETDEV_DETACH] dev 4 rdmap8s0f0 port 2 [NETDEV_DETACH] dev 4 rdmap8s0f0 port 3 [NETDEV_DETACH] dev 4 rdmap8s0f0 port 4 [NETDEV_DETACH] dev 4 rdmap8s0f0 port 5 Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-7-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Use IB set_netdev and get_netdev functionsChiara Meiohas
The IB layer provides a common interface to store and get net devices associated to an IB device port (ib_device_set_netdev() and ib_device_get_netdev()). Previously, mlx5_ib stored and managed the associated net devices internally. Replace internal net device management in mlx5_ib with ib_device_set_netdev() when attaching/detaching a net device and ib_device_get_netdev() when retrieving the net device. Export ib_device_get_netdev(). For mlx5 representors/PFs/VFs and lag creation we replace the netdev assignments with the IB set/get netdev functions. In active-backup mode lag the active slave net device is stored in the lag itself. To assure the net device stored in a lag bond IB device is the active slave we implement the following: - mlx5_core: when modifying the slave of a bond we send the internal driver event MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE. - mlx5_ib: when catching the event call ib_device_set_netdev() This patch also ensures the correct IB events are sent in switchdev lag. While at it, when in multiport eswitch mode, only a single IB device is created for all ports. The said IB device will receive all netdev events of its VFs once loaded, thus to avoid overwriting the mapping of PF IB device to PF netdev, ignore NETDEV_REGISTER events if the ib device has already been mapped to a netdev. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/device: Remove optimization in ib_device_get_netdev()Chiara Meiohas
The caller of ib_device_get_netdev() relies on its result to accurately match a given netdev with the ib device associated netdev. ib_device_get_netdev returns NULL when the IB device associated netdev is unregistering, preventing the caller of matching netdevs properly. Thus, remove this optimization and return the netdev even if it is undergoing unregistration, allowing matching by the caller. This change ensures proper netdev matching and reference count handling by the caller of ib_device_get_netdev/ib_device_set_netdev API. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-5-michaelgur@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-14RDMA: Fix netdev tracker in ib_device_set_netdevDavid Ahern
If a netdev has already been assigned, ib_device_set_netdev needs to release the reference on the older netdev but it is mistakenly being called for the new netdev. Fix it and in the process use netdev_put to be symmetrical with the netdev_hold. Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240710203310.19317-1-dsahern@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-01RDMA/core: Support IB sub device with type "SMI"Mark Zhang
This patch adds 2 APIs, as well as driver operations to support adding and deleting an IB sub device, which provides part of functionalities of it's parent. A sub device has a type; for a sub device with type "SMI", it provides the smi capability through umad for its parent, meaning uverb is not supported. A sub device cannot live without a parent. So when a parent is released, all it's sub devices are released as well. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/44253f7508b21eb2caefea3980c2bc072869116c.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/device: Return error earlier if port in not validLeon Romanovsky
There is no need to allocate port data if port provided is not valid. Fixes: c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") Link: https://lore.kernel.org/r/022047a8b16988fc88d4426da50bf60a4833311b.1719235449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-05-05RDMA/core: Remove NULL check before dev_{put, hold}Jules Irenge
Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Link: https://lore.kernel.org/r/ZjF1Eedxwhn4JSkz@octinomon.home Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-02-21RDMA/device: Fix a race between mad_client and cm_client initShifeng Li
The mad_client will be initialized in enable_device_and_get(), while the devices_rwsem will be downgraded to a read semaphore. There is a window that leads to the failed initialization for cm_client, since it can not get matched mad port from ib_mad_port_list, and the matched mad port will be added to the list after that. mad_client | cm_client ------------------|-------------------------------------------------------- ib_register_device| enable_device_and_get down_write(&devices_rwsem) xa_set_mark(&devices, DEVICE_REGISTERED) downgrade_write(&devices_rwsem) | |ib_cm_init |ib_register_client(&cm_client) |down_read(&devices_rwsem) |xa_for_each_marked (&devices, DEVICE_REGISTERED) |add_client_context |cm_add_one |ib_register_mad_agent |ib_get_mad_port |__ib_get_mad_port |list_for_each_entry(entry, &ib_mad_port_list, port_list) |return NULL |up_read(&devices_rwsem) | add_client_context| ib_mad_init_device| ib_mad_port_open | list_add_tail(&port_priv->port_list, &ib_mad_port_list) up_read(&devices_rwsem) | Fix it by using down_write(&devices_rwsem) in ib_register_client(). Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") Link: https://lore.kernel.org/r/20240203035313.98991-1-lishifeng@sangfor.com.cn Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-09-20RDMA/core: Add support to dump SRQ resource in RAW formatwenglianfa
Add support to dump SRQ resource in raw format. It enable drivers to return the entire device specific SRQ context without setting each field separately. Example: $ rdma res show srq -r dev hns3 149000... $ rdma res show srq -j -r [{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-20RDMA/core: Add dedicated SRQ resource tracker functionwenglianfa
Add a dedicated callback function for SRQ resource tracker. Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-19RDMA/core: Use size_{add,sub,mul}() in calls to struct_size()Gustavo A. R. Silva
If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` provides against potential integer overflows is defeated. Fix this by hardening calls to `struct_size()` with `size_add()`, `size_sub()` and `size_mul()`. Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") Fixes: a4676388e2e2 ("RDMA/core: Simplify how the gid_attrs sysfs is created") Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") Fixes: 324e227ea7c9 ("RDMA/device: Add ib_device_get_by_netdev()") Fixes: 5aad26a7eac5 ("IB/core: Use struct_size() in kzalloc()") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-16Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
2022-12-11RDMA: Add missed netdev_put() for the netdevice_trackerJason Gunthorpe
The netdev core will detect if any untracked puts are done on tracked pointers and throw refcount warnings: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 1 PID: 33 at lib/refcount.c:31 refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 33 Comm: kworker/u4:2 Not tainted 6.1.0-rc8-next-20221207-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: ib-unreg-wq ib_unregister_work RIP: 0010:refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31 Code: 05 5a 60 51 0a 01 e8 35 0a b5 05 0f 0b e9 d3 fe ff ff e8 6c 9b 75 fd 48 c7 c7 c0 6d a6 8a c6 05 37 60 51 0a 01 e8 16 0a b5 05 <0f> 0b e9 b4 fe +ff ff 48 89 ef e8 5a b5 c3 fd e9 5c fe ff ff 0f 1f RSP: 0018:ffffc90000aa7b30 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8880172f9d40 RSI: ffffffff8166b1dc RDI: fffff52000154f58 RBP: ffff88807906c600 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000001 R11: 0000000000000000 R12: 1ffff92000154f6b R13: 0000000000000000 R14: ffff88807906c600 R15: ffff888046894000 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe350a8ff8 CR3: 000000007a9e7000 CR4: 00000000003526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x539/0x6b0 lib/ref_tracker.c:118 netdev_tracker_free include/linux/netdevice.h:4039 [inline] netdev_put include/linux/netdevice.h:4056 [inline] dev_put include/linux/netdevice.h:4082 [inline] free_netdevs+0x1f8/0x470 drivers/infiniband/core/device.c:2204 __ib_unregister_device+0xa0/0x1a0 drivers/infiniband/core/device.c:1478 ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1586 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 So change the missed dev_put for pdata->netdev to also follow the tracker. Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Reported-by: syzbot+3fd8326d9a0812d19218@syzkaller.appspotmail.com Reported-by: syzbot+a1ed8ffe3121380cd5dd@syzkaller.appspotmail.com Reported-by: syzbot+8d0a099c8a6d1e4e601c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-e99919867b8d+1e2-netdev_tracker2_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-09Merge tag 'v6.1-rc8' into rdma.git for-nextJason Gunthorpe
For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-28RDMA: Add netdevice_tracker to ib_device_set_netdev()Jason Gunthorpe
This will cause an informative backtrace to print if the user of ib_device_set_netdev() isn't careful about tearing down the ibdevice before its the netdevice parent is destroyed. Such as like this: unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 leaked reference. ib_device_set_netdev+0x266/0x730 siw_newlink+0x4e0/0xfd0 nldev_newlink+0x35c/0x5c0 rdma_nl_rcv_msg+0x36d/0x690 rdma_nl_rcv+0x2ee/0x430 netlink_unicast+0x543/0x7f0 netlink_sendmsg+0x918/0xe20 sock_sendmsg+0xcf/0x120 ____sys_sendmsg+0x70d/0x8b0 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This will help debug the issues syzkaller is seeing. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-a7c81b3842ce+e5-netdev_tracker_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-24driver core: make struct class.dev_uevent() take a const *Greg Kroah-Hartman
The dev_uevent() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Jens Axboe <axboe@kernel.dk> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Johan Hovold <johan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Keith Busch <kbusch@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Raed Salem <raeds@nvidia.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Avihai Horon <avihaih@nvidia.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jakob Koschel <jakobkoschel@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Wang Yufen <wangyufen@huawei.com> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-pm@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20221123122523.1332370-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-21Merge 6.1-rc6 into driver-core-nextGreg Kroah-Hartman
We need the kernfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-09driver core: class: make namespace and get_ownership take const *Greg Kroah-Hartman
The callbacks in struct class namespace() and get_ownership() do not modify the struct device passed to them, so mark the pointer as constant and fix up all callbacks in the kernel to have the correct function signature. This helps make it more obvious what calls and callbacks do, and do not, modify structures passed to them. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20221001165426.2690912-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-28RDMA/core: Fix null-ptr-deref in ib_core_cleanup()Chen Zhongjin
KASAN reported a null-ptr-deref error: KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 1 PID: 379 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:destroy_workqueue+0x2f/0x740 RSP: 0018:ffff888016137df8 EFLAGS: 00000202 ... Call Trace: ib_core_cleanup+0xa/0xa1 [ib_core] __do_sys_delete_module.constprop.0+0x34f/0x5b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa1a0d221b7 ... It is because the fail of roce_gid_mgmt_init() is ignored: ib_core_init() roce_gid_mgmt_init() gid_cache_wq = alloc_ordered_workqueue # fail ... ib_core_cleanup() roce_gid_mgmt_cleanup() destroy_workqueue(gid_cache_wq) # destroy an unallocated wq Fix this by catching the fail of roce_gid_mgmt_init() in ib_core_init(). Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Link: https://lore.kernel.org/r/20221025024146.109137-1-chenzhongjin@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-27RDMA/core: Fix order of nldev_exit callLeon Romanovsky
Create symmetrical exit flow by calling to nldev_exit() after call to rdma_nl_unregister(RDMA_NL_LS). Fixes: 6c80b41abe22 ("RDMA/netlink: Add nldev initialization flows") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/64e676774a53a406f4cde265d5a4cfd6b8e97df9.1666683334.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21IB: move from strlcpy with unused retval to strscpyWolfram Sang
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lore.kernel.org/r/20220818210018.6841-1-wsa+renesas@sang-engineering.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-04-25RDMA/core: Avoid flush_workqueue(system_unbound_wq) usageTetsuo Handa
Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_unbound_wq with local ib_unreg_wq. Link: https://lore.kernel.org/r/252cefb0-a400-83f6-2032-333d69f52c1b@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-09RDMA/core: Delete IPsec flow action logic from the coreLeon Romanovsky
The removal of mlx5 flow steering logic, left the kernel without any RDMA drivers that implements flow action callbacks supplied by RDMA/core. Any user access to them caused to EOPNOTSUPP error, which can be achieved by simply removing ioctl implementation. Link: https://lore.kernel.org/r/a638e376314a2eb1c66f597c0bbeeab2e5de7faf.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-12-14RDMA/core: Let ib_find_gid() continue search even after empty entryAvihai Horon
Currently, ib_find_gid() will stop searching after encountering the first empty GID table entry. This behavior is wrong since neither IB nor RoCE spec enforce tightly packed GID tables. For example, when a valid GID entry exists at index N, and if a GID entry is empty at index N-1, ib_find_gid() will fail to find the valid entry. Fix it by making ib_find_gid() continue searching even after encountering missing entries. Fixes: 5eb620c81ce3 ("IB/core: Add helpers for uncached GID and P_Key searches") Link: https://lore.kernel.org/r/e55d331b96cecfc2cf19803d16e7109ea966882d.1639055490.git.leonro@nvidia.com Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12RDMA/counter: Add optional counter supportAharon Landau
An optional counter is a driver-specific counter that may be dynamically enabled/disabled. This enhancement allows drivers to expose counters which are, for example, mutually exclusive and cannot be enabled at the same time, counters that might degrades performance, optional debug counters, etc. Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not exported in sysfs, and must be at the end of all stats, otherwise the attr->show() in sysfs would get wrong indexes for hwcounters that are behind optional counters. Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA: Globally allocate and release QP memoryLeon Romanovsky
Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16IB/core: Read subnet_prefix in ib_query_port via cache.Anand Khoje
ib_query_port() calls device->ops.query_port() to get the port attributes. The method of querying is device driver specific. The same function calls device->ops.query_gid() to get the GID and extract the subnet_prefix (gid_prefix). The GID and subnet_prefix are stored in a cache. But they do not get read from the cache if the device is an Infiniband device. The following change takes advantage of the cached subnet_prefix. Testing with RDBMS has shown a significant improvement in performance with this change. Link: https://lore.kernel.org/r/20210712122625.1147-4-anand.a.khoje@oracle.com Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16IB/core: Shifting initialization of device->cache_lockAnand Khoje
The lock cache_lock of struct ib_device is initialized in function ib_cache_setup_one(). This is much later than the device initialization in _ib_alloc_device(). This change shifts initialization of cache_lock in _ib_alloc_device(). Link: https://lore.kernel.org/r/20210712122625.1147-3-anand.a.khoje@oracle.com Suggested-by: Haakon Bugge <haakon.bugge@oracle.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21IB/core: Removed port validity check from ib_get_cached_subnet_prefixAnand Khoje
Removed port validity check from ib_get_cached_subnet_prefix() as this check is not needed because "port_num" is valid. Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Remove rdma_set_device_sysfs_group()Jason Gunthorpe
The driver's device group can be specified as part of the ops structure like the device's port group. No need for the complicated API. Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Allow port_groups to be used with namespacesJason Gunthorpe
Now that the port_groups data is being destroyed and managed by the core code this restriction is no longer needed. All the ib_port_attrs are compatible with the core's sysfs lifecycle. When the main device is destroyed and moved to another namespace the driver's port sysfs can be created/destroyed as well due to it now being a simple attribute list. Link: https://lore.kernel.org/r/afd8b676eace2821692d44489ff71856277c48d1.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Change ops->init_port to ops->port_groupsJason Gunthorpe
init_port was only being used to register sysfs attributes against the port kobject. Now that all users are creating static attribute_group's we can simply set the attribute_group list in the ops and the core code can just handle it directly. This makes all the sysfs management quite straightforward and prevents any driver from abusing the naked port kobject in future because no driver code can access it. Link: https://lore.kernel.org/r/114f68f3d921460eafe14cea5a80ca65d81729c3.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Create the device hw_counters through the normal groups mechanismJason Gunthorpe
Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add(). This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow. Move all the memory freeing to the release function. Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Split the alloc_hw_stats() ops to port and device variantsJason Gunthorpe
This is being used to implement both the port and device global stats, which is causing some confusion in the drivers. For instance EFA and i40iw both seem to be misusing the device stats. Split it into two ops so drivers that don't support one or the other can leave the op NULL'd, making the calling code a little simpler to understand. Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com Tested-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26RDMA: Support more than 255 rdma portsMark Bloch
Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11RDMA/core: Remove unused req_ncomp_notif device operationGal Pressman
The request_ncomp_notif device operation and function are unused, remove them. Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/core: Fix kernel doc warnings for ib_port_immutable_read()Leon Romanovsky
drivers/infiniband/core/device.c:859: warning: Function parameter or member 'dev' not described in 'ib_port_immutable_read' drivers/infiniband/core/device.c:859: warning: Function parameter or member 'port' not described in 'ib_port_immutable_read' Fixes: 7416790e2245 ("RDMA/core: Introduce and use API to read port immutable data") Link: https://lore.kernel.org/r/20210210151421.1108809-1-leon@kernel.org Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05RDMA/core: Introduce and use API to read port immutable dataParav Pandit
Currently mlx5 driver caches port GID table length for 2 ports. It is also cached by IB core as port immutable data. When mlx5 representor ports are present, which are usually more than 2, invalid access to port_caps array can happen while validating the GID table length which is only for 2 ports. To avoid this, take help of the IB cores port immutable data by exposing an API to read the port immutable fields. Remove mlx5 driver's internal cache, thereby reduce code and data. Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-20RDMA/core: Add device method for registering dma-buf based memory regionJianxin Xiong
Dma-buf based memory region requires one extra parameter and is processed quite differently. Adding a separate method allows clean separation from regular memory regions. Link: https://lore.kernel.org/r/1608067636-98073-3-git-send-email-jianxin.xiong@intel.com Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19RDMA/core: device: Fix formatting in worthy kernel-doc header and demote anotherLee Jones
Fixes the following W=1 kernel build warning(s): drivers/infiniband/core/device.c:1896: warning: Function parameter or member 'ibdev' not described in 'ib_get_client_nl_info' drivers/infiniband/core/device.c:1896: warning: Function parameter or member 'client_name' not described in 'ib_get_client_nl_info' drivers/infiniband/core/device.c:1896: warning: Function parameter or member 'res' not described in 'ib_get_client_nl_info' drivers/infiniband/core/device.c:2328: warning: Function parameter or member 'nldev_cb' not described in 'ib_enum_all_devs' drivers/infiniband/core/device.c:2328: warning: Function parameter or member 'skb' not described in 'ib_enum_all_devs' Link: https://lore.kernel.org/r/20210118223929.512175-3-lee.jones@linaro.org Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-10RDMA/core: Do not indicate device ready when device enablement failsJack Morgenstein
In procedure ib_register_device, procedure kobject_uevent is called (advertising that the device is ready for userspace usage) even when device_enable_and_get() returned an error. As a result, various RDMA modules attempted to register for the device even while the device driver was preparing to unregister the device. Fix this by advertising the device availability only after enabling the device succeeds. Fixes: e7a5b4aafd82 ("RDMA/device: Don't fire uevent before device is fully initialized") Link: https://lore.kernel.org/r/20201208073545.9723-3-leon@kernel.org Suggested-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>