summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-12RDMA/hns: Fix build error of hns_roce_traceJunxian Huang
Add include path to find hns_roce_trace.h to fix the following build error: In file included from drivers/infiniband/hw/hns/hns_roce_trace.h:213, from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:53: ./include/trace/define_trace.h:110:42: fatal error: ./hns_roce_trace.h: No such file or directory 110 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) | ^ compilation terminated. Fixes: 02007e3ddc07 ("RDMA/hns: Add trace for flush CQE") Reported-by: Paul E. McKenney <paulmck@kernel.org> Closes: https://lore.kernel.org/linux-next/b7dd4dda-37d8-47e4-8d78-b6585be21cfd@paulmck-laptop/T/#t Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250507033903.2879433-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-06RDMA/siw: Remove unused siw_mem_addDr. David Alan Gilbert
siw_mem_add() was added in 2019 by commit 2251334dcac9 ("rdma/siw: application buffer management") but has remained unused. Remove it. Link: https://patch.msgid.link/r/20250505210226.88994-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-06IB/hfi1: Remove unused sc_drop and sdma_all_idleDr. David Alan Gilbert
sc_drop() and sdma_all_idle() were both added in 2015's commit 7724105686e7 ("IB/hfi1: add driver files") but have remained unused. Remove them. Link: https://patch.msgid.link/r/20250505205419.88131-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-05RDMA/mlx5: Fix error flow upon firmware failure for RQ destructionPatrisious Haddad
Upon RQ destruction if the firmware command fails which is the last resource to be destroyed some SW resources were already cleaned regardless of the failure. Now properly rollback the object to its original state upon such failure. In order to avoid a use-after free in case someone tries to destroy the object again, which results in the following kernel trace: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 37589 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) rfkill mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf sunrpc vfat fat virtio_net net_failover failover fuse loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_console virtio_gpu virtio_blk virtio_dma_buf virtio_mmio dm_mirror dm_region_hash dm_log dm_mod xpmem(OE) CPU: 0 UID: 0 PID: 37589 Comm: python3 Kdump: loaded Tainted: G OE ------- --- 6.12.0-54.el10.aarch64 #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xf4/0x148 lr : refcount_warn_saturate+0xf4/0x148 sp : ffff80008b81b7e0 x29: ffff80008b81b7e0 x28: ffff000133d51600 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000ffffffea x24: ffff00010ae80f00 x23: ffff00010ae80f80 x22: ffff0000c66e5d08 x21: 0000000000000000 x20: ffff0000c66e0000 x19: ffff00010ae80340 x18: 0000000000000006 x17: 0000000000000000 x16: 0000000000000020 x15: ffff80008b81b37f x14: 0000000000000000 x13: 2e656572662d7265 x12: ffff80008283ef78 x11: ffff80008257efd0 x10: ffff80008283efd0 x9 : ffff80008021ed90 x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff x5 : ffff0001fb8e3408 x4 : 0000000000000000 x3 : ffff800179993000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000133d51600 Call trace: refcount_warn_saturate+0xf4/0x148 mlx5_core_put_rsc+0x88/0xa0 [mlx5_ib] mlx5_core_destroy_rq_tracked+0x64/0x98 [mlx5_ib] mlx5_ib_destroy_wq+0x34/0x80 [mlx5_ib] ib_destroy_wq_user+0x30/0xc0 [ib_core] uverbs_free_wq+0x28/0x58 [ib_uverbs] destroy_hw_idr_uobject+0x34/0x78 [ib_uverbs] uverbs_destroy_uobject+0x48/0x240 [ib_uverbs] __uverbs_cleanup_ufile+0xd4/0x1a8 [ib_uverbs] uverbs_destroy_ufile_hw+0x48/0x120 [ib_uverbs] ib_uverbs_close+0x2c/0x100 [ib_uverbs] __fput+0xd8/0x2f0 __fput_sync+0x50/0x70 __arm64_sys_close+0x40/0x90 invoke_syscall.constprop.0+0x74/0xd0 do_el0_svc+0x48/0xe8 el0_svc+0x44/0x1d0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x1a4/0x1a8 Fixes: e2013b212f9f ("net/mlx5_core: Add RQ and SQ event handling") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/3181433ccdd695c63560eeeb3f0c990961732101.1745839855.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-05IB/cm: Drop lockdep assert and WARN when freeing old msgVlad Dumitrescu
The send completion handler can run after cm_id has advanced to another message. The cm_id lock is not needed in this case, but a recent change re-used cm_free_priv_msg(), which asserts that the lock is held and WARNs if the cm_id's currently outstanding msg is different than the one being freed. Fixes: 1e5159219076 ("IB/cm: Do not hold reference on cm_id unless needed") Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Sean Hefty <shefty@nvidia.com> Link: https://patch.msgid.link/0c364c29142f72b7875fdeba51f3c9bd6ca863ee.1745839788.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-27IB/hfi1: Adjust fd->entry_to_rb allocation typeKees Cook
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct tid_rb_node **", but the return type will be "struct rb_node **". These are the same allocation size (pointer size), but the types do not match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250426061247.work.261-kees@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-27IB/mthca: Adjust buddy->bits allocation typeKees Cook
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "unsigned long **", but the returned type will be "long **". These are the same allocation size (pointer size), but the types do not match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250426061208.work.000-kees@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for CMDQ dumpingJunxian Huang
Add trace for CMDQ dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | kworker/u512:1-14003 [089] b..1. 50737.238304: hns_cmdq_req: 0000:bd:00.0 cmdq opcode:0x8500, flag:0x1, retval:0x0, data:{0x2,0x0,0x0,0xffff0000,0x32323232,0x0} kworker/u512:1-14003 [089] b..1. 50737.238316: hns_cmdq_resp: 0000:bd:00.0 cmdq opcode:0x8500, flag:0x2, retval:0x0, data:{0x2,0x0,0x0,0xffff0000,0x32323232,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Include hnae3.h in hns_roce_hw_v2.hJunxian Huang
hns_roce_hw_v2.h has a direct dependency on hnae3.h due to the inline function hns_roce_write64(), but it doesn't include this header currently. This leads to that files including hns_roce_hw_v2.h must also include hnae3.h to avoid compilation errors, even if they themselves don't really rely on hnae3.h. This doesn't make sense, hns_roce_hw_v2.h should include hnae3.h directly. Fixes: d3743fa94ccd ("RDMA/hns: Fix the chip hanging caused by sending doorbell during reset") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for MR/MTR attribute dumpingJunxian Huang
Add trace for MR/MTR attribute dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | ib_send_bw-14751 [111] ..... 8763.823038: hns_buf_attr: rg cnt:1, pg_sft:0xc, mtt_only:no, rg 0 (sz:131072, hop:2), rg 1 (sz:0, hop:0), rg 2 (sz:0, hop:0) ib_send_bw-14751 [111] ..... 8763.823118: hns_mr: iova:0xffffb2968000, size:131072, key:512, pd:1, pbl_hop:1, npages:4, type:0, status:0 Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for AEQE dumpingJunxian Huang
Add trace for AEQE dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | <idle>-0 [120] d.h1. 7995.835587: hns_ae_info: event 19 aeqe: {0x80006013,0x0,0x0,0x10d2c,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for WQE dumpingJunxian Huang
Add trace for WQE dumping, including SQ, RQ and SRQ. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | roce_test_main-22730 [074] d..1. 16133.898282: hns_sq_wqe: SQ 0xc wqe (0x0/0xffff0820a6076060): {0x180,0x639c,0x0,0x1000000,0x0,0x0,0x0,0x0, 0x639c,0x300,0xf7e38000,0x0,0x0,0x0,0x0,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for flush CQEJunxian Huang
Add trace to print the producer index of QP when triggering flush CQE. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | ib_send_bw-11474 [075] d..1. 2393.434738: hns_sq_flush_cqe: SQ 0x2 flush head 0xb5c7. ib_send_bw-11474 [075] d..1. 2393.434739: hns_rq_flush_cqe: RQ 0x2 flush head 0. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/core: Move ODP capability definitions to uapiDaisuke Matsuda
The bits are used from both kernel space and userland, so they should be placed in UAPI. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250418051345.1022339-2-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/rxe: Remove 32-bit architecture supportDaisuke Matsuda
Major linux distibutions have phased out support for 32-bit machines. Since rxe is primarily used for development and testing, the benefit of maintaining 32-bit support is minimal. This change simplifies ATOMIC WRITE implementations and improves maintainability of the driver. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250421025101.3588139-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/rxe: Remove unused rxe_run_taskDr. David Alan Gilbert
rxe_run_task() has been unused since 2024's commit 23bc06af547f ("RDMA/rxe: Don't call direct between tasks") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250419132725.199785-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bugZhu Yanjun
Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 assign_lock_key kernel/locking/lockdep.c:986 [inline] register_lock_class+0x4a3/0x4c0 kernel/locking/lockdep.c:1300 __lock_acquire+0x99/0x1ba0 kernel/locking/lockdep.c:5110 lock_acquire kernel/locking/lockdep.c:5866 [inline] lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5823 __timer_delete_sync+0x152/0x1b0 kernel/time/timer.c:1644 rxe_qp_do_cleanup+0x5c3/0x7e0 drivers/infiniband/sw/rxe/rxe_qp.c:815 execute_in_process_context+0x3a/0x160 kernel/workqueue.c:4596 __rxe_cleanup+0x267/0x3c0 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_qp+0x3f7/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:604 create_qp+0x62d/0xa80 drivers/infiniband/core/verbs.c:1250 ib_create_qp_kernel+0x9f/0x310 drivers/infiniband/core/verbs.c:1361 ib_create_qp include/rdma/ib_verbs.h:3803 [inline] rdma_create_qp+0x10c/0x340 drivers/infiniband/core/cma.c:1144 rds_ib_setup_qp+0xc86/0x19a0 net/rds/ib_cm.c:600 rds_ib_cm_initiate_connect+0x1e8/0x3d0 net/rds/ib_cm.c:944 rds_rdma_cm_event_handler_cmn+0x61f/0x8c0 net/rds/rdma_transport.c:109 cma_cm_event_handler+0x94/0x300 drivers/infiniband/core/cma.c:2184 cma_work_handler+0x15b/0x230 drivers/infiniband/core/cma.c:3042 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> The root cause is as below: In the function rxe_create_qp, the function rxe_qp_from_init is called to create qp, if this function rxe_qp_from_init fails, rxe_cleanup will be called to handle all the allocated resources, including the timers: retrans_timer and rnr_nak_timer. The function rxe_qp_from_init calls the function rxe_qp_init_req to initialize the timers: retrans_timer and rnr_nak_timer. But these timers are initialized in the end of rxe_qp_init_req. If some errors occur before the initialization of these timers, this problem will occur. The solution is to check whether these timers are initialized or not. If these timers are not initialized, ignore these timers. Fixes: 8700e3e7c485 ("Soft RoCE driver") Reported-by: syzbot+4edb496c3cad6e953a31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4edb496c3cad6e953a31 Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250419080741.1515231-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/cma: Remove unused rdma_res_to_idDr. David Alan Gilbert
The last use of rdma_res_to_id() was removed in 2020 by commi t211cd9459fda ("RDMA: Add dedicated CM_ID resource tracker function") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250418165848.241305-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: Add support of 4M, 1G, and 2G pagesKonstantin Taranov
Check PF capability flag whether the 4M, 1G, and 2G pages are supported. Add these pages sizes to mana_ib, if supported. Define possible page sizes in enum gdma_page_type and remove unused enum atb_page_size. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: support of the zero based MRsKonstantin Taranov
Add IB_ZERO_BASED to the valid flags and use the corresponding MR creation request for the zero based memory. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: Access remote atomic for MRsKonstantin Taranov
Add IB_ACCESS_REMOTE_ATOMIC to the valid flags for MRs and use the corresponding flag bit during MR creation in the HW. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-11RDMA/hns: initialize db in update_srq_db()Chen Linxuan
On x86_64 with gcc version 13.3.0, I compile drivers/infiniband/hw/hns/hns_roce_hw_v2.c with: make defconfig ./scripts/kconfig/merge_config.sh .config <( echo CONFIG_COMPILE_TEST=y echo CONFIG_HNS3=m echo CONFIG_INFINIBAND=m echo CONFIG_INFINIBAND_HNS_HIP08=m ) make KCFLAGS="-fno-inline-small-functions -fno-inline-functions-called-once" \ drivers/infiniband/hw/hns/hns_roce_hw_v2.o Then I get a compile error: CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers CC [M] drivers/infiniband/hw/hns/hns_roce_hw_v2.o In file included from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:47: drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'update_srq_db': drivers/infiniband/hw/hns/hns_roce_common.h:74:17: error: 'db' is used uninitialized [-Werror=uninitialized] 74 | *((__le32 *)_ptr + (field_h) / 32) &= \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:90:17: note: in expansion of macro '_hr_reg_clear' 90 | _hr_reg_clear(ptr, field_type, field_h, field_l); \ | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write' 95 | #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val) | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:948:9: note: in expansion of macro 'hr_reg_write' 948 | hr_reg_write(&db, DB_TAG, srq->srqn); | ^~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:946:31: note: 'db' declared here 946 | struct hns_roce_v2_db db; | ^~ cc1: all warnings being treated as errors Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Co-developed-by: Winston Wen <wentao@uniontech.com> Signed-off-by: Winston Wen <wentao@uniontech.com> Link: https://patch.msgid.link/FF922C77946229B6+20250411105459.90782-5-chenlinxuan@uniontech.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-11RDMA/rxe: Fix mismatched type declarationsDaisuke Matsuda
Some functions return int values while they are defined as enum resp_states variables. This patch resolves the mismatches in rxe. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250409102701.1275265-1-matsuda-daisuke@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-09RDMA: Don't use %pK through printkThomas Weißschuh
In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250407-restricted-pointers-infiniband-v1-1-22b20504b84d@linutronix.de Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-09RDMA/rxe: Enable ODP in ATOMIC WRITE operationDaisuke Matsuda
Add rxe_odp_do_atomic_write() so that ODP specific steps are applied to ATOMIC WRITE requests. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250324075649.3313968-3-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-08RDMA/rxe: Enable ODP in RDMA FLUSH operationDaisuke Matsuda
For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific steps are executed. Otherwise, no additional consideration is required. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250324075649.3313968-2-matsuda-daisuke@fujitsu.com Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-07IB/cm: use rwlock for MAD agent lockJacob Moroni
In workloads where there are many processes establishing connections using RDMA CM in parallel (large scale MPI), there can be heavy contention for mad_agent_lock in cm_alloc_msg. This contention can occur while inside of a spin_lock_irq region, leading to interrupts being disabled for extended durations on many cores. Furthermore, it leads to the serialization of rdma_create_ah calls, which has negative performance impacts for NICs which are capable of processing multiple address handle creations in parallel. The end result is the machine becoming unresponsive, hung task warnings, netdev TX timeouts, etc. Since the lock appears to be only for protection from cm_remove_one, it can be changed to a rwlock to resolve these issues. Reproducer: Server: for i in $(seq 1 512); do ucmatose -c 32 -p $((i + 5000)) & done Client: for i in $(seq 1 512); do ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 & done Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock") Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com Signed-off-by: Jacob Moroni <jmoroni@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/hns: Remove unused parametersChengchang Tang
Remove unused parameters. Link: https://patch.msgid.link/r/20250327114724.3454268-2-huangjunxian6@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07IB/hfi1: Avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unused flex-array member `class_data` from `struct opa_mad_notice_attr`. Fix the following warning: drivers/infiniband/hw/hfi1/mad.c:23:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://patch.msgid.link/r/Z-wiYkll8Vo3ME3P@kspp Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/core: Convert to use ERR_CAST()Li Haoran
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/20250401211015750qxOfU9XZ8QgKizM1Lcyq2@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/uverbs: Convert to use ERR_CAST()Li Haoran
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/202504012109233981_YPVbd4wQzmAzP3tA5IG@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/core: Convert to use ERR_CAST()Li Haoran
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/20250401210840146_IyrV3zlejzz3eAnDmMSB@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA: Replace msecs_to_jiffies with secs_to_jiffies for timeoutPeng Jiang
In drivers/infiniband/hw/mlx4/mcg.c and drivers/infiniband/hw/mlx5/mr.c, `msecs_to_jiffies` is used to convert milliseconds to jiffies. For constant milliseconds, using `msecs_to_jiffies` introduces additional computational overhead. For example, it is unnecessary to check if m > jiffies_to_msecs(MAX_JIFFY_OFFSET) or (int)m < 0 for constants, while using `secs_to_jiffies` can avoid these extra calculations. Link: https://patch.msgid.link/r/20250326221955611qu6Ix3Pt5WgKvhL6sTySX@zte.com.cn Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/mlx5: convert timeouts to secs_to_jiffies()Easwar Hariharan
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) Link: https://patch.msgid.link/r/20250219-rdma-secs-to-jiffies-v1-2-b506746561a9@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-06Linux 6.15-rc1v6.15-rc1Linus Torvalds
2025-04-06tools/include: make uapi/linux/types.h usable from assemblyThomas Weißschuh
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when included from assembly code. Mirror this behaviour in the tools/ variant. Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the toolchain automatically. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06Merge tag 'turbostat-2025.05.06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - support up to 8192 processors - add cpuidle governor debug telemetry, disabled by default - update default output to exclude cpuidle invocation counts - bug fixes * tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: v2025.05.06 tools/power turbostat: disable "cpuidle" invocation counters, by default tools/power turbostat: re-factor sysfs code tools/power turbostat: Restore GFX sysfs fflush() call tools/power turbostat: Document GNR UncMHz domain convention tools/power turbostat: report CoreThr per measurement interval tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192 tools/power turbostat: Add idle governor statistics reporting tools/power turbostat: Fix names matching tools/power turbostat: Allow Zero return value for some RAPL registers tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options
2025-04-06Merge tag 'soundwire-6.15-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT * tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE
2025-04-06tools/power turbostat: v2025.05.06Len Brown
Support up to 8192 processors Add cpuidle governor debug telemetry, disabled by default Update default output to exclude cpuidle invocation counts Bug fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: disable "cpuidle" invocation counters, by defaultLen Brown
Create "pct_idle" counter group, the sofware notion of residency so it can now be singled out, independent of other counter groups. Create "cpuidle" group, the cpuidle invocation counts. Disable "cpuidle", by default. Create "swidle" = "cpuidle" + "pct_idle". Undocument "sysfs", the old name for "swidle", but keep it working for backwards compatibilty. Create "hwidle", all the HW idle counters Modify "idle", enabled by default "idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle") Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06Merge tag 'perf-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a perf events time accounting bug" * tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix child_total_time_enabled accounting bug at task exit
2025-04-06Merge tag 'sched-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06Disable SLUB_TINY for build testingLinus Torvalds
... and don't error out so hard on missing module descriptions. Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1'). After that commit the warning became an unconditional hard error. And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed. The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre. So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds. Reported-by: Damian Tometzki <damian@riscv-rocks.de> Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/ Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06tools/power turbostat: re-factor sysfs codeLen Brown
Probe cpuidle "sysfs" residency and counts separately, since soon we will make one disabled on, and the other disabled off. Clarify that some BIC (build-in-counters) are actually "groups". since we're about to re-name some of those groups. no functional change. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Restore GFX sysfs fflush() callZhang Rui
Do fflush() to discard the buffered data, before each read of the graphics sysfs knobs. Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Document GNR UncMHz domain conventionLen Brown
Document that on Intel Granite Rapids Systems, Uncore domains 0-2 are CPU domains, and uncore domains 3-4 are IO domains. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: report CoreThr per measurement intervalLen Brown
The CoreThr column displays total thermal throttling events since boot time. Change it to report events during the measurement interval. This is more useful for showing a user the current conditions. Total events since boot time are still available to the user via /sys/devices/system/cpu/cpu*/thermal_throttle/* Document CoreThr on turbostat.8 Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print") Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Chen Yu <yu.c.chen@intel.com>
2025-04-06tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192Justin Ernst
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output: "turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151" A similar error appears with the use of turbostat --cpu when the inputted cpu range contains a cpu number >= 1024: # turbostat -c 1100-1151 "--cpu 1100-1151" malformed ... Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS. It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low. For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS brings support for parsing cpu numbers >= 1024. Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64. Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06Merge tag 'timers-cleanups-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()
2025-04-06Merge tag 'irq-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more irq updates from Thomas Gleixner: "A set of updates for the interrupt subsystem: - A treewide cleanup for the irq_domain code, which makes the naming consistent and gets rid of the original oddity of naming domains 'host'. This is a trivial mechanical change and is done late to ensure that all instances have been catched and new code merged post rc1 wont reintroduce new instances. - A trivial consistency fix in the migration code The recent introduction of irq_force_complete_move() in the core code, causes a problem for the nostalgia crowd who maintains ia64 out of tree. The code assumes that hierarchical interrupt domains are enabled and dereferences irq_data::parent_data unconditionally. That works in mainline because both architectures which enable that code have hierarchical domains enabled. Though it breaks the ia64 build, which enables the functionality, but does not have hierarchical domains. While it's not really a problem for mainline today, this unconditional dereference is inconsistent and trivially fixable by using the existing helper function irqd_get_parent_data(), which has the appropriate #ifdeffery in place" * tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move() irqdomain: Stop using 'host' for domain irqdomain: Rename irq_get_default_host() to irq_get_default_domain() irqdomain: Rename irq_set_default_host() to irq_set_default_domain()