summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/siw
AgeCommit message (Collapse)Author
2023-12-04RDMA/siw: Call orq_get_current if possibleGuoqing Jiang
Use orq_get_current() in siw_orq_empty(). Link: https://lore.kernel.org/r/20231203092655.28102-5-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/siw: Set qp_state in siw_query_qpGuoqing Jiang
Run test_query_rc_qp against siw failed since siw didn't set qp_state accordingly. To address it, introduce siw_qp_state_to_ib_qp_state which convert SIW_QP_STATE_IDLE to IB_QPS_INIT which is similar as in cxgb4. rdma-core# ./build/bin/run_tests.py --dev siw0 tests.test_qp.QPTest.test_query_rc_qp -v test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ... FAIL ====================================================================== FAIL: test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gjiang/rdma-core/tests/test_qp.py", line 284, in test_query_rc_qp self.query_qp_common_test(e.IBV_QPT_RC) File "/home/gjiang/rdma-core/tests/test_qp.py", line 265, in query_qp_common_test self.verify_qp_attrs(caps, e.IBV_QPS_INIT, qp_init_attr, qp_attr) File "/home/gjiang/rdma-core/tests/test_qp.py", line 239, in verify_qp_attrs self.assertEqual(state, attr.qp_state) AssertionError: <ibv_qp_state.IBV_QPS_INIT: 1> != 0 ---------------------------------------------------------------------- Ran 1 test in 0.057s FAILED (failures=1) Link: https://lore.kernel.org/r/20231203092655.28102-4-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/siw: Reduce memory usage of struct siw_rx_streamGuoqing Jiang
We can reduce the memory of the struct by move some of it's member. Before, /* size: 144, cachelines: 3, members: 17 */ /* sum members: 124, holes: 3, sum holes: 12 */ /* sum bitfield members: 7 bits (0 bytes) */ /* padding: 7 */ /* bit_padding: 1 bits */ After /* size: 128, cachelines: 2, members: 17 */ /* padding: 3 */ /* bit_padding: 1 bits */ Link: https://lore.kernel.org/r/20231203092655.28102-3-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/siw: Move tx_cpu aheadGuoqing Jiang
We can reduce one cacheline for the usage of struct siw_qp. Before, /* size: 1928, cachelines: 31, members: 38 */ /* sum members: 1920, holes: 2, sum holes: 8 */ /* paddings: 4, sum paddings: 13 */ /* forced alignments: 3 */ after /* size: 1920, cachelines: 30, members: 38 */ /* paddings: 4, sum paddings: 13 */ /* forced alignments: 3 */ Link: https://lore.kernel.org/r/20231203092655.28102-2-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-11-15RDMA/siw: Update comments for siw_qp_sq_processGuoqing Jiang
There is no siw_sq_work_handler in code, change it with siw_tx_thread since siw_run_sq -> siw_sq_resume -> siw_qp_sq_process. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-18-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce siw_destroy_cep_sockGuoqing Jiang
Add one helper to simplify code a bit. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310091735.oG7bTvLR-lkp@intel.com/` Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-17-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Only check attrs->cap.max_send_wr in siw_create_qpGuoqing Jiang
We can just check max_send_wr here given both max_send_wr and max_recv_wr are defined as u32 type, and we also need to ensure num_sqe (derived from max_send_wr) shouldn't be zero. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-16-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Fix typoGuoqing Jiang
Replace ORRQ with ORQ. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-15-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Remove siw_sk_save_upcallsGuoqing Jiang
Let's move it into siw_sk_assign_cm_upcalls, then we only need to get sk_callback_lock once. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-14-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Cleanup siw_acceptGuoqing Jiang
With the initialization of rv and the two added label, we can simplifiy code a bit. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-13-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce siw_free_cm_idGuoqing Jiang
Factor out a helper to simplify code. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310091656.JlrmcNXB-lkp@intel.com/ Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-12-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce siw_cep_set_free_and_putGuoqing Jiang
Add the helper which can be used in some places. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-11-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Add one parameter to siw_destroy_cpulistGuoqing Jiang
With that we can reuse it in siw_init_cpulist. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-10-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce SIW_STAG_MAX_INDEXGuoqing Jiang
Add the macro to remove magic number in the code. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-9-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Factor out siw_rx_data helperGuoqing Jiang
Remove the redundant code given they share the same logic. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-8-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: No need to check term_info.valid before call siw_send_terminateGuoqing Jiang
Remove the redundate checking since siw_send_terminate check it inside. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-7-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Remove rcu from siw_qpGuoqing Jiang
Remove it since it is not used. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-6-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Remove goto lable in siw_mmapGuoqing Jiang
Let's remove it since the failure case only falls through to the useless label. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-5-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Use iov.iov_len in kernel_sendmsgGuoqing Jiang
We can pass iov.iov_len here. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-4-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce siw_update_skb_rcvdGuoqing Jiang
There are some places share the same logic, factor a common helper for it. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-3-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-15RDMA/siw: Introduce siw_get_pageGuoqing Jiang
Add the wrapper function to get either pbl page or umem page. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-2-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13RDMA/siw: Use crypto_shash_digest() in siw_qp_prepare_tx()Eric Biggers
Simplify siw_qp_prepare_tx() by using crypto_shash_digest() instead of an init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20231029045839.154071-1-ebiggers@kernel.org Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13RDMA/siw: Use ib_umem_get() to pin user pagesBernard Metzler
Abandon siw private code to pin user pages during user memory registration, but use ib_umem_get() instead. This will help maintaining the driver in case of changes to the memory subsystem. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20231104075643.195186-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-31Merge tag 'v6.6' into rdma.git for-nextJason Gunthorpe
Resolve conflict by taking the spin_lock hunk from for-next: https://lore.kernel.org/r/20230928113851.5197a1ec@canb.auug.org.au Required for the next patch. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-02RDMA/siw: Annotate struct siw_pbl with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct siw_pbl. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-4-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11RDMA/siw: Fix connection failure handlingBernard Metzler
In case immediate MPA request processing fails, the newly created endpoint unlinks the listening endpoint and is ready to be dropped. This special case was not handled correctly by the code handling the later TCP socket close, causing a NULL dereference crash in siw_cm_work_handler() when dereferencing a NULL listener. We now also cancel the useless MPA timeout, if immediate MPA request processing fails. This patch furthermore simplifies MPA processing in general: Scheduling a useless TCP socket read in sk_data_ready() upcall is now surpressed, if the socket is already moved out of TCP_ESTABLISHED state. Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230905145822.446263-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11IB: Use capital "OR" for multiple licenses in SPDXKrzysztof Kozlowski
Documentation/process/license-rules.rst and checkpatch expect the SPDX identifier syntax for multiple licenses to use capital "OR". Correct it to keep consistent format and avoid copy-paste issues. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230823092912.122674-1-krzysztof.kozlowski@linaro.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22RDMA/siw: Call llist_reverse_order in siw_run_sqGuoqing Jiang
We can call the function to get fifo list. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-4-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22RDMA/siw: Correct wrong debug messageGuoqing Jiang
We need to print num_sle first then pbl->max_buf per the condition. Also replace mem->pbl with pbl while at it. Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-3-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22RDMA/siw: Balance the reference of cep->kref in the error pathGuoqing Jiang
The siw_connect can go to err in below after cep is allocated successfully: 1. If siw_cm_alloc_work returns failure. In this case socket is not assoicated with cep so siw_cep_put can't be called by siw_socket_disassoc. We need to call siw_cep_put twice since cep->kref is increased once after it was initialized. 2. If siw_cm_queue_work can't find a work, which means siw_cep_get is not called in siw_cm_queue_work, so cep->kref is increased twice by siw_cep_get and when associate socket with cep after it was initialized. So we need to call siw_cep_put three times (one in siw_socket_disassoc). 3. siw_send_mpareqrep returns error, this scenario is similar as 2. So we need to remove one siw_cep_put in the error path. Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-2-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-07-31RDMA/siw: Fix tx thread initialization.Bernard Metzler
Immediately removing the siw module after insertion may crash in siw_stop_tx_thread(), if the according thread did not yet had a chance to initialize its wait queue and siw_stop_tx_thread() tries to wakeup that thread. Initializing the threads state before spwaning it fixes it. Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230728114418.124328-1-bmt@zurich.ibm.com Tested-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-07-21RDMA/siw: Fabricate a GID on tun and loopback devicesChuck Lever
LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses. Currently, siw_device_create() falls back to copying the IB device's name in those cases, because an all-zero MAC address breaks the RDMA core address resolution mechanism. However, at the point when siw_device_create() constructs a GID, the ib_device::name field is uninitialized, leaving the MAC address to remain in an all-zero state. Fabricate a random artificial GID for such devices, and ensure this artificial GID is returned for all device query operations. Link: https://lore.kernel.org/r/168960673260.3007.12378736853793339110.stgit@manet.1015granger.net Reported-by: Tom Talpey <tom@talpey.com> Fixes: a2d36b02c15d ("RDMA/siw: Enable siw on tunnel devices") Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-21RDMA/siw: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Link: https://lore.kernel.org/r/20230627144339.144478-15-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-28Merge tag 'net-next-6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...
2023-06-24tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usageDavid Howells
As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't use it further in than the sendpage methods, but rather translate it to MSG_MORE and use that instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: Bernard Metzler <bmt@zurich.ibm.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: David Ahern <dsahern@kernel.org> cc: Karsten Graul <kgraul@linux.ibm.com> cc: Wenjia Zhang <wenjia@linux.ibm.com> cc: Jan Karcher <jaka@linux.ibm.com> cc: "D. Wythe" <alibuda@linux.alibaba.com> cc: Tony Lu <tonylu@linux.alibaba.com> cc: Wen Gu <guwen@linux.alibaba.com> cc: Boris Pismenny <borisp@nvidia.com> cc: Steffen Klassert <steffen.klassert@secunet.com> cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-23siw: Inline do_tcp_sendpages()David Howells
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Tom Talpey <tom@talpey.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-16RDMA: Add ib_virt_dma_to_page()Jason Gunthorpe
Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-03RDMA/siw: Remove namespace check from siw_netdev_event()Tetsuo Handa
syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy siw_device created after unshare(CLONE_NEWNET) due to net namespace check. It seems that this check was by error there and should be removed. Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13RDMA/siw: Fix potential page_array out of range accessDaniil Dulov
When seg is equal to MAX_ARRAY, the loop should break, otherwise it will result in out of range access. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru> Link: https://lore.kernel.org/r/20230227091751.589612-1-d.dulov@aladdin.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-06RDMA/siw: Fix user page pinning accountingBernard Metzler
To avoid racing with other user memory reservations, immediately account full amount of pages to be pinned. Fixes: 2251334dcac9 ("rdma/siw: application buffer management") Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Fix two build warnings on 32 bit platforms It seems the linux-next CI and 0-day bot are not testing enough 32 bit configurations, as soon as you merged the rdma pull request there were two instant reports of warnings on these sytems that I would have thought should have been covered by time in linux-next Anyhow, here are the fixes so people don't hit problems with -Werror" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Fix pointer cast warning RDMA/rxe: Fix compile warnings on 32-bit
2022-12-16RDMA/siw: Fix pointer cast warningArnd Bergmann
The previous build fix left a remaining issue in configurations with 64-bit dma_addr_t on 32-bit architectures: drivers/infiniband/sw/siw/siw_qp_tx.c: In function 'siw_get_pblpage': drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 32 | return virt_to_page((void *)paddr); | ^ Use the same double cast here that the driver uses elsewhere to convert between dma_addr_t and void*. Fixes: 0d1b756acf60 ("RDMA/siw: Pass a pointer to virt_to_page()") Link: https://lore.kernel.org/r/20221215170347.2612403-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual size of updates, a new driver, and most of the bulk focusing on rxe: - Usual typos, style, and language updates - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp - Lots of RXE updates: * Improve reply error handling for bad MR operations * Code tidying * Debug printing uses common loggers * Remove half implemented RD related stuff * Support IBA's recently defined Atomic Write and Flush operations - erdma support for atomic operations - New driver 'mana' for Ethernet HW available in Azure VMs. This driver only supports DPDK" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits) IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces RDMA: Add missed netdev_put() for the netdevice_tracker RDMA/rxe: Enable RDMA FLUSH capability for rxe device RDMA/cm: Make QP FLUSHABLE for supported device RDMA/rxe: Implement flush completion RDMA/rxe: Implement flush execution in responder side RDMA/rxe: Implement RC RDMA FLUSH service in requester side RDMA/rxe: Extend rxe packet format to support flush RDMA/rxe: Allow registering persistent flag for pmem MR only RDMA/rxe: Extend rxe user ABI to support flush RDMA: Extend RDMA kernel verbs ABI to support flush RDMA: Extend RDMA user ABI to support flush RDMA/rxe: Fix incorrect responder length checking RDMA/rxe: Fix oops with zero length reads RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define RDMA/hns: Fix XRC caps on HIP08 RDMA/hns: Fix error code of CMD RDMA/hns: Fix page size cap from firmware RDMA/hns: Fix PBL page MTR find RDMA/hns: Fix AH attr queried by query_qp ...
2022-11-30RDMA/siw: remove FOLL_FORCE usageDavid Hildenbrand
GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/20221116102659.70287-13-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-15RDMA/siw: Set defined status for work completion with undefined statusBernard Metzler
A malicious user may write undefined values into memory mapped completion queue elements status or opcode. Undefined status or opcode values will result in out-of-bounds access to an array mapping siw internal representation of opcode and status to RDMA core representation when reaping CQ elements. While siw detects those undefined values, it did not correctly set completion status to a defined value, thus defeating the whole purpose of the check. This bug leads to the following Smatch static checker warning: drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe() error: buffer overflow 'map_cqe_status' 10 <= 21 Fixes: bdf1da5df9da ("RDMA/siw: Fix immediate work request flush to completion queue") Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-09RDMA/siw: Fix immediate work request flush to completion queueBernard Metzler
Correctly set send queue element opcode during immediate work request flushing in post sendqueue operation, if the QP is in ERROR state. An undefined ocode value results in out-of-bounds access to an array for mapping the opcode between siw internal and RDMA core representation in work completion generation. It resulted in a KASAN BUG report of type 'global-out-of-bounds' during NFSoRDMA testing. This patch further fixes a potential case of a malicious user which may write undefined values for completion queue elements status or opcode, if the CQ is memory mapped to user land. It avoids the same out-of-bounds access to arrays for status and opcode mapping as described above. Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods") Reported-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20221107145057.895747-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-06Merge tag 'v6.0' into rdma.git for-nextJason Gunthorpe
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>