summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-05-01arp: Factorise ip_route_output() call in arp_req_set() and arp_req_delete().Kuniyuki Iwashima
When ioctl(SIOCDARP/SIOCSARP) is issued for non-proxy entry (no ATF_COM) without arpreq.arp_dev[] set, arp_req_set() and arp_req_delete() looks up dev based on IPv4 address by ip_route_output(). Let's factorise the same code as arp_req_dev(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240430015813.71143-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01arp: Validate netmask earlier for SIOCDARP and SIOCSARP in arp_ioctl().Kuniyuki Iwashima
When ioctl(SIOCDARP/SIOCSARP) is issued with ATF_PUBL, r.arp_netmask must be 0.0.0.0 or 255.255.255.255. Currently, the netmask is validated in arp_req_delete_public() or arp_req_set_public() under rtnl_lock(). We have ATF_NETMASK test in arp_ioctl() before holding rtnl_lock(), so let's move the netmask validation there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240430015813.71143-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01arp: Move ATF_COM setting in arp_req_set().Kuniyuki Iwashima
In arp_req_set(), if ATF_PERM is set in arpreq.arp_flags, ATF_COM is set automatically. The flag will be used later for neigh_update() only when a neighbour entry is found. Let's set ATF_COM just before calling neigh_update(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240430015813.71143-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01rxrpc: Clients must accept conn from any addressJeffrey Altman
The find connection logic of Transarc's Rx was modified in the mid-1990s to support multi-homed servers which might send a response packet from an address other than the destination address in the received packet. The rules for accepting a packet by an Rx initiator (RX_CLIENT_CONNECTION) were altered to permit acceptance of a packet from any address provided that the port number was unchanged and all of the connection identifiers matched (Epoch, CID, SecurityClass, ...). This change applies the same rules to the Linux implementation which makes it consistent with IBM AFS 3.6, Arla, OpenAFS and AuriStorFS. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: Jeffrey Altman <jaltman@auristor.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/20240419163057.4141728-1-marc.dionne@auristor.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01netfs, 9p: Implement helpers for new write codeDavid Howells
Implement the helpers for the new write code in 9p. There's now an optional ->prepare_write() that allows the filesystem to set the parameters for the next write, such as maximum size and maximum segment count, and an ->issue_write() that is called to initiate an (asynchronous) write operation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01ipv6: anycast: use call_rcu_hurry() in aca_put()Eric Dumazet
This is a followup of commit b5327b9a300e ("ipv6: use call_rcu_hurry() in fib6_info_release()"). I had another pmtu.sh failure, and found another lazy call_rcu() causing this failure. aca_free_rcu() calls fib6_info_release() which releases devices references. We must not delay it too much or risk unregister_netdevice/ref_tracker traces because references to netdev are not released in time. This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01net: core: reject skb_copy(_expand) for fraglist GSO skbsFelix Fietkau
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become invalid. Return NULL if such an skb is passed to skb_copy or skb_copy_expand, in order to prevent a crash on a potential later call to skb_gso_segment. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01net: bridge: fix multicast-to-unicast with fraglist GSOFelix Fietkau
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns an invalid linearized skb. This code only needs to change the ethernet header, so pskb_copy is the right function to call here. Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01xfrm: Restrict SA direction attribute to specific netlink message typesAntony Antony
Reject the usage of the SA_DIR attribute in xfrm netlink messages when it's not applicable. This ensures that SA_DIR is only accepted for certain message types (NEWSA, UPDSA, and ALLOCSPI) Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add dir validation to "in" data path lookupAntony Antony
Introduces validation for the x->dir attribute within the XFRM input data lookup path. If the configured direction does not match the expected direction, input, increment the XfrmInStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmInStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add dir validation to "out" data path lookupAntony Antony
Introduces validation for the x->dir attribute within the XFRM output data lookup path. If the configured direction does not match the expected direction, output, increment the XfrmOutStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmOutPolError 1 XfrmOutStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add Direction to the SA in or outAntony Antony
This patch introduces the 'dir' attribute, 'in' or 'out', to the xfrm_state, SA, enhancing usability by delineating the scope of values based on direction. An input SA will restrict values pertinent to input, effectively segregating them from output-related values. And an output SA will restrict attributes for output. This change aims to streamline the configuration process and improve the overall consistency of SA attributes during configuration. This feature sets the groundwork for future patches, including the upcoming IP-TFS patch. Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-30netpoll: Fix race condition in netpoll_owner_activeBreno Leitao
KCSAN detected a race condition in netpoll: BUG: KCSAN: data-race in net_rx_action / netpoll_send_skb write (marked) to 0xffff8881164168b0 of 4 bytes by interrupt on cpu 10: net_rx_action (./include/linux/netpoll.h:90 net/core/dev.c:6712 net/core/dev.c:6822) <snip> read to 0xffff8881164168b0 of 4 bytes by task 1 on cpu 2: netpoll_send_skb (net/core/netpoll.c:319 net/core/netpoll.c:345 net/core/netpoll.c:393) netpoll_send_udp (net/core/netpoll.c:?) <snip> value changed: 0x0000000a -> 0xffffffff This happens because netpoll_owner_active() needs to check if the current CPU is the owner of the lock, touching napi->poll_owner non atomically. The ->poll_owner field contains the current CPU holding the lock. Use an atomic read to check if the poll owner is the current CPU. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240429100437.3487432-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30mptcp: ensure snd_nxt is properly initialized on connectPaolo Abeni
Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: move sysctl_mem_pcpu_rsv to net_hotdataEric Dumazet
sysctl_mem_pcpu_rsv is used in TCP fast path, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: add <net/proto_memory.h>Eric Dumazet
Move some proto memory definitions out of <net/sock.h> Very few files need them, and following patch will include <net/hotdata.h> from <net/proto_memory.h> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30tcp: move tcp_out_of_memory() to net/ipv4/tcp.cEric Dumazet
tcp_out_of_memory() has a single caller: tcp_check_oom(). Following patch will also make sk_memory_allocated() not anymore visible from <net/sock.h> and <net/tcp.h> Add const qualifier to sock argument of tcp_out_of_memory() and tcp_check_oom(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: move sysctl_skb_defer_max to net_hotdataEric Dumazet
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: move sysctl_max_skb_frags to net_hotdataEric Dumazet
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()Yunsheng Lin
rxrpc_alloc_data_txbuf() may be called with data_align being zero in none_alloc_txbuf() and rxkad_alloc_txbuf(), data_align is supposed to be an order-based alignment value, but zero is not a valid order-based alignment value, and '~(data_align - 1)' doesn't result in a valid mask-based alignment value for __page_frag_alloc_align(). Fix it by passing a valid order-based alignment value in none_alloc_txbuf() and rxkad_alloc_txbuf(). Also use page_frag_alloc_align() expecting an order-based alignment value in rxrpc_alloc_data_txbuf() to avoid doing the alignment converting operation and to catch possible invalid alignment value in the future. Remove the 'if (data_align)' checking too, as it is always true for a valid order-based alignment value. Fixes: 6b2536462fd4 ("rxrpc: Fix use of changed alignment param to page_frag_alloc_align()") Fixes: 49489bb03a50 ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240428111640.27306-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: page_pool: support error injectionJakub Kicinski
Because of caching / recycling using the general page allocation failures to induce errors in page pool allocation is very hard. Add direct error injection support to page_pool_alloc_pages(). Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240429144426.743476-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net/smc: implement DMB-merged operations of loopback-ismWen Gu
This implements operations related to merging sndbuf with peer DMB in loopback-ism. The DMB won't be freed until no sndbuf is attached to it. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: adapt cursor update when sndbuf and peer DMB are mergedWen Gu
If the local sndbuf shares the same physical memory with peer DMB, the cursor update processing needs to be adapted to ensure that the data to be consumed won't be overwritten. So in this case, the fin_curs and sndbuf_space that were originally updated after sending the CDC message should be modified to not be update until the peer updates cons_curs. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: {at|de}tach sndbuf to peer DMB if supportedWen Gu
If the device used by SMC-D supports merging local sndbuf to peer DMB, then create sndbuf descriptor and attach it to peer DMB once peer token is obtained, and detach and free the sndbuf descriptor when the connection is freed. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: add operations to merge sndbuf with peer DMBWen Gu
In some scenarios using Emulated-ISM device, sndbuf can share the same physical memory region with peer DMB to avoid data copy from one side to the other. In such case the sndbuf is only a descriptor that describes the shared memory and does not actually occupy memory, it's more like a ghost buffer. +----------+ +----------+ | socket A | | socket B | +----------+ +----------+ | | +--------+ +--------+ | sndbuf | | DMB | | desc | | desc | +--------+ +--------+ | | | +----v-----+ +--------------------------> memory | +----------+ So here introduces three new SMC-D device operations to check if this feature is supported by device, and to {attach|detach} ghost sndbuf to peer DMB. For now only loopback-ism supports this. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: register loopback-ism into SMC-D device listWen Gu
After the loopback-ism device is ready, add it to the SMC-D device list as an ISMv2 device, and always keep it at the beginning to ensure it is preferred for providing a shortcut for data transfer within the same kernel. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: ignore loopback-ism when dumping SMC-D devicesWen Gu
Since loopback-ism is not a PCI device, the PCI information fed back by smc_nl_handle_smcd_dev() does not apply to loopback-ism. So currently ignore loopback-ism when dumping SMC-D devices. The netlink function of loopback-ism will be refactored when SMC netlink interface is updated. Link: https://lore.kernel.org/r/caab067b-f5c3-490f-9259-262624c236b4@linux.ibm.com/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: mark optional smcd_ops and check for support when calledWen Gu
Some operations are not supported by new introduced Emulated-ISM, so mark them as optional and check if the device supports them when called. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: implement DMB-related operations of loopback-ismWen Gu
This implements DMB (un)registration and data move operations of loopback-ism device. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: implement ID-related operations of loopback-ismWen Gu
This implements operations related to IDs for the loopback-ism device. loopback-ism uses an Extended GID that is a 128-bit GID instead of the existing ISM 64-bit GID, and uses the CHID defined with the reserved value 0xFFFF. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: introduce loopback-ism for SMC intra-OS shortcutWen Gu
This introduces a kind of Emulated-ISM device named loopback-ism for SMCv2.1. The loopback-ism device is currently exclusive for SMC usage, and aims to provide an SMC shortcut for sockets within the same kernel, leading to improved intra-OS traffic performance. Configuration of this feature is managed through the config SMC_LO. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: decouple ism_client from SMC-D DMB registrationWen Gu
The struct 'ism_client' is specialized for s390 platform firmware ISM. So replace it with 'void' to make SMCD DMB registration helper generic for both Emulated-ISM and existing ISM. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30sctp: prefer struct_size over open coded arithmeticErick Archer
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "ids" variable is a pointer to "struct sctp_assoc_ids" and this structure ends in a flexible array: struct sctp_assoc_ids { [...] sctp_assoc_t gaids_assoc_id[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kmalloc() function. Also, refactor the code adding the "ids_size" variable to avoid sizing twice. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <erick.archer@outlook.com> Acked-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/PAXPR02MB724871DB78375AB06B5171C88B152@PAXPR02MB7248.eurprd02.prod.outlook.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30netdev: add queue statsXuan Zhuo
These stats are commonly. Support reporting those via netdev-genl queue stats. name: rx-hw-drops name: rx-hw-drop-overruns name: rx-csum-unnecessary name: rx-csum-none name: rx-csum-bad name: rx-hw-gro-packets name: rx-hw-gro-bytes name: rx-hw-gro-wire-packets name: rx-hw-gro-wire-bytes name: rx-hw-drop-ratelimits name: tx-hw-drops name: tx-hw-drop-errors name: tx-csum-none name: tx-needs-csum name: tx-hw-gso-packets name: tx-hw-gso-bytes name: tx-hw-gso-wire-packets name: tx-hw-gso-wire-bytes name: tx-hw-drop-ratelimits Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-29net: hsr: init prune_proxy_timer soonerEric Dumazet
We must initialize prune_proxy_timer before we attempt a del_timer_sync() on it. syzbot reported the following splat: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 1 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-01199-gfc48de77d69d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 assign_lock_key+0x238/0x270 kernel/locking/lockdep.c:976 register_lock_class+0x1cf/0x980 kernel/locking/lockdep.c:1289 __lock_acquire+0xda/0x1fd0 kernel/locking/lockdep.c:5014 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __timer_delete_sync+0x148/0x310 kernel/time/timer.c:1648 del_timer_sync include/linux/timer.h:185 [inline] hsr_dellink+0x33/0x80 net/hsr/hsr_netlink.c:132 default_device_exit_batch+0x956/0xa90 net/core/dev.c:11737 ops_exit_list net/core/net_namespace.c:175 [inline] cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:637 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> ODEBUG: assert_init not available (active state 0) object: ffff88806d3fcd88 object type: timer_list hint: 0x0 WARNING: CPU: 1 PID: 11 at lib/debugobjects.c:517 debug_print_object+0x17a/0x1f0 lib/debugobjects.c:514 Fixes: 5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240426163355.2613767-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-04-29 We've added 147 non-merge commits during the last 32 day(s) which contain a total of 158 files changed, 9400 insertions(+), 2213 deletions(-). The main changes are: 1) Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86 BPF JIT. This allows inlining per-CPU array and hashmap lookups and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko. 2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song. 3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction, from Alexei Starovoitov. 4) Add support for passing mark with bpf_fib_lookup helper, from Anton Protopopov. 5) Add a new bpf_wq API for deferring events and refactor sleepable bpf_timer code to keep common code where possible, from Benjamin Tissoires. 6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs to check when NULL is passed for non-NULLable parameters, from Eduard Zingerman. 7) Harden the BPF verifier's and/or/xor value tracking, from Harishankar Vishwanathan. 8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel crypto subsystem, from Vadim Fedorenko. 9) Various improvements to the BPF instruction set standardization doc, from Dave Thaler. 10) Extend libbpf APIs to partially consume items from the BPF ringbuffer, from Andrea Righi. 11) Bigger batch of BPF selftests refactoring to use common network helpers and to drop duplicate code, from Geliang Tang. 12) Support bpf_tail_call_static() helper for BPF programs with GCC 13, from Jose E. Marchesi. 13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled, from Kumar Kartikeya Dwivedi. 14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs, from David Vernet. 15) Extend the BPF verifier to allow different input maps for a given bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu. 16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan. 17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison the stack memory, from Martin KaFai Lau. 18) Improve xsk selftest coverage with new tests on maximum and minimum hardware ring size configurations, from Tushar Vyavahare. 19) Various ReST man pages fixes as well as documentation and bash completion improvements for bpftool, from Rameez Rehman & Quentin Monnet. 20) Fix libbpf with regards to dumping subsequent char arrays, from Quentin Deslandes. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits) bpf, docs: Clarify PC use in instruction-set.rst bpf_helpers.h: Define bpf_tail_call_static when building with GCC bpf, docs: Add introduction for use in the ISA Internet Draft selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args selftests/bpf: dummy_st_ops should reject 0 for non-nullable params bpf: check bpf_dummy_struct_ops program params for test runs selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops selftests/bpf: adjust dummy_st_ops_success to detect additional error bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable selftests/bpf: Add ring_buffer__consume_n test. bpf: Add bpf_guard_preempt() convenience macro selftests: bpf: crypto: add benchmark for crypto functions selftests: bpf: crypto skcipher algo selftests bpf: crypto: add skcipher to bpf crypto bpf: make common crypto API for TC/XDP programs bpf: update the comment for BTF_FIELDS_MAX selftests/bpf: Fix wq test. selftests/bpf: Use make_sockaddr in test_sock_addr selftests/bpf: Use connect_to_addr in test_sock_addr ... ==================== Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: - Fix an Oops in xs_tcp_tls_setup_socket - Fix an Oops due to missing error handling in nfs_net_init() * tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Handle error of rpc_proc_register() in nfs_net_init(). SUNRPC: add a missing rpc_stat for TCP TLS
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29inet: use call_rcu_hurry() in inet_free_ifa()Eric Dumazet
This is a followup of commit c4e86b4363ac ("net: add two more call_rcu_hurry()") Our reference to ifa->ifa_dev must be freed ASAP to release the reference to the netdev the same way. inet_rcu_free_ifa() in_dev_put() -> in_dev_finish_destroy() -> netdev_put() This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29net: give more chances to rcu in netdev_wait_allrefs_any()Eric Dumazet
This came while reviewing commit c4e86b4363ac ("net: add two more call_rcu_hurry()"). Paolo asked if adding one synchronize_rcu() would help. While synchronize_rcu() does not help, making sure to call rcu_barrier() before msleep(wait) is definitely helping to make sure lazy call_rcu() are completed. Instead of waiting ~100 seconds in my tests, the ref_tracker splats occurs one time only, and netdev_wait_allrefs_any() latency is reduced to the strict minimum. Ideally we should audit our call_rcu() users to make sure no refcount (or cascading call_rcu()) is held too long, because rcu_barrier() is quite expensive. Fixes: 0e4be9e57e8c ("net: use exponential backoff in netdev_wait_allrefs") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/all/28bbf698-befb-42f6-b561-851c67f464aa@kernel.org/T/#m76d73ed6b03cd930778ac4d20a777f22a08d6824 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-26Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-26 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 14 files changed, 168 insertions(+), 72 deletions(-). The main changes are: 1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page, from Puranjay Mohan. 2) Fix a crash in XDP with devmap broadcast redirect when the latter map is in process of being torn down, from Toke Høiland-Jørgensen. 3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF program runtime stats, from Xu Kuohai. 4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue, from Jason Xing. 5) Fix BPF verifier error message in resolve_pseudo_ldimm64, from Anton Protopopov. 6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item, from Andrii Nakryiko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64 bpf, x86: Fix PROBE_MEM runtime load check bpf: verifier: prevent userspace memory access xdp: use flags field to disambiguate broadcast redirect arm32, bpf: Reimplement sign-extension mov instruction riscv, bpf: Fix incorrect runtime stats bpf, arm64: Fix incorrect runtime stats bpf: Fix a verifier verbose message bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers MAINTAINERS: Update email address for Puranjay Mohan bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition ==================== Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-26tcp: fix tcp_grow_skb() vs tstampsEric Dumazet
I forgot to call tcp_skb_collapse_tstamp() in the case we consume the second skb in write queue. Neal suggested to create a common helper used by tcp_mtu_probe() and tcp_grow_skb(). Fixes: 8ee602c63520 ("tcp: try to send bigger TSO packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240425193450.411640-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-26rstreason: make it work in trace worldJason Xing
At last, we should let it work by introducing this reset reason in trace world. One of the possible expected outputs is: ... tcp_send_reset: skbaddr=xxx skaddr=xxx src=xxx dest=xxx state=TCP_ESTABLISHED reason=NOT_SPECIFIED Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26mptcp: introducing a helper into active reset logicJason Xing
Since we have mapped every mptcp reset reason definition in enum sk_rst_reason, introducing a new helper can cover some missing places where we have already set the subflow->reset_reason. Note: using SK_RST_REASON_NOT_SPECIFIED is the same as SK_RST_REASON_MPTCP_RST_EUNSPEC. They are both unknown. So we can convert it directly. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26mptcp: support rstreason for passive resetJason Xing
It relies on what reset options in the skb are as rfc8684 says. Reusing this logic can save us much energy. This patch replaces most of the prior NOT_SPECIFIED reasons. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26tcp: support rstreason for passive resetJason Xing
Reuse the dropreason logic to show the exact reason of tcp reset, so we can finally display the corresponding item in enum sk_reset_reason instead of reinventing new reset reasons. This patch replaces all the prior NOT_SPECIFIED reasons. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for active resetJason Xing
Like what we did to passive reset: only passing possible reset reason in each active reset path. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for passive resetJason Xing
Adjust the parameter and support passing reason of reset which is for now NOT_SPECIFIED. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26net l2tp: drop flow hash on forwardDavid Bauer
Drop the flow-hash of the skb when forwarding to the L2TP netdev. This avoids the L2TP qdisc from using the flow-hash from the outer packet, which is identical for every flow within the tunnel. This does not affect every platform but is specific for the ethernet driver. It depends on the platform including L4 information in the flow-hash. One such example is the Mediatek Filogic MT798x family of networking processors. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240424171110.13701-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>