summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-09-15ipv6: lockless IPV6_MULTICAST_ALL implementationEric Dumazet
Move np->mc_all to an atomic flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_RECVERR_RFC4884 implementationEric Dumazet
Move np->recverr_rfc4884 to an atomic flag to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MINHOPCOUNT implementationEric Dumazet
Add one missing READ_ONCE() annotation in do_ipv6_getsockopt() and make IPV6_MINHOPCOUNT setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MTU implementationEric Dumazet
np->frag_size can be read/written without holding socket lock. Add missing annotations and make IPV6_MTU setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MULTICAST_HOPS implementationEric Dumazet
This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MULTICAST_LOOP implementationEric Dumazet
Add inet6_{test|set|clear|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_UNICAST_HOPS implementationEric Dumazet
Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14ipv6: mcast: Remove redundant comparison in igmp6_mcf_get_next()Gavrilov Ilia
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084100.1502379-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next()Gavrilov Ilia
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084039.1501984-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udplite: fix various data-racesEric Dumazet
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udplite: remove UDPLITE_BITEric Dumazet
This flag is set but never read, we can remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: annotate data-races around udp->encap_typeEric Dumazet
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GROEric Dumazet
Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flagsEric Dumazet
These are read locklessly, move them to udp_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: add missing WRITE_ONCE() around up->encap_rcvEric Dumazet
UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv while other cpus read it. Fixes: 067b207b281d ("[UDP]: Cleanup UDP encapsulation code") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->gro_enabled to udp->udp_flagsEric Dumazet
syzbot reported that udp->gro_enabled can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->no_check6_rx to udp->udp_flagsEric Dumazet
syzbot reported that udp->no_check6_rx can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->no_check6_tx to udp->udp_flagsEric Dumazet
syzbot reported that udp->no_check6_tx can be read locklessly. Use one atomic bit from udp->udp_flags Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: introduce udp->udp_flagsEric Dumazet
According to syzbot, it is time to use proper atomic flags for various UDP flags. Add udp_flags field, and convert udp->corkflag to first bit in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().Kuniyuki Iwashima
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb. However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue. When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames(). Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely. Note this does not change SOCK_SEQPACKET behaviour. Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14vsock: send SIGPIPE on write to shutdowned socketArseniy Krasnov
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-13netfilter: nf_tables: Fix entries val in rule reset audit logPhil Sutter
The value in idx and the number of rules handled in that particular __nf_tables_dump_rules() call is not identical. The former is a cursor to pick up from if multiple netlink messages are needed, so its value is ever increasing. Fixing this is not just a matter of subtracting s_idx from it, though: When resetting rules in multiple chains, __nf_tables_dump_rules() is called for each and cb->args[0] is not adjusted in between. Introduce a dedicated counter to record the number of rules reset in this call in a less confusing way. While being at it, prevent the direct return upon buffer exhaustion: Any rules previously dumped into that skb would evade audit logging otherwise. Fixes: 9b5ba5c9c5109 ("netfilter: nf_tables: Unbreak audit log reset") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-13netfilter: conntrack: fix extension size tableFlorian Westphal
The size table is incorrect due to copypaste error, this reserves more size than needed. TSTAMP reserved 32 instead of 16 bytes. TIMEOUT reserved 16 instead of 8 bytes. Fixes: 5f31edc0676b ("netfilter: conntrack: move extension sizes into core") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-13NFSv4.1: fix pnfs MDS=DS session trunkingOlga Kornievskaia
Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13Revert "SUNRPC: Fail faster on bad verifier"Trond Myklebust
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724. The premise of this commit was incorrect. There are exactly 2 cases where rpcauth_checkverf() will return an error: 1) If there was an XDR decode problem (i.e. garbage data). 2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC. In the second case, there are again 2 subcases: a) The GSS context expires, in which case gss_validate() will force a new context negotiation on retry by invalidating the cred. b) The sequence number check failed because an RPC call timed out, and the client retransmitted the request using a new sequence number, as required by RFC2203. In neither subcase is this a fatal error. Reported-by: Russell Cattelan <cattelan@thebarn.com> Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13SUNRPC: Mark the cred for revalidation if the server rejects itTrond Myklebust
If the server rejects the credential as being stale, or bad, then we should mark it for revalidation before retransmitting. Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13net/sched: cls_route: make netlink errors meaningfulPedro Tammela
Use netlink extended ack and parsing policies to return more meaningful errors instead of the relying solely on errnos. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13wifi: mac80211: fix mesh id corruption on 32 bit systemsFelix Fietkau
Since the changed field size was increased to u64, mesh_bss_info_changed pulls invalid bits from the first 3 bytes of the mesh id, clears them, and passes them on to ieee80211_link_info_change_notify, because ifmsh->mbss_changed was not updated to match its size. Fix this by turning into ifmsh->mbss_changed into an unsigned long array with 64 bit size. Fixes: 15ddba5f4311 ("wifi: mac80211: consistently use u64 for BSS changes") Reported-by: Thomas Hühn <thomas.huehn@hs-nordhausen.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230913050134.53536-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13xfrm: fix a data-race in xfrm_gen_index()Eric Dumazet
xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock. This means we must use a per-netns idx_generator variable, instead of a static one. Alternative would be to use an atomic variable. syzbot reported: BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0: xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline] xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1: xfrm_sk_policy_insert+0x13e/0x640 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00006ad8 -> 0x00006b18 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: 1121994c803f ("netns xfrm: policy insertion in netns") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-09-13tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.Kuniyuki Iwashima
Since bhash2 was introduced, the example below does not work as expected. These two bind() should conflict, but the 2nd bind() now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:127.0.0.1', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find() fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates a new tb2 for the 2nd socket. Then, we call inet_csk_bind_conflict() that checks conflicts in the new tb2 by inet_bhash2_conflict(). However, the new tb2 does not include the 1st socket, thus the bind() finally succeeds. In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find() returns the 1st socket's tb2. Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1, the 2nd bind() fails properly for the same reason mentinoed in the previous commit. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13tcp: Fix bind() regression for v4-mapped-v6 wildcard address.Kuniyuki Iwashima
Andrei Vagin reported bind() regression with strace logs. If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:0.0.0.0', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the v4-mapped-v6 wildcard address. The example above does not work after commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket"), but the blamed change is not the commit. Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated as 0.0.0.0, and the sequence above worked by chance. Technically, this case has been broken since bhash2 was introduced. Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, the 2nd bind() fails properly because we fall back to using bhash to detect conflicts for the v4-mapped-v6 address. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Andrei Vagin <avagin@google.com> Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13tcp: Factorise sk_family-independent comparison in ↵Kuniyuki Iwashima
inet_bind2_bucket_match(_addr_any). This is a prep patch to make the following patches cleaner that touch inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any(). Both functions have duplicated comparison for netns, port, and l3mdev. Let's factorise them. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-09-11 (i40e, iavf) This series contains updates to i40e and iavf drivers. Andrii ensures all VSIs are cleaned up for remove in i40e. Brett reworks logic for setting promiscuous mode that can, currently, cause incorrect states on iavf. --- v2: - Remove redundant i40e_vsi_free_q_vectors() and kfree() calls (patch 1) v1: https://lore.kernel.org/netdev/20230905180521.887861-1-anthony.l.nguyen@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-12tcp: defer regular ACK while processing socket backlogEric Dumazet
This idea came after a particular workload requested the quickack attribute set on routes, and a performance drop was noticed for large bulk transfers. For high throughput flows, it is best to use one cpu running the user thread issuing socket system calls, and a separate cpu to process incoming packets from BH context. (With TSO/GRO, bottleneck is usually the 'user' cpu) Problem is the user thread can spend a lot of time while holding the socket lock, forcing BH handler to queue most of incoming packets in the socket backlog. Whenever the user thread releases the socket lock, it must first process all accumulated packets in the backlog, potentially adding latency spikes. Due to flood mitigation, having too many packets in the backlog increases chance of unexpected drops. Backlog processing unfortunately shifts a fair amount of cpu cycles from the BH cpu to the 'user' cpu, thus reducing max throughput. This patch takes advantage of the backlog processing, and the fact that ACK are mostly cumulative. The idea is to detect we are in the backlog processing and defer all eligible ACK into a single one, sent from tcp_release_cb(). This saves cpu cycles on both sides, and network resources. Performance of a single TCP flow on a 200Gbit NIC: - Throughput is increased by 20% (100Gbit -> 120Gbit). - Number of generated ACK per second shrinks from 240,000 to 40,000. - Number of backlog drops per second shrinks from 230 to 0. Benchmark context: - Regular netperf TCP_STREAM (no zerocopy) - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids) - MAX_SKB_FRAGS = 17 (~60KB per GRO packet) This feature is guarded by a new sysctl, and enabled by default: /proc/sys/net/ipv4/tcp_backlog_ack_defer Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net: call prot->release_cb() when processing backlogEric Dumazet
__sk_flush_backlog() / sk_flush_backlog() are used when TCP recvmsg()/sendmsg() process large chunks, to not let packets in the backlog too long. It makes sense to call tcp_release_cb() to also process actions held in sk->sk_tsq_flags for smoother scheduling. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12tcp: no longer release socket ownership in tcp_release_cb()Eric Dumazet
This partially reverts c3f9b01849ef ("tcp: tcp_release_cb() should release socket ownership"). prequeue has been removed by Florian in commit e7942d0633c4 ("tcp: remove prequeue support") __tcp_checksum_complete_user() being gone, we no longer have to release socket ownership in tcp_release_cb(). This is a prereq for third patch in the series ("net: call prot->release_cb() when processing backlog"). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net: dst: remove unnecessary input parameter in dst_alloc and dst_initZhengchao Shao
Since commit 1202cdd66531("Remove DECnet support from kernel") has been merged, all callers pass in the initial_ref value of 1 when they call dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is declared and replace initial_ref with 1 in dst_alloc(). Also when all callers call dst_init(), the value of initial_ref is 1. Therefore, remove the input parameter initial_ref of the dst_init() and replace initial_ref with the value 1 in dst_init. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()Liu Jian
I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9 CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18 This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module. If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information. Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-11wifi: cfg80211: fix cqm_config access raceJohannes Berg
Max Schulze reports crashes with brcmfmac. The reason seems to be a race between userspace removing the CQM config and the driver calling cfg80211_cqm_rssi_notify(), where if the data is freed while cfg80211_cqm_rssi_notify() runs it will crash since it assumes wdev->cqm_config is set. This can't be fixed with a simple non-NULL check since there's nothing we can do for locking easily, so use RCU instead to protect the pointer, but that requires pulling the updates out into an asynchronous worker so they can sleep and call back into the driver. Since we need to change the free anyway, also change it to go back to the old settings if changing the settings fails. Reported-and-tested-by: Max Schulze <max.schulze@online.de> Closes: https://lore.kernel.org/r/ac96309a-8d8d-4435-36e6-6d152eb31876@online.de Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11wifi: cfg80211: validate AP phy operation before starting itAditya Kumar Singh
Many regulatories can have HE/EHT Operation as not permitted. In such cases, AP should not be allowed to start if it is using a channel having the no operation flag set. However, currently there is no such check in place. Fix this issue by validating such IEs sent during start AP against the channel flags. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230905064857.1503-1-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11wifi: cfg80211/mac80211: hold link BSSes when assoc fails for MLO connectionWen Gong
When connect to MLO AP with more than one link, and the assoc response of AP is not success, then cfg80211_unhold_bss() is not called for all the links' cfg80211_bss except the primary link which means the link used by the latest successful association request. Thus the hold value of the cfg80211_bss is not reset to 0 after the assoc fail, and then the __cfg80211_unlink_bss() will not be called for the cfg80211_bss by __cfg80211_bss_expire(). Then the AP always looks exist even the AP is shutdown or reconfigured to another type, then it will lead error while connecting it again. The detail info are as below. When connect with muti-links AP, cfg80211_hold_bss() is called by cfg80211_mlme_assoc() for each cfg80211_bss of all the links. When assoc response from AP is not success(such as status_code==1), the ieee80211_link_data of non-primary link(sdata->link[link_id]) is NULL because ieee80211_assoc_success()->ieee80211_vif_update_links() is not called for the links. Then struct cfg80211_rx_assoc_resp resp in cfg80211_rx_assoc_resp() and struct cfg80211_connect_resp_params cr in __cfg80211_connect_result() will only have the data of the primary link, and finally function cfg80211_connect_result_release_bsses() only call cfg80211_unhold_bss() for the primary link. Then cfg80211_bss of the other links will never free because its hold is always > 0 now. Hence assign value for the bss and status from assoc_data since it is valid for this case. Also assign value of addr from assoc_data when the link is NULL because the addrs of assoc_data and link both represent the local link addr and they are same value for success connection. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Link: https://lore.kernel.org/r/20230825070055.28164-1-quic_wgong@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11netfilter: nf_tables: disallow element removal on anonymous setsPablo Neira Ayuso
Anonymous sets need to be populated once at creation and then they are bound to rule since 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase"), otherwise transaction reports EINVAL. Userspace does not need to delete elements of anonymous sets that are not yet bound, reject this with EOPNOTSUPP. From flush command path, skip anonymous sets, they are expected to be bound already. Otherwise, EINVAL is hit at the end of this transaction for unbound sets. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-11kcm: Fix memory leak in error path of kcm_sendmsg()Shigeru Yoshida
syzbot reported a memory leak like below: BUG: memory leak unreferenced object 0xffff88810b088c00 (size 240): comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s) hex dump (first 32 bytes): 00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634 [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline] [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815 [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline] [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748 [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494 [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548 [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577 [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append newly allocated skbs to 'head'. If some bytes are copied, an error occurred, and jumped to out_error label, 'last_skb' is left unmodified. A later kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the 'head' frag_list and causing the leak. This patch fixes this issue by properly updating the last allocated skb in 'last_skb'. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11hsr: Fix uninit-value access in fill_frame_info()Ziyang Xuan
Syzbot reports the following uninit-value access problem. ===================================================== BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline] BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 fill_frame_info net/hsr/hsr_forward.c:601 [inline] hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560 __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:644 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794 packet_alloc_skb net/packet/af_packet.c:2936 [inline] packet_snd net/packet/af_packet.c:3030 [inline] packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 It is because VLAN not yet supported in hsr driver. Return error when protocol is ETH_P_8021Q in fill_frame_info() now to fix it. Fixes: 451d8123f897 ("net: prp: add packet handling support") Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9 Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in ↵Guangguan Wang
smcr_port_add While doing smcr_port_add, there maybe linkgroup add into or delete from smc_lgr_list.list at the same time, which may result kernel crash. So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add. The crash calltrace show below: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014 Workqueue: events smc_ib_port_event_work [smc] RIP: 0010:smcr_port_add+0xa6/0xf0 [smc] RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297 RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918 R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4 R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08 FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0 PKRU: 55555554 Call Trace: smc_ib_port_event_work+0x18f/0x380 [smc] process_one_work+0x19b/0x340 worker_thread+0x30/0x370 ? process_one_work+0x340/0x340 kthread+0x114/0x130 ? __kthread_cancel_work+0x50/0x50 ret_from_fork+0x1f/0x30 Fixes: 1f90a05d9ff9 ("net/smc: add smcr_port_add() and smcr_link_up() processing") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net/smc: bugfix for smcr v2 server connect success statisticGuangguan Wang
In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used to determin whether to increase the v1 statistic or the v2 statistic. It is correct for SMCD. But for SMCR, smcr_version should be used. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-08net: ipv4: fix one memleak in __inet_del_ifa()Liu Jian
I got the below warning when do fuzzing test: unregister_netdevice: waiting for bond0 to become free. Usage count = 2 It can be repoduced via: ip link add bond0 type bond sysctl -w net.ipv4.conf.bond0.promote_secondaries=1 ip addr add 4.117.174.103/0 scope 0x40 dev bond0 ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0 ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0 ip addr del 4.117.174.103/0 scope 0x40 dev bond0 ip link delete bond0 type bond In this reproduction test case, an incorrect 'last_prim' is found in __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40) is lost. The memory of the secondary address is leaked and the reference of in_device and net_device is leaked. Fix this problem: Look for 'last_prim' starting at location of the deleted IP and inserting the promoted IP into the location of 'last_prim'. Fixes: 0ff60a45678e ("[IPV4]: Fix secondary IP addresses after promotion") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-07Merge tag 'net-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ...
2023-09-08netfilter: nft_set_hash: try later when GC hits EAGAIN on iterationPablo Neira Ayuso
Skip GC run if iterator rewinds to the beginning with EAGAIN, otherwise GC might collect the same element more than once. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>