summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-10-13net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not setMartin KaFai Lau
It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);". Fixes: fefba7d1ae19 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
2023-10-11bpf: Implement cgroup sockaddr hooks for unix socketsDaan De Meyer
These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpfDaan De Meyer
As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11bpf: Propagate modified uaddrlen from cgroup sockaddr programsDaan De Meyer
As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents. __cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09bpf: Derive source IP addr via bpf_*_fib_lookup()Martynas Pumputis
Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21Merge tag 'net-6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq|txq_stats reference ...
2023-09-21vsock/virtio: MSG_ZEROCOPY flag supportArseniy Krasnov
This adds handling of MSG_ZEROCOPY flag on transmission path: 1) If this flag is set and zerocopy transmission is possible (enabled in socket options and transport allows zerocopy), then non-linear skb will be created and filled with the pages of user's buffer. Pages of user's buffer are locked in memory by 'get_user_pages()'. 2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so to decrease this field correctly, proper skb destructor is needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. 3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'. If this callback is set, then transport needs extra check to be able to send provided number of buffers in zerocopy mode. Currently, the only transport that needs this callback set is virtio, because this transport adds new buffers to the virtio queue and we need to check, that number of these buffers is less than size of the queue (it is required by virtio spec). vhost and loopback transports don't need this check. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio: non-linear skb handling for tapArseniy Krasnov
For tap device new skb is created and data from the current skb is copied to it. This adds copying data from non-linear skb to new the skb. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio: support to send non-linear skbArseniy Krasnov
For non-linear skb use its pages from fragment array as buffers in virtio tx queue. These pages are already pinned by 'get_user_pages()' during such skb creation. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio/vhost: read data from non-linear skbArseniy Krasnov
This is preparation patch for MSG_ZEROCOPY support. It adds handling of non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with 'skb_copy_datagram_iter()'. Main advantage of the second one is that it can handle paged part of the skb by using 'kmap()' on each page, but if there are no pages in the skb, it behaves like simple copying to iov iterator. This patch also adds new field to the control block of skb - this value shows current offset in the skb to read next portion of data (it doesn't matter linear it or not). Idea behind this field is that 'skb_copy_datagram_iter()' handles both types of skb internally - it just needs an offset from which to copy data from the given skb. This offset is incremented on each read from skb. This approach allows to simplify handling of both linear and non-linear skbs, because for linear skb we need to call 'skb_pull()' after reading data from it, while in non-linear case we need to update 'data_len'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21Merge tag 'nf-23-09-20' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net The following three patches fix regressions in the netfilter subsystem: 1. Reject attempts to repeatedly toggle the 'dormant' flag in a single transaction. Doing so makes nf_tables lose track of the real state vs. the desired state. This ends with an attempt to unregister hooks that were never registered in the first place, which yields a splat. 2. Fix element counting in the new nftables garbage collection infra that came with 6.5: More than 255 expired elements wraps a counter which results in memory leak. 3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command is in progress, fix from Jozsef Kadlecsik. * tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once ==================== Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-20net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()Jinjie Ruan
When making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, modprobe handshake-test and then rmmmod handshake-test, the below memory leak is detected. The struct socket_alloc which is allocated by alloc_inode_sb() in __sock_create() is not freed. And the struct dentry which is allocated by __d_alloc() in sock_alloc_file() is not freed. Since fput() will call file->f_op->release() which is sock_close() here and it will call __sock_release(). and fput() will call dput(dentry) to free the struct dentry. So replace sock_release() with fput() to fix the below memory leak. After applying this patch, the following memory leak is never detected. unreferenced object 0xffff888109165840 (size 768): comm "kunit_try_catch", pid 1852, jiffies 4294685807 (age 976.262s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0209ba2>] 0xffffffffa0209ba2 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810f472008 (size 192): comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 08 20 47 0f 81 88 ff ff ......... G..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0209bbb>] 0xffffffffa0209bbb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810958e580 (size 224): comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0209bbb>] 0xffffffffa0209bbb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926dc88 (size 192): comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 88 dc 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208fdc>] 0xffffffffa0208fdc [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a241380 (size 224): comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208fdc>] 0xffffffffa0208fdc [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109165040 (size 768): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0208860>] 0xffffffffa0208860 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926d568 (size 192): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 68 d5 26 09 81 88 ff ff ........h.&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208879>] 0xffffffffa0208879 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240580 (size 224): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.347s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208879>] 0xffffffffa0208879 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164c40 (size 768): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0208541>] 0xffffffffa0208541 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926cd18 (size 192): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 18 cd 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa020855a>] 0xffffffffa020855a [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240200 (size 224): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa020855a>] 0xffffffffa020855a [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164840 (size 768): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa02093e2>] 0xffffffffa02093e2 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926cab8 (size 192): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 b8 ca 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02093fb>] 0xffffffffa02093fb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240040 (size 224): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02093fb>] 0xffffffffa02093fb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109166440 (size 768): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa02097c1>] 0xffffffffa02097c1 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926c398 (size 192): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 98 c3 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02097da>] 0xffffffffa02097da [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888107e0b8c0 (size 224): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02097da>] 0xffffffffa02097da [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164440 (size 768): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.487s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa020824e>] 0xffffffffa020824e [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810f4cf698 (size 192): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 98 f6 4c 0f 81 88 ff ff ..........L..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208267>] 0xffffffffa0208267 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888107e0b000 (size 224): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208267>] 0xffffffffa0208267 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 Fixes: 88232ec1ec5e ("net/handshake: Add Kunit tests for the handshake consumer API") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20wifi: cfg80211: make read-only array centers_80mhz static constColin Ian King
Don't populate the read-only array lanes on the stack, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAPJozsef Kadlecsik
Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it. The race is caused by using the wrong reference counter, i.e. the ref counter instead of ref_netlink. Fixes: 24e227896bbf ("netfilter: ipset: Add schedule point in call_ad().") Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/ZPZqetxOmH+w%2Fmyc@westworld/#r Tested-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20netfilter: nf_tables: fix memleak when more than 255 elements expiredFlorian Westphal
When more than 255 elements expired we're supposed to switch to a new gc container structure. This never happens: u8 type will wrap before reaching the boundary and nft_trans_gc_space() always returns true. This means we recycle the initial gc container structure and lose track of the elements that came before. While at it, don't deref 'gc' after we've passed it to call_rcu. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20netfilter: nf_tables: disable toggling dormant table state more than onceFlorian Westphal
nft -f -<<EOF add table ip t add table ip t { flags dormant; } add chain ip t c { type filter hook input priority 0; } add table ip t EOF Triggers a splat from nf core on next table delete because we lose track of right hook register state: WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook RIP: 0010:__nf_unregister_net_hook+0x41b/0x570 nf_unregister_net_hook+0xb4/0xf0 __nf_tables_unregister_hook+0x160/0x1d0 [..] The above should have table in *active* state, but in fact no hooks were registered. Reject on/off/on games rather than attempting to fix this. Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg> Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg> Cc: info@starlabs.sg Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20net: rds: Fix possible NULL-pointer dereferenceArtem Chernyshev
In rds_rdma_cm_event_handler_cmn() check, if conn pointer exists before dereferencing it as rdma_set_service_type() argument Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: fd261ce6a30e ("rds: rdma: update rdma transport for tos") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-19ipv6: lockless IPV6_ADDR_PREFERENCES implementationEric Dumazet
We have data-races while reading np->srcprefs Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230918142321.1794107-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-19net: bridge: use DEV_STATS_INC()Eric Dumazet
syzbot/KCSAN reported data-races in br_handle_frame_finish() [1] This function can run from multiple cpus without mutual exclusion. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Handles updates to dev->stats.tx_dropped while we are at it. [1] BUG: KCSAN: data-race in br_handle_frame_finish / br_handle_frame_finish read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 1: br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189 br_nf_hook_thresh+0x1ed/0x220 br_nf_pre_routing_finish_ipv6+0x50f/0x540 NF_HOOK include/linux/netfilter.h:304 [inline] br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178 br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508 nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline] nf_hook_bridge_pre net/bridge/br_input.c:272 [inline] br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417 __netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417 __netif_receive_skb_one_core net/core/dev.c:5521 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637 process_backlog+0x21f/0x380 net/core/dev.c:5965 __napi_poll+0x60/0x3b0 net/core/dev.c:6527 napi_poll net/core/dev.c:6594 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6727 __do_softirq+0xc1/0x265 kernel/softirq.c:553 run_ksoftirqd+0x17/0x20 kernel/softirq.c:921 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 0: br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189 br_nf_hook_thresh+0x1ed/0x220 br_nf_pre_routing_finish_ipv6+0x50f/0x540 NF_HOOK include/linux/netfilter.h:304 [inline] br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178 br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508 nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline] nf_hook_bridge_pre net/bridge/br_input.c:272 [inline] br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417 __netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417 __netif_receive_skb_one_core net/core/dev.c:5521 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637 process_backlog+0x21f/0x380 net/core/dev.c:5965 __napi_poll+0x60/0x3b0 net/core/dev.c:6527 napi_poll net/core/dev.c:6594 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6727 __do_softirq+0xc1/0x265 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_tt_local_purge+0x1a8/0x1f0 net/batman-adv/translation-table.c:1356 batadv_tt_purge+0x2b/0x630 net/batman-adv/translation-table.c:3560 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x00000000000d7190 -> 0x00000000000d7191 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 14848 Comm: kworker/u4:11 Not tainted 6.6.0-rc1-syzkaller-00236-gad8a69f361b9 #0 Fixes: 1c29fc4989bc ("[BRIDGE]: keep track of received multicast packets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230918091351.1356153-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-18Merge tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Various O_DIRECT related fixes from Trond: - Error handling - Locking issues - Use the correct commit info for joining page groups - Fixes for rescheduling IO Sunrpc bad verifier fixes: - Report EINVAL errors from connect() - Revalidate creds that the server has rejected - Revert "SUNRPC: Fail faster on bad verifier" Misc: - Fix pNFS session trunking when MDS=DS - Fix zero-value filehandles for post-open getattr operations - Fix compiler warning about tautological comparisons - Revert 'SUNRPC: clean up integer overflow check' before Trond's fix" * tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Silence compiler complaints about tautological comparisons Revert "SUNRPC: clean up integer overflow check" NFSv4.1: fix zero value filehandle in post open getattr NFSv4.1: fix pnfs MDS=DS session trunking Revert "SUNRPC: Fail faster on bad verifier" SUNRPC: Mark the cred for revalidation if the server rejects it NFS/pNFS: Report EINVAL errors from connect() to the server NFS: More fixes for nfs_direct_write_reschedule_io() NFS: Use the correct commit info in nfs_join_page_group() NFS: More O_DIRECT accounting fixes for error paths NFS: Fix O_DIRECT locking issues NFS: Fix error handling for O_DIRECT write scheduling
2023-09-18ax25: Kconfig: Update link for linux-ax25.orgPeter Lafreniere
http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Change all references to the old site in the ax25 Kconfig to its replacement. Link: https://marc.info/?m=166792551600315 Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18mptcp: fix dangling connection hang-upPaolo Abeni
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d114d57 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430 Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18mptcp: rename timer related helper to less confusing namesPaolo Abeni
The msk socket uses to different timeout to track close related events and retransmissions. The existing helpers do not indicate clearly which timer they actually touch, making the related code quite confusing. Change the existing helpers name to avoid such confusion. No functional change intended. This patch is linked to the next one ("mptcp: fix dangling connection hang-up"). The two patches are supposed to be backported together. Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18mptcp: process pending subflow error on closePaolo Abeni
On incoming TCP reset, subflow closing could happen before error propagation. That in turn could cause the socket error being ignored, and a missing socket state transition, as reported by Daire-Byrne. Address the issues explicitly checking for subflow socket error at close time. To avoid code duplication, factor-out of __mptcp_error_report() a new helper implementing the relevant bits. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429 Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18mptcp: move __mptcp_error_report in protocol.cPaolo Abeni
This will simplify the next patch ("mptcp: process pending subflow error on close"). No functional change intended. Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18mptcp: fix bogus receive window shrinkage with multiple subflowsPaolo Abeni
In case multiple subflows race to update the mptcp-level receive window, the subflow losing the race should use the window value provided by the "winning" subflow to update it's own tcp-level rcv_wnd. To such goal, the current code bogusly uses the mptcp-level rcv_wnd value as observed before the update attempt. On unlucky circumstances that may lead to TCP-level window shrinkage, and stall the other end. Address the issue feeding to the rcv wnd update the correct value. Fixes: f3589be0c420 ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/427 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18ceph: Annotate struct ceph_monmap with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ceph_monmap. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18tipc: Use size_add() in calls to struct_size()Gustavo A. R. Silva
If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` adds against potential integer overflows is defeated. Fix this by hardening call to `struct_size()` with `size_add()`. Fixes: e034c6d23bc4 ("tipc: Use struct_size() helper") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18tls: Use size_add() in call to struct_size()Gustavo A. R. Silva
If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` adds against potential integer overflows is defeated. Fix this by hardening call to `struct_size()` with `size_add()`. Fixes: b89fec54fd61 ("tls: rx: wrap decrypt params in a struct") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18net: hsr: Add __packed to struct hsr_sup_tlv.Sebastian Andrzej Siewior
Struct hsr_sup_tlv describes HW layout and therefore it needs a __packed attribute to ensure the compiler does not add any padding. Due to the size and __packed attribute of the structs that use hsr_sup_tlv it has no functional impact. Add __packed to struct hsr_sup_tlv. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18net: hsr: Properly parse HSRv1 supervisor frames.Lukasz Majewski
While adding support for parsing the redbox supervision frames, the author added `pull_size' and `total_pull_size' to track the amount of bytes that were pulled from the skb during while parsing the skb so it can be reverted/ pushed back at the end. In the process probably copy&paste error occurred and for the HSRv1 case the ethhdr was used instead of the hsr_tag. Later the hsr_tag was used instead of hsr_sup_tag. The later error didn't matter because both structs have the size so HSRv0 was still working. It broke however HSRv1 parsing because struct ethhdr is larger than struct hsr_tag. Reinstate the old pulling flow and pull first ethhdr, hsr_tag in v1 case followed by hsr_sup_tag. [bigeasy: commit message] Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames")' Suggested-by: Tristram.Ha@microchip.com Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18dccp: fix dccp_v4_err()/dccp_v6_err() againEric Dumazet
dh->dccph_x is the 9th byte (offset 8) in "struct dccp_hdr", not in the "byte 7" as Jann claimed. We need to make sure the ICMP messages are big enough, using more standard ways (no more assumptions). syzbot reported: BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2681 [inline] BUG: KMSAN: uninit-value in dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 icmpv6_notify+0x4c7/0x880 net/ipv6/icmp.c:867 icmpv6_rcv+0x19d5/0x30d0 ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:304 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:304 [inline] ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5523 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637 netif_receive_skb_internal net/core/dev.c:5723 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5782 tun_rx_batched+0x83b/0x920 tun_get_user+0x564c/0x6940 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:650 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6313 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2795 tun_alloc_skb drivers/net/tun.c:1531 [inline] tun_get_user+0x23cf/0x6940 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 4995 Comm: syz-executor153 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Fixes: 977ad86c2a1b ("dccp: Fix out of bounds access in DCCP error handler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jann Horn <jannh@google.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18ncsi: Propagate carrier gain/loss events to the NCSI controllerJohnathan Mantey
Report the carrier/no-carrier state for the network interface shared between the BMC and the passthrough channel. Without this functionality the BMC is unable to reconfigure the NIC in the event of a re-cabling to a different subnet. Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17ipv4: fix null-deref in ipv4_link_failureKyle Zeng
Currently, we assume the skb is associated with a device before calling __ip_options_compile, which is not always the case if it is re-routed by ipvs. When skb->dev is NULL, dev_net(skb->dev) will become null-dereference. This patch adds a check for the edge case and switch to use the net_device from the rtable when skb->dev is NULL. Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Cc: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: introduce possibility to expose info about nested devlinksJiri Pirko
In mlx5, there is a devlink instance created for PCI device. Also, one separate devlink instance is created for auxiliary device that represents the netdev of uplink port. This relation is currently invisible to the devlink user. Benefit from the rel infrastructure and allow for nested devlink instance to set the relationship for the nested-in devlink instance. Note that there may be many nested instances, therefore use xarray to hold the list of rel_indexes for individual nested instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: convert linecard nested devlink to new rel infrastructureJiri Pirko
Benefit from the newly introduced rel infrastructure, treat the linecard nested devlink instances in the same way as port function instances. Convert the code to use the rel infrastructure. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: expose peer SF devlink instanceJiri Pirko
Introduce a new helper devl_port_fn_devlink_set() to be used by driver assigning a devlink instance to the peer devlink port function. Expose this to user over new netlink attribute nested under port function nest to expose devlink handle related to the port function. This is particularly helpful for user to understand the relationship between devlink instances created for SFs and the port functions they belong to. Note that caller of devlink_port_notify() needs to hold devlink instance lock, put the assertion to devl_port_fn_devlink_set() to make this requirement explicit. Also note the limitations that only allow to make this assignment for registered objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: introduce object and nested devlink relationship infraJiri Pirko
It is a bit tricky to maintain relationship between devlink objects and nested devlink instances due to following aspects: 1) Locking. It is necessary to lock the devlink instance that contains the object first, only after that to lock the nested instance. 2) Lifetimes. Objects (e.g devlink port) may be removed before the nested devlink instance. 3) Notifications. If nested instance changes (e.g. gets registered/unregistered) the nested-in object needs to send appropriate notifications. Resolve this by introducing an xarray that holds 1:1 relationships between devlink object and related nested devlink instance. Use that xarray index to get the object/nested devlink instance on the other side. Provide necessary helpers: devlink_rel_nested_in_add/clear() to add and clear the relationship. devlink_rel_nested_in_notify() to call the nested-in object to send notifications during nested instance register/unregister/netns change. devlink_rel_devlink_handle_put() to be used by nested-in object fill function to fill the nested handle. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: extend devlink_nl_put_nested_handle() with attrtype argJiri Pirko
As the next patch is going to call this helper with need to fill another type of nested attribute, pass it over function arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: move devlink_nl_put_nested_handle() into netlink.cJiri Pirko
As the next patch is going to call this helper out of the linecard.c, move to netlink.c. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: put netnsid to nested handleJiri Pirko
If netns of devlink instance and nested devlink instance differs, put netnsid attr to indicate that. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: move linecard struct into linecard.cJiri Pirko
Instead of exposing linecard struct, expose a simple helper to get the linecard index, which is all is needed outside linecard.c. Move the linecard struct to linecard.c and keep it private similar to the rest of the devlink objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17netdev: expose DPLL pin handle for netdeviceJiri Pirko
In case netdevice represents a SyncE port, the user needs to understand the connection between netdevice and associated DPLL pin. There might me multiple netdevices pointing to the same pin, in case of VF/SF implementation. Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to how it is implemented for devlink port. Add a struct dpll_pin pointer to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set() and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear the DPLL pin relationship to netdev. Note that during the lifetime of struct dpll_pin the pin handle does not change. Therefore it is save to access it lockless. It is drivers responsibility to call netdev_dpll_pin_clear() before dpll_pin_put(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16tcp: new TCP_INFO stats for RTO eventsAananth V
The 2023 SIGCOMM paper "Improving Network Availability with Protective ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can effectively reduce application disruption during outages. To better measure the efficacy of this feature, this patch adds three more detailed stats during RTO recovery and exports via TCP_INFO. Applications and monitoring systems can leverage this data to measure the network path diversity and end-to-end repair latency during network outages to improve their network infrastructure. The following counters are added to tcp_sock in order to track RTO events over the lifetime of a TCP socket. 1. u16 total_rto - Counts the total number of RTO timeouts. 2. u16 total_rto_recoveries - Counts the total number of RTO recoveries. 3. u32 total_rto_time - Counts the total time spent (ms) in RTO recoveries. (time spent in CA_Loss and CA_Recovery states) To compute total_rto_time, we add a new u32 rto_stamp field to tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO recovery (CA_Loss). Corresponding fields are also added to the tcp_info struct. Signed-off-by: Aananth V <aananthv@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16tcp: call tcp_try_undo_recovery when an RTOd TFO SYNACK is ACKedAananth V
For passive TCP Fast Open sockets that had SYN/ACK timeout and did not send more data in SYN_RECV, upon receiving the final ACK in 3WHS, the congestion state may awkwardly stay in CA_Loss mode unless the CA state was undone due to TCP timestamp checks. However, if tcp_rcv_synrecv_state_fastopen() decides not to undo, then we should enter CA_Open, because at that point we have received an ACK covering the retransmitted SYNACKs. Currently, the icsk_ca_state is only set to CA_Open after we receive an ACK for a data-packet. This is because tcp_ack does not call tcp_fastretrans_alert (and tcp_process_loss) if !prior_packets Note that tcp_process_loss() calls tcp_try_undo_recovery(), so having tcp_rcv_synrecv_state_fastopen() decide that if we're in CA_Loss we should call tcp_try_undo_recovery() is consistent with that, and low risk. Fixes: dad8cea7add9 ("tcp: fix TFO SYNACK undo to avoid double-timestamp-undo") Signed-off-by: Aananth V <aananthv@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16net: core: Use the bitmap API to allocate bitmapsAndy Shevchenko
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them. It is less verbose and it improves the type checking and semantic. While at it, add missing header inclusion (should be bitops.h, but with the above change it becomes bitmap.h). Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 450 insertions(+), 36 deletions(-). The main changes are: 1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao. 2) Check whether override is allowed in kprobe mult, from Jiri Olsa. 3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick. 4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann, Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor, Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16net: add truesize debug checks in skb_{add|coalesce}_rx_frag()Eric Dumazet
It can be time consuming to track driver bugs, that might be detected too late from this confusing warning in skb_try_coalesce() WARN_ON_ONCE(delta < len); Add sanity check in skb_add_rx_frag() and skb_coalesce_rx_frag() to better track bug origin for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>