summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-03-17net: dsa: Add missing of_node_put() in dsa_port_parse_ofMiaoqian Lin
The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done. Fixes: 6d4e5c570c2d ("net: dsa: get port type at parse time") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-16Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-03-16 1) Fix a kernel-info-leak in pfkey. From Haimin Zhang. 2) Fix an incorrect check of the return value of ipv6_skip_exthdr. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: esp6: fix check on ipv6_skip_exthdr's return value af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register ==================== Link: https://lore.kernel.org/r/20220316121142.3142336-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14net/packet: fix slab-out-of-bounds access in packet_recvmsg()Eric Dumazet
syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH and mmap operations, tpacket_rcv() is queueing skbs with garbage in skb->cb[], triggering a too big copy [1] Presumably, users of af_packet using mmap() already gets correct metadata from the mapped buffer, we can simply make sure to clear 12 bytes that might be copied to user space later. BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline] BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631 CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b3660695e80 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 memcpy+0x39/0x60 mm/kasan/shadow.c:66 memcpy include/linux/fortify-string.h:225 [inline] packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fdfd5954c29 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60 R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54 </TASK> addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame: ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246 this frame has 1 object: [32, 160) 'addr' Memory state around the buggy address: ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 >ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 ^ ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 ================================================================== Fixes: 0fb375fb9b93 ("[AF_PACKET]: Allow for > 8 byte hardware addresses.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net coming late in the 5.17-rc process: 1) Revert port remap to mitigate shadowing service ports, this is causing problems in existing setups and this mitigation can be achieved with explicit ruleset, eg. ... tcp sport < 16386 tcp dport >= 32768 masquerade random This patches provided a built-in policy similar to the one described above. 2) Disable register tracking infrastructure in nf_tables. Florian reported two issues: - Existing expressions with no implemented .reduce interface that causes data-store on register should cancel the tracking. - Register clobbering might be possible storing data on registers that are larger than 32-bits. This might lead to generating incorrect ruleset bytecode. These two issues are scheduled to be addressed in the next release cycle. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: disable register tracking Revert "netfilter: conntrack: tag conntracks picked up in local out hook" Revert "netfilter: nat: force port remap to prevent shadowing well-known ports" ==================== Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14esp6: fix check on ipv6_skip_exthdr's return valueSabrina Dubroca
Commit 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr") introduced an incorrect check, which leads to all ESP packets over either TCPv6 or UDPv6 encapsulation being dropped. In this particular case, offset is negative, since skb->data points to the ESP header in the following chain of headers, while skb->network_header points to the IPv6 header: IPv6 | ext | ... | ext | UDP | ESP | ... That doesn't seem to be a problem, especially considering that if we reach esp6_input_done2, we're guaranteed to have a full set of headers available (otherwise the packet would have been dropped earlier in the stack). However, it means that the return value will (intentionally) be negative. We can make the test more specific, as the expected return value of ipv6_skip_exthdr will be the (negated) size of either a UDP header, or a TCP header with possible options. In the future, we should probably either make ipv6_skip_exthdr explicitly accept negative offsets (and adjust its return value for error cases), or make ipv6_skip_exthdr only take non-negative offsets (and audit all callers). Fixes: 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-12netfilter: nf_tables: disable register trackingPablo Neira Ayuso
The register tracking infrastructure is incomplete, it might lead to generating incorrect ruleset bytecode, disable it by now given we are late in the release process. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-11vsock: each transport cycles only on its own socketsJiyong Park
When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11net: ipv6: fix skb_over_panic in __ip6_append_dataTadeusz Struk
Syzbot found a kernel bug in the ipv6 stack: LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580 The reproducer triggers it by sending a crafted message via sendmmsg() call, which triggers skb_over_panic, and crashes the kernel: skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575 head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0 dev:<NULL> Update the check that prevents an invalid packet with MTU equal to the fregment header size to eat up all the space for payload. The reproducer can be found here: LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000 Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge tag 'net-5.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, and ipsec. Current release - regressions: - Bluetooth: fix unbalanced unlock in set_device_flags() - Bluetooth: fix not processing all entries on cmd_sync_work, make connect with qualcomm and intel adapters reliable - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" - xdp: xdp_mem_allocator can be NULL in trace_mem_connect() - eth: ice: fix race condition and deadlock during interface enslave Current release - new code bugs: - tipc: fix incorrect order of state message data sanity check Previous releases - regressions: - esp: fix possible buffer overflow in ESP transformation - dsa: unlock the rtnl_mutex when dsa_master_setup() fails - phy: meson-gxl: fix interrupt handling in forced mode - smsc95xx: ignore -ENODEV errors when device is unplugged Previous releases - always broken: - xfrm: fix tunnel mode fragmentation behavior - esp: fix inter address family tunneling on GSO - tipc: fix null-deref due to race when enabling bearer - sctp: fix kernel-infoleak for SCTP sockets - eth: macb: fix lost RX packet wakeup race in NAPI receive - eth: intel stop disabling VFs due to PF error responses - eth: bcmgenet: don't claim WOL when its not available" * tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). ice: Fix race condition during interface enslave net: phy: meson-gxl: improve link-up behavior net: bcmgenet: Don't claim WOL when its not available net: arc_emac: Fix use after free in arc_mdio_probe() sctp: fix kernel-infoleak for SCTP sockets net: phy: correct spelling error of media in documentation net: phy: DP83822: clear MISR2 register to disable interrupts gianfar: ethtool: Fix refcount leak in gfar_get_ts_info selftests: pmtu.sh: Kill nettest processes launched in subshell. selftests: pmtu.sh: Kill tcpdump processes launched by subshell. NFC: port100: fix use-after-free in port100_send_complete net/mlx5e: SHAMPO, reduce TIR indication net/mlx5e: Lag, Only handle events from highest priority multipath entry net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE net/mlx5: Fix a race on command flush flow net/mlx5: Fix size field in bufferx_reg struct ax25: Fix NULL pointer dereference in ax25_kill_by_device net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr net: ethernet: lpc_eth: Handle error for clk_enable ...
2022-03-10xdp: xdp_mem_allocator can be NULL in trace_mem_connect().Sebastian Andrzej Siewior
Since the commit mentioned below __xdp_reg_mem_model() can return a NULL pointer. This pointer is dereferenced in trace_mem_connect() which leads to segfault. The trace points (mem_connect + mem_disconnect) were put in place to pair connect/disconnect using the IDs. The ID is only assigned if __xdp_reg_mem_model() does not return NULL. That connect trace point is of no use if there is no ID. Skip that connect trace point if xdp_alloc is NULL. [ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace point ] Fixes: 4a48ef70b93b8 ("xdp: Allow registering memory model without rxq reference") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10sctp: fix kernel-infoleak for SCTP socketsEric Dumazet
syzbot reported a kernel infoleak [1] of 4 bytes. After analysis, it turned out r->idiag_expires is not initialized if inet_sctp_diag_fill() calls inet_diag_msg_common_fill() Make sure to clear idiag_timer/idiag_retrans/idiag_expires and let inet_diag_msg_sctpasoc_fill() fill them again if needed. [1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:154 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 instrument_copy_to_user include/linux/instrumented.h:121 [inline] copyout lib/iov_iter.c:154 [inline] _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 copy_to_iter include/linux/uio.h:162 [inline] simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519 __skb_datagram_iter+0x2d5/0x11b0 net/core/datagram.c:425 skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533 skb_copy_datagram_msg include/linux/skbuff.h:3696 [inline] netlink_recvmsg+0x669/0x1c80 net/netlink/af_netlink.c:1977 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x795/0xa10 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x19d/0x210 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3247 [inline] __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4975 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0x545/0xf90 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1158 [inline] netlink_dump+0x3e5/0x16c0 net/netlink/af_netlink.c:2248 __netlink_dump_start+0xcf8/0xe90 net/netlink/af_netlink.c:2373 netlink_dump_start include/linux/netlink.h:254 [inline] inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1341 sock_diag_rcv_msg+0x24a/0x620 netlink_rcv_skb+0x40c/0x7e0 net/netlink/af_netlink.c:2494 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x1093/0x1360 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x14d9/0x1720 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] sock_write_iter+0x594/0x690 net/socket.c:1061 do_iter_readv_writev+0xa7f/0xc70 do_iter_write+0x52c/0x1500 fs/read_write.c:851 vfs_writev fs/read_write.c:924 [inline] do_writev+0x645/0xe00 fs/read_write.c:967 __do_sys_writev fs/read_write.c:1040 [inline] __se_sys_writev fs/read_write.c:1037 [inline] __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Bytes 68-71 of 2508 are uninitialized Memory access of size 2508 starts at ffff888114f9b000 Data copied to user address 00007f7fe09ff2e0 CPU: 1 PID: 3478 Comm: syz-executor306 Not tainted 5.17.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220310001145.297371-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10af_key: add __GFP_ZERO flag for compose_sadb_supported in function ↵Haimin Zhang
pfkey_register Add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register to initialize the buffer of supp_skb to fix a kernel-info-leak issue. 1) Function pfkey_register calls compose_sadb_supported to request a sk_buff. 2) compose_sadb_supported calls alloc_sbk to allocate a sk_buff, but it doesn't zero it. 3) If auth_len is greater 0, then compose_sadb_supported treats the memory as a struct sadb_supported and begins to initialize. But it just initializes the field sadb_supported_len and field sadb_supported_exttype without field sadb_supported_reserved. Reported-by: TCS Robot <tcs_robot@tencent.com> Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-03-09 1) Fix IPv6 PMTU discovery for xfrm interfaces. From Lina Wang. 2) Revert failing for policies and states that are configured with XFRMA_IF_ID 0. It broke a user configuration. From Kai Lueke. 3) Fix a possible buffer overflow in the ESP output path. 4) Fix ESP GSO for tunnel and BEET mode on inter address family tunnels. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09ax25: Fix NULL pointer dereference in ax25_kill_by_deviceDuoming Zhou
When two ax25 devices attempted to establish connection, the requester use ax25_create(), ax25_bind() and ax25_connect() to initiate connection. The receiver use ax25_rcv() to accept connection and use ax25_create_cb() in ax25_rcv() to create ax25_cb, but the ax25_cb->sk is NULL. When the receiver is detaching, a NULL pointer dereference bug caused by sock_hold(sk) in ax25_kill_by_device() will happen. The corresponding fail log is shown below: =============================================================== BUG: KASAN: null-ptr-deref in ax25_device_event+0xfd/0x290 Call Trace: ... ax25_device_event+0xfd/0x290 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x174/0x220 unregister_netdevice_many+0x1f7/0xa60 unregister_netdevice_queue+0x12f/0x170 unregister_netdev+0x13/0x20 mkiss_close+0xcd/0x140 tty_ldisc_release+0xc0/0x220 tty_release_struct+0x17/0xa0 tty_release+0x62d/0x670 ... This patch add condition check in ax25_kill_by_device(). If s->sk is NULL, it will goto if branch to kill device. Fixes: 4e0f718daf97 ("ax25: improve the incomplete fix to avoid UAF and NPD bugs") Reported-by: Thomas Osterried <thomas@osterried.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08tipc: fix incorrect order of state message data sanity checkTung Nguyen
When receiving a state message, function tipc_link_validate_msg() is called to validate its header portion. Then, its data portion is validated before it can be accessed correctly. However, current data sanity check is done after the message header is accessed to update some link variables. This commit fixes this issue by moving the data sanity check to the beginning of state message handling and right after the header sanity check. Fixes: 9aa422ad3266 ("tipc: improve size validations for received domain records") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20220308021200.9245-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08Revert "netfilter: conntrack: tag conntracks picked up in local out hook"Florian Westphal
This was a prerequisite for the ill-fated "netfilter: nat: force port remap to prevent shadowing well-known ports". As this has been reverted, this change can be backed out too. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-08Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"Florian Westphal
This reverts commit 878aed8db324bec64f3c3f956e64d5ae7375a5de. This change breaks existing setups where conntrack is used with asymmetric paths. In these cases, the NAT transformation occurs on the syn-ack instead of the syn: 1. SYN x:12345 -> y -> 443 // sent by initiator, receiverd by responder 2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder 3. tuple_force_port_remap() gets called, sees: 'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives 4. SYNACK y:$RANDOM -> x:12345 // connection is never established While its possible to avoid the breakage with NOTRACK rules, a kernel update should not break working setups. An alternative to the revert is to augment conntrack to tag mid-stream connections plus more code in the nat core to skip NAT for such connections, however, this leads to more interaction/integration between conntrack and NAT. Therefore, revert, users will need to add explicit nat rules to avoid port shadowing. Link: https://lore.kernel.org/netfilter-devel/20220302105908.GA5852@breakpoint.cc/#R Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2051413 Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-07net: Fix esp GSO on inter address family tunnels.Steffen Klassert
The esp tunnel GSO handlers use skb_mac_gso_segment to push the inner packet to the segmentation handlers. However, skb_mac_gso_segment takes the Ethernet Protocol ID from 'skb->protocol' which is wrong for inter address family tunnels. We fix this by introducing a new skb_eth_gso_segment function. This function can be used if it is necessary to pass the Ethernet Protocol ID directly to the segmentation handler. First users of this function will be the esp4 and esp6 tunnel segmentation handlers. Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07esp: Fix BEET mode inter address family tunneling on GSOSteffen Klassert
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family tunneling case. Fix this by setting these gso types. Fixes: 384a46ea7bdc7 ("esp4: add gso_segment for esp4 beet mode") Fixes: 7f9e40eb18a99 ("esp6: add gso_segment for esp6 beet mode") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07esp: Fix possible buffer overflow in ESP transformationSteffen Klassert
The maximum message size that can be send is bigger than the maximum site that skb_page_frag_refill can allocate. So it is possible to write beyond the allocated buffer. Fix this by doing a fallback to COW in that case. v2: Avoid get get_order() costs as suggested by Linus Torvalds. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Reported-by: valis <sec@valis.email> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07xen/9p: use alloc/free_pages_exact()Juergen Gross
Instead of __get_free_pages() and free_pages() use alloc_pages_exact() and free_pages_exact(). This is in preparation of a change of gnttab_end_foreign_access() which will prohibit use of high-order pages. By using the local variable "order" instead of ring->intf->ring_order in the error path of xen_9pfs_front_alloc_dataring() another bug is fixed, as the error path can be entered before ring->intf->ring_order is being set. By using alloc_pages_exact() the size in bytes is specified for the allocation, which fixes another bug for the case of order < (PAGE_SHIFT - XEN_PAGE_SHIFT). This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- V4: - new patch
2022-03-06net: dsa: unlock the rtnl_mutex when dsa_master_setup() failsVladimir Oltean
After the blamed commit, dsa_tree_setup_master() may exit without calling rtnl_unlock(), fix that. Fixes: c146f9bc195a ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-06Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"Kai Lueke
This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID 0 was meant to be used for configuring the policy/state without matching for a specific interface (e.g., Cilium is affected, see https://github.com/cilium/cilium/pull/18789 and https://github.com/cilium/cilium/pull/19019). Signed-off-by: Kai Lueke <kailueke@linux.microsoft.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-04tipc: fix kernel panic when enabling bearerTung Nguyen
When enabling a bearer on a node, a kernel panic is observed: [ 4.498085] RIP: 0010:tipc_mon_prep+0x4e/0x130 [tipc] ... [ 4.520030] Call Trace: [ 4.520689] <IRQ> [ 4.521236] tipc_link_build_proto_msg+0x375/0x750 [tipc] [ 4.522654] tipc_link_build_state_msg+0x48/0xc0 [tipc] [ 4.524034] __tipc_node_link_up+0xd7/0x290 [tipc] [ 4.525292] tipc_rcv+0x5da/0x730 [tipc] [ 4.526346] ? __netif_receive_skb_core+0xb7/0xfc0 [ 4.527601] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 4.528737] __netif_receive_skb_list_core+0x20b/0x260 [ 4.530068] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 4.531450] ? dev_gro_receive+0x4c2/0x680 [ 4.532512] napi_complete_done+0x6f/0x180 [ 4.533570] virtnet_poll+0x29c/0x42e [virtio_net] ... The node in question is receiving activate messages in another thread after changing bearer status to allow message sending/ receiving in current thread: thread 1 | thread 2 -------- | -------- | tipc_enable_bearer() | test_and_set_bit_lock() | tipc_bearer_xmit_skb() | | tipc_l2_rcv_msg() | tipc_rcv() | __tipc_node_link_up() | tipc_link_build_state_msg() | tipc_link_build_proto_msg() | tipc_mon_prep() | { | ... | // null-pointer dereference | u16 gen = mon->dom_gen; | ... | } // Not being executed yet | tipc_mon_create() | { | ... | // allocate | mon = kzalloc(); | ... | } | Monitoring pointer in thread 2 is dereferenced before monitoring data is allocated in thread 1. This causes kernel panic. This commit fixes it by allocating the monitoring data before enabling the bearer to receive messages. Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Reported-by: Shuang Li <shuali@redhat.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03Merge tag 'for-net-2022-03-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression with processing of MGMT commands - Fix unbalanced unlock in Set Device Flags * tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags() ==================== Link: https://lore.kernel.org/r/20220303210743.314679-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()Eric Dumazet
While investigating on why a synchronize_net() has been added recently in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report() might drop skbs in some cases. Discussion about removing synchronize_net() from ipv6_mc_down() will happen in a different thread. Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto changeVladimir Oltean
The blamed commit said one thing but did another. It explains that we should restore the "return err" to the original "goto out_unwind_tagger", but instead it replaced it with "goto out_unlock". When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a multi-switch tree, the switches would end up not using the same tagging protocol. Fixes: 0b0e2ff10356 ("net: dsa: restore error path of dsa_tree_change_tag_proto") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03net: dcb: disable softirqs in dcbnl_flush_dev()Vladimir Oltean
Ido Schimmel points out that since commit 52cff74eef5d ("dcbnl : Disable software interrupts before taking dcb_lock"), the DCB API can be called by drivers from softirq context. One such in-tree example is the chelsio cxgb4 driver: dcb_rpl -> cxgb4_dcb_handle_fw_update -> dcb_ieee_setapp If the firmware for this driver happened to send an event which resulted in a call to dcb_ieee_setapp() at the exact same time as another DCB-enabled interface was unregistering on the same CPU, the softirq would deadlock, because the interrupted process was already holding the dcb_lock in dcbnl_flush_dev(). Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as in the rest of the dcbnl code. Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_workLuiz Augusto von Dentz
hci_cmd_sync_queue can be called multiple times, each adding a hci_cmd_sync_work_entry, before hci_cmd_sync_work is run so this makes sure they are all dequeued properly otherwise it creates a backlog of entries that are never run. Link: https://lore.kernel.org/all/CAJCQCtSeUtHCgsHXLGrSTWKmyjaQDbDNpP4rb0i+RE+L2FTXSA@mail.gmail.com/T/ Fixes: 6a98e3836fa20 ("Bluetooth: Add helper for serialized HCI command execution") Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()Hans de Goede
There is only one "goto done;" in set_device_flags() and this happens *before* hci_dev_lock() is called, move the done label to after the hci_dev_unlock() to fix the following unlock balance: [ 31.493567] ===================================== [ 31.493571] WARNING: bad unlock balance detected! [ 31.493576] 5.17.0-rc2+ #13 Tainted: G C E [ 31.493581] ------------------------------------- [ 31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at: [ 31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth] [ 31.493684] but there are no more locks to release! Note this bug has been around for a couple of years, but before commit fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags") supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so the check for unsupported flags which does the "goto done;" never triggered. Fixes: fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by serverD. Wythe
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear. Based on the fact that whether a new SMC connection can be accepted or not depends on not only the limit of conn nums, but also the available entries of rtoken. Since the rtoken release is trigger by peer, while the conn nums is decrease by local, tons of thing can happen in this time difference. This only thing that needs to be mentioned is that now all connection creations are completely protected by smc_server_lgr_pending lock, it's enough to check only the available entries in rtokens_used_mask. Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by clientD. Wythe
The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client dues to following execution sequence: Server Conn A: Server Conn B: Client Conn B: smc_lgr_unregister_conn smc_lgr_register_conn smc_clc_send_accept -> smc_rtoken_add smcr_buf_unuse -> Client Conn A: smc_rtoken_delete smc_lgr_unregister_conn() makes current link available to assigned to new incoming connection, while smcr_buf_unuse() has not executed yet, which means that smc_rtoken_add may fail because of insufficient rtoken_entry, reversing their execution order will avoid this problem. Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-02tcp: make tcp_read_sock() more robustEric Dumazet
If recv_actor() returns an incorrect value, tcp_read_sock() might loop forever. Instead, issue a one time warning and make sure to make progress. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220302161723.3910001-2-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02bpf, sockmap: Do not ignore orig_len parameterEric Dumazet
Currently, sk_psock_verdict_recv() returns skb->len This is problematic because tcp_read_sock() might have passed orig_len < skb->len, due to the presence of TCP urgent data. This causes an infinite loop from tcp_read_sock() Followup patch will make tcp_read_sock() more robust vs bad actors. Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220302161723.3910001-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02net: fix up skbs delta_truesize in UDP GRO frag_listlena wang
The truesize for a UDP GRO packet is added by main skb and skbs in main skb's frag_list: skb_gro_receive_list p->truesize += skb->truesize; The commit 53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist") introduced a truesize increase for frag_list skbs. When uncloning skb, it will call pskb_expand_head and trusesize for frag_list skbs may increase. This can occur when allocators uses __netdev_alloc_skb and not jump into __alloc_skb. This flow does not use ksize(len) to calculate truesize while pskb_expand_head uses. skb_segment_list err = skb_unclone(nskb, GFP_ATOMIC); pskb_expand_head if (!skb->sk || skb->destructor == sock_edemux) skb->truesize += size - osize; If we uses increased truesize adding as delta_truesize, it will be larger than before and even larger than previous total truesize value if skbs in frag_list are abundant. The main skb truesize will become smaller and even a minus value or a huge value for an unsigned int parameter. Then the following memory check will drop this abnormal skb. To avoid this error we should use the original truesize to segment the main skb. Fixes: 53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist") Signed-off-by: lena wang <lena.wang@mediatek.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02Merge tag 'batadv-net-pullrequest-20220302' of ↵Jakub Kicinski
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Remove redundant iflink requests, by Sven Eckelmann (2 patches) - Don't expect inter-netns unique iflink indices, by Sven Eckelmann * tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge: batman-adv: Don't expect inter-netns unique iflink indices batman-adv: Request iflink once in batadv_get_real_netdevice batman-adv: Request iflink once in batadv-on-batadv check ==================== Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02nl80211: Update bss channel on channel switch for P2P_CLIENTSreeramya Soratkal
The wdev channel information is updated post channel switch only for the station mode and not for the other modes. Due to this, the P2P client still points to the old value though it moved to the new channel when the channel change is induced from the P2P GO. Update the bss channel after CSA channel switch completion for P2P client interface as well. Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com> Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02batman-adv: Don't expect inter-netns unique iflink indicesSven Eckelmann
The ifindex doesn't have to be unique for multiple network namespaces on the same machine. $ ip netns add test1 $ ip -net test1 link add dummy1 type dummy $ ip netns add test2 $ ip -net test2 link add dummy2 type dummy $ ip -net test1 link show dev dummy1 6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff $ ip -net test2 link show dev dummy2 6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff But the batman-adv code to walk through the various layers of virtual interfaces uses this assumption because dev_get_iflink handles it internally and doesn't return the actual netns of the iflink. And dev_get_iflink only documents the situation where ifindex == iflink for physical devices. But only checking for dev->netdev_ops->ndo_get_iflink is also not an option because ipoib_get_iflink implements it even when it sometimes returns an iflink != ifindex and sometimes iflink == ifindex. The caller must therefore make sure itself to check both netns and iflink + ifindex for equality. Only when they are equal, a "physical" interface was detected which should stop the traversal. On the other hand, vxcan_get_iflink can also return 0 in case there was currently no valid peer. In this case, it is still necessary to stop. Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02batman-adv: Request iflink once in batadv_get_real_netdeviceSven Eckelmann
There is no need to call dev_get_iflink multiple times for the same net_device in batadv_get_real_netdevice. And since some of the ndo_get_iflink callbacks are dynamic (for example via RCUs like in vxcan_get_iflink), it could easily happen that the returned values are not stable. The pre-checks before __dev_get_by_index are then of course bogus. Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02batman-adv: Request iflink once in batadv-on-batadv checkSven Eckelmann
There is no need to call dev_get_iflink multiple times for the same net_device in batadv_is_on_batman_iface. And since some of the .ndo_get_iflink callbacks are dynamic (for example via RCUs like in vxcan_get_iflink), it could easily happen that the returned values are not stable. The pre-checks before __dev_get_by_index are then of course bogus. Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-01net: dsa: restore error path of dsa_tree_change_tag_protoVladimir Oltean
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process which initiated the protocol change exits the kernel processing while still holding the rtnl_mutex. So any other process attempting to lock the rtnl_mutex would deadlock after such event. The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed by the blamed commit, introducing this regression. We must still call rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old protocol. The latter is due to the limiting design of notifier chains for cross-chip operations, which don't have a built-in error recovery mechanism - we should look into using notifier_call_chain_robust for that. Fixes: dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01Merge tag 'for-net-2022-03-01' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression with scanning not working in some systems. * tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix not checking MGMT cmd pending queue ==================== Link: https://lore.kernel.org/r/20220302004330.125536-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01Bluetooth: Fix not checking MGMT cmd pending queueBrian Gix
A number of places in the MGMT handlers we examine the command queue for other commands (in progress but not yet complete) that will interact with the process being performed. However, not all commands go into the queue if one of: 1. There is no negative side effect of consecutive or redundent commands 2. The command is entirely perform "inline". This change examines each "pending command" check, and if it is not needed, deletes the check. Of the remaining pending command checks, we make sure that the command is in the pending queue by using the mgmt_pending_add/mgmt_pending_remove pair rather than the mgmt_pending_new/mgmt_pending_free pair. Link: https://lore.kernel.org/linux-bluetooth/f648f2e11bb3c2974c32e605a85ac3a9fac944f1.camel@redhat.com/T/ Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not intentional. From Eric Dumazet. 2) Use-after-free in netfilter hook core, from Eric Dumazet. 3) Missing rcu read lock side for netfilter egress hook, from Florian Westphal. 4) nf_queue assume state->sk is full socket while it might not be. Invoke sock_gen_put(), from Florian Westphal. 5) Add selftest to exercise the reported KASAN splat in 4) 6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0. Also from Florian. 7) Use input interface index only for hardware offload, not for the software plane. This breaks tc ct action. Patch from Paul Blakey. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: net/sched: act_ct: Fix flow table lookup failure with no originating ifindex netfilter: nf_queue: handle socket prefetch netfilter: nf_queue: fix possible use-after-free selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test netfilter: nf_queue: don't assume sk is full socket netfilter: egress: silence egress hook lockdep splats netfilter: fix use-after-free in __nf_register_net_hook() netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant ==================== Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01net/sched: act_ct: Fix flow table lookup failure with no originating ifindexPaul Blakey
After cited commit optimizted hw insertion, flow table entries are populated with ifindex information which was intended to only be used for HW offload. This tuple ifindex is hashed in the flow table key, so it must be filled for lookup to be successful. But tuple ifindex is only relevant for the netfilter flowtables (nft), so it's not filled in act_ct flow table lookup, resulting in lookup failure, and no SW offload and no offload teardown for TCP connection FIN/RST packets. To fix this, add new tc ifindex field to tuple, which will only be used for offloading, not for lookup, as it will not be part of the tuple hash. Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-01Merge tag 'wireless-for-net-2022-03-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless johannes Berg says: ==================== Some last-minute fixes: * rfkill - add missing rfill_soft_blocked() when disabled * cfg80211 - handle a nla_memdup() failure correctly - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in Makefile * mac80211 - fix EAPOL handling in 802.3 RX path - reject setting up aggregation sessions before connection is authorized to avoid timeouts or similar - handle some SAE authentication steps correctly - fix AC selection in mesh forwarding * iwlwifi - remove TWT support as it causes firmware crashes when the AP isn't behaving correctly - check debugfs pointer before dereferncing it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01cfg80211: fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typoJohannes Berg
The kbuild change here accidentally removed not only the unquoting, but also the last character of the variable name. Fix that. Fixes: 129ab0d2d9f3 ("kbuild: do not quote string values in include/config/auto.conf") Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220221155512.1d25895f7c5f.I50fa3d4189fcab90a2896fe8cae215035dae9508@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-01xfrm: fix tunnel model fragmentation behaviorLina Wang
in tunnel mode, if outer interface(ipv4) is less, it is easily to let inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message is received. When send again, packets are fragmentized with 1280, they are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2(). According to RFC4213 Section3.2.2: if (IPv4 path MTU - 20) is less than 1280 if packet is larger than 1280 bytes Send ICMPv6 "packet too big" with MTU=1280 Drop packet else Encapsulate but do not set the Don't Fragment flag in the IPv4 header. The resulting IPv4 packet might be fragmented by the IPv4 layer on the encapsulator or by some router along the IPv4 path. endif else if packet is larger than (IPv4 path MTU - 20) Send ICMPv6 "packet too big" with MTU = (IPv4 path MTU - 20). Drop packet. else Encapsulate and set the Don't Fragment flag in the IPv4 header. endif endif Packets should be fragmentized with ipv4 outer interface, so change it. After it is fragemtized with ipv4, there will be double fragmenation. No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized, then tunneled with IPv4(No.49& No.50), which obey spec. And received peer cannot decrypt it rightly. 48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50) 49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44) 50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000) 51 2002::10 2002::11 180 Echo (ping) request 52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50) xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below: 1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2] 2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50) 3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request Signed-off-by: Lina Wang <lina.wang@mediatek.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-01netfilter: nf_queue: handle socket prefetchFlorian Westphal
In case someone combines bpf socket assign and nf_queue, then we will queue an skb who references a struct sock that did not have its reference count incremented. As we leave rcu protection, there is no guarantee that skb->sk is still valid. For refcount-less skb->sk case, try to increment the reference count and then override the destructor. In case of failure we have two choices: orphan the skb and 'delete' preselect or let nf_queue() drop the packet. Do the latter, it should not happen during normal operation. Fixes: cf7fbe660f2d ("bpf: Add socket assign support") Acked-by: Joe Stringer <joe@cilium.io> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-01netfilter: nf_queue: fix possible use-after-freeFlorian Westphal
Eric Dumazet says: The sock_hold() side seems suspect, because there is no guarantee that sk_refcnt is not already 0. On failure, we cannot queue the packet and need to indicate an error. The packet will be dropped by the caller. v2: split skb prefetch hunk into separate change Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>