summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-09-07Merge tag 'net-5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes and stragglers from Jakub Kicinski: "Networking stragglers and fixes, including changes from netfilter, wireless and can. Current release - regressions: - qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi - ip_gre: validate csum_start only on pull - bnxt_en: fix 64-bit doorbell operation on 32-bit kernels - ionic: fix double use of queue-lock, fix a sleeping in atomic - can: c_can: fix null-ptr-deref on ioctl() - cs89x0: disable compile testing on powerpc Current release - new code bugs: - bridge: mcast: fix vlan port router deadlock, consistently disable BH Previous releases - regressions: - dsa: tag_rtl4_a: fix egress tags, only port 0 was working - mptcp: fix possible divide by zero - netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex - netfilter: socket: icmp6: fix use-after-scope - stmmac: fix MAC not working when system resume back with WoL active Previous releases - always broken: - ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address - seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6 - mptcp: only send extra TCP acks in eligible socket states - dsa: lantiq_gswip: fix maximum frame length - stmmac: fix overall budget calculation for rxtx_napi - bnxt_en: fix firmware version reporting via devlink - renesas: sh_eth: add missing barrier to fix freeing wrong tx descriptor Stragglers: - netfilter: conntrack: switch to siphash - netfilter: refuse insertion if chain has grown too large - ncsi: add get MAC address command to get Intel i210 MAC address" * tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) ieee802154: Remove redundant initialization of variable ret net: stmmac: fix MAC not working when system resume back with WoL active net: phylink: add suspend/resume support net: renesas: sh_eth: Fix freeing wrong tx descriptor bonding: 3ad: pass parameter bond_params by reference cxgb3: fix oops on module removal can: c_can: fix null-ptr-deref on ioctl() can: rcar_canfd: add __maybe_unused annotation to silence warning net: wwan: iosm: Unify IO accessors used in the driver net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors net: qcom/emac: Replace strlcpy with strscpy ip6_gre: Revert "ip6_gre: add validation for csum_start" net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static selftests/bpf: Test XDP bonding nest and unwind bonding: Fix negative jump label count on nested bonding MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry stmmac: dwmac-loongson:Fix missing return value iwlwifi: fix printk format warnings in uefi.c net: create netdev->dev_addr assignment helpers bnxt_en: Fix possible unintended driver initiated error recovery ...
2021-09-07ieee802154: Remove redundant initialization of variable retColin Ian King
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06ip6_gre: Revert "ip6_gre: add validation for csum_start"Willem de Bruijn
This reverts commit 9cf448c200ba9935baa94e7a0964598ce947db9d. This commit was added for equivalence with a similar fix to ip_gre. That fix proved to have a bug. Upon closer inspection, ip6_gre is not susceptible to the original bug. So revert the unnecessary extra check. In short, ipgre_xmit calls skb_pull to remove ipv4 headers previously inserted by dev_hard_header. ip6gre_tunnel_xmit does not. Link: https://lore.kernel.org/netdev/CA+FuTSe+vJgTVLc9SojGuN-f9YQ+xWLPKE_S4f=f+w+_P2hgUg@mail.gmail.com/#t Fixes: 9cf448c200ba ("ip6_gre: add validation for csum_start") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05ip_gre: validate csum_start only on pullWillem de Bruijn
The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL addressAntonio Quartulli
GRE interfaces are not Ether-like and therefore it is not possible to generate the v6LL address the same way as (for example) GRETAP devices. With default settings, a GRE interface will attempt generating its v6LL address using the EUI64 approach, but this will fail when the local endpoint of the GRE tunnel is set to "any". In this case the GRE interface will end up with no v6LL address, thus violating RFC4291. SIT interfaces already implement a different logic to ensure that a v6LL address is always computed. Change the GRE v6LL generation logic to follow the same approach as SIT. This way GRE interfaces will always have a v6LL address as well. Behaviour of GRETAP interfaces has not been changed as they behave like classic Ether-like interfaces. To avoid code duplication sit_add_v4_addrs() has been renamed to add_v4_addrs() and adapted to handle also the IP6GRE/GRE cases. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Better client responsiveness when server isn't replying - Use refcount_t in sunrpc rpc_client refcount tracking - Add srcaddr and dst_port to the sunrpc sysfs info files - Add basic support for connection sharing between servers with multiple NICs` Bugfixes and Cleanups: - Sunrpc tracepoint cleanups - Disconnect after ib_post_send() errors to avoid deadlocks - Fix for tearing down rpcrdma_reps - Fix a potential pNFS layoutget livelock loop - pNFS layout barrier fixes - Fix a potential memory corruption in rpc_wake_up_queued_task_set_status() - Fix reconnection locking - Fix return value of get_srcport() - Remove rpcrdma_post_sends() - Remove pNFS dead code - Remove copy size restriction for inter-server copies - Overhaul the NFS callback service - Clean up sunrpc TCP socket shutdowns - Always provide aligned buffers to RPC read layers" * tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits) NFS: Always provide aligned buffers to the RPC read layers NFSv4.1 add network transport when session trunking is detected SUNRPC enforce creation of no more than max_connect xprts NFSv4 introduce max_connect mount options SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs SUNRPC keep track of number of transports to unique addresses NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox SUNRPC: Tweak TCP socket shutdown in the RPC client SUNRPC: Simplify socket shutdown when not reusing TCP ports NFSv4.2: remove restriction of copy size for inter-server copy. NFS: Clean up the synopsis of callback process_op() NFS: Extract the xdr_init_encode/decode() calls from decode_compound NFS: Remove unused callback void decoder NFS: Add a private local dispatcher for NFSv4 callback operations SUNRPC: Eliminate the RQ_AUTHERR flag SUNRPC: Set rq_auth_stat in the pg_authenticate() callout SUNRPC: Add svc_rqst::rq_auth_stat SUNRPC: Add dst_port to the sysfs xprt info file SUNRPC: Add srcaddr as a file in sysfs sunrpc: Fix return value of get_srcport() ...
2021-09-04fq_codel: reject silly quantum parametersEric Dumazet
syzbot found that forcing a big quantum attribute would crash hosts fast, essentially using this: tc qd replace dev eth0 root fq_codel quantum 4294967295 This is because fq_codel_dequeue() would have to loop ~2^31 times in : if (flow->deficit <= 0) { flow->deficit += q->quantum; list_move_tail(&flow->flowchain, &q->old_flows); goto begin; } SFQ max quantum is 2^19 (half a megabyte) Lets adopt a max quantum of one megabyte for FQ_CODEL. Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Protect nft_ct template with global mutex, from Pavel Skripkin. 2) Two recent commits switched inet rt and nexthop exception hashes from jhash to siphash. If those two spots are problematic then conntrack is affected as well, so switch voer to siphash too. While at it, add a hard upper limit on chain lengths and reject insertion if this is hit. Patches from Florian Westphal. 3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN, from Benjamin Hesmans. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: socket: icmp6: fix use-after-scope netfilter: refuse insertion if chain has grown too large netfilter: conntrack: switch to siphash netfilter: conntrack: sanitize table size default settings netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex ==================== Link: https://lore.kernel.org/r/20210903163020.13741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-03Merge tag 'kbuild-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add -s option (strict mode) to merge_config.sh to make it fail when any symbol is redefined. - Show a warning if a different compiler is used for building external modules. - Infer --target from ARCH for CC=clang to let you cross-compile the kernel without CROSS_COMPILE. - Make the integrated assembler default (LLVM_IAS=1) for CC=clang. - Add <linux/stdarg.h> to the kernel source instead of borrowing <stdarg.h> from the compiler. - Add Nick Desaulniers as a Kbuild reviewer. - Drop stale cc-option tests. - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG to handle symbols in inline assembly. - Show a warning if 'FORCE' is missing for if_changed rules. - Various cleanups * tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kbuild: redo fake deps at include/ksym/*.h kbuild: clean up objtool_args slightly modpost: get the *.mod file path more simply checkkconfigsymbols.py: Fix the '--ignore' option kbuild: merge vmlinux_link() between ARCH=um and other architectures kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh kbuild: merge vmlinux_link() between the ordinary link and Clang LTO kbuild: remove stale *.symversions kbuild: remove unused quiet_cmd_update_lto_symversions gen_compile_commands: extract compiler command from a series of commands x86: remove cc-option-yn test for -mtune= arc: replace cc-option-yn uses with cc-option s390: replace cc-option-yn uses with cc-option ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild sparc: move the install rule to arch/sparc/Makefile security: remove unneeded subdir-$(CONFIG_...) kbuild: sh: remove unused install script kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y kbuild: Switch to 'f' variants of integrated assembler flag kbuild: Shuffle blank line to improve comment meaning ...
2021-09-03netfilter: socket: icmp6: fix use-after-scopeBenjamin Hesmans
Bug reported by KASAN: BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) Call Trace: (...) inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) (...) nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91 net/ipv6/netfilter/nf_socket_ipv6.c:146) It seems that this bug has already been fixed by Eric Dumazet in the past in: commit 78296c97ca1f ("netfilter: xt_socket: fix a stack corruption bug") But a variant of the same issue has been introduced in commit d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") `daddr` and `saddr` potentially hold a reference to ipv6_var that is no longer in scope when the call to `nf_socket_get_sock_v6` is made. Fixes: d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-03net: remove the unnecessary check in cipso_v4_doi_free王贇
The commit 733c99ee8be9 ("net: fix NULL pointer reference in cipso_v4_doi_free") was merged by a mistake, this patch try to cleanup the mess. And we already have the commit e842cb60e8ac ("net: fix NULL pointer reference in cipso_v4_doi_free") which fixed the root cause of the issue mentioned in it's description. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03net: bridge: mcast: fix vlan port router deadlockNikolay Aleksandrov
Before vlan/port mcast router support was added br_multicast_set_port_router was used only with bh already disabled due to the bridge port lock, but that is no longer the case and when it is called to configure a vlan/port mcast router we can deadlock with the timer, so always disable bh to make sure it can be called from contexts with both enabled and disabled bh. Fixes: 2796d846d74a ("net: bridge: vlan: convert mcast router global option to per-vlan entry") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03seg6_iptunnel: Remove redundant initialization of variable errColin Ian King
The variable err is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03tipc: clean up inconsistent indentingColin Ian King
There is a statement that is indented one character too deeply, clean this up. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03skbuff: clean up inconsistent indentingColin Ian King
There is a statement that is indented one character too deeply, clean this up. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03mptcp: Only send extra TCP acks in eligible socket statesMat Martineau
Recent changes exposed a bug where specifically-timed requests to the path manager netlink API could trigger a divide-by-zero in __tcp_select_window(), as syzkaller does: divide error: 0000 [#1] SMP KASAN NOPTI CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016 Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea RSP: 0018:ffff888031ccf020 EFLAGS: 00010216 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000 RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000 R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000 R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: tcp_select_window net/ipv4/tcp_output.c:264 [inline] __tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351 __tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972 __tcp_send_ack net/ipv4/tcp_output.c:3978 [inline] tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978 mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654 mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58 mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328 mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359 genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792 netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0x14e/0x190 net/socket.c:724 ____sys_sendmsg+0x709/0x870 net/socket.c:2403 ___sys_sendmsg+0xff/0x170 net/socket.c:2457 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the first subflow in the MPTCP socket's connection list without validating that the subflow was in a suitable connection state. To address this, always validate subflow state when sending extra ACKs on subflows for address advertisement or subflow priority change. Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229 Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03pktgen: remove unused variableEric Dumazet
pktgen_thread_worker() no longer needs wait variable, delete it. Fixes: ef87979c273a ("pktgen: better scheduler friendliness") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6Ryoga Saito
This patch fixes kernel NULL pointer dereference when creating nexthop which is bound with SRv6 decapsulation. In the creation of nexthop, __seg6_end_dt_vrf_build is called. __seg6_end_dt_vrf_build expects fc_lninfo in fib6_config is set correctly, but it isn't set in nh_create_ipv6, which causes kernel crash. Here is steps to reproduce kernel crash: 1. modprobe vrf 2. ip -6 nexthop add encap seg6local action End.DT4 vrftable 1 dev eth0 We got the following message: [ 901.370336] BUG: kernel NULL pointer dereference, address: 0000000000000ba0 [ 901.371658] #PF: supervisor read access in kernel mode [ 901.372672] #PF: error_code(0x0000) - not-present page [ 901.373672] PGD 0 P4D 0 [ 901.374248] Oops: 0000 [#1] SMP PTI [ 901.374944] CPU: 0 PID: 8593 Comm: ip Not tainted 5.14-051400-generic #202108310811-Ubuntu [ 901.376404] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014 [ 901.377907] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.379182] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.382652] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.383746] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.385436] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.386924] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.388537] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.389937] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.391226] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.392737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.393803] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.395122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.396496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.397833] PKRU: 55555554 [ 901.398578] Call Trace: [ 901.399144] l3mdev_ifindex_lookup_by_table_id+0x3b/0x70 [ 901.400179] __seg6_end_dt_vrf_build+0x34/0xd0 [ 901.401067] seg6_end_dt4_build+0x16/0x20 [ 901.401904] seg6_local_build_state+0x271/0x430 [ 901.402797] lwtunnel_build_state+0x81/0x130 [ 901.403645] fib_nh_common_init+0x82/0x100 [ 901.404465] ? sock_def_readable+0x4b/0x80 [ 901.405285] fib6_nh_init+0x115/0x7c0 [ 901.406033] nh_create_ipv6.isra.0+0xe1/0x140 [ 901.406932] rtm_new_nexthop+0x3b7/0xeb0 [ 901.407828] rtnetlink_rcv_msg+0x152/0x3a0 [ 901.408663] ? rtnl_calcit.isra.0+0x130/0x130 [ 901.409535] netlink_rcv_skb+0x55/0x100 [ 901.410319] rtnetlink_rcv+0x15/0x20 [ 901.411026] netlink_unicast+0x1a8/0x250 [ 901.411813] netlink_sendmsg+0x238/0x470 [ 901.412602] ? _copy_from_user+0x2b/0x60 [ 901.413394] sock_sendmsg+0x65/0x70 [ 901.414112] ____sys_sendmsg+0x218/0x290 [ 901.414929] ? copy_msghdr_from_user+0x5c/0x90 [ 901.415814] ___sys_sendmsg+0x81/0xc0 [ 901.416559] ? fsnotify_destroy_marks+0x27/0xf0 [ 901.417447] ? call_rcu+0xa4/0x230 [ 901.418153] ? kmem_cache_free+0x23f/0x410 [ 901.418972] ? dentry_free+0x37/0x70 [ 901.419705] ? mntput_no_expire+0x4c/0x260 [ 901.420574] __sys_sendmsg+0x62/0xb0 [ 901.421297] __x64_sys_sendmsg+0x1f/0x30 [ 901.422057] do_syscall_64+0x5c/0xc0 [ 901.422756] ? syscall_exit_to_user_mode+0x27/0x50 [ 901.423675] ? __x64_sys_close+0x12/0x40 [ 901.424462] ? do_syscall_64+0x69/0xc0 [ 901.425219] ? irqentry_exit_to_user_mode+0x9/0x20 [ 901.426149] ? irqentry_exit+0x19/0x30 [ 901.426901] ? exc_page_fault+0x89/0x160 [ 901.427709] ? asm_exc_page_fault+0x8/0x30 [ 901.428536] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 901.429514] RIP: 0033:0x7fe493945747 [ 901.430248] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 901.433549] RSP: 002b:00007ffe9932cf68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 901.434981] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe493945747 [ 901.436303] RDX: 0000000000000000 RSI: 00007ffe9932cfe0 RDI: 0000000000000003 [ 901.437607] RBP: 00000000613053f7 R08: 0000000000000001 R09: 00007ffe9932d07c [ 901.438990] R10: 000055f4a903a010 R11: 0000000000000246 R12: 0000000000000001 [ 901.440340] R13: 0000000000000001 R14: 000055f4a802b163 R15: 000055f4a8042020 [ 901.441630] Modules linked in: vrf nls_utf8 isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit rapl input_leds joydev serio_raw qemu_fw_cfg mac_hid sch_fq_codel drm virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd virtio_net net_failover cryptd psmouse virtio_blk failover i2c_piix4 pata_acpi floppy [ 901.450808] CR2: 0000000000000ba0 [ 901.451514] ---[ end trace c27b934b99ade304 ]--- [ 901.452403] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.453626] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.456910] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.457912] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.459238] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.460552] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.461882] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.463208] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.464529] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.466058] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.467189] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.468515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.469858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.471139] PKRU: 55555554 Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02net: qrtr: revert check in qrtr_endpoint_post()Dan Carpenter
I tried to make this check stricter as a hardenning measure but it broke audo and wifi on these devices so revert it. Fixes: aaa8e4922c88 ("net: qrtr: make checks in qrtr_endpoint_post() stricter") Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02ipv6: change return type from int to void for mld_process_v2Jiwon Kim
The mld_process_v2 only returned 0. So, the return type is changed to void. Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-01net/ncsi: add get MAC address command to get Intel i210 MAC addressIvan Mikhaylov
This patch adds OEM Intel GMA command and response handler for it. Signed-off-by: Brad Ho <Brad_Ho@phoenix.com> Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Link: https://lore.kernel.org/r/20210830171806.119857-2-i.mikhaylov@yadro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-01mptcp: fix possible divide by zeroPaolo Abeni
Florian noted that if mptcp_alloc_tx_skb() allocation fails in __mptcp_push_pending(), we can end-up invoking mptcp_push_release()/tcp_push() with a zero mss, causing a divide by 0 error. This change addresses the issue refactoring the skb allocation code checking if skb collapsing will happen for sure and doing the skb allocation only after such check. Skb allocation will now happen only after the call to tcp_send_mss() which correctly initializes mss_now. As side bonuses we now fill the skb tx cache only when needed, and this also clean-up a bit the output path. v1 -> v2: - use lockdep_assert_held_once() - Jakub - fix indentation - Jakub Reported-by: Florian Westphal <fw@strlen.de> Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-01Merge tag 'tty-5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial updates from Greg KH: "Here is the "big" set of tty/serial driver patches for 5.15-rc1 Nothing major in here at all, just some driver updates and more cleanups on old tty apis and code that needed it that includes: - tty.h cleanup of things that didn't belong in it - other tty cleanups by Jiri - driver cleanups - rs485 support added to amba-pl011 driver - dts updates - stm32 serial driver updates - other minor fixes and driver updates All have been in linux-next for a while with no reported problems" * tag 'tty-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (83 commits) tty: serial: uartlite: Use read_poll_timeout for a polling loop tty: serial: uartlite: Use constants in early_uartlite_putc tty: Fix data race between tiocsti() and flush_to_ldisc() serial: vt8500: Use of_device_get_match_data serial: tegra: Use of_device_get_match_data serial: 8250_ingenic: Use of_device_get_match_data tty: serial: linflexuart: Remove redundant check to simplify the code tty: serial: fsl_lpuart: do software reset for imx7ulp and imx8qxp tty: serial: fsl_lpuart: enable two stop bits for lpuart32 tty: serial: fsl_lpuart: fix the wrong mapbase value mxser: use semi-colons instead of commas tty: moxa: use semi-colons instead of commas tty: serial: fsl_lpuart: check dma_tx_in_progress in tx dma callback tty: replace in_irq() with in_hardirq() serial: sh-sci: fix break handling for sysrq serial: stm32: use devm_platform_get_and_ioremap_resource() serial: stm32: use the defined variable to simplify code Revert "arm pl011 serial: support multi-irq request" tty: serial: samsung: Add Exynos850 SoC data tty: serial: samsung: Fix driver data macros style ...
2021-09-01mptcp: Fix duplicated argument in protocol.hWan Jiabing
Fix the following coccicheck warning: ./net/mptcp/protocol.h:36:50-73: duplicated argument to & or | The OPTION_MPTCP_MPJ_SYNACK here is duplicate. Here should be OPTION_MPTCP_MPJ_ACK. Fixes: 74c7dfbee3e18 ("mptcp: consolidate in_opt sub-options fields in a bitmask") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-01net: dsa: tag_rtl4_a: Fix egress tagsLinus Walleij
I noticed that only port 0 worked on the RTL8366RB since we started to use custom tags. It turns out that the format of egress custom tags is actually different from ingress custom tags. While the lower bits just contain the port number in ingress tags, egress tags need to indicate destination port by setting the bit for the corresponding port. It was working on port 0 because port 0 added 0x00 as port number in the lower bits, and if you do this the packet appears at all ports, including the intended port. Ooops. Fix this and all ports work again. Use the define for shifting the "type A" into place while we're at it. Tested on the D-Link DIR-685 by sending traffic to each of the ports in turn. It works. Fixes: 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags") Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31Merge tag 'net-next-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Enable memcg accounting for various networking objects. BPF: - Introduce bpf timers. - Add perf link and opaque bpf_cookie which the program can read out again, to be used in libbpf-based USDT library. - Add bpf_task_pt_regs() helper to access user space pt_regs in kprobes, to help user space stack unwinding. - Add support for UNIX sockets for BPF sockmap. - Extend BPF iterator support for UNIX domain sockets. - Allow BPF TCP congestion control progs and bpf iterators to call bpf_setsockopt(), e.g. to switch to another congestion control algorithm. Protocols: - Support IOAM Pre-allocated Trace with IPv6. - Support Management Component Transport Protocol. - bridge: multicast: add vlan support. - netfilter: add hooks for the SRv6 lightweight tunnel driver. - tcp: - enable mid-stream window clamping (by user space or BPF) - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD - more accurate DSACK processing for RACK-TLP - mptcp: - add full mesh path manager option - add partial support for MP_FAIL - improve use of backup subflows - optimize option processing - af_unix: add OOB notification support. - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the router. - mac80211: Target Wake Time support in AP mode. - can: j1939: extend UAPI to notify about RX status. Driver APIs: - Add page frag support in page pool API. - Many improvements to the DSA (distributed switch) APIs. - ethtool: extend IRQ coalesce uAPI with timer reset modes. - devlink: control which auxiliary devices are created. - Support CAN PHYs via the generic PHY subsystem. - Proper cross-chip support for tag_8021q. - Allow TX forwarding for the software bridge data path to be offloaded to capable devices. Drivers: - veth: more flexible channels number configuration. - openvswitch: introduce per-cpu upcall dispatch. - Add internet mix (IMIX) mode to pktgen. - Transparently handle XDP operations in the bonding driver. - Add LiteETH network driver. - Renesas (ravb): - support Gigabit Ethernet IP - NXP Ethernet switch (sja1105): - fast aging support - support for "H" switch topologies - traffic termination for ports under VLAN-aware bridge - Intel 1G Ethernet - support getcrosststamp() with PCIe PTM (Precision Time Measurement) for better time sync - support Credit-Based Shaper (CBS) offload, enabling HW traffic prioritization and bandwidth reservation - Broadcom Ethernet (bnxt) - support pulse-per-second output - support larger Rx rings - Mellanox Ethernet (mlx5) - support ethtool RSS contexts and MQPRIO channel mode - support LAG offload with bridging - support devlink rate limit API - support packet sampling on tunnels - Huawei Ethernet (hns3): - basic devlink support - add extended IRQ coalescing support - report extended link state - Netronome Ethernet (nfp): - add conntrack offload support - Broadcom WiFi (brcmfmac): - add WPA3 Personal with FT to supported cipher suites - support 43752 SDIO device - Intel WiFi (iwlwifi): - support scanning hidden 6GHz networks - support for a new hardware family (Bz) - Xen pv driver: - harden netfront against malicious backends - Qualcomm mobile - ipa: refactor power management and enable automatic suspend - mhi: move MBIM to WWAN subsystem interfaces Refactor: - Ambient BPF run context and cgroup storage cleanup. - Compat rework for ndo_ioctl. Old code removal: - prism54 remove the obsoleted driver, deprecated by the p54 driver. - wan: remove sbni/granch driver" * tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits) net: Add depends on OF_NET for LiteX's LiteETH ipv6: seg6: remove duplicated include net: hns3: remove unnecessary spaces net: hns3: add some required spaces net: hns3: clean up a type mismatch warning net: hns3: refine function hns3_set_default_feature() ipv6: remove duplicated 'net/lwtunnel.h' include net: w5100: check return value after calling platform_get_resource() net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx() net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource() net: mdio-ipq4019: Make use of devm_platform_ioremap_resource() fou: remove sparse errors ipv4: fix endianness issue in inet_rtm_getroute_build_skb() octeontx2-af: Set proper errorcode for IPv4 checksum errors octeontx2-af: Fix static code analyzer reported issues octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg octeontx2-af: Fix loop in free and unmap counter af_unix: fix potential NULL deref in unix_dgram_connect() dpaa2-eth: Replace strlcpy with strscpy octeontx2-af: Use NDC TX for transmit packet data ...
2021-08-31Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "New features: - Support for server-side disconnect injection via debugfs - Protocol definitions for new RPC_AUTH_TLS authentication flavor Performance improvements: - Reduce page allocator traffic in the NFSD splice read actor - Reduce CPU utilization in svcrdma's Send completion handler Notable bug fixes: - Stabilize lockd operation when re-exporting NFS mounts - Fix the use of %.*s in NFSD tracepoints - Fix /proc/sys/fs/nfs/nsm_use_hostnames" * tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) nfsd: fix crash on LOCKT on reexported NFSv3 nfs: don't allow reexport reclaims lockd: don't attempt blocking locks on nfs reexports nfs: don't atempt blocking locks on nfs reexports Keep read and write fds with each nlm_file lockd: update nlm_lookup_file reexport comment nlm: minor refactoring nlm: minor nlm_lookup_file argument change lockd: lockd server-side shouldn't set fl_ops SUNRPC: Add documentation for the fail_sunrpc/ directory SUNRPC: Server-side disconnect injection SUNRPC: Move client-side disconnect injection SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() nfsd4: Fix forced-expiry locking rpc: fix gss_svc_init cleanup on failure SUNRPC: Add RPC_AUTH_TLS protocol numbers lockd: change the proc_handler for nsm_use_hostnames sysctl: introduce new proc handler proc_dobool SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() ...
2021-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/netdevice.h net/socket.c d0efb16294d1 ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") 876f0bf9d0d5 ("net: socket: simplify dev_ifconf handling") 29c4964822aa ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-31ipv6: seg6: remove duplicated includeLv Ruyi
Remove all but the first include of net/lwtunnel.h from 'seg6_local.c. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31ipv6: remove duplicated 'net/lwtunnel.h' includeLv Ruyi
Remove all but the first include of net/lwtunnel.h from seg6_iptunnel.c. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31fou: remove sparse errorsEric Dumazet
We need to add __rcu qualifier to avoid these errors: net/ipv4/fou.c:250:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:250:18: expected struct net_offload const **offloads net/ipv4/fou.c:250:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:251:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:251:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:251:15: struct net_offload const * net/ipv4/fou.c:272:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:272:18: expected struct net_offload const **offloads net/ipv4/fou.c:272:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:273:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:273:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:273:15: struct net_offload const * net/ipv4/fou.c:442:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:442:18: expected struct net_offload const **offloads net/ipv4/fou.c:442:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:443:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:443:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:443:15: struct net_offload const * net/ipv4/fou.c:489:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:489:18: expected struct net_offload const **offloads net/ipv4/fou.c:489:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:490:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:490:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:490:15: struct net_offload const * net/ipv4/udp_offload.c:170:26: warning: incorrect type in assignment (different address spaces) net/ipv4/udp_offload.c:170:26: expected struct net_offload const **offloads net/ipv4/udp_offload.c:170:26: got struct net_offload const [noderef] __rcu ** net/ipv4/udp_offload.c:171:23: error: incompatible types in comparison expression (different address spaces): net/ipv4/udp_offload.c:171:23: struct net_offload const [noderef] __rcu * net/ipv4/udp_offload.c:171:23: struct net_offload const * Fixes: efc98d08e1ec ("fou: eliminate IPv4,v6 specific GRO functions") Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31ipv4: fix endianness issue in inet_rtm_getroute_build_skb()Eric Dumazet
The UDP length field should be in network order. This removes the following sparse error: net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types) net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len net/ipv4/route.c:3173:27: got unsigned long Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31af_unix: fix potential NULL deref in unix_dgram_connect()Eric Dumazet
syzbot was able to trigger NULL deref in unix_dgram_connect() [1] This happens in if (unix_peer(sk)) sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL Because locks have been dropped, unix_peer() might be non NULL, while @other is NULL (AF_UNSPEC case) We need to move code around, so that we no longer access unix_peer() and sk_state while locks have been released. [1] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sys_connect_file+0x155/0x1a0 net/socket.c:1890 __sys_connect+0x161/0x190 net/socket.c:1907 __do_sys_connect net/socket.c:1917 [inline] __se_sys_connect net/socket.c:1914 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1914 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x446a89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89 RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000 R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000 Modules linked in: ---[ end trace 4eb809357514968c ]--- RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Alexei Starovoitov <ast@kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31net: bridge: use mld2r_ngrec instead of icmpv6_dataunMichelleJin
br_ip6_multicast_mld2_report function uses icmp6h to parse mld2_report packet. mld2r_ngrec defines mld2r_hdr.icmp6_dataun.un_data16[1] in include/net/mld.h. So, it is more compact to use mld2r rather than icmp6h. By doing printk test, it is confirmed that icmp6h->icmp6_dataun.un_data16[1] and mld2r->mld2r_ngrec are indeed equivalent. Also, sizeof(*mld2r) and sizeof(*icmp6h) are equivalent, too. Signed-off-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failedXiyu Yang
The reference counting issue happens in one exception handling path of cbq_change_class(). When failing to get tcf_block, the function forgets to decrease the refcount of "rtab" increased by qdisc_put_rtab(), causing a refcount leak. Fix this issue by jumping to "failure" label when get tcf_block failed. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30Merge tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: - cancellation cleanups (Hao, Pavel) - io-wq accounting cleanup (Hao) - io_uring submit locking fix (Hao) - io_uring link handling fixes (Hao) - fixed file improvements (wangyangbo, Pavel) - allow updates of linked timeouts like regular timeouts (Pavel) - IOPOLL fix (Pavel) - remove batched file get optimization (Pavel) - improve reference handling (Pavel) - IRQ task_work batching (Pavel) - allow pure fixed file, and add support for open/accept (Pavel) - GFP_ATOMIC RT kernel fix - multiple CQ ring waiter improvement - funnel IRQ completions through task_work - add support for limiting async workers explicitly - add different clocksource support for timeouts - io-wq wakeup race fix - lots of cleanups and improvement (Pavel et al) * tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits) io-wq: fix wakeup race when adding new work io-wq: wqe and worker locks no longer need to be IRQ safe io-wq: check max_worker limits if a worker transitions bound state io_uring: allow updating linked timeouts io_uring: keep ltimeouts in a list io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts io-wq: provide a way to limit max number of workers io_uring: add build check for buf_index overflows io_uring: clarify io_req_task_cancel() locking io_uring: add task-refs-get helper io_uring: fix failed linkchain code logic io_uring: remove redundant req_set_fail() io_uring: don't free request to slab io_uring: accept directly into fixed file table io_uring: hand code io_accept() fd installing io_uring: openat directly into fixed fd table net: add accept helper not installing fd io_uring: fix io_try_cancel_userdata race for iowq io_uring: IRQ rw completion batching io_uring: batch task work locking ...
2021-08-30Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "Sitting on top of the core block changes, here are the driver changes for the 5.15 merge window: - NVMe updates via Christoph: - suspend improvements for devices with an HMB (Keith Busch) - handle double completions more gacefull (Sagi Grimberg) - cleanup the selects for the nvme core code a bit (Sagi Grimberg) - don't update queue count when failing to set io queues (Ruozhu Li) - various nvmet connect fixes (Amit Engel) - cleanup lightnvm leftovers (Keith Busch, me) - small cleanups (Colin Ian King, Hou Pu) - add tracing for the Set Features command (Hou Pu) - CMB sysfs cleanups (Keith Busch) - add a mutex_destroy call (Keith Busch) - remove lightnvm subsystem. It's served its purpose and ultimately led to zoned nvme support, we no longer need it (Christoph) - revert floppy O_NDELAY fix (Denis) - nbd fixes (Hou, Pavel, Baokun) - nbd locking fixes (Tetsuo) - nbd device removal fixes (Christoph) - raid10 rcu warning fix (Xiao) - raid1 write behind fix (Guoqing) - rnbd fixes (Gioh, Md Haris) - misc fixes (Colin)" * tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits) Revert "floppy: reintroduce O_NDELAY fix" raid1: ensure write behind bio has less than BIO_MAX_VECS sectors md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard nbd: remove nbd->destroy_complete nbd: only return usable devices from nbd_find_unused nbd: set nbd->index before releasing nbd_index_mutex nbd: prevent IDR lookups from finding partially initialized devices nbd: reset NBD to NULL when restarting in nbd_genl_connect nbd: add missing locking to the nbd_dev_add error path nvme: remove the unused NVME_NS_* enum nvme: remove nvm_ndev from ns nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers block: nbd: add sanity check for first_minor nvmet: check that host sqsize does not exceed ctrl MQES nvmet: avoid duplicate qid in connect cmd nvmet: pass back cntlid on successful completion nvme-rdma: don't update queue count when failing to set io queues nvme-tcp: don't update queue count when failing to set io queues nvme-tcp: pair send_mutex init with destroy nvme: allow user toggling hmb usage ...
2021-08-30Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30sch_htb: Fix inconsistency when leaf qdisc creation failsMaxim Mikityanskiy
In HTB offload mode, qdiscs of leaf classes are grafted to netdev queues. sch_htb expects the dev_queue field of these qdiscs to point to the corresponding queues. However, qdisc creation may fail, and in that case noop_qdisc is used instead. Its dev_queue doesn't point to the right queue, so sch_htb can lose track of used netdev queues, which will cause internal inconsistencies. This commit fixes this bug by keeping track of the netdev queue inside struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the new field, the two values are synced on writes, and WARNs are added to assert equality of the two values. The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue, the driver used to pass the old and new queue IDs to sch_htb. Now that there is a new field (offload_queue) in struct htb_class that needs to be updated on this operation, the driver will pass the old class ID to sch_htb instead (it already knows the new class ID). Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30net: ipv4: Fix the warning for dereferenceYajun Deng
Add a if statements to avoid the warning. Dan Carpenter report: The patch faf482ca196a: "net: ipv4: Move ip_options_fragment() out of loop" from Aug 23, 2021, leads to the following Smatch complaint: net/ipv4/ip_output.c:833 ip_do_fragment() warn: variable dereferenced before check 'iter.frag' (see line 828) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: faf482ca196a ("net: ipv4: Move ip_options_fragment() out of loop") Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#t Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: qrtr: make checks in qrtr_endpoint_post() stricterDan Carpenter
These checks are still not strict enough. The main problem is that if "cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is guaranteed to be 4 but we need to be at least 16 bytes. In fact, we can reject everything smaller than sizeof(*pkt) which is 20 bytes. Also I don't like the ALIGN(size, 4). It's better to just insist that data is needs to be aligned at the start. Fixes: 0baa99ee353c ("net: qrtr: Allow non-immediate node routing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30fix array-index-out-of-bounds in taprio_changeHaimin Zhang
syzbot report an array-index-out-of-bounds in taprio_change index 16 is out of range for type '__u16 [16]' that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check the return value of netdev_set_num_tc. Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30net: fix NULL pointer reference in cipso_v4_doi_free王贇
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc failed, we sometime observe panic: BUG: kernel NULL pointer dereference, address: ... RIP: 0010:cipso_v4_doi_free+0x3a/0x80 ... Call Trace: netlbl_cipsov4_add_std+0xf4/0x8c0 netlbl_cipsov4_add+0x13f/0x1b0 genl_family_rcv_msg_doit.isra.15+0x132/0x170 genl_rcv_msg+0x125/0x240 This is because in cipso_v4_doi_free() there is no check on 'doi_def->map.std' when doi_def->type got value 1, which is possibe, since netlbl_cipsov4_add_std() haven't initialize it before alloc 'doi_def->map.std'. This patch just add the check to prevent panic happen in similar cases. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ipv4: make exception cache less predictibleEric Dumazet
Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in fnhe_hashfun()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, by 50% in average, we do not expect this to be a problem. This patch is more complex than the prior one (IPv6 equivalent), because IPv4 was reusing the oldest entry. Since we need to be able to evict more than one entry per update_or_create_fnhe() call, I had to replace fnhe_oldest() with fnhe_remove_oldest(). Also note that we will queue extra kfree_rcu() calls under stress, which hopefully wont be a too big issue. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ipv6: make exception cache less predictibleEric Dumazet
Even after commit 4785305c05b2 ("ipv6: use siphash in rt6_exception_hash()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, we do not expect this to be a problem. Following patch is dealing with the same issue in IPv4. Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Clean up and consolidate ct ecache infrastructure by merging ct and expect notifiers, from Florian Westphal. 2) Missing counters and timestamp in nfnetlink_queue and _log conntrack information. 3) Missing error check for xt_register_template() in iptables mangle, as a incremental fix for the previous pull request, also from Florian Westphal. 4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets do not break. There is a static key to disable the hooks by default. The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable impact in the seg6_input path for non-netfilter users: similar numbers with and without this patch. This is a sample of the perf report output: 11.67% kpktgend_0 [ipv6] [k] ipv6_get_saddr_eval 7.89% kpktgend_0 [ipv6] [k] __ipv6_addr_label 7.52% kpktgend_0 [ipv6] [k] __ipv6_dev_get_saddr 6.63% kpktgend_0 [kernel.vmlinux] [k] asm_exc_nmi 4.74% kpktgend_0 [ipv6] [k] fib6_node_lookup_1 3.48% kpktgend_0 [kernel.vmlinux] [k] pskb_expand_head 3.33% kpktgend_0 [ipv6] [k] ip6_rcv_core.isra.29 3.33% kpktgend_0 [ipv6] [k] seg6_do_srh_encap 2.53% kpktgend_0 [ipv6] [k] ipv6_dev_get_saddr 2.45% kpktgend_0 [ipv6] [k] fib6_table_lookup 2.24% kpktgend_0 [kernel.vmlinux] [k] ___cache_free 2.16% kpktgend_0 [ipv6] [k] ip6_pol_route 2.11% kpktgend_0 [kernel.vmlinux] [k] __ipv6_addr_type ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30netfilter: refuse insertion if chain has grown too largeFlorian Westphal
Also add a stat counter for this that gets exported both via old /proc interface and ctnetlink. Assuming the old default size of 16536 buckets and max hash occupancy of 64k, this results in 128k insertions (origin+reply), so ~8 entries per chain on average. The revised settings in this series will result in about two entries per bucket on average. This allows a hard-limit ceiling of 64. This is not tunable at the moment, but its possible to either increase nf_conntrack_buckets or decrease nf_conntrack_max to reduce average lengths. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30netfilter: conntrack: switch to siphashFlorian Westphal
Replace jhash in conntrack and nat core with siphash. While at it, use the netns mix value as part of the input key rather than abuse the seed value. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30netfilter: conntrack: sanitize table size default settingsFlorian Westphal
conntrack has two distinct table size settings: nf_conntrack_max and nf_conntrack_buckets. The former limits how many conntrack objects are allowed to exist in each namespace. The second sets the size of the hashtable. As all entries are inserted twice (once for original direction, once for reply), there should be at least twice as many buckets in the table than the maximum number of conntrack objects that can exist at the same time. Change the default multiplier to 1 and increase the chosen bucket sizes. This results in the same nf_conntrack_max settings as before but reduces the average bucket list length. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30netfilter: add netfilter hooks to SRv6 data planeRyoga Saito
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows. This patch also introduces a new sysctl toggle for enabling lightweight tunnel netfilter hooks. Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>