summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2021-10-14icmp: fix icmp_ext_echo_iio parsing in icmp_build_probeXin Long
In icmp_build_probe(), the icmp_ext_echo_iio parsing should be done step by step and skb_header_pointer() return value should always be checked, this patch fixes 3 places in there: - On case ICMP_EXT_ECHO_CTYPE_NAME, it should only copy ident.name from skb by skb_header_pointer(), its len is ident_len. Besides, the return value of skb_header_pointer() should always be checked. - On case ICMP_EXT_ECHO_CTYPE_INDEX, move ident_len check ahead of skb_header_pointer(), and also do the return value check for skb_header_pointer(). - On case ICMP_EXT_ECHO_CTYPE_ADDR, before accessing iio->ident.addr. ctype3_hdr.addrlen, skb_header_pointer() should be called first, then check its return value and ident_len. On subcases ICMP_AFI_IP and ICMP_AFI_IP6, also do check for ident. addr.ctype3_hdr.addrlen and skb_header_pointer()'s return value. On subcase ICMP_AFI_IP, the len for skb_header_pointer() should be "sizeof(iio->extobj_hdr) + sizeof(iio->ident.addr.ctype3_hdr) + sizeof(struct in_addr)" or "ident_len". v1->v2: - To make it more clear, call skb_header_pointer() once only for iio->indent's parsing as Jakub Suggested. v2->v3: - The extobj_hdr.length check against sizeof(_iio) should be done before calling skb_header_pointer(), as Eric noticed. Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/31628dd76657ea62f5cf78bb55da6b35240831f1.1634205050.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ip: use dev_addr_set() in tunnelsJakub Kicinski
Use dev_addr_set() instead of writing to netdev->dev_addr directly in ip tunnels drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07net: prefer socket bound to interface when not in VRFMike Manning
The commit 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") modified compute_score() so that a device match is always made, not just in the case of an l3mdev skb, then increments the score also for unbound sockets. This ensures that sockets bound to an l3mdev are never selected when not in a VRF. But as unbound and bound sockets are now scored equally, this results in the last opened socket being selected if there are matches in the default VRF for an unbound socket and a socket bound to a dev that is not an l3mdev. However, handling prior to this commit was to always select the bound socket in this case. Reinstate this handling by incrementing the score only for bound sockets. The required isolation due to choosing between an unbound socket and a socket bound to an l3mdev remains in place due to the device match always being made. The same approach is taken for compute_score() for stream sockets. Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF") Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-05bpf: Enable TCP congestion control kfunc from modulesKumar Kartikeya Dwivedi
This commit moves BTF ID lookup into the newly added registration helper, in a way that the bbr, cubic, and dctcp implementation set up their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not dependent on modules are looked up from the wrapper function. This lifts the restriction for them to be compiled as built in objects, and can be loaded as modules if required. Also modify Makefile.modfinal to call resolve_btfids for each module. Note that since kernel kfunc_ids never overlap with module kfunc_ids, we only match the owner for module btf id sets. See following commits for background on use of: CONFIG_X86 ifdef: 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86) CONFIG_DYNAMIC_FTRACE ifdef: 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
2021-10-05bpf: Introduce BPF support for kernel module function callsKumar Kartikeya Dwivedi
This change adds support on the kernel side to allow for BPF programs to call kernel module functions. Userspace will prepare an array of module BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter. In the kernel, the module BTFs are placed in the auxilliary struct for bpf_prog, and loaded as needed. The verifier then uses insn->off to index into the fd_array. insn->off 0 is reserved for vmlinux BTF (for backwards compat), so userspace must use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is sorted based on offset in an array, and each offset corresponds to one descriptor, with a max limit up to 256 such module BTFs. We also change existing kfunc_tab to distinguish each element based on imm, off pair as each such call will now be distinct. Another change is to check_kfunc_call callback, which now include a struct module * pointer, this is to be used in later patch such that the kfunc_id and module pointer are matched for dynamically registered BTF sets from loadable modules, so that same kfunc_id in two modules doesn't lead to check_kfunc_call succeeding. For the duration of the check_kfunc_call, the reference to struct module exists, as it returns the pointer stored in kfunc_btf_tab. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
2021-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net (v2) The following patchset contains Netfilter fixes for net: 1) Move back the defrag users fields to the global netns_nf area. Kernel fails to boot if conntrack is builtin and kernel is booted with: nf_conntrack.enable_hooks=1. From Florian Westphal. 2) Rule event notification is missing relevant context such as the position handle and the NLM_F_APPEND flag. 3) Rule replacement is expanded to add + delete using the existing rule handle, reverse order of this operation so it makes sense from rule notification standpoint. 4) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags from the rule notification path. Patches #2, #3 and #4 are used by 'nft monitor' and 'iptables-monitor' userspace utilities which are not correctly representing the following operations through netlink notifications: - rule insertions - rule addition/insertion from position handle - create table/chain/set/map/flowtable/... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/phy/bcm7xxx.c d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations") f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c b193e15ac69d ("net: prevent user from passing illegal stab size") 69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30net: snmp: inline snmp_get_cpu_field()Eric Dumazet
This trivial function is called ~90,000 times on 256 cpus hosts, when reading /proc/net/netstat. And this number keeps inflating. Inlining it saves many cycles. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30tcp: adjust rcv_ssthresh according to sk_reserved_memWei Wang
When user sets SO_RESERVE_MEM socket option, in order to utilize the reserved memory when in memory pressure state, we adjust rcv_ssthresh according to the available reserved memory for the socket, instead of using 4 * advmss always. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30tcp: adjust sndbuf according to sk_reserved_memWei Wang
If user sets SO_RESERVE_MEM socket option, in order to fully utilize the reserved memory in memory pressure state on the tx path, we modify the logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it when new data is acked. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: add new socket option SO_RESERVE_MEMWei Wang
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net/ipv4/datagram.c: remove superfluous header files from datagram.cMianhan Liu
datagram.c hasn't use any macro or function declared in linux/ip.h. Thus, these files can be removed from datagram.c safely without affecting the compilation of the net/ipv4 module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28net: ipv4: remove superfluous header files from fib_notifier.cMianhan Liu
fib_notifier.c hasn't use any macro or function declared in net/netns/ipv4.h. Thus, these files can be removed from fib_notifier.c safely without affecting the compilation of the net/ipv4 module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928164011.1454-1-liumh1@shanghaitech.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-28net: udp: annotate data race around udp_sk(sk)->corkflagEric Dumazet
up->corkflag field can be read or written without any lock. Annotate accesses to avoid possible syzbot/KCSAN reports. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1Florian Westphal
This is a revert of 7b1957b049 ("netfilter: nf_defrag_ipv4: use net_generic infra") and a partial revert of 8b0adbe3e3 ("netfilter: nf_defrag_ipv6: use net_generic infra"). If conntrack is builtin and kernel is booted with: nf_conntrack.enable_hooks=1 .... kernel will fail to boot due to a NULL deref in nf_defrag_ipv4_enable(): Its called before the ipv4 defrag initcall is made, so net_generic() returns NULL. To resolve this, move the user refcount back to struct net so calls to those functions are possible even before their initcalls have run. Fixes: 7b1957b04956 ("netfilter: nf_defrag_ipv4: use net_generic infra") Fixes: 8b0adbe3e38d ("netfilter: nf_defrag_ipv6: use net_generic infra"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-27net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.cMianhan Liu
tcp_nv.c hasn't use any macro or function declared in mm.h. Thus, these files can be removed from tcp_nv.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net 1) ipset limits the max allocatable memory via kvmalloc() to MAX_INT, from Jozsef Kadlecsik. 2) Check ip_vs_conn_tab_bits value to be in the range specified in Kconfig, from Andrea Claudi. 3) Initialize fragment offset in ip6tables, from Jeremy Sowden. 4) Make conntrack hash chain length random, from Florian Westphal. 5) Add zone ID to conntrack and NAT hashtuple again, also from Florian. 6) Add selftests for bidirectional zone support and colliding tuples, from Florian Westphal. 7) Unlink table before synchronize_rcu when cleaning tables with owner, from Florian. 8) ipset limits the max allocatable memory via kvmalloc() to MAX_INT. 9) Release conntrack entries via workqueue in masquerade, from Florian. 10) Fix bogus net_init in iptables raw table definition, also from Florian. 11) Work around missing softdep in log extensions, from Florian Westphal. 12) Serialize hash resizes and cleanups with mutex, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: conntrack: serialize hash resizes and cleanups netfilter: log: work around missing softdep backend module netfilter: iptable_raw: drop bogus net_init annotation netfilter: nf_nat_masquerade: defer conntrack walk to work queue netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic netfilter: nf_tables: Fix oversized kvmalloc() calls netfilter: nf_tables: unlink table before deleting it selftests: netfilter: add zone stress test with colliding tuples selftests: netfilter: add selftest for directional zone support netfilter: nat: include zone id in nat table hash again netfilter: conntrack: include zone id in tuple hash again netfilter: conntrack: make max chain length random netfilter: ip6_tables: zero-initialize fragment offset ipvs: check that ip_vs_conn_tab_bits is between 8 and 20 netfilter: ipset: Fix oversized kvmalloc() calls ==================== Link: https://lore.kernel.org/r/20210924221113.348767-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-24tcp: tracking packets with CE marks in BW rate sampleYuchung Cheng
In order to track CE marks per rate sample (one round trip), TCP needs a per-skb header field to record the tp->delivered_ce count when the skb was sent. To make space, we replace the "last_in_flight" field which is used exclusively for NV congestion control. The stat needed by NV can be alternatively approximated by existing stats tcp_sock delivered and mss_cache. This patch counts the number of packets delivered which have CE marks in the rate sample, using similar approach of delivery accounting. Cc: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24net: ipv4: Fix rtnexthop len when RTA_FLOW is presentXiao Liang
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23tcp: remove sk_{tr}x_skb_cacheEric Dumazet
This reverts the following patches : - commit 2e05fcae83c4 ("tcp: fix compile error if !CONFIG_SYSCTL") - commit 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") - commit 472c2e07eef0 ("tcp: add one skb cache for tx") - commit 8b27dae5a2e8 ("tcp: add one skb cache for rx") Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23tcp: make tcp_build_frag() staticPaolo Abeni
After the previous patch the mentioned helper is used only inside its compilation unit: let's make it static. RFC -> v1: - preserve the tcp_build_frag() helper (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23tcp: expose the tcp_mark_push() and tcp_skb_entail() helpersPaolo Abeni
the tcp_skb_entail() helper is actually skb_entail(), renamed to provide proper scope. The two helper will be used by the next patch. RFC -> v1: - rename skb_entail to tcp_skb_entail (Eric) Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23nexthop: Fix memory leaks in nexthop notification chain listenersIdo Schimmel
syzkaller discovered memory leaks [1] that can be reduced to the following commands: # ip nexthop add id 1 blackhole # devlink dev reload pci/0000:06:00.0 As part of the reload flow, mlxsw will unregister its netdevs and then unregister from the nexthop notification chain. Before unregistering from the notification chain, mlxsw will receive delete notifications for nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw will not receive notifications for nexthops using netdevs that are not dismantled as part of the reload flow. For example, the blackhole nexthop above that internally uses the loopback netdev as its nexthop device. One way to fix this problem is to have listeners flush their nexthop tables after unregistering from the notification chain. This is error-prone as evident by this patch and also not symmetric with the registration path where a listener receives a dump of all the existing nexthops. Therefore, fix this problem by replaying delete notifications for the listener being unregistered. This is symmetric to the registration path and also consistent with the netdev notification chain. The above means that unregister_nexthop_notifier(), like register_nexthop_notifier(), will have to take RTNL in order to iterate over the existing nexthops and that any callers of the function cannot hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN driver. To avoid a deadlock, change the latter to unregister its nexthop listener without holding RTNL, making it symmetric to the registration path. [1] unreferenced object 0xffff88806173d600 (size 512): comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s) hex dump (first 32 bytes): 41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa.... 08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............ backtrace: [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522 [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline] [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline] [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231 [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline] [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239 [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83 [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline] [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306 [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244 [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline] [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline] [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913 [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572 [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504 [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590 [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340 [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929 [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline] [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline] [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409 [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463 [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492 [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline] [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline] [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499 Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23net/ipv4/xfrm4_tunnel.c: remove superfluous header files from xfrm4_tunnel.cMianhan Liu
xfrm4_tunnel.c hasn't use any macro or function declared in mutex.h and ip.h Thus, these files can be removed from xfrm4_tunnel.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-09-21net/ipv4/sysctl_net_ipv4.c: remove superfluous header files from ↵Mianhan Liu
sysctl_net_ipv4.c sysctl_net_ipv4.c hasn't use any macro or function declared in igmp.h, inetdevice.h, mm.h, module.h, nsproxy.h, swap.h, inet_frag.h, route.h and snmp.h. Thus, these files can be removed from sysctl_net_ipv4.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21net/ipv4/syncookies.c: remove superfluous header files from syncookies.cMianhan Liu
syncookies.c hasn't use any macro or function declared in slab.h and random.h, Thus, these files can be removed from syncookies.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21net/ipv4/udp_tunnel_core.c: remove superfluous header files from ↵Mianhan Liu
udp_tunnel_core.c udp_tunnel_core.c hasn't use any macro or function declared in udp.h, types.h, and net_namespace.h. Thus, these files can be removed from udp_tunnel_core.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21netfilter: iptable_raw: drop bogus net_init annotationFlorian Westphal
This is a leftover from the times when this function was wired up via pernet_operations. Now its called when userspace asks for the table. With CONFIG_NET_NS=n, iptable_raw_table_init memory has been discarded already and we get a kernel crash. Other tables are fine, __net_init annotation was removed already. Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: youling 257 <youling257@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-20net/ipv4/tcp_minisocks.c: remove superfluous header files from tcp_minisocks.cMianhan Liu
tcp_minisocks.c hasn't use any macro or function declared in mm.h, module.h, slab.h, sysctl.h, workqueue.h, static_key.h and inet_common.h. Thus, these files can be removed from tcp_minisocks.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-20net/ipv4/tcp_fastopen.c: remove superfluous header files from tcp_fastopen.cMianhan Liu
tcp_fastopen.c hasn't use any macro or function declared in crypto.h, err.h, init.h, list.h, rculist.h and inetpeer.h. Thus, these files can be removed from tcp_fastopen.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-20net/ipv4/route.c: remove superfluous header files from route.cMianhan Liu
route.c hasn't use any macro or function declared in uaccess.h, types.h, string.h, sockios.h, times.h, protocol.h, arp.h and l3mdev.h. Thus, these files can be removed from route.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-20nexthop: Fix division by zero while replacing a resilient groupIdo Schimmel
The resilient nexthop group torture tests in fib_nexthop.sh exposed a possible division by zero while replacing a resilient group [1]. The division by zero occurs when the data path sees a resilient nexthop group with zero buckets. The tests replace a resilient nexthop group in a loop while traffic is forwarded through it. The tests do not specify the number of buckets while performing the replacement, resulting in the kernel allocating a stub resilient table (i.e, 'struct nh_res_table') with zero buckets. This table should never be visible to the data path, but the old nexthop group (i.e., 'oldg') might still be used by the data path when the stub table is assigned to it. Fix this by only assigning the stub table to the old nexthop group after making sure the group is no longer used by the data path. Tested with fib_nexthops.sh: Tests passed: 222 Tests failed: 0 [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:nexthop_select_path+0x2d2/0x1a80 [...] Call Trace: fib_select_multipath+0x79b/0x1530 fib_select_path+0x8fb/0x1c10 ip_route_output_key_hash_rcu+0x1198/0x2da0 ip_route_output_key_hash+0x190/0x340 ip_route_output_flow+0x21/0x120 raw_sendmsg+0x91d/0x2e10 inet_sendmsg+0x9e/0xe0 __sys_sendto+0x23d/0x360 __x64_sys_sendto+0xe1/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: stable@vger.kernel.org Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19NET: IPV4: fix error "do not initialise globals to 0"wangzhitong
this patch fixes below Errors reported by checkpatch ERROR: do not initialise globals to 0 +int cipso_v4_rbm_optfmt = 0; Signed-off-by: wangzhitong <wangzhitong@uniontech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers""Eric Dumazet
This reverts commit d7807a9adf4856171f8441f13078c33941df48ab. As mentioned in https://lkml.org/lkml/2021/9/13/1819 5 years old commit 919483096bfe ("ipv4: fix memory leaks in ip_cmsg_send() callers") was a correct fix. ip_cmsg_send() can loop over multiple cmsghdr() If IP_RETOPTS has been successful, but following cmsghdr generates an error, we do not free ipc.ok If IP_RETOPTS is not successful, we have freed the allocated temporary space, not the one currently in ipc.opt. Sure, code could be refactored, but let's not bring back old bugs. Fixes: d7807a9adf48 ("Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()zhenggy
Commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") may directly retrans a multiple segments TSO/GSO packet without split, Since this commit, we can no longer assume that a retransmitted packet is a single segment. This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one() that use the actual segments(pcount) of the retransmitted packet. Before that commit (10d3be569243), the assumption underlying the tp->undo_retrans-- seems correct. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: zhenggy <zhenggy@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-13udp_tunnel: Fix udp_tunnel_nic work-queue typeAya Levin
Turn udp_tunnel_nic work-queue to an ordered work-queue. This queue holds the UDP-tunnel configuration commands of the different netdevs. When the netdevs are functions of the same NIC the order of execution may be crucial. Problem example: NIC with 2 PFs, both PFs declare offload quota of up to 3 UDP-ports. $ifconfig eth2 1.1.1.1/16 up $ip link add eth2_19503 type vxlan id 5049 remote 1.1.1.2 dev eth2 dstport 19053 $ip link set dev eth2_19503 up $ip link add eth2_19504 type vxlan id 5049 remote 1.1.1.3 dev eth2 dstport 19054 $ip link set dev eth2_19504 up $ip link add eth2_19505 type vxlan id 5049 remote 1.1.1.4 dev eth2 dstport 19055 $ip link set dev eth2_19505 up $ip link add eth2_19506 type vxlan id 5049 remote 1.1.1.5 dev eth2 dstport 19056 $ip link set dev eth2_19506 up NIC RX port offload infrastructure offloads the first 3 UDP-ports (on all devices which sets NETIF_F_RX_UDP_TUNNEL_PORT feature) and not UDP-port 19056. So both PFs gets this offload configuration. $ip link set dev eth2_19504 down This triggers udp-tunnel-core to remove the UDP-port 19504 from offload-ports-list and offload UDP-port 19056 instead. In this scenario it is important that the UDP-port of 19504 will be removed from both PFs before trying to add UDP-port 19056. The NIC can stop offloading a UDP-port only when all references are removed. Otherwise the NIC may report exceeding of the offload quota. Fixes: cc4e3835eff4 ("udp_tunnel: add central NIC RX port offload infrastructure") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-13Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"Yajun Deng
This reverts commit 919483096bfe75dda338e98d56da91a263746a0a. There is only when ip_options_get() return zero need to free. It already called kfree() when return error. Fixes: 919483096bfe ("ipv4: fix memory leaks in ip_cmsg_send() callers") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05ip_gre: validate csum_start only on pullWillem de Bruijn
The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03net: remove the unnecessary check in cipso_v4_doi_free王贇
The commit 733c99ee8be9 ("net: fix NULL pointer reference in cipso_v4_doi_free") was merged by a mistake, this patch try to cleanup the mess. And we already have the commit e842cb60e8ac ("net: fix NULL pointer reference in cipso_v4_doi_free") which fixed the root cause of the issue mentioned in it's description. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6Ryoga Saito
This patch fixes kernel NULL pointer dereference when creating nexthop which is bound with SRv6 decapsulation. In the creation of nexthop, __seg6_end_dt_vrf_build is called. __seg6_end_dt_vrf_build expects fc_lninfo in fib6_config is set correctly, but it isn't set in nh_create_ipv6, which causes kernel crash. Here is steps to reproduce kernel crash: 1. modprobe vrf 2. ip -6 nexthop add encap seg6local action End.DT4 vrftable 1 dev eth0 We got the following message: [ 901.370336] BUG: kernel NULL pointer dereference, address: 0000000000000ba0 [ 901.371658] #PF: supervisor read access in kernel mode [ 901.372672] #PF: error_code(0x0000) - not-present page [ 901.373672] PGD 0 P4D 0 [ 901.374248] Oops: 0000 [#1] SMP PTI [ 901.374944] CPU: 0 PID: 8593 Comm: ip Not tainted 5.14-051400-generic #202108310811-Ubuntu [ 901.376404] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014 [ 901.377907] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.379182] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.382652] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.383746] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.385436] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.386924] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.388537] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.389937] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.391226] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.392737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.393803] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.395122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.396496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.397833] PKRU: 55555554 [ 901.398578] Call Trace: [ 901.399144] l3mdev_ifindex_lookup_by_table_id+0x3b/0x70 [ 901.400179] __seg6_end_dt_vrf_build+0x34/0xd0 [ 901.401067] seg6_end_dt4_build+0x16/0x20 [ 901.401904] seg6_local_build_state+0x271/0x430 [ 901.402797] lwtunnel_build_state+0x81/0x130 [ 901.403645] fib_nh_common_init+0x82/0x100 [ 901.404465] ? sock_def_readable+0x4b/0x80 [ 901.405285] fib6_nh_init+0x115/0x7c0 [ 901.406033] nh_create_ipv6.isra.0+0xe1/0x140 [ 901.406932] rtm_new_nexthop+0x3b7/0xeb0 [ 901.407828] rtnetlink_rcv_msg+0x152/0x3a0 [ 901.408663] ? rtnl_calcit.isra.0+0x130/0x130 [ 901.409535] netlink_rcv_skb+0x55/0x100 [ 901.410319] rtnetlink_rcv+0x15/0x20 [ 901.411026] netlink_unicast+0x1a8/0x250 [ 901.411813] netlink_sendmsg+0x238/0x470 [ 901.412602] ? _copy_from_user+0x2b/0x60 [ 901.413394] sock_sendmsg+0x65/0x70 [ 901.414112] ____sys_sendmsg+0x218/0x290 [ 901.414929] ? copy_msghdr_from_user+0x5c/0x90 [ 901.415814] ___sys_sendmsg+0x81/0xc0 [ 901.416559] ? fsnotify_destroy_marks+0x27/0xf0 [ 901.417447] ? call_rcu+0xa4/0x230 [ 901.418153] ? kmem_cache_free+0x23f/0x410 [ 901.418972] ? dentry_free+0x37/0x70 [ 901.419705] ? mntput_no_expire+0x4c/0x260 [ 901.420574] __sys_sendmsg+0x62/0xb0 [ 901.421297] __x64_sys_sendmsg+0x1f/0x30 [ 901.422057] do_syscall_64+0x5c/0xc0 [ 901.422756] ? syscall_exit_to_user_mode+0x27/0x50 [ 901.423675] ? __x64_sys_close+0x12/0x40 [ 901.424462] ? do_syscall_64+0x69/0xc0 [ 901.425219] ? irqentry_exit_to_user_mode+0x9/0x20 [ 901.426149] ? irqentry_exit+0x19/0x30 [ 901.426901] ? exc_page_fault+0x89/0x160 [ 901.427709] ? asm_exc_page_fault+0x8/0x30 [ 901.428536] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 901.429514] RIP: 0033:0x7fe493945747 [ 901.430248] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 901.433549] RSP: 002b:00007ffe9932cf68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 901.434981] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe493945747 [ 901.436303] RDX: 0000000000000000 RSI: 00007ffe9932cfe0 RDI: 0000000000000003 [ 901.437607] RBP: 00000000613053f7 R08: 0000000000000001 R09: 00007ffe9932d07c [ 901.438990] R10: 000055f4a903a010 R11: 0000000000000246 R12: 0000000000000001 [ 901.440340] R13: 0000000000000001 R14: 000055f4a802b163 R15: 000055f4a8042020 [ 901.441630] Modules linked in: vrf nls_utf8 isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit rapl input_leds joydev serio_raw qemu_fw_cfg mac_hid sch_fq_codel drm virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd virtio_net net_failover cryptd psmouse virtio_blk failover i2c_piix4 pata_acpi floppy [ 901.450808] CR2: 0000000000000ba0 [ 901.451514] ---[ end trace c27b934b99ade304 ]--- [ 901.452403] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.453626] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.456910] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.457912] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.459238] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.460552] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.461882] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.463208] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.464529] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.466058] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.467189] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.468515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.469858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.471139] PKRU: 55555554 Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31fou: remove sparse errorsEric Dumazet
We need to add __rcu qualifier to avoid these errors: net/ipv4/fou.c:250:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:250:18: expected struct net_offload const **offloads net/ipv4/fou.c:250:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:251:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:251:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:251:15: struct net_offload const * net/ipv4/fou.c:272:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:272:18: expected struct net_offload const **offloads net/ipv4/fou.c:272:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:273:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:273:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:273:15: struct net_offload const * net/ipv4/fou.c:442:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:442:18: expected struct net_offload const **offloads net/ipv4/fou.c:442:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:443:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:443:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:443:15: struct net_offload const * net/ipv4/fou.c:489:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:489:18: expected struct net_offload const **offloads net/ipv4/fou.c:489:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:490:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:490:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:490:15: struct net_offload const * net/ipv4/udp_offload.c:170:26: warning: incorrect type in assignment (different address spaces) net/ipv4/udp_offload.c:170:26: expected struct net_offload const **offloads net/ipv4/udp_offload.c:170:26: got struct net_offload const [noderef] __rcu ** net/ipv4/udp_offload.c:171:23: error: incompatible types in comparison expression (different address spaces): net/ipv4/udp_offload.c:171:23: struct net_offload const [noderef] __rcu * net/ipv4/udp_offload.c:171:23: struct net_offload const * Fixes: efc98d08e1ec ("fou: eliminate IPv4,v6 specific GRO functions") Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31ipv4: fix endianness issue in inet_rtm_getroute_build_skb()Eric Dumazet
The UDP length field should be in network order. This removes the following sparse error: net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types) net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len net/ipv4/route.c:3173:27: got unsigned long Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30net: ipv4: Fix the warning for dereferenceYajun Deng
Add a if statements to avoid the warning. Dan Carpenter report: The patch faf482ca196a: "net: ipv4: Move ip_options_fragment() out of loop" from Aug 23, 2021, leads to the following Smatch complaint: net/ipv4/ip_output.c:833 ip_do_fragment() warn: variable dereferenced before check 'iter.frag' (see line 828) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: faf482ca196a ("net: ipv4: Move ip_options_fragment() out of loop") Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#t Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30ipv4: make exception cache less predictibleEric Dumazet
Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in fnhe_hashfun()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, by 50% in average, we do not expect this to be a problem. This patch is more complex than the prior one (IPv6 equivalent), because IPv4 was reusing the oldest entry. Since we need to be able to evict more than one entry per update_or_create_fnhe() call, I had to replace fnhe_oldest() with fnhe_remove_oldest(). Also note that we will queue extra kfree_rcu() calls under stress, which hopefully wont be a too big issue. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Clean up and consolidate ct ecache infrastructure by merging ct and expect notifiers, from Florian Westphal. 2) Missing counters and timestamp in nfnetlink_queue and _log conntrack information. 3) Missing error check for xt_register_template() in iptables mangle, as a incremental fix for the previous pull request, also from Florian Westphal. 4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets do not break. There is a static key to disable the hooks by default. The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable impact in the seg6_input path for non-netfilter users: similar numbers with and without this patch. This is a sample of the perf report output: 11.67% kpktgend_0 [ipv6] [k] ipv6_get_saddr_eval 7.89% kpktgend_0 [ipv6] [k] __ipv6_addr_label 7.52% kpktgend_0 [ipv6] [k] __ipv6_dev_get_saddr 6.63% kpktgend_0 [kernel.vmlinux] [k] asm_exc_nmi 4.74% kpktgend_0 [ipv6] [k] fib6_node_lookup_1 3.48% kpktgend_0 [kernel.vmlinux] [k] pskb_expand_head 3.33% kpktgend_0 [ipv6] [k] ip6_rcv_core.isra.29 3.33% kpktgend_0 [ipv6] [k] seg6_do_srh_encap 2.53% kpktgend_0 [ipv6] [k] ipv6_dev_get_saddr 2.45% kpktgend_0 [ipv6] [k] fib6_table_lookup 2.24% kpktgend_0 [kernel.vmlinux] [k] ___cache_free 2.16% kpktgend_0 [ipv6] [k] ip6_pol_route 2.11% kpktgend_0 [kernel.vmlinux] [k] __ipv6_addr_type ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller
ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-08-27 1) Remove an unneeded extra variable in esp4 esp_ssg_unref. From Corey Minyard. 2) Add a configuration option to change the default behaviour to block traffic if there is no matching policy. Joint work with Christian Langrock and Antony Antony. 3) Fix a shift-out-of-bounce bug reported from syzbot. From Pavel Skripkin. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26tcp: enable mid stream window clampNeil Spring
The TCP_WINDOW_CLAMP socket option is defined in tcp(7) to "Bound the size of the advertised window to this value." Window clamping is distributed across two variables, window_clamp ("Maximal window to advertise" in tcp.h) and rcv_ssthresh ("Current window clamp"). This patch updates the function where the window clamp is set to also reduce the current window clamp, rcv_sshthresh, if needed. With this, setting the TCP_WINDOW_CLAMP option has the documented effect of limiting the window. Signed-off-by: Neil Spring <ntspring@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210825210117.1668371-1-ntspring@fb.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>