summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-02-03ethtool: Get link mode in use instead of speed and duplex parametersDanielle Ratson
Currently, when user space queries the link's parameters, as speed and duplex, each parameter is passed from the driver to ethtool. Instead, get the link mode bit in use, and derive each of the parameters from it in ethtool. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03ethtool: Extend link modes settings uAPI with lanesDanielle Ratson
Currently, when auto negotiation is on, the user can advertise all the linkmodes which correspond to a specific speed, but does not have a similar selector for the number of lanes. This is significant when a specific speed can be achieved using different number of lanes. For example, 2x50 or 4x25. Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct ethtool_link_settings' with lanes field in order to implement a new lanes-selector that will enable the user to advertise a specific number of lanes as well. When auto negotiation is off, lanes parameter can be forced only if the driver supports it. Add a capability bit in 'struct ethtool_ops' that allows ethtool know if the driver can handle the lanes parameter when auto negotiation is off, so if it does not, an error message will be returned when trying to set lanes. Example: $ ethtool -s swp1 lanes 4 $ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseSR/Full 50000baseCR2/Full 100000baseSR4/Full 100000baseCR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 100000baseSR4/Full 100000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Auto-negotiation: on Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Link detected: no Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03ethtool: Validate master slave configuration before rtnl_lock()Danielle Ratson
Create a new function for input validations to be called before rtnl_lock() and move the master slave validation to that function. This would be a cleanup for next patch that would add another validation to the new function. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignoredVladimir Oltean
The bridge emits VLAN filtering events and quite a few others via switchdev with orig_dev = br->dev. After the blamed commit, these events started getting ignored. The point of the patch was to not offload switchdev objects for ports that didn't go through dsa_port_bridge_join, because the configuration is unsupported: - ports that offload a bonding/team interface go through dsa_port_bridge_join when that bonding/team interface is later bridged with another switch port or LAG - ports that don't offload LAG don't get notified of the bridge that is on top of that LAG. Sadly, a check is missing, which is that the orig_dev is equal to the bridge device. This check is compatible with the original intention, because ports that don't offload bridging because they use a software LAG don't have dp->bridge_dev set. On a semi-related note, we should not offload switchdev objects or populate dp->bridge_dev if the driver doesn't implement .port_bridge_join either. However there is no regression associated with that, so it can be done separately. Fixes: 5696c8aedfcc ("net: dsa: Don't offload port attributes on standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()Sabyrzhan Tasbolatov
syzbot found WARNING in qrtr_tun_write_iter [1] when write_iter length exceeds KMALLOC_MAX_SIZE causing order >= MAX_ORDER condition. Additionally, there is no check for 0 length write. [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc include/linux/slab.h:557 [inline] kzalloc include/linux/slab.h:682 [inline] qrtr_tun_write_iter+0x8a/0x180 net/qrtr/tun.c:83 call_write_iter include/linux/fs.h:1901 [inline] Reported-by: syzbot+c2a7e5c5211605a90865@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Link: https://lore.kernel.org/r/20210202092059.1361381-1-snovitoll@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04netfilter: flowtable: fix tcp and udp header checksum updateSven Auhagen
When updating the tcp or udp header checksum on port nat the function inet_proto_csum_replace2 with the last parameter pseudohdr as true. This leads to an error in the case that GRO is used and packets are split up in GSO. The tcp or udp checksum of all packets is incorrect. The error is probably masked due to the fact the most network driver implement tcp/udp checksum offloading. It also only happens when GRO is applied and not on single packets. The error is most visible when using a pppoe connection which is not triggering the tcp/udp checksum offload. Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-04net, veth: Alloc skb in bulk for ndo_xdp_xmitLorenzo Bianconi
Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine in order to alloc skbs in bulk for XDP_PASS verdict. Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list. The proposed approach has been tested in the following scenario: eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS XDP_REDIRECT: xdp_redirect_map bpf sample XDP_PASS: xdp_rxq_info bpf sample traffic generator: pkt_gen sending udp traffic on a remote device bpf-next master: ~3.64Mpps bpf-next + skb bulking allocation: ~3.79Mpps Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org
2021-02-04netfilter: nftables: fix possible UAF over chains from packet path in netnsPablo Neira Ayuso
Although hooks are released via call_rcu(), chain and rule objects are immediately released while packets are still walking over these bits. This patch adds the .pre_exit callback which is invoked before synchronize_rcu() in the netns framework to stay safe. Remove a comment which is not valid anymore since the core does not use synchronize_net() anymore since 8c873e219970 ("netfilter: core: free hooks with call_rcu"). Suggested-by: Florian Westphal <fw@strlen.de> Fixes: df05ef874b28 ("netfilter: nf_tables: release objects on netns destruction") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-04netfilter: xt_recent: Fix attempt to update deleted entryJozsef Kadlecsik
When both --reap and --update flag are specified, there's a code path at which the entry to be updated is reaped beforehand, which then leads to kernel crash. Reap only entries which won't be updated. Fixes kernel bugzilla #207773. Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773 Reported-by: Reindl Harald <h.reindl@thelounge.net> Fixes: 0079c5aee348 ("netfilter: xt_recent: add an entry reaper") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-03net: indirect call helpers for ipv4/ipv6 dst_check functionsBrian Vazquez
This patch avoids the indirect call for the common case: ip6_dst_check and ipv4_dst_check Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03net: use indirect call helpers for dst_mtuBrian Vazquez
This patch avoids the indirect call for the common case: ip6_mtu and ipv4_mtu Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03net: use indirect call helpers for dst_outputBrian Vazquez
This patch avoids the indirect call for the common case: ip6_output and ip_output Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03net: use indirect call helpers for dst_inputBrian Vazquez
This patch avoids the indirect call for the common case: ip_local_deliver and ip6_input Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03Bluetooth: Fix crash in mgmt_add_adv_patterns_monitor_completeHoward Chung
If hci_add_adv_monitor is a pending command(e.g. forward to msft_add_monitor_pattern), it is possible that mgmt_add_adv_patterns_monitor_complete gets called before cmd->user_data gets set, which will cause a crash when we try to get the moniter handle through cmd->user_data in mgmt_add_adv_patterns_monitor_complete. This moves the cmd->user_data assignment earlier than hci_add_adv_monitor. RIP: 0010:mgmt_add_adv_patterns_monitor_complete+0x82/0x187 [bluetooth] Code: 1e bf 03 00 00 00 be 52 00 00 00 4c 89 ea e8 9e e4 02 00 49 89 c6 48 85 c0 0f 84 06 01 00 00 48 89 5d b8 4c 89 fb 4d 8b 7e 30 <41> 0f b7 47 18 66 89 45 c0 45 84 e4 75 5a 4d 8b 56 28 48 8d 4d c8 RSP: 0018:ffffae81807dbcb8 EFLAGS: 00010286 RAX: ffff91c4bdf723c0 RBX: 0000000000000000 RCX: ffff91c4e5da5b80 RDX: ffff91c405680000 RSI: 0000000000000052 RDI: ffff91c49d654c00 RBP: ffffae81807dbd00 R08: ffff91c49fb157e0 R09: ffff91c49fb157e0 R10: 000000000002a4f0 R11: ffffffffc0819cfd R12: 0000000000000000 R13: ffff91c405680000 R14: ffff91c4bdf723c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff91c4ea300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000133612002 CR4: 00000000003606e0 Call Trace: ? msft_le_monitor_advertisement_cb+0x111/0x141 [bluetooth] hci_event_packet+0x425e/0x631c [bluetooth] ? printk+0x59/0x73 ? __switch_to_asm+0x41/0x70 ? msft_le_set_advertisement_filter_enable_cb+0xa6/0xa6 [bluetooth] ? bt_dbg+0xb4/0xbb [bluetooth] ? __switch_to_asm+0x41/0x70 hci_rx_work+0x101/0x319 [bluetooth] process_one_work+0x257/0x506 worker_thread+0x10d/0x284 kthread+0x14c/0x154 ? process_one_work+0x506/0x506 ? kthread_blkcg+0x2c/0x2c ret_from_fork+0x1f/0x40 Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Reviewed-by: Archie Pusaka <apusaka@chromium.org> Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-02-02inet: do not export inet_gro_{receive|complete}Eric Dumazet
inet_gro_receive() and inet_gro_complete() are part of GRO engine which can not be modular. Similarly, inet_gso_segment() does not need to be exported, being part of GSO stack. In other words, net/ipv6/ip6_offload.o is part of vmlinux, regardless of CONFIG_IPV6. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210202154145.1568451-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mac80211-next-for-net-next-2021-02-02' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time, only RTNL locking reduction fallout. - cfg80211_dev_rename() requires RTNL - cfg80211_change_iface() and cfg80211_set_encryption() require wiphy mutex (was missing in wireless extensions) - cfg80211_destroy_ifaces() requires wiphy mutex - netdev registration can fail due to notifiers, and then notifiers are "unrolled", need to handle this properly * tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: cfg80211: fix netdev registration deadlock cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held wext: call cfg80211_set_encryption() with wiphy lock held wext: call cfg80211_change_iface() with wiphy lock held nl80211: call cfg80211_dev_rename() under RTNL ==================== Link: https://lore.kernel.org/r/20210202144106.38207-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: add the mibs for ADD_ADDR with portGeliang Tang
This patch adds the mibs for ADD_ADDR with port: MPTCP_MIB_PORTADD for received ADD_ADDR suboption with a port number. MPTCP_MIB_PORTSYNRX, MPTCP_MIB_PORTSYNACKRX, MPTCP_MIB_PORTACKRX, for received MP_JOIN's SYN or SYN/ACK or ACK with a port number which is different from the msk's port number. MPTCP_MIB_MISMATCHPORTSYNRX and MPTCP_MIB_MISMATCHPORTACKRX, for received SYN or ACK MP_JOIN with a mismatched port-number. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlinkGeliang Tang
This patch adds MPTCP_PM_ADDR_ATTR_PORT filling and parsing in PM netlink. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: enable use_port when invoke addresses_equalGeliang Tang
When dealing with the addresses list local_addr_list or anno_list, we should enable the function addresses_equal's parameter use_port. And enable it in address_zero too. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: add port number check for MP_JOINGeliang Tang
This patch adds two new helpers, subflow_use_different_sport and subflow_use_different_dport, to check whether the subflow's source or destination port number is different from the msk's port number. When receiving the MP_JOIN's SYN/SYNACK/ACK, we do these port number checks and print out the different port numbers. And furthermore, when receiving the MP_JOIN's SYN/ACK, we also use a new helper mptcp_pm_sport_in_anno_list to check whether this port number is announced. If it isn't, we need to abort this connection. This patch also populates the local address's port field in local_address. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: add a new helper subflow_req_create_thmacGeliang Tang
This patch adds a new helper named subflow_req_create_thmac, which is extracted from subflow_token_join_request. It initializes subflow_req's local_nonce and thmac fields, those are the more expensive to populate. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: drop unused skb in subflow_token_join_requestGeliang Tang
This patch drops the unused parameter skb in subflow_token_join_request. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: create the listening socket for new portGeliang Tang
This patch creates a listening socket when an address with a port-number is added by PM netlink. Then binds the new port to the socket, and listens for new connections. When the address is removed or the addresses are flushed by PM netlink, release the listening socket. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: send ack for every add_addrGeliang Tang
This patch changes the sending ACK conditions for the ADD_ADDR, send an ACK packet for any ADD_ADDR, not just when ipv6 addresses or port numbers are included. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/139 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: create subflow or signal addr for newly added addressGeliang Tang
Currently, when a new MPTCP endpoint is added, the existing MPTCP sockets are not affected. This patch implements a new function mptcp_nl_add_subflow_or_signal_addr, invoked when an address is added from PM netlink. This function traverses the MPTCP sockets list and invokes mptcp_pm_create_subflow_or_signal_addr to try to create a subflow or signal an address for the newly added address, if local constraint allows that. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/19 Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: drop *_max fields in mptcp_pm_dataGeliang Tang
This patch drops the per-msk values add_addr_signal_max, add_addr_accept_max, local_addr_max and subflows_max fields in struct mptcp_pm_data, uses the pernet *_max values instead. And adds four new helpers to get the pernet *_max values separately. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: use WRITE_ONCE for the pernet *_maxGeliang Tang
This patch uses WRITE_ONCE() for all the pernet add_addr_signal_max, add_addr_accept_max, local_addr_max and subflows_max fields in struct pm_nl_pernet to avoid concurrency issues. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv6: Emit notification when fib hardware flags are changedAmit Cohen
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Move fib6_info_hw_flags_set() to C file because it is no longer a short function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Emit notification when fib hardware flags are changedAmit Cohen
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Publish fib_nlmsg_size()Amit Cohen
Publish fib_nlmsg_size() to allow it to be used later on from fib_alias_hw_flags_set(). Remove the inline keyword since it shouldn't be used inside C files. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Pass fib_rt_info as const to fib_dump_info()Amit Cohen
fib_dump_info() does not change 'fri', so pass it as 'const'. It will later allow us to invoke fib_dump_info() from fib_alias_hw_flags_set(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: fix up truesize of cloned skb in skb_prepare_for_shift()Marco Elver
Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when cloning an skb, save and restore truesize after pskb_expand_head(). This can occur if the allocator decides to service an allocation of the same size differently (e.g. use a different size class, or pass the allocation on to KFENCE). Because truesize is used for bookkeeping (such as sk_wmem_queued), a modified truesize of a cloned skb may result in corrupt bookkeeping and relevant warnings (such as in sk_stream_kill_queues()). Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: fix length of MP_PRIO suboptionDavide Caratti
With version 0 of the protocol it was legal to encode the 'Subflow Id' in the MP_PRIO suboption, to specify which subflow would change its 'Backup' flag. This has been removed from v1 specification: thus, according to RFC 8684 §3.3.8, the resulting 'Length' for MP_PRIO changed from 4 to 3 byte. Current Linux generates / parses MP_PRIO according to the old spec, using 'Length' equal to 4, and hardcoding 1 as 'Subflow Id'; RFC compliance can improve if we change 'Length' in other to become 3, leaving a 'Nop' after the MP_PRIO suboption. In this way the kernel will emit and accept *only* MP_PRIO suboptions that are compliant to version 1 of the MPTCP protocol. unpatched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr unpatched.pcap | grep prio reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.48433 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 4032325513 ecr 1876514270,mptcp prio non-backup id 1,mptcp dss ack 14084896651682217737], length 0 patched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr patched.pcap | grep prio reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.49735 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 1276737699 ecr 2686399734,mptcp prio non-backup,nop,mptcp dss ack 18433038869082491686], length 0 Changes since v2: - when accounting for option space, don't increment 'TCPOLEN_MPTCP_PRIO' and use 'TCPOLEN_MPTCP_PRIO_ALIGN' instead, thanks to Matthieu Baerts. Changes since v1: - refactor patch to avoid using 'TCPOLEN_MPTCP_PRIO' with its old value, thanks to Geliang Tang. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matteo Croce <mcroce@linux.microsoft.com> Link: https://lore.kernel.org/r/846cdd41e6ad6ec88ef23fee1552ab39c2f5a3d1.1612184361.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'net-5.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc7, including fixes from bpf and mac80211 trees. Current release - regressions: - ip_tunnel: fix mtu calculation - mlx5: fix function calculation for page trees Previous releases - regressions: - vsock: fix the race conditions in multi-transport support - neighbour: prevent a dead entry from updating gc_list - dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Previous releases - always broken: - bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. - bpf: add missing fput() in BPF inode storage map update helper - udp: ipv4: manipulate network header of NATed UDP GRO fraglist - mac80211: fix station rate table updates on assoc - r8169: work around RTL8125 UDP HW bug - igc: report speed and duplex as unknown when device is runtime suspended - rxrpc: fix deadlock around release of dst cached on udp tunnel" * tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary net: ipa: fix two format specifier errors net: ipa: use the right accessor in ipa_endpoint_status_skip() net: ipa: be explicit about endianness net: ipa: add a missing __iomem attribute net: ipa: pass correct dma_handle to dma_free_coherent() r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net: mvpp2: TCAM entry enable should be written after SRAM data net: lapb: Copy the skb before sending a packet net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc udp: ipv4: manipulate network header of NATed UDP GRO fraglist net: ip_tunnel: fix mtu calculation vsock: fix the race conditions in multi-transport support net: sched: replaced invalid qdisc tree flush helper in qdisc_replace ibmvnic: device remove has higher precedence over reset ...
2021-02-02net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundaryAndreas Oetken
sup_multicast_addr is passed to ether_addr_equal for address comparison which casts the address inputs to u16 leading to an unaligned access. Aligning the sup_multicast_addr to u16 boundary fixes the issue. Signed-off-by: Andreas Oetken <andreas.oetken@siemens.com> Link: https://lore.kernel.org/r/20210202090304.2740471-1-ennoerlangen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGSSabyrzhan Tasbolatov
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS control message is passed with user-controlled 0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition. The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400. So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10 is the closest limit, with 0x10 leftover. Same condition is currently done in rds_cmsg_rdma_args(). [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc_array include/linux/slab.h:592 [inline] kcalloc include/linux/slab.h:621 [inline] rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568 rds_rm_size net/rds/send.c:928 [inline] Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: lapb: Copy the skb before sending a packetXie He
When sending a packet, we will prepend it with an LAPB header. This modifies the shared parts of a cloned skb, so we should copy the skb rather than just clone it, before we prepend the header. In "Documentation/networking/driver.rst" (the 2nd point), it states that drivers shouldn't modify the shared parts of a cloned skb when transmitting. The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called when an skb is being sent, clones the skb and sents the clone to AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte pseudo-header before handing over the skb to us, if we don't copy the skb before prepending the LAPB header, the first byte of the packets received on AF_PACKET sockets can be corrupted. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mac80211-for-net-2021-02-02' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two fixes: - station rate tables were not updated correctly after association, leading to bad configuration - rtl8723bs (staging) was initializing data incorrectly after the previous fix and needed to move the init later * tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip mac80211: fix station rate table updates on assoc ==================== Link: https://lore.kernel.org/r/20210202143505.37610-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Bluetooth: Fix null pointer dereference in amp_read_loc_assoc_final_dataGopal Tiwari
kernel panic trace looks like: #5 [ffffb9e08698fc80] do_page_fault at ffffffffb666e0d7 #6 [ffffb9e08698fcb0] page_fault at ffffffffb70010fe [exception RIP: amp_read_loc_assoc_final_data+63] RIP: ffffffffc06ab54f RSP: ffffb9e08698fd68 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8c8845a5a000 RCX: 0000000000000004 RDX: 0000000000000000 RSI: ffff8c8b9153d000 RDI: ffff8c8845a5a000 RBP: ffffb9e08698fe40 R8: 00000000000330e0 R9: ffffffffc0675c94 R10: ffffb9e08698fe58 R11: 0000000000000001 R12: ffff8c8b9cbf6200 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8c8b2026da0b ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb9e08698fda8] hci_event_packet at ffffffffc0676904 [bluetooth] #8 [ffffb9e08698fe50] hci_rx_work at ffffffffc06629ac [bluetooth] #9 [ffffb9e08698fe98] process_one_work at ffffffffb66f95e7 hcon->amp_mgr seems NULL triggered kernel panic in following line inside function amp_read_loc_assoc_final_data set_bit(READ_LOC_AMP_ASSOC_FINAL, &mgr->state); Fixed by checking NULL for mgr. Signed-off-by: Gopal Tiwari <gtiwari@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-02-01udp: ipv4: manipulate network header of NATed UDP GRO fraglistDongseok Yi
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT forwarding. Only the header of head_skb from ip_finish_output_gso -> skb_gso_segment is updated but following frag_skbs are not updated. A call path skb_mac_gso_segment -> inet_gso_segment -> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list does not try to update UDP/IP header of the segment list but copy only the MAC header. Update port, addr and check of each skb of the segment list in __udp_gso_segment_list. It covers both SNAT and DNAT. Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/1611962007-80092-1-git-send-email-dseok.yi@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net: ip_tunnel: fix mtu calculationVadim Fedorenko
dev->hard_header_len for tunnel interface is set only when header_ops are set too and already contains full overhead of any tunnel encapsulation. That's why there is not need to use this overhead twice in mtu calc. Fixes: fdafed459998 ("ip_gre: set dev->hard_header_len and dev->needed_headroom properly") Reported-by: Slava Bacherikov <mail@slava.cc> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1611959267-20536-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01vsock: fix the race conditions in multi-transport supportAlexander Popov
There are multiple similar bugs implicitly introduced by the commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading"). The bug pattern: [1] vsock_sock.transport pointer is copied to a local variable, [2] lock_sock() is called, [3] the local variable is used. VSOCK multi-transport support introduced the race condition: vsock_sock.transport value may change between [1] and [2]. Let's copy vsock_sock.transport pointer to local variables after the lock_sock() call. Fixes: c0cfa2d8a788fcf4 ("vsock: add multi-transports support") Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Link: https://lore.kernel.org/r/20210201084719.2257066-1-alex.popov@linux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01SUNRPC: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple break statements instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-02-01cfg80211: fix netdev registration deadlockJohannes Berg
If register_netdevice() fails after having called cfg80211's netdev notifier (cfg80211_netdev_notifier_call) it will call the notifier again with UNREGISTER. This would then lock the wiphy mutex because we're marked as registered, which causes a deadlock. Fix this by separately keeping track of whether or not we're in the middle of registering to also skip the notifier call on this unregister. Reported-by: syzbot+2ae0ca9d7737ad1a62b7@syzkaller.appspotmail.com Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") Link: https://lore.kernel.org/r/20210201192048.ed8bad436737.I7cae042c44b15f80919a285799a15df467e9d42d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-02-01Bluetooth: Skip eSCO 2M params when not supportedYu Liu
If a peer device doesn't support eSCO 2M we should skip the params that use it when setting up sync connection since they will always fail. Signed-off-by: Yu Liu <yudiliu@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-02-01net: sunrpc: xprtsock.c: Corrected few spellings ,in commentsBhaskar Chowdhury
Few trivial and rudimentary spell corrections. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-02-01SUNRPC: correct error code comment in xs_tcp_setup_socket()Calum Mackay
This comment was introduced by commit 6ea44adce915 ("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()"). I believe EIO was a typo at the time: it should have been EAGAIN. Subsequently, commit 0445f92c5d53 ("SUNRPC: Fix disconnection races") changed that to ENOTCONN. Rather than trying to keep the comment here in sync with the code in xprt_force_disconnect(), make the point in a non-specific way. Fixes: 6ea44adce915 ("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()") Signed-off-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-02-01SUNRPC: Fix NFS READs that start at non-page-aligned offsetsChuck Lever
Anj Duvnjak reports that the Kodi.tv NFS client is not able to read video files from a v5.10.11 Linux NFS server. The new sendpage-based TCP sendto logic was not attentive to non- zero page_base values. nfsd_splice_read() sets that field when a READ payload starts in the middle of a page. The Linux NFS client rarely emits an NFS READ that is not page- aligned. All of my testing so far has been with Linux clients, so I missed this one. Reported-by: A. Duvnjak <avian@extremenerds.net> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471 Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: A. Duvnjak <avian@extremenerds.net>
2021-02-01mac80211: fix station rate table updates on assocFelix Fietkau
If the driver uses .sta_add, station entries are only uploaded after the sta is in assoc state. Fix early station rate table updates by deferring them until the sta has been uploaded. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210201083324.3134-1-nbd@nbd.name [use rcu_access_pointer() instead since we won't dereference here] Signed-off-by: Johannes Berg <johannes.berg@intel.com>