summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-01-22mac80211: introduce aql_enable node in debugfsLorenzo Bianconi
Introduce aql_enable node in debugfs in order to enable/disable aql. This is useful for debugging purpose. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/e7a934d5d84e4796c4f97ea5de4e66c824296b07.1610214851.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-22cfg80211: Add phyrate conversion support for extended MCS in 60GHz bandMax Chen
The current phyrate conversion does not include extended MCS and provides incorrect rates. Add a flag for extended MCS in DMG and add corresponding phyrate table for the correct conversions using base MCS in DMG specs. Signed-off-by: Max Chen <mxchen@codeaurora.org> Link: https://lore.kernel.org/r/1609977050-7089-2-git-send-email-mxchen@codeaurora.org [reduce data size, make a single WARN] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-22cfg80211: add VHT rate entries for MCS-10 and MCS-11Arend van Spriel
Observed the warning in cfg80211_calculate_bitrate_vht() using an 11ac chip reporting MCS-11. Since devices reporting non-standard MCS-9 is already supported add similar entries for MCS-10 and MCS-11. Actually, the value of MCS-9@20MHz is slightly off so corrected that. Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/20210105105839.3795-1-arend.vanspriel@broadcom.com [fix array size] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-22mac80211: reduce peer HE MCS/NSS to own capabilitiesWen Gong
For VHT capbility, we do intersection of MCS and NSS for peers in mac80211, to simplify drivers. Add this for HE as well. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1609816120-9411-3-git-send-email-wgong@codeaurora.org [reword commit message, style cleanups, fix endian annotations] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-01-21 1) Fix a rare panic on SMP systems when packet reordering happens between anti replay check and update. From Shmulik Ladkani. 2) Fix disable_xfrm sysctl when used on xfrm interfaces. From Eyal Birger. 3) Fix a race in PF_KEY when the availability of crypto algorithms is set. From Cong Wang. 4) Fix a return value override in the xfrm policy selftests. From Po-Hsu Lin. 5) Fix an integer wraparound in xfrm_policy_addr_delta. From Visa Hankala. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Fix wraparound in xfrm_policy_addr_delta() selftests: xfrm: fix test return value override issue in xfrm_policy.sh af_key: relax availability checks for skb size calculation xfrm: fix disable_xfrm sysctl when used on xfrm interfaces xfrm: Fix oops in xfrm_replay_advance_bmp ==================== Link: https://lore.kernel.org/r/20210121121558.621339-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-21libceph: fix "Boolean result is used in bitwise operation" warningIlya Dryomov
This line dates back to 2013, but cppcheck complained because commit 2f713615ddd9 ("libceph: move msgr1 protocol implementation to its own file") moved it. Add parenthesis to silence the warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-21mac80211: remove NSS number of 160MHz if not support 160MHz for HEWen Gong
When it does not support 160MHz in HE phy capabilities information, it should not treat the NSS number of 160MHz as a valid number, otherwise the final NSS will be set to 0. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1609816120-9411-2-git-send-email-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21mac80211: 160MHz with extended NSS BW in CSAShay Bar
Upon receiving CSA with 160MHz extended NSS BW from associated AP, STA should set the HT operation_mode based on new_center_freq_seg1 because it is later used as ccfs2 in ieee80211_chandef_vht_oper(). Signed-off-by: Aviad Brikman <aviad.brikman@celeno.com> Signed-off-by: Shay Bar <shay.bar@celeno.com> Link: https://lore.kernel.org/r/20201222064714.24888-1-shay.bar@celeno.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21mac80211: add LDPC encoding to ieee80211_parse_tx_radiotapPhilipp Borgers
This patch adds support for LDPC encoding to the radiotap tx parse function. Piror to this change adding the LDPC flag to the radiotap header did not encode frames with LDPC. Signed-off-by: Philipp Borgers <borgers@mi.fu-berlin.de> Link: https://lore.kernel.org/r/20201219170710.11706-1-borgers@mi.fu-berlin.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21mac80211: add rx decapsulation offload supportFelix Fietkau
This allows drivers to pass 802.3 frames to mac80211, with some restrictions: - the skb must be passed with a valid sta - fast-rx needs to be active for the sta - monitor mode needs to be disabled mac80211 will tell the driver when it is safe to enable rx decap offload for a particular station. In order to implement support, a driver must: - call ieee80211_hw_set(hw, SUPPORTS_RX_DECAP_OFFLOAD) - implement ops->sta_set_decap_offload - mark 802.3 frames with RX_FLAG_8023 If it doesn't want to enable offload for some vif types, it can mask out IEEE80211_OFFLOAD_DECAP_ENABLED in vif->offload_flags from within the .add_interface or .update_vif_offload driver ops Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218184718.93650-6-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21net/fq_impl: do not maintain a backlog-sorted list of flowsFelix Fietkau
A sorted flow list is only needed to drop packets in the biggest flow when hitting the overmemory condition. By scanning flows only when needed, we can avoid paying the cost of maintaining the list under normal conditions In order to avoid scanning lots of empty flows and touching too many cold cache lines, a bitmap of flows with backlog is maintained Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218184718.93650-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21net/fq_impl: drop get_default_func, move default flow to fq_tinFelix Fietkau
Simplifies the code and prepares for a rework of scanning for flows on overmemory drop. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218184718.93650-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-21Merge branch 'tty-splice' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into tty-next Fixes both the "splice/sendfile to a tty" and "splice/sendfile from a tty" regression from 5.10. * 'tty-splice' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: tty: teach the n_tty ICANON case about the new "cookie continuations" too tty: teach n_tty line discipline about the new "cookie continuations" tty: clean up legacy leftovers from n_tty line discipline tty: implement read_iter tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer tty: implement write_iter
2021-01-20ip_gre: remove CRC flag from dev features in gre_gso_segmentXin Long
This patch is to let it always do CRC checksum in sctp_gso_segment() by removing CRC flag from the dev features in gre_gso_segment() for SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP. It could set csum/csum_start in GSO CB properly in sctp_gso_segment() after that commit, so it would do checksum with gso_make_checksum() in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute gre csum for sctp over gre tunnels") can be reverted now. Note that when need_csum is false, we can still leave CRC checksum of SCTP to HW by not clearing this CRC flag if it's supported, as Jakub and Alex noticed. v1->v2: - improve the changelog. - fix "rev xmas tree" in varibles declaration. v2->v3: - remove CRC flag from dev features only when need_csum is true. Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/00439f24d5f69e2c6fa2beadc681d056c15c258f.1610772251.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20udp: not remove the CRC flag from dev features when need_csum is falseXin Long
In __skb_udp_tunnel_segment(), when it's a SCTP over VxLAN/GENEVE packet and need_csum is false, which means the outer udp checksum doesn't need to be computed, csum_start and csum_offset could be used by the inner SCTP CRC CSUM for SCTP HW CRC offload. So this patch is to not remove the CRC flag from dev features when need_csum is false. Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/1e81b700642498546eaa3f298e023fd7ad394f85.1610776757.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20net/sched: cls_flower add CT_FLAGS_INVALID flag supportwenxu
This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to match the ct_state with invalid for conntrack. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20net: inline rollback_registered_many()Jakub Kicinski
Similar to the change for rollback_registered() - rollback_registered_many() was a part of unregister_netdevice_many() minus the net_set_todo(), which is no longer needed. Functionally this patch moves the list_empty() check back after: BUG_ON(dev_boot_phase); ASSERT_RTNL(); but I can't find any reason why that would be an issue. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20net: move rollback_registered_many()Jakub Kicinski
Move rollback_registered_many() and add a temporary forward declaration to make merging the code into unregister_netdevice_many() easier to review. No functional changes. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20net: inline rollback_registered()Jakub Kicinski
rollback_registered() is a local helper, it's common for driver code to call unregister_netdevice_queue(dev, NULL) when they want to unregister netdevices under rtnl_lock. Inline rollback_registered() and adjust the only remaining caller. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20net: move net_set_todo inside rollback_registered()Jakub Kicinski
Commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.") moved net_set_todo() outside of rollback_registered() so that rollback_registered() can be used in the failure path of register_netdevice() but without risking a double free. Since commit cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state."), however, we have a better way of handling that condition, since destructors don't call free_netdev() directly. After the change in commit c269a24ce057 ("net: make free_netdev() more lenient with unregistering devices") we can now move net_set_todo() back. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20nexthop: Specialize rtm_nh_policyPetr Machata
This policy is currently only used for creation of new next hops and new next hop groups. Rename it accordingly and remove the two attributes that are not valid in that context: NHA_GROUPS and NHA_MASTER. For consistency with other policies, do not mention policy array size in the declarator, and replace NHA_MAX for ARRAY_SIZE as appropriate. Note that with this commit, NHA_MAX and __NHA_MAX are not used anymore. Leave them in purely as a user API. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20nexthop: Use a dedicated policy for nh_valid_dump_req()Petr Machata
This function uses the global nexthop policy, but only accepts four particular attributes. Create a new policy that only includes the four supported attributes, and use it. Convert the loop to a series of ifs. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20nexthop: Use a dedicated policy for nh_valid_get_del_req()Petr Machata
This function uses the global nexthop policy only to then bounce all arguments except for NHA_ID. Instead, just create a new policy that only includes the one allowed attribute. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20tty: convert tty_ldisc_ops 'read()' function to take a kernel pointerLinus Torvalds
The tty line discipline .read() function was passed the final user pointer destination as an argument, which doesn't match the 'write()' function, and makes it very inconvenient to do a splice method for ttys. This is a conversion to use a kernel buffer instead. NOTE! It does this by passing the tty line discipline ->read() function an additional "cookie" to fill in, and an offset into the cookie data. The line discipline can fill in the cookie data with its own private information, and then the reader will repeat the read until either the cookie is cleared or it runs out of data. The only real user of this is N_HDLC, which can use this to handle big packets, even if the kernel buffer is smaller than the whole packet. Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-20bpf: Split cgroup_bpf_enabled per attach typeStanislav Fomichev
When we attach any cgroup hook, the rest (even if unused/unattached) start to contribute small overhead. In particular, the one we want to avoid is __cgroup_bpf_run_filter_skb which does two redirections to get to the cgroup and pushes/pulls skb. Let's split cgroup_bpf_enabled to be per-attach to make sure only used attach types trigger. I've dropped some existing high-level cgroup_bpf_enabled in some places because BPF_PROG_CGROUP_XXX_RUN macros usually have another cgroup_bpf_enabled check. I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type] has to be constant and known at compile time. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
2021-01-20bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVEStanislav Fomichev
Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom call in do_tcp_getsockopt using the on-stack data. This removes 3% overhead for locking/unlocking the socket. Without this patch: 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc With the patch applied: 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern Note, exporting uapi/tcp.h requires removing netinet/tcp.h from test_progs.h because those headers have confliciting definitions. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com
2021-01-20net, xdp: Introduce xdp_build_skb_from_frame utility routineLorenzo Bianconi
Introduce xdp_build_skb_from_frame utility routine to build the skb from xdp_frame. Respect to __xdp_build_skb_from_frame, xdp_build_skb_from_frame will allocate the skb object. Rely on xdp_build_skb_from_frame in veth driver. Introduce missing xdp metadata support in veth_xdp_rcv_one routine. Add missing metadata support in veth_xdp_rcv_one(). Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/94ade9e853162ae1947941965193190da97457bc.1610475660.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20net, xdp: Introduce __xdp_build_skb_from_frame utility routineLorenzo Bianconi
Introduce __xdp_build_skb_from_frame utility routine to build the skb from xdp_frame. Rely on __xdp_build_skb_from_frame in cpumap code. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/4f9f4c6b3dd3933770c617eb6689dbc0c6e25863.1610475660.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/can/dev.c commit 03f16c5075b2 ("can: dev: can_restart: fix use after free bug") commit 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir") Code move. drivers/net/dsa/b53/b53_common.c commit 8e4052c32d6b ("net: dsa: b53: fix an off by one in checking "vlan->vid"") commit b7a9e0da2d1c ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects") Field rename. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20Merge tag 'net-5.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and can trees. Current release - regressions: - nfc: nci: fix the wrong NCI_CORE_INIT parameters Current release - new code bugs: - bpf: allow empty module BTFs Previous releases - regressions: - bpf: fix signed_{sub,add32}_overflows type handling - tcp: do not mess with cloned skbs in tcp_add_backlog() - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach - bpf: don't leak memory in bpf getsockopt when optlen == 0 - tcp: fix potential use-after-free due to double kfree() - mac80211: fix encryption issues with WEP - devlink: use right genl user_ptr when handling port param get/set - ipv6: set multicast flag on the multicast route - tcp: fix TCP_USER_TIMEOUT with zero window Previous releases - always broken: - bpf: local storage helpers should check nullness of owner ptr passed - mac80211: fix incorrect strlen of .write in debugfs - cls_flower: call nla_ok() before nla_next() - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too" * tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net: systemport: free dev before on error path net: usb: cdc_ncm: don't spew notifications net: mscc: ocelot: Fix multicast to the CPU port tcp: Fix potential use-after-free due to double kfree() bpf: Fix signed_{sub,add32}_overflows type handling can: peak_usb: fix use after free bugs can: vxcan: vxcan_xmit: fix use after free bug can: dev: can_restart: fix use after free bug tcp: fix TCP socket rehash stats mis-accounting net: dsa: b53: fix an off by one in checking "vlan->vid" tcp: do not mess with cloned skbs in tcp_add_backlog() selftests: net: fib_tests: remove duplicate log test net: nfc: nci: fix the wrong NCI_CORE_INIT parameters sh_eth: Fix power down vs. is_opened flag ordering net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled netfilter: rpfilter: mask ecn bits before fib lookup udp: mask TOS bits in udp_v4_early_demux() xsk: Clear pool even for inactive queues bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback sh_eth: Make PHY access aware of Runtime PM to fix reboot crash ...
2021-01-20tcp: Fix potential use-after-free due to double kfree()Kuniyuki Iwashima
Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct request_sock and then can allocate inet_rsk(req)->ireq_opt. After that, tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full socket into ehash and sets NULL to ireq_opt. Otherwise, tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full socket. The commit 01770a1661657 ("tcp: fix race condition when creating child sockets from syncookies") added a new path, in which more than one cores create full sockets for the same SYN cookie. Currently, the core which loses the race frees the full socket without resetting inet_opt, resulting in that both sock_put() and reqsk_put() call kfree() for the same memory: sock_put sk_free __sk_free sk_destruct __sk_destruct sk->sk_destruct/inet_sock_destruct kfree(rcu_dereference_protected(inet->inet_opt, 1)); reqsk_put reqsk_free __reqsk_free req->rsk_ops->destructor/tcp_v4_reqsk_destructor kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1)); Calling kmalloc() between the double kfree() can lead to use-after-free, so this patch fixes it by setting NULL to inet_opt before sock_put(). As a side note, this kind of issue does not happen for IPv6. This is because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which correspond to ireq_opt in IPv4. Fixes: 01770a166165 ("tcp: fix race condition when creating child sockets from syncookies") CC: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-01-20 1) Fix wrong bpf_map_peek_elem_proto helper callback, from Mircea Cirjaliu. 2) Fix signed_{sub,add32}_overflows type truncation, from Daniel Borkmann. 3) Fix AF_XDP to also clear pools for inactive queues, from Maxim Mikityanskiy. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix signed_{sub,add32}_overflows type handling xsk: Clear pool even for inactive queues bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback ==================== Link: https://lore.kernel.org/r/20210120163439.8160-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19tcp: fix TCP socket rehash stats mis-accountingYuchung Cheng
The previous commit 32efcc06d2a1 ("tcp: export count for rehash attempts") would mis-account rehashing SNMP and socket stats: a. During handshake of an active open, only counts the first SYN timeout b. After handshake of passive and active open, stop updating after (roughly) TCP_RETRIES1 recurring RTOs c. After the socket aborts, over count timeout_rehash by 1 This patch fixes this by checking the rehash result from sk_rethink_txhash. Fixes: 32efcc06d2a1 ("tcp: export count for rehash attempts") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: fix GSO for SG-enabled devicesPaolo Abeni
The commit dbd50f238dec ("net: move the hsize check to the else block in skb_segment") introduced a data corruption for devices supporting scatter-gather. The problem boils down to signed/unsigned comparison given unexpected results: if signed 'hsize' is negative, it will be considered greater than a positive 'len', which is unsigned. This commit addresses resorting to the old checks order, so that 'hsize' never has a negative value when compared with 'len'. v1 -> v2: - reorder hsize checks instead of explicit cast (Alex) Bisected-by: Matthieu Baerts <matthieu.baerts@tessares.net> Fixes: dbd50f238dec ("net: move the hsize check to the else block in skb_segment") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19tcp: do not mess with cloned skbs in tcp_add_backlog()Eric Dumazet
Heiner Kallweit reported that some skbs were sent with the following invalid GSO properties : - gso_size > 0 - gso_type == 0 This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2. Juerg Haefliger was able to reproduce a similar issue using a lan78xx NIC and a workload mixing TCP incoming traffic and forwarded packets. The problem is that tcp_add_backlog() is writing over gso_segs and gso_size even if the incoming packet will not be coalesced to the backlog tail packet. While skb_try_coalesce() would bail out if tail packet is cloned, this overwriting would lead to corruptions of other packets cooked by lan78xx, sharing a common super-packet. The strategy used by lan78xx is to use a big skb, and split it into all received packets using skb_clone() to avoid copies. The drawback of this strategy is that all the small skb share a common struct skb_shared_info. This patch rewrites TCP gso_size/gso_segs handling to only happen on the tail skb, since skb_try_coalesce() made sure it was not cloned. Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Juerg Haefliger <juergh@canonical.com> Tested-by: Juerg Haefliger <juergh@canonical.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423 Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19taprio: boolean values to a bool variableJiapeng Zhong
Fix the following coccicheck warnings: ./net/sched/sch_taprio.c:393:3-16: WARNING: Assignment of 0/1 to bool variable. ./net/sched/sch_taprio.c:375:2-15: WARNING: Assignment of 0/1 to bool variable. ./net/sched/sch_taprio.c:244:4-19: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Link: https://lore.kernel.org/r/1610958662-71166-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: nfc: nci: fix the wrong NCI_CORE_INIT parametersBongsu Jeon
Fix the code because NCI_CORE_INIT_CMD includes two parameters in NCI2.0 but there is no parameters in NCI1.x. Fixes: bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence") Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Link: https://lore.kernel.org/r/20210118205522.317087-1-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabledTariq Toukan
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be logically done when RXCSUM offload is off. Fixes: 14136564c8ee ("net: Add TLS RX offload feature") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19net: add inline function skb_csum_is_sctpXin Long
This patch is to define a inline function skb_csum_is_sctp(), and also replace all places where it checks if it's a SCTP CSUM skb. This function would be used later in many networking drivers in the following patches. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19netfilter: rpfilter: mask ecn bits before fib lookupGuillaume Nault
RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt() treats Not-ECT or ECT(1) packets in a different way than those with ECT(0) or CE. Reproducer: Create two netns, connected with a veth: $ ip netns add ns0 $ ip netns add ns1 $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10/32 dev veth01 $ ip -netns ns1 address add 192.0.2.11/32 dev veth10 Add a route to ns1 in ns0: $ ip -netns ns0 route add 192.0.2.11/32 dev veth01 In ns1, only packets with TOS 4 can be routed to ns0: $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS is 4: $ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT ... 0% packet loss ... $ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1) ... 0% packet loss ... $ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0) ... 0% packet loss ... $ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE ... 0% packet loss ... Now use iptable's rpfilter module in ns1: $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP Not-ECT and ECT(1) packets still pass: $ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT ... 0% packet loss ... $ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1) ... 0% packet loss ... But ECT(0) and ECN packets are dropped: $ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0) ... 100% packet loss ... $ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE ... 100% packet loss ... After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore. Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19udp: mask TOS bits in udp_v4_early_demux()Guillaume Nault
udp_v4_early_demux() is the only function that calls ip_mc_validate_source() with a TOS that hasn't been masked with IPTOS_RT_MASK. This results in different behaviours for incoming multicast UDPv4 packets, depending on if ip_mc_validate_source() is called from the early-demux path (udp_v4_early_demux) or from the regular input path (ip_route_input_noref). ECN would normally not be used with UDP multicast packets, so the practical consequences should be limited on that side. However, IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align with the non-early-demux path behaviour. Reproducer: Setup two netns, connected with veth: $ ip netns add ns0 $ ip netns add ns1 $ ip -netns ns0 link set dev lo up $ ip -netns ns1 link set dev lo up $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01 $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10 In ns0, add route to multicast address 224.0.2.0/24 using source address 198.51.100.10: $ ip -netns ns0 address add 198.51.100.10/32 dev lo $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10 In ns1, define route to 198.51.100.10, only for packets with TOS 4: $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10 Also activate rp_filter in ns1, so that incoming packets not matching the above route get dropped: $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1 Now try to receive packets on 224.0.2.11: $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof - In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is, tos 6 for socat): $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test0" message is properly received by socat in ns1, because early-demux has no cached dst to use, so source address validation is done by ip_route_input_mc(), which receives a TOS that has the ECN bits masked. Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0): $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test1" message isn't received by socat in ns1, because, now, early-demux has a cached dst to use and calls ip_mc_validate_source() immediately, without masking the ECN bits. Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19xsk: Clear pool even for inactive queuesMaxim Mikityanskiy
The number of queues can change by other means, rather than ethtool. For example, attaching an mqprio qdisc with num_tc > 1 leads to creating multiple sets of TX queues, which may be then destroyed when mqprio is deleted. If an AF_XDP socket is created while mqprio is active, dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may decrease with deletion of mqprio, which will mean that the pool won't be NULLed, and a further increase of the number of TX queues may expose a dangling pointer. To avoid any potential misbehavior, this commit clears pool for RX and TX queues, regardless of real_num_*_queues, still taking into consideration num_*_queues to avoid overflows. Fixes: 1c1efc2af158 ("xsk: Create and free buffer pool independently from umem") Fixes: a41b4f3c58dd ("xsk: simplify xdp_clear_umem_at_qid implementation") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com
2021-01-19Merge tag 'nfsd-5.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results - Fix a tracepoint change that went in the initial 5.11 merge * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Move the svc_xdr_recvfrom tracepoint again nfsd4: readdirplus shouldn't return parent of export
2021-01-19net: core: devlink: use right genl user_ptr when handling port param get/setOleksandr Mazur
Fix incorrect user_ptr dereferencing when handling port param get/set: idx [0] stores the 'struct devlink' pointer; idx [1] stores the 'struct devlink_port' pointer; Fixes: 637989b5d77e ("devlink: Always use user_ptr[0] for devlink and simplify post_doit") CC: Parav Pandit <parav@mellanox.com> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Link: https://lore.kernel.org/r/20210119085333.16833-1-vadym.kochan@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Except bond interface from some TLS checksTariq Toukan
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond interfaces, they do not exist as the interaction is done directly with the lower device. Also, make the validate function pass when it's called with the upper bond interface. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Device offload to use lowest netdevice in chainTariq Toukan
Do not call the tls_dev_ops of upper devices. Instead, ask them for the proper lowest device and communicate with it directly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: netdevice: Add operation ndo_sk_get_lower_devTariq Toukan
ndo_sk_get_lower_dev returns the lower netdev that corresponds to a given socket. Additionally, we implement a helper netdev_sk_get_lowest_dev() to get the lowest one in chain. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net_sched: fix RTNL deadlock again caused by request_module()Cong Wang
tcf_action_init_1() loads tc action modules automatically with request_module() after parsing the tc action names, and it drops RTNL lock and re-holds it before and after request_module(). This causes a lot of troubles, as discovered by syzbot, because we can be in the middle of batch initializations when we create an array of tc actions. One of the problem is deadlock: CPU 0 CPU 1 rtnl_lock(); for (...) { tcf_action_init_1(); -> rtnl_unlock(); -> request_module(); rtnl_lock(); for (...) { tcf_action_init_1(); -> tcf_idr_check_alloc(); // Insert one action into idr, // but it is not committed until // tcf_idr_insert_many(), then drop // the RTNL lock in the _next_ // iteration -> rtnl_unlock(); -> rtnl_lock(); -> a_o->init(); -> tcf_idr_check_alloc(); // Now waiting for the same index // to be committed -> request_module(); -> rtnl_lock() // Now waiting for RTNL lock } rtnl_unlock(); } rtnl_unlock(); This is not easy to solve, we can move the request_module() before this loop and pre-load all the modules we need for this netlink message and then do the rest initializations. So the loop breaks down to two now: for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(name, tb[i]...); ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(ops[i - 1]...); } Although this looks serious, it only has been reported by syzbot, so it seems hard to trigger this by humans. And given the size of this patch, I'd suggest to make it to net-next and not to backport to stable. This patch has been tested by syzbot and tested with tdc.py by me. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18tcp: fix TCP_USER_TIMEOUT with zero windowEnke Chen
The TCP session does not terminate with TCP_USER_TIMEOUT when data remain untransmitted due to zero window. The number of unanswered zero-window probes (tcp_probes_out) is reset to zero with incoming acks irrespective of the window size, as described in tcp_probe_timer(): RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as long as the receiver continues to respond probes. We support this by default and reset icsk_probes_out with incoming ACKs. This counter, however, is the wrong one to be used in calculating the duration that the window remains closed and data remain untransmitted. Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the actual issue. In this patch a new timestamp is introduced for the socket in order to track the elapsed time for the zero-window probes that have not been answered with any non-zero window ack. Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") Reported-by: William McCall <william.mccall@gmail.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18ipv6: set multicast flag on the multicast routeMatteo Croce
The multicast route ff00::/8 is created with type RTN_UNICAST: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium Set the type to RTN_MULTICAST which is more appropriate. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>