From 33d915d9e8ce811d8958915ccd18d71a66c7c495 Mon Sep 17 00:00:00 2001 From: Manikanta Pubbisetty Date: Wed, 8 May 2019 14:55:33 +0530 Subject: {nl,mac}80211: allow 4addr AP operation on crypto controlled devices As per the current design, in the case of sw crypto controlled devices, it is the device which advertises the support for AP/VLAN iftype based on it's ability to tranmsit packets encrypted in software (In VLAN functionality, group traffic generated for a specific VLAN group is always encrypted in software). Commit db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") has introduced this change. Since 4addr AP operation also uses AP/VLAN iftype, this conditional way of advertising AP/VLAN support has broken 4addr AP mode operation on crypto controlled devices which do not support VLAN functionality. In the case of ath10k driver, not all firmwares have support for VLAN functionality but all can support 4addr AP operation. Because AP/VLAN support is not advertised for these devices, 4addr AP operations are also blocked. Fix this by allowing 4addr operation on devices which do not support AP/VLAN iftype but can support 4addr AP operation (decision is based on the wiphy flag WIPHY_FLAG_4ADDR_AP). Cc: stable@vger.kernel.org Fixes: db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") Signed-off-by: Manikanta Pubbisetty Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 87dae868707e..948139690a58 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -3839,7 +3839,8 @@ struct cfg80211_ops { * on wiphy_new(), but can be changed by the driver if it has a good * reason to override the default * @WIPHY_FLAG_4ADDR_AP: supports 4addr mode even on AP (with a single station - * on a VLAN interface) + * on a VLAN interface). This flag also serves an extra purpose of + * supporting 4ADDR AP mode on devices which do not support AP/VLAN iftype. * @WIPHY_FLAG_4ADDR_STATION: supports 4addr mode even as a station * @WIPHY_FLAG_CONTROL_PORT_PROTOCOL: This device supports setting the * control port protocol ethertype. The device also honours the -- cgit From f0d2ca1531377e7da888913e277eefac05a59b6f Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Wed, 12 Jun 2019 17:18:38 +0200 Subject: net: ethtool: Allow matching on vlan DEI bit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Using ethtool, users can specify a classification action matching on the full vlan tag, which includes the DEI bit (also previously called CFI). However, when converting the ethool_flow_spec to a flow_rule, we use dissector keys to represent the matching patterns. Since the vlan dissector key doesn't include the DEI bit, this information was silently discarded when translating the ethtool flow spec in to a flow_rule. This commit adds the DEI bit into the vlan dissector key, and allows propagating the information to the driver when parsing the ethtool flow spec. Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Reported-by: Michał Mirosław Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller --- include/net/flow_dissector.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/net') diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 7c5a8d9a8d2a..dfabc0503446 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -46,6 +46,7 @@ struct flow_dissector_key_tags { struct flow_dissector_key_vlan { u16 vlan_id:12, + vlan_dei:1, vlan_priority:3; __be16 vlan_tpid; }; -- cgit From e1ae5c2ea4783b1fd87be250f9fcc9d9e1a6ba3f Mon Sep 17 00:00:00 2001 From: Stephen Suryaputra Date: Mon, 10 Jun 2019 10:32:50 -0400 Subject: vrf: Increment Icmp6InMsgs on the original netdev Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/addrconf.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/net') diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 2f67ae854ff0..becdad576859 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -309,6 +309,22 @@ static inline struct inet6_dev *__in6_dev_get(const struct net_device *dev) return rcu_dereference_rtnl(dev->ip6_ptr); } +/** + * __in6_dev_stats_get - get inet6_dev pointer for stats + * @dev: network device + * @skb: skb for original incoming interface if neeeded + * + * Caller must hold rcu_read_lock or RTNL, because this function + * does not take a reference on the inet6_dev. + */ +static inline struct inet6_dev *__in6_dev_stats_get(const struct net_device *dev, + const struct sk_buff *skb) +{ + if (netif_is_l3_master(dev)) + dev = dev_get_by_index_rcu(dev_net(dev), inet6_iif(skb)); + return __in6_dev_get(dev); +} + /** * __in6_dev_get_safely - get inet6_dev pointer from netdevice * @dev: network device -- cgit From ede61ca474a0348b975d9824565b66c7595461de Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:19 -0700 Subject: tcp: add tcp_rx_skb_cache sysctl Instead of relying on rps_needed, it is safer to use a separate static key, since we do not want to enable TCP rx_skb_cache by default. This feature can cause huge increase of memory usage on hosts with millions of sockets. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/sock.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index e9d769c04637..b02645e2dfad 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2433,13 +2433,11 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags) * This routine must be called with interrupts disabled or with the socket * locked so that the sk_buff queue operation is ok. */ +DECLARE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb) { __skb_unlink(skb, &sk->sk_receive_queue); - if ( -#ifdef CONFIG_RPS - !static_branch_unlikely(&rps_needed) && -#endif + if (static_branch_unlikely(&tcp_rx_skb_cache_key) && !sk->sk_rx_skb_cache) { sk->sk_rx_skb_cache = skb; skb_orphan(skb); -- cgit From 0b7d7f6b22084a3156f267c85303908a8f4c9a08 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:20 -0700 Subject: tcp: add tcp_tx_skb_cache sysctl Feng Tang reported a performance regression after introduction of per TCP socket tx/rx caches, for TCP over loopback (netperf) There is high chance the regression is caused by a change on how well the 32 KB per-thread page (current->task_frag) can be recycled, and lack of pcp caches for order-3 pages. I could not reproduce the regression myself, cpus all being spinning on the mm spinlocks for page allocs/freeing, regardless of enabling or disabling the per tcp socket caches. It seems best to disable the feature by default, and let admins enabling it. MM layer either needs to provide scalable order-3 pages allocations, or could attempt a trylock on zone->lock if the caller only attempts to get a high-order page and is able to fallback to order-0 ones in case of pressure. Tests run on a 56 cores host (112 hyper threads) - 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath - 35.49% queued_spin_lock_slowpath - 18.18% get_page_from_freelist - __alloc_pages_nodemask - 18.18% alloc_pages_current skb_page_frag_refill sk_page_frag_refill tcp_sendmsg_locked tcp_sendmsg inet_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto do_syscall_64 entry_SYSCALL_64_after_hwframe __libc_send + 17.31% __free_pages_ok + 31.43% swapper [kernel.vmlinux] [k] intel_idle + 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 0.68% netperf [kernel.vmlinux] [k] skb_release_data + 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked 0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet Reported-by: Feng Tang Signed-off-by: David S. Miller --- include/net/sock.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index b02645e2dfad..7d7f4ce63bb2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1463,12 +1463,14 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) __sk_mem_reclaim(sk, 1 << 20); } +DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { sock_set_flag(sk, SOCK_QUEUE_SHRUNK); sk->sk_wmem_queued -= skb->truesize; sk_mem_uncharge(sk, skb->truesize); - if (!sk->sk_tx_skb_cache && !skb_cloned(skb)) { + if (static_branch_unlikely(&tcp_tx_skb_cache_key) && + !sk->sk_tx_skb_cache && !skb_cloned(skb)) { skb_zcopy_clear(skb, true); sk->sk_tx_skb_cache = skb; return; -- cgit From ce27ec60648d8e066227cb2f58b1d3d4f7253d08 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:21 -0700 Subject: net: add high_order_alloc_disable sysctl/static key >From linux-3.7, (commit 5640f7685831 "net: use a per task frag allocator") TCP sendmsg() has preferred using order-3 allocations. While it gives good results for most cases, we had reports that heavy uses of TCP over loopback were hitting a spinlock contention in page allocations/freeing. This commits adds a sysctl so that admins can opt-in for order-0 allocations. Hopefully mm layer might optimize order-3 allocations in the future since it could give us a nice boost (see 8 lines of following benchmark) The following benchmark shows a win when more than 8 TCP_STREAM threads are running (56 x86 cores server in my tests) for thr in {1..30} do sysctl -wq net.core.high_order_alloc_disable=0 T0=`./super_netperf $thr -H 127.0.0.1 -l 15` sysctl -wq net.core.high_order_alloc_disable=1 T1=`./super_netperf $thr -H 127.0.0.1 -l 15` echo $thr:$T0:$T1 done 1: 49979: 37267 2: 98745: 76286 3: 141088: 110051 4: 177414: 144772 5: 197587: 173563 6: 215377: 208448 7: 241061: 234087 8: 267155: 263373 9: 295069: 297402 10: 312393: 335213 11: 340462: 368778 12: 371366: 403954 13: 412344: 443713 14: 426617: 473580 15: 474418: 507861 16: 503261: 538539 17: 522331: 563096 18: 532409: 567084 19: 550824: 605240 20: 525493: 641988 21: 564574: 665843 22: 567349: 690868 23: 583846: 710917 24: 588715: 736306 25: 603212: 763494 26: 604083: 792654 27: 602241: 796450 28: 604291: 797993 29: 611610: 833249 30: 577356: 841062 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index 7d7f4ce63bb2..6cbc16136357 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2534,6 +2534,8 @@ extern int sysctl_optmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; +DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); + static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ -- cgit From 3b4929f65b0d8249f19a50245cd88ed1a2f78cff Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 17 May 2019 17:17:22 -0700 Subject: tcp: limit payload size of sacked skbs Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet Reported-by: Jonathan Looney Acked-by: Neal Cardwell Reviewed-by: Tyler Hicks Cc: Yuchung Cheng Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller --- include/net/tcp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/tcp.h b/include/net/tcp.h index ac2f53fbfa6b..582c0caa9811 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -51,6 +51,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_HEADER (128 + MAX_HEADER) #define MAX_TCP_OPTION_SPACE 40 +#define TCP_MIN_SND_MSS 48 +#define TCP_MIN_GSO_SIZE (TCP_MIN_SND_MSS - MAX_TCP_OPTION_SPACE) /* * Never offer a window over 32767 without using window scaling. Some -- cgit From 5f3e2bf008c2221478101ee72f5cb4654b9fc363 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 6 Jun 2019 09:15:31 -0700 Subject: tcp: add tcp_min_snd_mss sysctl Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet Suggested-by: Jonathan Looney Acked-by: Neal Cardwell Cc: Yuchung Cheng Cc: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller --- include/net/netns/ipv4.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/net') diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 7698460a3dd1..623cfbb7b8dc 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -117,6 +117,7 @@ struct netns_ipv4 { #endif int sysctl_tcp_mtu_probing; int sysctl_tcp_base_mss; + int sysctl_tcp_min_snd_mss; int sysctl_tcp_probe_threshold; u32 sysctl_tcp_probe_interval; -- cgit