summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-04-01sock_map: Simplify sock_map_link() a bitCong Wang
sock_map_link() passes down map progs, but it is confusing to see both map progs and psock progs. Make the map progs more obvious by retrieving it directly with sock_map_progs() inside sock_map_link(). Now it is aligned with sock_map_link_no_progs() too. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-8-xiyou.wangcong@gmail.com
2021-04-01skmsg: Use GFP_KERNEL in sk_psock_create_ingress_msg()Cong Wang
This function is only called in process context. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-7-xiyou.wangcong@gmail.com
2021-04-01skmsg: Use rcu work for destroying psockCong Wang
The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-6-xiyou.wangcong@gmail.com
2021-04-01skmsg: Avoid lock_sock() in sk_psock_backlog()Cong Wang
We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. As noticed by John, we still have to lock the psock->work, because the same work item could be running concurrently on different CPU's. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-5-xiyou.wangcong@gmail.com
2021-04-01net: Introduce skb_send_sock() for sock_mapCong Wang
We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com
2021-04-01skmsg: Introduce a spinlock to protect ingress_msgCong Wang
Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-3-xiyou.wangcong@gmail.com
2021-04-01skmsg: Lock ingress_skb when purgingCong Wang
Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-2-xiyou.wangcong@gmail.com
2021-03-31xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory modelOng Boon Leong
xdp_return_frame() may be called outside of NAPI context to return xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with napi_direct = false. For page_pool memory model, __xdp_return() calls xdp_return_frame_no_direct() unconditionally and below false negative kernel BUG throw happened under preempt-rt build: [ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884 [ 430.451678] caller is __xdp_return+0x1ff/0x2e0 [ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45 Changes in v2: - This patch fixes the issue by making xdp_return_frame_no_direct() is only called if napi_direct = true, as recommended for better by Jesper Dangaard Brouer. Thanks! Fixes: 2539650fadbf ("xdp: Helpers for disabling napi_direct of xdp_return_frame") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv6: remove extra dev_hold() for fallback tunnelsEric Dumazet
My previous commits added a dev_hold() in tunnels ndo_init(), but forgot to remove it from special functions setting up fallback tunnels. Fallback tunnels do call their respective ndo_init() This leads to various reports like : unregister_netdevice: waiting for ip6gre0 to become free. Usage count = 2 Fixes: 48bb5697269a ("ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31net/tipc: fix missing destroy_workqueue() on error in tipc_crypto_start()Yang Yingliang
Add the missing destroy_workqueue() before return from tipc_crypto_start() in the error handling case. Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv6: convert elligible sysctls to u8Eric Dumazet
Convert most sysctls that can fit in a byte. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31tcp: convert tcp_comp_sack_nr sysctl to u8Eric Dumazet
tcp_comp_sack_nr max value was already 255. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv4: convert igmp_link_local_mcast_reports sysctl to u8Eric Dumazet
This sysctl is a bool, can use less storage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv4: convert fib_multipath_{use_neigh|hash_policy} sysctls to u8Eric Dumazet
Make room for better packing of netns_ipv4 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv4: convert udp_l3mdev_accept sysctl to u8Eric Dumazet
Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ipv4: convert fib_notify_on_flag_change sysctl to u8Eric Dumazet
Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-03-31 1) Fix ipv4 pmtu checks for xfrm anf vti interfaces. From Eyal Birger. 2) There are situations where the socket passed to xfrm_output_resume() is not the same as the one attached to the skb. Use the socket passed to xfrm_output_resume() to avoid lookup failures when xfrm is used with VRFs. From Evan Nimmo. 3) Make the xfrm_state_hash_generation sequence counter per network namespace because but its write serialization lock is also per network namespace. Write protection is insufficient otherwise. From Ahmed S. Darwish. 4) Fixup sctp featue flags when used with esp offload. From Xin Long. 5) xfrm BEET mode doesn't support fragments for inner packets. This is a limitation of the protocol, so no fix possible. Warn at least to notify the user about that situation. From Xin Long. 6) Fix NULL pointer dereference on policy lookup when namespaces are uses in combination with esp offload. 7) Fix incorrect transformation on esp offload when packets get segmented at layer 3. 8) Fix some user triggered usages of WARN_ONCE in the xfrm compat layer. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31qrtr: Convert qrtr_ports from IDR to XArrayMatthew Wilcox (Oracle)
The XArray interface is easier for this driver to use. Also fixes a bug reported by the improper use of GFP_ATOMIC. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31net/rds: Fix a use after free in rds_message_map_pagesLv Yunlong
In rds_message_map_pages, the rm is freed by rds_message_put(rm). But rm is still used by rm->data.op_sg in return value. My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is freed to avoid the uaf. Fixes: 7dba92037baf3 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
Same reasons than for the previous commits : 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods") 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods") 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods") After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. [1] WARNING: CPU: 1 PID: 21059 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 21059 Comm: syz-executor.4 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900025aefe8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520004b5def RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888023488568 R13: ffff8880254e9000 R14: 00000000dfd82cfd R15: ffff88802ee2d7c0 FS: 00007f13bc590700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0943e74000 CR3: 0000000025273000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6_tnl_dev_uninit+0x370/0x3d0 net/ipv6/ip6_tunnel.c:387 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_tunnel.c:263 ip6_tnl_newlink+0x312/0x580 net/ipv6/ip6_tunnel.c:2052 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31ethtool: support FEC settings over netlinkJakub Kicinski
Add FEC API to netlink. This is not a 1-to-1 conversion. FEC settings already depend on link modes to tell user which modes are supported. Take this further an use link modes for manual configuration. Old struct ethtool_fecparam is still used to talk to the drivers, so we need to translate back and forth. We can revisit the internal API if number of FEC encodings starts to grow. Enforce only one active FEC bit (by using a bit position rather than another mask). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31neighbour: Disregard DEAD dst in neigh_updateTong Zhu
After a short network outage, the dst_entry is timed out and put in DST_OBSOLETE_DEAD. We are in this code because arp reply comes from this neighbour after network recovers. There is a potential race condition that dst_entry is still in DST_OBSOLETE_DEAD. With that, another neighbour lookup causes more harm than good. In best case all packets in arp_queue are lost. This is counterproductive to the original goal of finding a better path for those packets. I observed a worst case with 4.x kernel where a dst_entry in DST_OBSOLETE_DEAD state is associated with loopback net_device. It leads to an ethernet header with all zero addresses. A packet with all zero source MAC address is quite deadly with mac80211, ath9k and 802.11 block ack. It fails ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx queue (ath_tx_complete_aggr). BAW (block ack window) is not updated. BAW logic is damaged and ath9k transmission is disabled. Signed-off-by: Tong Zhu <zhutong@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31netfilter: add helper function to set up the nfnetlink header and use itPablo Neira Ayuso
This patch adds a helper function to set up the netlink and nfnetlink headers. Update existing codebase to use it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nftables: add helper function to set the base sequence numberPablo Neira Ayuso
This patch adds a helper function to calculate the base sequence number field that is stored in the nfnetlink header. Use the helper function whenever possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nftables: remove unnecessary spin_lock_init()Yang Yingliang
The spinlock nf_tables_destroy_list_lock is initialized statically. It is unnecessary to initialize by spin_lock_init(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: flowtable: dst_check() from garbage collector pathPablo Neira Ayuso
Move dst_check() to the garbage collector path. Stale routes trigger the flow entry teardown state which makes affected flows go back to the classic forwarding path to re-evaluate flow offloading. IPv6 requires the dst cookie to work, store it in the flow_tuple, otherwise dst_check() always fails. Fixes: e5075c0badaa ("netfilter: flowtable: call dst_check() to fall back to classic forwarding") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31audit: log nftables configuration change events once per tableRichard Guy Briggs
Reduce logging of nftables events to a level similar to iptables. Restore the table field to list the table, adding the generation. Indicate the op as the most significant operation in the event. A couple of sample events: type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld The issue was originally documented in https://github.com/linux-audit/audit-kernel/issues/124 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nft_log: perform module load from nf_tablesFlorian Westphal
modprobe calls from the nf_logger_find_get() API causes deadlock in very special cases because they occur with the nf_tables transaction mutex held. In the specific case of nf_log, deadlock is via: A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \ -> pernet_ops rwsem -> wait for C B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load the backend module during a transaction. For nft_log we would have to add a softdep for both nfnetlink_log or nf_log_syslog, since we do not know in advance which of the two backends are going to be configured. This defers the modprobe op until after the transaction mutex is released. Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log: add module softdepsFlorian Westphal
xt_LOG has no direct dependency on the syslog-based logger, it relies on the nf_log core to probe the requested backend. Now that all syslog-based loggers reside in the same module, we can just add a soft dependency on nf_log_syslog and let modprobe take care of it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log_common: merge with nf_log_syslogFlorian Westphal
Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log_bridge: merge with nf_log_syslogFlorian Westphal
Provide bridge log support from nf_log_syslog. After the merge there is no need to load the "real packet loggers", all of them now reside in the same module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()Chuck Lever
This, to me, seems less cluttered and less redundant. I was hoping it could help reduce lock contention on the dto_q lock by reducing the size of the critical section, but alas, the only improvement is readability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_argChuck Lever
These fields are no longer used. The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on x86_64, down from 2440 bytes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31svcrdma: Remove sc_read_complete_qChuck Lever
Now that svc_rdma_recvfrom() waits for Read completion, sc_read_complete_q is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31svcrdma: Single-stage RDMA ReadChuck Lever
Currently the generic RPC server layer calls svc_rdma_recvfrom() twice to retrieve an RPC message that uses Read chunks. I'm not exactly sure why this design was chosen originally. Instead, let's wait for the Read chunk completion inline in the first call to svc_rdma_recvfrom(). The goal is to eliminate some page allocator churn. rdma_read_complete() replaces pages in the second svc_rqst by calling put_page() repeatedly while the upper layer waits for the request to be constructed, which adds unnecessary NFS WRITE round- trip latency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com>
2021-03-30mptcp: remove id 0 addressGeliang Tang
This patch added a new function mptcp_nl_remove_id_zero_address to remove the id 0 address. In this function, traverse all the existing msk sockets to find the msk matched the input IP address. Then fill the removing list with id 0, and pass it to mptcp_pm_remove_addr and mptcp_pm_remove_subflow. Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30mptcp: unify RM_ADDR and RM_SUBFLOW receivingGeliang Tang
There are some duplicate code in mptcp_pm_nl_rm_addr_received and mptcp_pm_nl_rm_subflow_received. This patch unifies them into a new function named mptcp_pm_nl_rm_addr_or_subflow. In it, use the input parameter rm_type to identify it's now removing an address or a subflow. Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30mptcp: remove all subflows involving id 0 addressGeliang Tang
There's only one subflow involving the non-zero id address, but there may be multi subflows involving the id 0 address. Here's an example: local_id=0, remote_id=0 local_id=1, remote_id=0 local_id=0, remote_id=1 If the removing address id is 0, all the subflows involving the id 0 address need to be removed. In mptcp_pm_nl_rm_addr_received/mptcp_pm_nl_rm_subflow_received, the "break" prevents the iteration to the next subflow, so this patch dropped them. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net: fix icmp_echo_enable_probe sysctlEric Dumazet
sysctl_icmp_echo_enable_probe is an u8. ipv4_net_table entry should use .maxlen = sizeof(u8). .proc_handler = proc_dou8vec_minmax, Fixes: f1b8fa9fa586 ("net: add sysctl for enabling RFC 8335 PROBE messages") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30udp: never accept GSO_FRAGLIST packetsPaolo Abeni
Currently the UDP protocol delivers GSO_FRAGLIST packets to the sockets without the expected segmentation. This change addresses the issue introducing and maintaining a couple of new fields to explicitly accept SKB_GSO_UDP_L4 or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso() accordingly. UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist zeroed. v1 -> v2: - use 2 bits instead of a whole GSO bitmask (Willem) Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30udp: properly complete L4 GRO over UDP tunnel packetPaolo Abeni
After the previous patch, the stack can do L4 UDP aggregation on top of a UDP tunnel. In such scenario, udp{4,6}_gro_complete will be called twice. This function will enter its is_flist branch immediately, even though that is only correct on the second call, as GSO_FRAGLIST is only relevant for the inner packet. Instead, we need to try first UDP tunnel-based aggregation, if the GRO packet requires that. This patch changes udp{4,6}_gro_complete to skip the frag list processing when while encap_mark == 1, identifying processing of the outer tunnel header. Additionally, clears the field in udp_gro_complete() so that we can enter the frag list path on the next round, for the inner header. v1 -> v2: - hopefully clarified the commit message Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30udp: skip L4 aggregation for UDP tunnel packetsPaolo Abeni
If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there are UDP tunnels available in the system, udp_gro_receive() could end-up doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at the outer UDP tunnel level for packets effectively carrying and UDP tunnel header. That could cause inner protocol corruption. If e.g. the relevant packets carry a vxlan header, different vxlan ids will be ignored/ aggregated to the same GSO packet. Inner headers will be ignored, too, so that e.g. TCP over vxlan push packets will be held in the GRO engine till the next flush, etc. Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the current packet could land in a UDP tunnel, and let udp_gro_receive() do GRO via udp_sk(sk)->gro_receive. The check implemented in this patch is broader than what is strictly needed, as the existing UDP tunnel could be e.g. configured on top of a different device: we could end-up skipping GRO at-all for some packets. Anyhow, that is a very thin corner case and covering it will add quite a bit of complexity. v1 -> v2: - hopefully clarify the commit message Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30udp: fixup csum for GSO receive slow pathPaolo Abeni
When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path: UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx) -> UDP socket (no UDP_GRO) ip_summed will be set to CHECKSUM_PARTIAL at creation time and such checksum mode will be preserved in the above path up to the UDP tunnel receive code where we have: __iptunnel_pull_header() -> skb_pull_rcsum() -> skb_postpull_rcsum() -> __skb_postpull_rcsum() The latter will convert the skb to CHECKSUM_NONE. The UDP GSO packet will be later segmented as part of the rx socket receive operation, and will present a CHECKSUM_NONE after segmentation. Additionally the segmented packets UDP CB still refers to the original GSO packet len. Overall that causes unexpected/wrong csum validation errors later in the UDP receive path. We could possibly address the issue with some additional checks and csum mangling in the UDP tunnel code. Since the issue affects only this UDP receive slow path, let's set a suitable csum status there. Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP encapsulation present a valid checksum when landing to udp_queue_rcv_skb(), as the UDP checksum has been validated by the GRO engine. v2 -> v3: - even more verbose commit message and comments v1 -> v2: - restrict the csum update to the packets strictly needing them - hopefully clarify the commit message and code comments Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net/decnet: Delete obsolete TODO fileWang Qing
The TODO file here has not been updated from 2005, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30net/ax25: Delete obsolete TODO fileWang Qing
The TODO file here has not been updated for 13 years, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31netfilter: conntrack: do not print icmpv6 as unknown via /procPablo Neira Ayuso
/proc/net/nf_conntrack shows icmpv6 as unknown. Fixes: 09ec82f5af99 ("netfilter: conntrack: remove protocol name from l4proto struct") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: flowtable: fix NAT IPv6 offload manglingPablo Neira Ayuso
Fix out-of-bound access in the address array. Fixes: 5c27d8d76ce8 ("netfilter: nf_flow_table_offload: add IPv6 support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log_netdev: merge with nf_log_syslogFlorian Westphal
Provide netdev family support from the nf_log_syslog module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log_ipv6: merge with nf_log_syslogFlorian Westphal
This removes the nf_log_ipv6 module, the functionality is now provided by nf_log_syslog. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: nf_log_arp: merge with nf_log_syslogFlorian Westphal
similar to previous change: nf_log_syslog now covers ARP logging as well, the old nf_log_arp module is removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>