summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-08-31net: ipv6: remove unused arg exact_dif in compute_scoreMiaohe Lin
The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match() is no longer needed after the above is removed, remove it too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_NACK_INTVYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_FWD_MSGYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31mptcp: Remove unused macro MPTCP_SAME_STATEYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: clean up codestyleMiaohe Lin
This is a pure codestyle cleanup patch. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: Use helper macro IP_MAX_MTU in __ip_append_data()Miaohe Lin
What 0xFFFF means here is actually the max mtu of a ip packet. Use help macro IP_MAX_MTU here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31openvswitch: using ip6_fragment in ipv6_stubwenxu
Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31ipv6: add ipv6_fragment hook in ipv6_stubwenxu
Add ipv6_fragment to ipv6_stub to avoid calling netfilter when access ip6_fragment. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31xsk: Add shared umem support between devicesMagnus Karlsson
Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-13-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Add shared umem support between queue idsMagnus Karlsson
Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Enable sharing of dma mappingsMagnus Karlsson
Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move addrs from buffer pool to umemMagnus Karlsson
Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move xsk_tx_list and its lock to buffer poolMagnus Karlsson
Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move queue_id, dev and need_wakeup to buffer poolMagnus Karlsson
Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move fill and completion rings to buffer poolMagnus Karlsson
Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Create and free buffer pool independently from umemMagnus Karlsson
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfacesMagnus Karlsson
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umemMagnus Karlsson
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31netlink: policy: correct validation type checkJohannes Berg
In the policy export for binary attributes I erroneously used a != NLA_VALIDATE_NONE comparison instead of checking for the two possible values, which meant that if a validation function pointer ended up aliasing the min/max as negatives, we'd hit a warning in nla_get_range_unsigned(). Fix this to correctly check for only the two types that should be handled here, i.e. range with or without warn-too-long. Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com Fixes: 8aa26c575fb3 ("netlink: make NLA_BINARY validation more flexible") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not delete clash entries on reply, let them expire instead, from Florian Westphal. 2) Do not report EAGAIN to nfnetlink, otherwise this enters a busy loop. Update nfnetlink_unicast() to translate EAGAIN to ENOBUFS. 3) Remove repeated words in code comments, from Randy Dunlap. 4) Several patches for the flowtable selftests, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-30tipc: fix using smp_processor_id() in preemptibleTuong Lien
The 'this_cpu_ptr()' is used to obtain the AEAD key' TFM on the current CPU for encryption, however the execution can be preemptible since it's actually user-space context, so the 'using smp_processor_id() in preemptible' has been observed. We fix the issue by using the 'get/put_cpu_ptr()' API which consists of a 'preempt_disable()' instead. Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-29netfilter: nft_socket: add wildcard supportBalazs Scheidler
Add NFT_SOCKET_WILDCARD to match to wildcard socket listener. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-29netfilter: conntrack: do not auto-delete clash entries on replyFlorian Westphal
Its possible that we have more than one packet with the same ct tuple simultaneously, e.g. when an application emits n packets on same UDP socket from multiple threads. NAT rules might be applied to those packets. With the right set of rules, n packets will be mapped to m destinations, where at least two packets end up with the same destination. When this happens, the existing clash resolution may merge the skb that is processed after the first has been received with the identical tuple already in hash table. However, its possible that this identical tuple is a NAT_CLASH tuple. In that case the second skb will be sent, but no reply can be received since the reply that is processed first removes the NAT_CLASH tuple. Do not auto-delete, this gives a 1 second window for replies to be passed back to originator. Packets that are coming later (udp stream case) will not be affected: they match the original ct entry, not a NAT_CLASH one. Also prevent NAT_CLASH entries from getting offloaded. Fixes: 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFSPablo Neira Ayuso
Frontend callback reports EAGAIN to nfnetlink to retry a command, this is used to signal that module autoloading is required. Unfortunately, nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets full, so it enters a busy-loop. This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast() since this is always MSG_DONTWAIT in the existing code which is exactly what nlmsg_unicast() passes to netlink_unicast() as parameter. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: delete repeated wordsRandy Dunlap
Drop duplicated words in net/netfilter/ and net/ipv4/netfilter/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: xt_HMARK: Use ip_is_fragment() helperYueHaibing
Use ip_is_fragment() to simpify code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: remove unneeded nf_ct_putFlorian Westphal
We can delay refcount increment until we reassign the existing entry to the current skb. A 0 refcount can't happen while the nf_conn object is still in the hash table and parallel mutations are impossible because we hold the bucket lock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: add clash resolution stat counterFlorian Westphal
There is a misconception about what "insert_failed" means. We increment this even when a clash got resolved, so it might not indicate a problem. Add a dedicated counter for clash resolution and only increment insert_failed if a clash cannot be resolved. For the old /proc interface, export this in place of an older stat that got removed a while back. For ctnetlink, export this with a new attribute. Also correct an outdated comment that implies we add a duplicate tuple -- we only add the (unique) reply direction. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: remove ignore statsFlorian Westphal
This counter increments when nf_conntrack_in sees a packet that already has a conntrack attached or when the packet is marked as UNTRACKED. Neither is an error. The former is normal for loopback traffic. The second happens for certain ICMPv6 packets or when nftables/ip(6)tables rules are in place. In case someone needs to count UNTRACKED packets, or packets that are marked as untracked before conntrack_in this can be done with both nftables and ip(6)tables rules. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: do not increment two error counters at same timeFlorian Westphal
The /proc interface for nf_conntrack displays the "error" counter as "icmp_error". It makes sense to not increment "invalid" when failing to handle an icmp packet since those are special. For example, its possible for conntrack to see partial and/or fragmented packets inside icmp errors. This should be a separate event and not get mixed with the "invalid" counter. Likewise, remove the "error" increment for errors from get_l4proto(). After this, the error counter will only increment for errors coming from icmp(v6) packet handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: nf_tables: add userdata attributes to nft_tableJose M. Guisado Gomez
Enables storing userdata for nft_table. Field udata points to user data and udlen store its length. Adds new attribute flag NFTA_TABLE_USERDATA Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28ipvs: Fix uninit-value in do_ip_vs_set_ctl()Peilin Ye
do_ip_vs_set_ctl() is referencing uninitialized stack value when `len` is zero. Fix it. Reported-by: syzbot+23b5f9e7caf61d9a3898@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=46ebfb92a8a812621a001ef04d90dfa459520fe2 Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: ip6t_NPT: rewrite addresses in ICMPv6 original packetMichael Zhou
Detect and rewrite a prefix embedded in an ICMPv6 original packet that was rewritten by a corresponding DNPT/SNPT rule so it will be recognised by the host that sent the original packet. Example Rules in effect on the 1:2:3:4::/64 + 5:6:7:8::/64 side router: * SNPT src-pfx 1:2:3:4::/64 dst-pfx 5:6:7:8::/64 * DNPT src-pfx 5:6:7:8::/64 dst-pfx 1:2:3:4::/64 No rules on the 9:a:b:c::/64 side. 1. 1:2:3:4::1 sends UDP packet to 9:a:b:c::1 2. Router applies SNPT changing src to 5:6:7:8::ffef::1 3. 9:a:b:c::1 receives packet with (src 5:6:7:8::ffef::1 dst 9:a:b:c::1) and replies with ICMPv6 port unreachable to 5:6:7:8::ffef::1, including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1) 4. Router forwards ICMPv6 packet with (src 9:a:b:c::1 dst 5:6:7:8::ffef::1) including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1) and applies DNPT changing dst to 1:2:3:4::1 5. 1:2:3:4::1 receives ICMPv6 packet with (src 9:a:b:c::1 dst 1:2:3:4::1) including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1). It doesn't recognise the original packet as the src doesn't match anything it originally sent With this change, at step 4, DNPT will also rewrite the original packet src to 1:2:3:4::1, so at step 5, 1:2:3:4::1 will recognise the ICMPv6 error and provide feedback to the application properly. Conversely, SNPT will help when ICMPv6 errors are sent from the translated network. 1. 9:a:b:c::1 sends UDP packet to 5:6:7:8::ffef::1 2. Router applies DNPT changing dst to 1:2:3:4::1 3. 1:2:3:4::1 receives packet with (src 9:a:b:c::1 dst 1:2:3:4::1) and replies with ICMPv6 port unreachable to 9:a:b:c::1 including original packet (src 9:a:b:c::1 dst 1:2:3:4::1) 4. Router forwards ICMPv6 packet with (src 1:2:3:4::1 dst 9:a:b:c::1) including original packet (src 9:a:b:c::1 dst 1:2:3:4::1) and applies SNPT changing src to 5:6:7:8::ffef::1 5. 9:a:b:c::1 receives ICMPv6 packet with (src 5:6:7:8::ffef::1 dst 9:a:b:c::1) including original packet (src 9:a:b:c::1 dst 1:2:3:4::1). It doesn't recognise the original packet as the dst doesn't match anything it already sent The change to SNPT means the ICMPv6 original packet dst will be rewritten to 5:6:7:8::ffef::1 in step 4, allowing the error to be properly recognised in step 5. Signed-off-by: Michael Zhou <mzhou@cse.unsw.edu.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netlabel: remove unused param from audit_log_format()Alex Dewar
Commit d3b990b7f327 ("netlabel: fix problems with mapping removal") added a check to return an error if ret_val != 0, before ret_val is later used in a log message. Now it will unconditionally print "... res=1". So just drop the check. Addresses-Coverity: ("Dead code") Fixes: d3b990b7f327 ("netlabel: fix problems with mapping removal") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28net_sched: fix error path in red_init()Cong Wang
When ->init() fails, ->destroy() is called to clean up. So it is unnecessary to clean up in red_init(), and it would cause some refcount underflow. Fixes: aee9caa03fc3 ("net: sched: sch_red: Add qevents "early_drop" and "mark"") Reported-and-tested-by: syzbot+b33c1cb0a30ebdc8a5f9@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+e5ea5f8a3ecfd4427a1c@syzkaller.appspotmail.com Cc: Petr Machata <petrm@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28net: add option to not create fall-back tunnels in root-ns as wellMahesh Bandewar
The sysctl that was added earlier by commit 79134e6ce2c ("net: do not create fallback tunnels for non-default namespaces") to create fall-back only in root-ns. This patch enhances that behavior to provide option not to create fallback tunnels in root-ns as well. Since modules that create fallback tunnels could be built-in and setting the sysctl value after booting is pointless, so added a kernel cmdline options to change this default. The default setting is preseved for backward compatibility. The kernel command line option of fb_tunnels=initns will set the sysctl value to 1 and will create fallback tunnels only in initns while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and fallback tunnels are skipped in every netns. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Zenczykowski <maze@google.com> Cc: Jian Yang <jianyang@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28netlink: fix a data race in netlink_rcv_wake()zhudi
The data races were reported by KCSAN: BUG: KCSAN: data-race in netlink_recvmsg / skb_queue_tail write (marked) to 0xffff8c0986e5a8c8 of 8 bytes by interrupt on cpu 3: skb_queue_tail+0xcc/0x120 __netlink_sendskb+0x55/0x80 netlink_broadcast_filtered+0x465/0x7e0 nlmsg_notify+0x8f/0x120 rtnl_notify+0x8e/0xb0 __neigh_notify+0xf2/0x120 neigh_update+0x927/0xde0 arp_process+0x8a3/0xf50 arp_rcv+0x27c/0x3b0 __netif_receive_skb_core+0x181c/0x1840 __netif_receive_skb+0x38/0xf0 netif_receive_skb_internal+0x77/0x1c0 napi_gro_receive+0x1bd/0x1f0 e1000_clean_rx_irq+0x538/0xb20 [e1000] e1000_clean+0x5e4/0x1340 [e1000] net_rx_action+0x310/0x9d0 __do_softirq+0xe8/0x308 irq_exit+0x109/0x110 do_IRQ+0x7f/0xe0 ret_from_intr+0x0/0x1d 0xffffffffffffffff read to 0xffff8c0986e5a8c8 of 8 bytes by task 1463 on cpu 0: netlink_recvmsg+0x40b/0x820 sock_recvmsg+0xc9/0xd0 ___sys_recvmsg+0x1a4/0x3b0 __sys_recvmsg+0x86/0x120 __x64_sys_recvmsg+0x52/0x70 do_syscall_64+0xb5/0x360 entry_SYSCALL_64_after_hwframe+0x65/0xca 0xffffffffffffffff Since the write is under sk_receive_queue->lock but the read is done as lockless. so fix it by using skb_queue_empty_lockless() instead of skb_queue_empty() for the read in netlink_rcv_wake() Signed-off-by: zhudi <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28bpf: Relax max_entries check for most of the inner map typesMartin KaFai Lau
Most of the maps do not use max_entries during verification time. Thus, those map_meta_equal() do not need to enforce max_entries when it is inserted as an inner map during runtime. The max_entries check is removed from the default implementation bpf_map_meta_equal(). The prog_array_map and xsk_map are exception. Its map_gen_lookup uses max_entries to generate inline lookup code. Thus, they will implement its own map_meta_equal() to enforce max_entries. Since there are only two cases now, the max_entries check is not refactored and stays in its own .c file. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com
2020-08-28bpf: Add map_meta_equal map opsMartin KaFai Lau
Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28Merge tag 'mac80211-next-for-davem-2020-08-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have: * some code to support SAE (WPA3) offload in AP mode * many documentation (wording) fixes/updates * netlink policy updates, including the use of NLA_RANGE with binary attributes * regulatory improvements for adjacent frequency bands * and a few other small additions/refactorings/cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28Merge tag 'mac80211-for-davem-2020-08-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have: * fixes for AQL (airtime queue limits) * reduce packet loss detection false positives * a small channel number fix for the 6 GHz band * a fix for 80+80/160 MHz negotiation * an nl80211 attribute (NL80211_ATTR_HE_6GHZ_CAPABILITY) fix * add a missing sanity check for the regulatory code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27rxrpc: Fix memory leak in rxkad_verify_response()Dinghao Liu
Fix a memory leak in rxkad_verify_response() whereby the response buffer doesn't get freed if we fail to allocate a ticket buffer. Fixes: ef68622da9cc ("rxrpc: Handle temporary errors better in rxkad security") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27Merge tag 'rxrpc-fixes-20200820' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc, afs: Fix probing issues Here are some fixes for rxrpc and afs to fix issues in the RTT measuring in rxrpc and thence the Volume Location server probing in afs: (1) Move the serial number of a received ACK into a local variable to simplify the next patch. (2) Fix the loss of RTT samples due to extra interposed ACKs causing baseline information to be discarded too early. This is a particular problem for afs when it sends a single very short call to probe a server it hasn't talked to recently. (3) Fix rxrpc_kernel_get_srtt() to indicate whether it actually has seen any valid samples or not. (4) Remove a field that's set/woken, but never read/waited on. (5) Expose the RTT and other probe information through procfs to make debugging of this stuff easier. (6) Fix VL rotation in afs to only use summary information from VL probing and not the probe running state (which gets clobbered when next a probe is issued). (7) Fix VL rotation to actually return the error aggregated from the probe errors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27batman-adv: bla: fix type misuse for backbone_gw hash indexingLinus Lüssing
It seems that due to a copy & paste error the void pointer in batadv_choose_backbone_gw() is cast to the wrong type. Fixing this by using "struct batadv_bla_backbone_gw" instead of "struct batadv_bla_claim" which better matches the caller's side. For now it seems that we were lucky because the two structs both have their orig/vid and addr/vid in the beginning. However I stumbled over this issue when I was trying to add some debug variables in front of "orig" in batadv_backbone_gw, which caused hash lookups to fail. Fixes: 07568d0369f9 ("batman-adv: don't rely on positions in struct for hashing") Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2020-08-27net: Fix some commentsMiaohe Lin
Fix some comments, including wrong function name, duplicated word and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27tipc: fix use-after-free in tipc_bcast_get_modeHoang Huu Le
Syzbot has reported those issues as: ================================================================== BUG: KASAN: use-after-free in tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759 Read of size 1 at addr ffff88805e6b3571 by task kworker/0:6/3850 CPU: 0 PID: 3850 Comm: kworker/0:6 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events tipc_net_finalize_work Thread 1's call trace: [...] kfree+0x103/0x2c0 mm/slab.c:3757 <- bcbase releasing tipc_bcast_stop+0x1b0/0x2f0 net/tipc/bcast.c:721 tipc_exit_net+0x24/0x270 net/tipc/core.c:112 [...] Thread 2's call trace: [...] tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759 <- bcbase has already been freed by Thread 1 tipc_node_broadcast+0x9e/0xcc0 net/tipc/node.c:1744 tipc_nametbl_publish+0x60b/0x970 net/tipc/name_table.c:752 tipc_net_finalize net/tipc/net.c:141 [inline] tipc_net_finalize+0x1fa/0x310 net/tipc/net.c:131 tipc_net_finalize_work+0x55/0x80 net/tipc/net.c:150 [...] ================================================================== BUG: KASAN: use-after-free in tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344 Read of size 8 at addr ffff888052ab2000 by task kworker/0:13/30628 CPU: 0 PID: 30628 Comm: kworker/0:13 Not tainted 5.8.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events tipc_net_finalize_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1f0/0x31e lib/dump_stack.c:118 print_address_description+0x66/0x5a0 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344 tipc_net_finalize+0x85/0xe0 net/tipc/net.c:138 tipc_net_finalize_work+0x50/0x70 net/tipc/net.c:150 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 [...] Freed by task 14058: save_stack mm/kasan/common.c:48 [inline] set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x220 mm/slab.c:3757 tipc_exit_net+0x29/0x50 net/tipc/core.c:113 ops_exit_list net/core/net_namespace.c:186 [inline] cleanup_net+0x708/0xba0 net/core/net_namespace.c:603 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 Fix it by calling flush_scheduled_work() to make sure the tipc_net_finalize_work() stopped before releasing bcbase object. Reported-by: syzbot+6ea1f7a8df64596ef4d7@syzkaller.appspotmail.com Reported-by: syzbot+e9cc557752ab126c1b99@syzkaller.appspotmail.com Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27nl80211: support SAE authentication offload in AP modeChung-Hsien Hsu
Let drivers advertise support for AP-mode SAE authentication offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD_AP flag. Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com> Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Link: https://lore.kernel.org/r/20200817073316.33402-4-stanley.hsu@cypress.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27mac80211: rename csa counters to countdown countersJohn Crispin
We want to reuse the functions and structs for other counters such as BSS color change. Rename them to more generic names. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200811080107.3615705-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27nl80211: rename csa counter attributes countdown countersJohn Crispin
We want to reuse the attributes for other counters such as BSS color change. Rename them to more generic names. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200811080107.3615705-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-27nl80211: add support for setting fixed HE rate/gi/ltfMiles Hu
This patch adds the nl80211 structs, definitions, policies and parsing code required to pass fixed HE rate, GI and LTF settings. Signed-off-by: Miles Hu <milehu@codeaurora.org> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200804081630.2013619-1-john@phrozen.org [fix comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>