summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-09-15batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LANLinus Lüssing
Scenario: * Multicast frame send from a BLA backbone (multiple nodes with their bat0 bridged together, with BLA enabled) Issue: * BLA backbone nodes receive the frame multiple times on bat0 For multicast frames received via batman-adv broadcast packets the originator of the broadcast packet is checked before decapsulating and forwarding the frame to bat0 (batadv_bla_is_backbone_gw()-> batadv_recv_bcast_packet()). If it came from a node which shares the same BLA backbone with us then it is not forwarded to bat0 to avoid a loop. When sending a multicast frame in a non-4-address batman-adv unicast packet we are currently missing this check - and cannot do so because the batman-adv unicast packet has no originator address field. However, we can simply fix this on the sender side by only sending the multicast frame via unicasts to interested nodes which do not share the same BLA backbone with us. This also nicely avoids some unnecessary transmissions on mesh side. Note that no infinite loop was observed, probably because of dropping via batadv_interface_tx()->batadv_bla_tx(). However the duplicates still utterly confuse switches/bridges, ICMPv6 duplicate address detection and neighbor discovery and therefore leads to long delays before being able to establish TCP connections, for instance. And it also leads to the Linux bridge printing messages like: "br-lan: received packet on eth1 with own address as source address ..." Fixes: 2d3f6ccc4ea5 ("batman-adv: Modified forwarding behaviour for multicast packets") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-09-14xsk: Fix refcount warning in xp_dma_mapMagnus Karlsson
Fix a potential refcount warning that a zero value is increased to one in xp_dma_map, by initializing the refcount to one to start with, instead of zero plus a refcount_inc(). Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/1600095036-23868-1-git-send-email-magnus.karlsson@gmail.com
2020-09-14xsk: Fix number of pinned pages/umem size discrepancyBjörn Töpel
For AF_XDP sockets, there was a discrepancy between the number of of pinned pages and the size of the umem region. The size of the umem region is used to validate the AF_XDP descriptor addresses. The logic that pinned the pages covered by the region only took whole pages into consideration, creating a mismatch between the size and pinned pages. A user could then pass AF_XDP addresses outside the range of pinned pages, but still within the size of the region, crashing the kernel. This change correctly calculates the number of pages to be pinned. Further, the size check for the aligned mode is simplified. Now the code simply checks if the size is divisible by the chunk size. Fixes: bbff2f321a86 ("xsk: new descriptor addressing scheme") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200910075609.7904-1-bjorn.topel@gmail.com
2020-09-14tcp: schedule EPOLLOUT after a partial sendmsgSoheil Hassas Yeganeh
For EPOLLET, applications must call sendmsg until they get EAGAIN. Otherwise, there is no guarantee that EPOLLOUT is sent if there was a failure upon memory allocation. As a result on high-speed NICs, userspace observes multiple small sendmsgs after a partial sendmsg until EAGAIN, since TCP can send 1-2 TSOs in between two sendmsg syscalls: // One large partial send due to memory allocation failure. sendmsg(20MB) = 2MB // Many small sends until EAGAIN. sendmsg(18MB) = 64KB sendmsg(17.9MB) = 128KB sendmsg(17.8MB) = 64KB ... sendmsg(...) = EAGAIN // At this point, userspace can assume an EPOLLOUT. To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios to guarantee that we send EPOLLOUT after partial sendmsg. After this commit userspace can assume that it will receive an EPOLLOUT after the first partial sendmsg. This EPOLLOUT will benefit from sk_stream_write_space() logic delaying the EPOLLOUT until significant space is available in write queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limitSoheil Hassas Yeganeh
If there was any event available on the TCP socket, tcp_poll() will be called to retrieve all the events. In tcp_poll(), we call sk_stream_is_writeable() which returns true as long as we are at least one byte below notsent_lowat. This will result in quite a few spurious EPLLOUT and frequent tiny sendmsg() calls as a result. Similar to sk_stream_write_space(), use __sk_stream_is_writeable with a wake value of 1, so that we set EPOLLOUT only if half the space is available for write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14net: sched: initialize with 0 before setting erspan md->uXin Long
In fl_set_erspan_opt(), all bits of erspan md was set 1, as this function is also used to set opt MASK. However, when setting for md->u.index for opt VALUE, the rest bits of the union md->u will be left 1. It would cause to fail the match of the whole md when version is 1 and only index is set. This patch is to fix by initializing with 0 before setting erspan md->u. Reported-by: Shuang Li <shuali@redhat.com> Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14lwtunnel: only keep the available bits when setting vxlan md->gbpXin Long
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can be set to or parse from the packet for vxlan gbp option. So do the mask when set it in lwtunnel, as it does in act_tunnel_key and cls_flower. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14net: sched: only keep the available bits when setting vxlan md->gbpXin Long
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can be set to or parse from the packet for vxlan gbp option. So we'd better do the mask when set it in act_tunnel_key and cls_flower. Otherwise, when users don't know these bits, they may configure with a value which can never be matched. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14tipc: use skb_unshare() instead in tipc_buf_append()Xin Long
In tipc_buf_append() it may change skb's frag_list, and it causes problems when this skb is cloned. skb_unclone() doesn't really make this skb's flag_list available to change. Shuang Li has reported an use-after-free issue because of this when creating quite a few macvlan dev over the same dev, where the broadcast packets will be cloned and go up to the stack: [ ] BUG: KASAN: use-after-free in pskb_expand_head+0x86d/0xea0 [ ] Call Trace: [ ] dump_stack+0x7c/0xb0 [ ] print_address_description.constprop.7+0x1a/0x220 [ ] kasan_report.cold.10+0x37/0x7c [ ] check_memory_region+0x183/0x1e0 [ ] pskb_expand_head+0x86d/0xea0 [ ] process_backlog+0x1df/0x660 [ ] net_rx_action+0x3b4/0xc90 [ ] [ ] Allocated by task 1786: [ ] kmem_cache_alloc+0xbf/0x220 [ ] skb_clone+0x10a/0x300 [ ] macvlan_broadcast+0x2f6/0x590 [macvlan] [ ] macvlan_process_broadcast+0x37c/0x516 [macvlan] [ ] process_one_work+0x66a/0x1060 [ ] worker_thread+0x87/0xb10 [ ] [ ] Freed by task 3253: [ ] kmem_cache_free+0x82/0x2a0 [ ] skb_release_data+0x2c3/0x6e0 [ ] kfree_skb+0x78/0x1d0 [ ] tipc_recvmsg+0x3be/0xa40 [tipc] So fix it by using skb_unshare() instead, which would create a new skb for the cloned frag and it'll be safe to change its frag_list. The similar things were also done in sctp_make_reassembled_event(), which is using skb_copy(). Reported-by: Shuang Li <shuali@redhat.com> Fixes: 37e22164a8a3 ("tipc: rename and move message reassembly function") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14tipc: Fix memory leak in tipc_group_create_member()Peilin Ye
tipc_group_add_to_tree() returns silently if `key` matches `nkey` of an existing node, causing tipc_group_create_member() to leak memory. Let tipc_group_add_to_tree() return an error in such a case, so that tipc_group_create_member() can handle it properly. Fixes: 75da2163dbb6 ("tipc: introduce communication groups") Reported-and-tested-by: syzbot+f95d90c454864b3b5bc9@syzkaller.appspotmail.com Cc: Hillf Danton <hdanton@sina.com> Link: https://syzkaller.appspot.com/bug?id=048390604fe1b60df34150265479202f10e13aff Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14__netif_receive_skb_core: don't untag vlan from skb on DSA masterVladimir Oltean
A DSA master interface has upper network devices, each representing an Ethernet switch port attached to it. Demultiplexing the source ports and setting skb->dev accordingly is done through the catch-all ETH_P_XDSA packet_type handler. Catch-all because DSA vendors have various header implementations, which can be placed anywhere in the frame: before the DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA handler acts like an rx_handler more than anything. It is unlikely for the DSA master interface to have any other upper than the DSA switch interfaces themselves. Only maybe a bridge upper*, but it is very likely that the DSA master will have no 8021q upper. So __netif_receive_skb_core() will try to untag the VLAN, despite the fact that the DSA switch interface might have an 8021q upper. So the skb will never reach that. So far, this hasn't been a problem because most of the possible placements of the DSA switch header mentioned in the first paragraph will displace the VLAN header when the DSA master receives the frame, so __netif_receive_skb_core() will not actually execute any VLAN-specific code for it. This only becomes a problem when the DSA switch header does not displace the VLAN header (for example with a tail tag). What the patch does is it bypasses the untagging of the skb when there is a DSA switch attached to this net device. So, DSA is the only packet_type handler which requires seeing the VLAN header. Once skb->dev will be changed, __netif_receive_skb_core() will be invoked again and untagging, or delivery to an 8021q upper, will happen in the RX of the DSA switch interface itself. *see commit 9eb8eff0cf2f ("net: bridge: allow enslaving some DSA master network devices". This is actually the reason why I prefer keeping DSA as a packet_type handler of ETH_P_XDSA rather than converting to an rx_handler. Currently the rx_handler code doesn't support chaining, and this is a problem because a DSA master might be bridged. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14ipv4: Initialize flowi4_multipath_hash in data pathDavid Ahern
flowi4_multipath_hash was added by the commit referenced below for tunnels. Unfortunately, the patch did not initialize the new field for several fast path lookups that do not initialize the entire flow struct to 0. Fix those locations. Currently, flowi4_multipath_hash is random garbage and affects the hash value computed by fib_multipath_hash for multipath selection. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: David Ahern <dsahern@gmail.com> Cc: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14net: try to avoid unneeded backlog flushPaolo Abeni
flush_all_backlogs() may cause deadlock on systems running processes with FIFO scheduling policy. The above is critical in -RT scenarios, where user-space specifically ensure no network activity is scheduled on the CPU running the mentioned FIFO process, but still get stuck. This commit tries to address the problem checking the backlog status on the remote CPUs before scheduling the flush operation. If the backlog is empty, we can skip it. v1 -> v2: - explicitly clear flushed cpu mask - Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14Merge tag 'rxrpc-next-20200914' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes for the connection manager rewrite Here are some fixes for the connection manager rewrite: (1) Fix a goto to the wrong place in error handling. (2) Fix a missing NULL pointer check. (3) The stored allocation error needs to be stored signed. (4) Fix a leak of connection bundle when clearing connections due to net namespace exit. (5) Fix an overget of the bundle when setting up a new client conn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14tcp: remove SOCK_QUEUE_SHRUNKEric Dumazet
SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state that remembers if some room has been made in the rtx queue by an incoming ACK packet. This is later used from tcp_check_space() before considering to send EPOLLOUT. Problem is: If we receive SACK packets, and no packet is removed from RTX queue, we can send fresh packets, thus moving them from write queue to rtx queue and eventually empty the write queue. This stall can happen if TCP_NOTSENT_LOWAT is used. With this fix, we no longer risk stalling sends while holes are repaired, and we can fully use socket sndbuf. This also removes a cache line dirtying for typical RPC workloads. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14net/packet: Fix a comment about hard_header_len and headroom allocationXie He
This comment is outdated and no longer reflects the actual implementation of af_packet.c. Reasons for the new comment: 1. In af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and checks if the user has provided a header sized between (dev->min_header_len) and (dev->hard_header_len) (in dev_validate_header). This shows the developers of af_packet.c expect hard_header_len to be consistent with header_ops. 2. In af_packet.c, the function packet_sendmsg_spkt has a FIXME comment. That comment states that prepending an LL header internally in a driver is considered a bug. I believe this bug can be fixed by setting hard_header_len to 0, making the internal header completely invisible to af_packet.c (and requesting the headroom in needed_headroom instead). 3. There is a commit for a WiFi driver: commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len") According to the discussion about it at: https://patchwork.kernel.org/patch/11407493/ The author tried to set the WiFi driver's hard_header_len to the Ethernet header length, and request additional header space internally needed by setting needed_headroom. This means this usage is already adopted by driver developers. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: call tcp_cleanup_rbuf on subflowsPaolo Abeni
That is needed to let the subflows announce promptly when new space is available in the receive buffer. tcp_cleanup_rbuf() is currently a static function, drop the scope modifier and add a declaration in the TCP header. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: allow picking different xmit subflowsPaolo Abeni
Update the scheduler to less trivial heuristic: cache the last used subflow, and try to send on it a reasonably long burst of data. When the burst or the subflow send space is exhausted, pick the subflow with the lower ratio between write space and send buffer - that is, the subflow with the greater relative amount of free space. v1 -> v2: - fix 32 bit build breakage due to 64bits div - fix checkpath issues (uint64_t -> u64) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: allow creating non-backup subflowsPaolo Abeni
Currently the 'backup' attribute of local endpoint is ignored. Let's use it for the MP_JOIN handshake Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: move address attribute into mptcp_addr_infoPaolo Abeni
So that can be accessed easily from the subflow creation helper. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: add OoO related mibsPaolo Abeni
Add a bunch of MPTCP mibs related to MPTCP OoO data processing. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: cleanup mptcp_subflow_discard_data()Paolo Abeni
There is no need to use the tcp_read_sock(), we can simply drop the skb. Additionally try to look at the next buffer for in order data. This both simplifies the code and avoid unneeded indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: move ooo skbs into msk out of order queue.Paolo Abeni
Add an RB-tree to cope with OoO (at MPTCP level) data. __mptcp_move_skb() insert into the RB tree "future" data, eventually coalescing skb as allowed by the MPTCP DSN. To simplify sequence accounting, move the DSN inside the cb. After successfully enqueuing in sequence data, check if we can use any data from the RB tree. Additionally move the data_fin check after spooling data from the OoO tree, otherwise we could miss shutdown events. The RB tree code is copied as verbatim as possible from tcp_data_queue_ofo(), with a few simplifications due to the fact that MPTCP doesn't need to cope with sacks. All bugs here are added by me. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: introduce and use mptcp_try_coalesce()Paolo Abeni
Factor-out existing code, will be re-used by the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: basic sndbuf autotuningPaolo Abeni
Let the msk sendbuf track the size of the larger subflow's send window, so that we ensure mptcp_sendmsg() does not exceed MPTCP-level send window. The update is performed just before try to send any data. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: trigger msk processing even for OoO dataPaolo Abeni
This is a prerequisite to allow receiving data from multiple subflows without re-injection. Instead of dropping the OoO - "future" data in subflow_check_data_avail(), call into __mptcp_move_skbs() and let the msk drop that. To avoid code duplication factor out the mptcp_subflow_discard_data() helper. Note that __mptcp_move_skbs() can now find multiple subflows with data avail (comprising to-be-discarded data), so must update the byte counter incrementally. v1 -> v2: - fix checkpatch issues (unsigned -> unsigned int) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: set data_ready status bit in subflow_check_data_avail()Paolo Abeni
This simplify mptcp_subflow_data_available() and will made follow-up patches simpler. Additionally remove the unneeded checks on subflow copied_seq: we always whole skbs out of subflows. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14mptcp: rethink 'is writable' conditionalPaolo Abeni
Currently, when checking for the 'msk is writable' condition, we look at the individual subflows write space. That works well while we send data via a single subflow, but will not as soon as we will enable concurrent xmit on multiple subflows. With this change msk becomes writable when the following conditions hold: - the socket has some free write space - there is at least a subflow with write free space Additionally we need to set the NOSPACE bit on all subflows before blocking. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14batman-adv: Add missing include for in_interrupt()Sven Eckelmann
The fix for receiving (internally generated) bla packets outside the interrupt context introduced the usage of in_interrupt(). But this functionality is only defined in linux/preempt.h which was not included with the same patch. Fixes: 279e89b2281a ("batman-adv: bla: use netif_rx_ni when not in interrupt context") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-09-14Bluetooth: pause/resume advertising around suspendDaniel Winkler
Currently, the controller will continue advertising when the system enters suspend. This patch makes sure that all advertising instances are paused when entering suspend, and resumed when suspend exits. The Advertising and Suspend/Resume test suites were both run on this change on 4.19 kernel with both hardware offloaded multi-advertising and software rotated multi-advertising. In addition, a new test was added that performs the following steps: * Register 3 advertisements via bluez RegisterAdvertisement * Verify reception of all advertisements by remote peer * Enter suspend on DUT * Verify failure to receive all advertisements by remote peer * Exit suspend on DUT * Verify reception of all advertisements by remote peer Signed-off-by: Daniel Winkler <danielwinkler@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-14rxrpc: Fix an overget of the conn bundle when setting up a client connDavid Howells
When setting up a client connection, a second ref is accidentally obtained on the connection bundle (we get one when allocating the conn and a second one when adding the conn to the bundle). Fix it to only use the ref obtained by rxrpc_alloc_client_connection() and not to add a second when adding the candidate conn to the bundle. Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-14rxrpc: Fix conn bundle leak in net-namespace exitDavid Howells
When the network namespace exits, rxrpc_clean_up_local_conns() needs to unbundle each client connection it evicts. Fix it to do this. kernel BUG at net/rxrpc/conn_object.c:481! RIP: 0010:rxrpc_destroy_all_connections.cold+0x11/0x13 net/rxrpc/conn_object.c:481 Call Trace: rxrpc_exit_net+0x1a4/0x2e0 net/rxrpc/net_ns.c:119 ops_exit_list+0xb0/0x160 net/core/net_namespace.c:186 cleanup_net+0x4ea/0xa00 net/core/net_namespace.c:603 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: syzbot+52071f826a617b9c76ed@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-14rxrpc: Fix rxrpc_bundle::alloc_error to be signedDavid Howells
The alloc_error field in the rxrpc_bundle struct should be signed as it has negative error codes assigned to it. Checks directly on it may then fail, and may produce a warning like this: net/rxrpc/conn_client.c:662 rxrpc_wait_for_channel() warn: 'bundle->alloc_error' is unsigned Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-14rxrpc: Fix an error goto in rxrpc_connect_call()David Howells
Fix an error-handling goto in rxrpc_connect_call() whereby it will jump to free the bundle it failed to allocate. Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-13Bluetooth: Emit controller suspend and resume eventsAbhishek Pandit-Subedi
Emit controller suspend and resume events when we are ready for suspend and we've resumed from suspend. The controller suspend event will report whatever suspend state was successfully entered. The controller resume event will check the first HCI event that was received after we finished preparing for suspend and, if it was a connection event, store the address of the peer that caused the event. If it was not a connection event, we mark the wake reason as an unexpected event. Here is a sample btmon trace with these events: @ MGMT Event: Controller Suspended (0x002d) plen 1 Suspend state: Page scanning and/or passive scanning (2) @ MGMT Event: Controller Resumed (0x002e) plen 8 Wake reason: Remote wake due to peer device connection (2) LE Address: CD:F3:CD:13:C5:9A (OUI CD-F3-CD) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-13Bluetooth: Add suspend reason for device disconnectAbhishek Pandit-Subedi
Update device disconnect event with reason 0x5 to indicate that device disconnected because the controller is suspending. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-13Bluetooth: Add mgmt suspend and resume eventsAbhishek Pandit-Subedi
Add the controller suspend and resume events, which will signal when Bluetooth has completed preparing for suspend and when it's ready for resume. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-13Bluetooth: Set ext scan response only when it existsAbhishek Pandit-Subedi
Only set extended scan response only when it exists. Otherwise, clear the scan response data. Per the core spec v5.2, Vol 4, Part E, 7.8.55 If the advertising set is non-scannable and the Host uses this command other than to discard existing data, the Controller shall return the error code Invalid HCI Command Parameters (0x12). On WCN3991, the controller correctly responds with Invalid Parameters when this is sent. That error causes __hci_req_hci_power_on to fail with -EINVAL and LE devices can't connect because background scanning isn't configured. Here is an hci trace of where this issue occurs during power on: < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 Handle: 0x00 Properties: 0x0010 Use legacy advertising PDUs: ADV_NONCONN_IND Min advertising interval: 181.250 msec (0x0122) Max advertising interval: 181.250 msec (0x0122) Channel map: 37, 38, 39 (0x07) Own address type: Random (0x01) Peer address type: Public (0x00) Peer address: 00:00:00:00:00:00 (OUI 00-00-00) Filter policy: Allow Scan Request from Any, Allow Connect... TX power: 127 dbm (0x7f) Primary PHY: LE 1M (0x01) Secondary max skip: 0x00 Secondary PHY: LE 1M (0x01) SID: 0x00 Scan request notifications: Disabled (0x00) > HCI Event: Command Complete (0x0e) plen 5 LE Set Extended Advertising Parameters (0x08|0x0036) ncmd 1 Status: Success (0x00) TX power (selected): 9 dbm (0x09) < HCI Command: LE Set Advertising Set Random Address (0x08|0x0035) plen 7 Advertising handle: 0x00 Advertising random address: 08:FD:55:ED:22:28 (OUI 08-FD-55) > HCI Event: Command Complete (0x0e) plen 4 LE Set Advertising Set Random Address (0x08|0x0035) ncmd Status: Success (0x00) < HCI Command: LE Set Extended Scan Response Data (0x08|0x0038) plen 35 Handle: 0x00 Operation: Complete scan response data (0x03) Fragment preference: Minimize fragmentation (0x01) Data length: 0x0d Name (short): Chromebook > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Scan Response Data (0x08|0x0038) ncmd 1 Status: Invalid HCI Command Parameters (0x12) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-13Bluetooth: Only mark socket zapped after unlockingAbhishek Pandit-Subedi
Since l2cap_sock_teardown_cb doesn't acquire the channel lock before setting the socket as zapped, it could potentially race with l2cap_sock_release which frees the socket. Thus, wait until the cleanup is complete before marking the socket as zapped. This race was reproduced on a JBL GO speaker after the remote device rejected L2CAP connection due to resource unavailability. Here is a dmesg log with debug logs from a repro of this bug: [ 3465.424086] Bluetooth: hci_core.c:hci_acldata_packet() hci0 len 16 handle 0x0003 flags 0x0002 [ 3465.424090] Bluetooth: hci_conn.c:hci_conn_enter_active_mode() hcon 00000000cfedd07d mode 0 [ 3465.424094] Bluetooth: l2cap_core.c:l2cap_recv_acldata() conn 000000007eae8952 len 16 flags 0x2 [ 3465.424098] Bluetooth: l2cap_core.c:l2cap_recv_frame() len 12, cid 0x0001 [ 3465.424102] Bluetooth: l2cap_core.c:l2cap_raw_recv() conn 000000007eae8952 [ 3465.424175] Bluetooth: l2cap_core.c:l2cap_sig_channel() code 0x03 len 8 id 0x0c [ 3465.424180] Bluetooth: l2cap_core.c:l2cap_connect_create_rsp() dcid 0x0045 scid 0x0000 result 0x02 status 0x00 [ 3465.424189] Bluetooth: l2cap_core.c:l2cap_chan_put() chan 000000006acf9bff orig refcnt 4 [ 3465.424196] Bluetooth: l2cap_core.c:l2cap_chan_del() chan 000000006acf9bff, conn 000000007eae8952, err 111, state BT_CONNECT [ 3465.424203] Bluetooth: l2cap_sock.c:l2cap_sock_teardown_cb() chan 000000006acf9bff state BT_CONNECT [ 3465.424221] Bluetooth: l2cap_core.c:l2cap_chan_put() chan 000000006acf9bff orig refcnt 3 [ 3465.424226] Bluetooth: hci_core.h:hci_conn_drop() hcon 00000000cfedd07d orig refcnt 6 [ 3465.424234] BUG: spinlock bad magic on CPU#2, kworker/u17:0/159 [ 3465.425626] Bluetooth: hci_sock.c:hci_sock_sendmsg() sock 000000002bb0cb64 sk 00000000a7964053 [ 3465.430330] lock: 0xffffff804410aac0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 3465.430332] Causing a watchdog bite! Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reported-by: Balakrishna Godavarthi <bgodavar@codeaurora.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-11Revert "net: dsa: Add more convenient functions for installing port VLANs"Vladimir Oltean
This reverts commit 314f76d7a68bab0516aa52877944e6aacfa0fc3f. Citing that commit message, the call graph was: dsa_slave_vlan_rx_add_vid dsa_port_setup_8021q_tagging | | | | | +-------------+ | | v v dsa_port_vid_add dsa_slave_port_obj_add | | +-------+ +-------+ | | v v dsa_port_vlan_add Now that tag_8021q has its own ops structure, it no longer relies on dsa_port_vid_add, and therefore on the dsa_switch_ops to install its VLANs. So dsa_port_vid_add now only has one single caller. So we can simplify the call graph to what it was before, aka: dsa_slave_vlan_rx_add_vid dsa_slave_port_obj_add | | +-------+ +-------+ | | v v dsa_port_vlan_add Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11net: dsa: tag_8021q: add a context structureVladimir Oltean
While working on another tag_8021q driver implementation, some things became apparent: - It is not mandatory for a DSA driver to offload the tag_8021q VLANs by using the VLAN table per se. For example, it can add custom TCAM rules that simply encapsulate RX traffic, and redirect & decapsulate rules for TX traffic. For such a driver, it makes no sense to receive the tag_8021q configuration through the same callback as it receives the VLAN configuration from the bridge and the 8021q modules. - Currently, sja1105 (the only tag_8021q user) sets a priv->expect_dsa_8021q variable to distinguish between the bridge calling, and tag_8021q calling. That can be improved, to say the least. - The crosschip bridging operations are, in fact, stateful already. The list of crosschip_links must be kept by the caller and passed to the relevant tag_8021q functions. So it would be nice if the tag_8021q configuration was more self-contained. This patch attempts to do that. Create a struct dsa_8021q_context which encapsulates a struct dsa_switch, and has 2 function pointers for adding and deleting a VLAN. These will replace the previous channel to the driver, which was through the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops. Also put the list of crosschip_links into this dsa_8021q_context. Drivers that don't support cross-chip bridging can simply omit to initialize this list, as long as they dont call any cross-chip function. The sja1105_vlan_add and sja1105_vlan_del functions are refactored into a smaller sja1105_vlan_add_one, which now has 2 entry points: - sja1105_vlan_add, from struct dsa_switch_ops - sja1105_dsa_8021q_vlan_add, from the tag_8021q ops But even this change is fairly trivial. It just reflects the fact that for sja1105, the VLANs from these 2 channels end up in the same hardware table. However that is not necessarily true in the general sense (and that's the reason for making this change). The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The dsa_8021q_context structure needs to be propagated because adding a VLAN is now done through the ops function pointers inside of it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11net: dsa: tag_8021q: setup tagging via a single function callVladimir Oltean
There is no point in calling dsa_port_setup_8021q_tagging for each individual port. Additionally, it will become more difficult to do that when we'll have a context structure to tag_8021q (next patch). So refactor this now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11taprio: Fix allowing too small intervalsVinicius Costa Gomes
It's possible that the user specifies an interval that couldn't allow any packet to be transmitted. This also avoids the issue of the hrtimer handler starving the other threads because it's running too often. The solution is to reject interval sizes that according to the current link speed wouldn't allow any packet to be transmitted. Reported-by: syzbot+8267241609ae8c23b248@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11bridge: mcast: Fix incomplete MDB dumpIdo Schimmel
Each MDB entry is encoded in a nested netlink attribute called 'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port group entry within the MDB entry. The cited commit added the ability to restart a dump from a specific port group entry. However, on failure to add a port group entry to the dump the entire MDB entry (stored in 'nest2') is removed, resulting in missing port group entries. Fix this by finalizing the MDB entry with the partial list of already encoded port group entries. Fixes: 5205e919c9f0 ("net: bridge: mcast: add support for src list and filter mode dumping") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11ipv6: remove redundant assignment to variable errColin Ian King
The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Also re-order variable declarations in reverse Christmas tree ordering. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11Bluetooth: Add MGMT capability flags for tx power and ext advertisingDaniel Winkler
For new advertising features, it will be important for userspace to know the capabilities of the controller and kernel. If the controller and kernel support extended advertising, we include flags indicating hardware offloading support and support for setting tx power of adv instances. In the future, vendor-specific commands may allow the setting of tx power in advertising instances, but for now this feature is only marked available if extended advertising is supported. This change is manually verified in userspace by ensuring the advertising manager's supported_flags field is updated with new flags on hatch chromebook (ext advertising supported). Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-11Bluetooth: Report num supported adv instances for hw offloadingDaniel Winkler
Here we make sure we properly report the number of supported advertising slots when we are using hardware offloading. If no hardware offloading is available, we default this value to HCI_MAX_ADV_INSTANCES for use in software rotation as before. This change has been tested on kukui (no ext adv) and hatch (ext adv) chromebooks by verifying "SupportedInstances" shows 5 (the default) and 6 (slots supported by controller), respectively. Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-11Bluetooth: sco: new getsockopt options BT_SNDMTU/BT_RCVMTUJoseph Hwang
This patch defines new getsockopt options BT_SNDMTU/BT_RCVMTU for SCO socket to be compatible with other bluetooth sockets. These new options return the same value as option SCO_OPTIONS which is already present on existing kernels. Signed-off-by: Joseph Hwang <josephsih@chromium.org> Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-11Bluetooth: Re-order clearing suspend tasksAbhishek Pandit-Subedi
Unregister_pm_notifier is a blocking call so suspend tasks should be cleared beforehand. Otherwise, the notifier will wait for completion before returning (and we encounter a 2s timeout on resume). Fixes: 0e9952804ec9c8 (Bluetooth: Clear suspend tasks on unregister) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-09-11Bluetooth: Fix memory leak in read_adv_mon_features()Peilin Ye
read_adv_mon_features() is leaking memory. Free `rp` before returning. Fixes: e5e1e7fd470c ("Bluetooth: Add handler of MGMT_OP_READ_ADV_MONITOR_FEATURES") Reported-and-tested-by: syzbot+f7f6e564f4202d8601c6@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f7f6e564f4202d8601c6 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>