summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-06-16can: bcm/raw/isotp: use per module netdevice notifierTetsuo Handa
syzbot is reporting hung task at register_netdevice_notifier() [1] and unregister_netdevice_notifier() [2], for cleanup_net() might perform time consuming operations while CAN driver's raw/bcm/isotp modules are calling {register,unregister}_netdevice_notifier() on each socket. Change raw/bcm/isotp modules to call register_netdevice_notifier() from module's __init function and call unregister_netdevice_notifier() from module's __exit function, as with gw/j1939 modules are doing. Link: https://syzkaller.appspot.com/bug?id=391b9498827788b3cc6830226d4ff5be87107c30 [1] Link: https://syzkaller.appspot.com/bug?id=1724d278c83ca6e6df100a2e320c10d991cf2bce [2] Link: https://lore.kernel.org/r/54a5f451-05ed-f977-8534-79e7aa2bcc8f@i-love.sakura.ne.jp Cc: linux-stable <stable@vger.kernel.org> Reported-by: syzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com> Reported-by: syzbot <syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: syzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-16can: j1939: fix Use-after-Free, hold skb ref while in useOleksij Rempel
This patch fixes a Use-after-Free found by the syzbot. The problem is that a skb is taken from the per-session skb queue, without incrementing the ref count. This leads to a Use-after-Free if the skb is taken concurrently from the session queue due to a CTS. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/r/20210521115720.7533-1-o.rempel@pengutronix.de Cc: Hillf Danton <hdanton@sina.com> Cc: linux-stable <stable@vger.kernel.org> Reported-by: syzbot+220c1a29987a9a490903@syzkaller.appspotmail.com Reported-by: syzbot+45199c1b73b4013525cf@syzkaller.appspotmail.com Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-16ethtool: add a stricter length checkJakub Kicinski
There has been a few errors in the ethtool reply size calculations, most of those are hard to trigger during basic testing because of skb size rounding up and netdev names being shorter than max. Add a more precise check. This change will affect the value of payload length displayed in case of -EMSGSIZE but that should be okay, "payload length" isn't a well defined term here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16inet_diag: add support for tw_markMaciej Żenczykowski
Timewait sockets have included mark since approx 4.18. Cc: Eric Dumazet <edumazet@google.com> Cc: Jon Maxwell <jmaxwell37@gmail.com> Fixes: 00483690552c ("tcp: Add mark for TIMEWAIT sockets") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16xfrm: avoid compiler warning when ipv6 is disabledFlorian Westphal
with CONFIG_IPV6=n: xfrm_output.c:140:12: warning: 'xfrm6_hdr_offset' defined but not used Fixes: 9acf4d3b9ec1 ("xfrm: ipv6: add xfrm6_hdr_offset helper") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-15netlabel: Fix memory leak in netlbl_mgmt_add_commonLiu Shixin
Hulk Robot reported memory leak in netlbl_mgmt_add_common. The problem is non-freed map in case of netlbl_domhsh_add() failed. BUG: memory leak unreferenced object 0xffff888100ab7080 (size 96): comm "syz-executor537", pid 360, jiffies 4294862456 (age 22.678s) hex dump (first 32 bytes): 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ backtrace: [<0000000008b40026>] netlbl_mgmt_add_common.isra.0+0xb2a/0x1b40 [<000000003be10950>] netlbl_mgmt_add+0x271/0x3c0 [<00000000c70487ed>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320 [<000000001f2ff614>] genl_rcv_msg+0x2bf/0x4f0 [<0000000089045792>] netlink_rcv_skb+0x134/0x3d0 [<0000000020e96fdd>] genl_rcv+0x24/0x40 [<0000000042810c66>] netlink_unicast+0x4a0/0x6a0 [<000000002e1659f0>] netlink_sendmsg+0x789/0xc70 [<000000006e43415f>] sock_sendmsg+0x139/0x170 [<00000000680a73d7>] ____sys_sendmsg+0x658/0x7d0 [<0000000065cbb8af>] ___sys_sendmsg+0xf8/0x170 [<0000000019932b6c>] __sys_sendmsg+0xd3/0x190 [<00000000643ac172>] do_syscall_64+0x37/0x90 [<000000009b79d6dc>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15net: inline function get_net_ns_by_fd if NET_NS is disabledChangbin Du
The function get_net_ns_by_fd() could be inlined when NET_NS is not enabled. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15net/sched: cls_flower: Remove match on n_protoBoris Sukholitko
The following flower filters fail to match packets: tc filter add dev eth0 ingress protocol 0x8864 flower \ action simple sdata hi64 tc filter add dev eth0 ingress protocol 802.1q flower \ vlan_ethtype 0x8864 action simple sdata "hi vlan" The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is being dissected by __skb_flow_dissect and it's internal protocol is being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES tunnel is transparent to the callers of __skb_flow_dissect. OTOH, in the filters above, cls_flower configures its key->basic.n_proto to the ETH_P_PPP_SES value configured by the user. Matching on this key fails because of __skb_flow_dissect "transparency" mentioned above. In the following, I would argue that the problem lies with cls_flower, unnessary attempting key->basic.n_proto match. There are 3 close places in fl_set_key in cls_flower setting up mask->basic.n_proto. They are (in reverse order of appearance in the code) due to: (a) No vlan is given: use TCA_FLOWER_KEY_ETH_TYPE parameter (b) One vlan tag is given: use TCA_FLOWER_KEY_VLAN_ETH_TYPE (c) Two vlans are given: use TCA_FLOWER_KEY_CVLAN_ETH_TYPE The match in case (a) is unneeded because flower has no its own eth_type parameter. It was removed by Jamal Hadi Salim in commit 488b41d020fb06428b90289f70a41210718f52b7 in iproute2. For TCA_FLOWER_KEY_ETH_TYPE the userspace uses the generic tc filter protocol field. Therefore the match for the case (a) is done by tc itself. The matches in cases (b), (c) are unneeded because the protocol will appear in and will be matched by flow_dissector_key_vlan.vlan_tpid. Therefore in the best case, key->basic.n_proto will try to repeat vlan key match again. The below patch removes mask->basic.n_proto setting and resets it to 0 in case (c). Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15bpf: Support socket migration by eBPF.Kuniyuki Iwashima
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, we run it for socket migration if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or net.ipv4.tcp_migrate_req is enabled. Currently, the expected_attach_type is not enforced for the BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the earlier idea in the commit aac3fc320d94 ("bpf: Post-hooks for sys_bind") to fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type(). Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to select a new listener based on the child socket. migrating_sk varies depending on if it is migrating a request in the accept queue or during 3WHS. - accept_queue : sock (ESTABLISHED/SYN_RECV) - 3WHS : request_sock (NEW_SYN_RECV) In the eBPF program, we can select a new listener by BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning SK_DROP. This feature is useful when listeners have different settings at the socket API level or when we want to free resources as soon as possible. - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fallbacks to the random selection - SK_DROP, cancel the migration. There is a noteworthy point. We select a listening socket in three places, but we do not have struct skb at closing a listener or retransmitting a SYN+ACK. On the other hand, some helper functions do not expect skb is NULL (e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer() in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb temporarily before running the eBPF program. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT.Kuniyuki Iwashima
We will call sock_reuseport.prog for socket migration in the next commit, so the eBPF program has to know which listener is closing to select a new listener. We can currently get a unique ID of each listener in the userspace by calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map. This patch makes the pointer of sk available in sk_reuseport_md so that we can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f7zc@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-9-kuniyu@amazon.co.jp
2021-06-15tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK.Kuniyuki Iwashima
This patch also changes the code to call reuseport_migrate_sock() and inet_reqsk_clone(), but unlike the other cases, we do not call inet_reqsk_clone() right after reuseport_migrate_sock(). Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener has three kinds of refcnt: (A) for listener itself (B) carried by reuqest_sock (C) sock_hold() in tcp_v[46]_rcv() While processing the req, (A) may disappear by close(listener). Also, (B) can disappear by accept(listener) once we put the req into the accept queue. So, we have to hold another refcnt (C) for the listener to prevent use-after-free. For socket migration, we call reuseport_migrate_sock() to select a listener with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv(). This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv(). Thus we have to take another refcnt (B) for the newly cloned request_sock. In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and try to put the new req into the accept queue. By migrating req after winning the "own_req" race, we can avoid such a worst situation: CPU 1 looks up req1 CPU 2 looks up req1, unhashes it, then CPU 1 loses the race CPU 3 looks up req2, unhashes it, then CPU 2 loses the race ... Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-8-kuniyu@amazon.co.jp
2021-06-15tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs.Kuniyuki Iwashima
As with the preceding patch, this patch changes reqsk_timer_handler() to call reuseport_migrate_sock() and inet_reqsk_clone() to migrate in-flight requests at retransmitting SYN+ACKs. If we can select a new listener and clone the request, we resume setting the SYN+ACK timer for the new req. If we can set the timer, we call inet_ehash_insert() to unhash the old req and put the new req into ehash. The noteworthy point here is that by unhashing the old req, another CPU processing it may lose the "own_req" race in tcp_v[46]_syn_recv_sock() and drop the final ACK packet. However, the new timer will recover this situation. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-7-kuniyu@amazon.co.jp
2021-06-15tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.Kuniyuki Iwashima
When we call close() or shutdown() for listening sockets, each child socket in the accept queue are freed at inet_csk_listen_stop(). If we can get a new listener by reuseport_migrate_sock() and clone the request by inet_reqsk_clone(), we try to add it into the new listener's accept queue by inet_csk_reqsk_queue_add(). If it fails, we have to call __reqsk_free() to call sock_put() for its listener and free the cloned request. After putting the full socket into ehash, tcp_v[46]_syn_recv_sock() sets NULL to ireq_opt/pktopts in struct inet_request_sock, but ipv6_opt can be non-NULL. So, we have to set NULL to ipv6_opt of the old request to avoid double free. Note that we do not update req->rsk_listener and instead clone the req to migrate because another path may reference the original request. If we protected it by RCU, we would need to add rcu_read_lock() in many places. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-6-kuniyu@amazon.co.jp
2021-06-15tcp: Add reuseport_migrate_sock() to select a new listener.Kuniyuki Iwashima
reuseport_migrate_sock() does the same check done in reuseport_listen_stop_sock(). If the reuseport group is capable of migration, reuseport_migrate_sock() selects a new listener by the child socket hash and increments the listener's sk_refcnt beforehand. Thus, if we fail in the migration, we have to decrement it later. We will support migration by eBPF in the later commits. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-5-kuniyu@amazon.co.jp
2021-06-15tcp: Keep TCP_CLOSE sockets in the reuseport group.Kuniyuki Iwashima
When we close a listening socket, to migrate its connections to another listener in the same reuseport group, we have to handle two kinds of child sockets. One is that a listening socket has a reference to, and the other is not. The former is the TCP_ESTABLISHED/TCP_SYN_RECV sockets, and they are in the accept queue of their listening socket. So we can pop them out and push them into another listener's queue at close() or shutdown() syscalls. On the other hand, the latter, the TCP_NEW_SYN_RECV socket is during the three-way handshake and not in the accept queue. Thus, we cannot access such sockets at close() or shutdown() syscalls. Accordingly, we have to migrate immature sockets after their listening socket has been closed. Currently, if their listening socket has been closed, TCP_NEW_SYN_RECV sockets are freed at receiving the final ACK or retransmitting SYN+ACKs. At that time, if we could select a new listener from the same reuseport group, no connection would be aborted. However, we cannot do that because reuseport_detach_sock() sets NULL to sk_reuseport_cb and forbids access to the reuseport group from closed sockets. This patch allows TCP_CLOSE sockets to remain in the reuseport group and access it while any child socket references them. The point is that reuseport_detach_sock() was called twice from inet_unhash() and sk_destruct(). This patch replaces the first reuseport_detach_sock() with reuseport_stop_listen_sock(), which checks if the reuseport group is capable of migration. If capable, it decrements num_socks, moves the socket backwards in socks[] and increments num_closed_socks. When all connections are migrated, sk_destruct() calls reuseport_detach_sock() to remove the socket from socks[], decrement num_closed_socks, and set NULL to sk_reuseport_cb. By this change, closed or shutdowned sockets can keep sk_reuseport_cb. Consequently, calling listen() after shutdown() can cause EADDRINUSE or EBUSY in inet_csk_bind_conflict() or reuseport_add_sock() which expects such sockets not to have the reuseport group. Therefore, this patch also loosens such validation rules so that a socket can listen again if it has a reuseport group with num_closed_socks more than 0. When such sockets listen again, we handle them in reuseport_resurrect(). If there is an existing reuseport group (reuseport_add_sock() path), we move the socket from the old group to the new one and free the old one if necessary. If there is no existing group (reuseport_alloc() path), we allocate a new reuseport group, detach sk from the old one, and free it if necessary, not to break the current shutdown behaviour: - we cannot carry over the eBPF prog of shutdowned sockets - we cannot attach/detach an eBPF prog to/from listening sockets via shutdowned sockets Note that when the number of sockets gets over U16_MAX, we try to detach a closed socket randomly to make room for the new listening socket in reuseport_grow(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-4-kuniyu@amazon.co.jp
2021-06-15tcp: Add num_closed_socks to struct sock_reuseport.Kuniyuki Iwashima
As noted in the following commit, a closed listener has to hold the reference to the reuseport group for socket migration. This patch adds a field (num_closed_socks) to struct sock_reuseport to manage closed sockets within the same reuseport group. Moreover, this and the following commits introduce some helper functions to split socks[] into two sections and keep TCP_LISTEN and TCP_CLOSE sockets in each section. Like a double-ended queue, we will place TCP_LISTEN sockets from the front and TCP_CLOSE sockets from the end. TCP_LISTEN----------> <-------TCP_CLOSE +---+---+ --- +---+ --- +---+ --- +---+ | 0 | 1 | ... | i | ... | j | ... | k | +---+---+ --- +---+ --- +---+ --- +---+ i = num_socks - 1 j = max_socks - num_closed_socks k = max_socks - 1 This patch also extends reuseport_add_sock() and reuseport_grow() to support num_closed_socks. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-3-kuniyu@amazon.co.jp
2021-06-15net: Introduce net.ipv4.tcp_migrate_req.Kuniyuki Iwashima
This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this option is enabled or eBPF program is attached, we will be able to migrate child sockets from a listener to another in the same reuseport group after close() or shutdown() syscalls. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-2-kuniyu@amazon.co.jp
2021-06-14Bluetooth: SMP: Fix crash when receiving new connection when debug is enabledLuiz Augusto von Dentz
When receiving a new connection pchan->conn won't be initialized so the code cannot use bt_dev_dbg as the pointer to hci_dev won't be accessible. Fixes: 2e1614f7d61e4 ("Bluetooth: SMP: Convert BT_ERR/BT_DBG to bt_dev_err/bt_dev_dbg") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-06-14net: flow_dissector: fix RPS on DSA mastersVladimir Oltean
After the blamed patch, __skb_flow_dissect() on the DSA master stopped adjusting for the length of the DSA headers. This is because it was told to adjust only if the needed_headroom is zero, aka if there is no DSA header. Of course, the adjustment should be done only if there _is_ a DSA header. Modify the comment too so it is clearer. Fixes: 4e50025129ef ("net: dsa: generalize overhead for taggers that use both headers and trailers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14net: core: devlink: add dropped stats traps fieldOleksandr Mazur
Whenever query statistics is issued for trap, devlink subsystem would also fill-in statistics 'dropped' field. This field indicates the number of packets HW dropped and failed to report to the device driver, and thus - to the devlink subsystem itself. In case if device driver didn't register callback for hard drop statistics querying, 'dropped' field will be omitted and not filled. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14net: qrtr: fix OOB Read in qrtr_endpoint_postPavel Skripkin
Syzbot reported slab-out-of-bounds Read in qrtr_endpoint_post. The problem was in wrong _size_ type: if (len != ALIGN(size, 4) + hdrlen) goto err; If size from qrtr_hdr is 4294967293 (0xfffffffd), the result of ALIGN(size, 4) will be 0. In case of len == hdrlen and size == 4294967293 in header this check won't fail and skb_put_data(skb, data + hdrlen, size); will read out of bound from data, which is hdrlen allocated block. Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-and-tested-by: syzbot+1917d778024161609247@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14net: dsa: dsa_slave_phy_connect(): extend phy's flags with port specific phy ↵Oleksij Rempel
flags The current get_phy_flags() is only processed when we connect to a PHY via a designed phy-handle property via phylink_of_phy_connect(), but if we fallback on the internal MDIO bus created by a switch and take the dsa_slave_phy_connect() path then we would not be processing that flag and using it at PHY connection time. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14mld: avoid unnecessary high order page allocation in mld_newpack()Taehee Yoo
If link mtu is too big, mld_newpack() allocates high-order page. But most mld packets don't need high-order page. So, it might waste unnecessary pages. To avoid this, it makes mld_newpack() try to allocate order-0 page. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14ipv6: fib6: remove redundant initialization of variable errColin Ian King
The variable err is being initialized with a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14ipv4: Fix device used for dst_alloc with local routesDavid Ahern
Oliver reported a use case where deleting a VRF device can hang waiting for the refcnt to drop to 0. The root cause is that the dst is allocated against the VRF device but cached on the loopback device. The use case (added to the selftests) has an implicit VRF crossing due to the ordering of the FIB rules (lookup local is before the l3mdev rule, but the problem occurs even if the FIB rules are re-ordered with local after l3mdev because the VRF table does not have a default route to terminate the lookup). The end result is is that the FIB lookup returns the loopback device as the nexthop, but the ingress device is in a VRF. The mismatch causes the dst alloc against the VRF device but then cached on the loopback. The fix is to bring the trick used for IPv6 (see ip6_rt_get_dev_rcu): pick the dst alloc device based the fib lookup result but with checks that the result has a nexthop device (e.g., not an unreachable or prohibit entry). Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") Reported-by: Oliver Herms <oliver.peter.herms@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14ethtool: strset: fix message length calculationJakub Kicinski
Outer nest for ETHTOOL_A_STRSET_STRINGSETS is not accounted for. This may result in ETHTOOL_MSG_STRSET_GET producing a warning like: calculated message payload length (684) not sufficient WARNING: CPU: 0 PID: 30967 at net/ethtool/netlink.c:369 ethnl_default_doit+0x87a/0xa20 and a splat. As usually with such warnings three conditions must be met for the warning to trigger: - there must be no skb size rounding up (e.g. reply_size of 684); - string set must be per-device (so that the header gets populated); - the device name must be at least 12 characters long. all in all with current user space it looks like reading priv flags is the only place this could potentially happen. Or with syzbot :) Reported-by: syzbot+59aa77b92d06cd5a54f2@syzkaller.appspotmail.com Fixes: 71921690f974 ("ethtool: provide string sets with STRSET_GET request") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14sch_cake: revise docs for RFC 8622 LE PHB supportTyson Moore
Commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling") added the LE mark to the Bulk tin. Update the comments to reflect the change. Signed-off-by: Tyson Moore <tyson@tyson.me> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14xfrm: Fix error reporting in xfrm_state_construct.Steffen Klassert
When memory allocation for XFRMA_ENCAP or XFRMA_COADDR fails, the error will not be reported because the -ENOMEM assignment to the err variable is overwritten before. Fix this by moving these two in front of the function so that memory allocation failures will be reported. Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-14Merge tag 'v5.13-rc6' into tty-nextGreg Kroah-Hartman
We want the tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-13sunrpc: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()Anna Schumaker
This seems to happen fairly easily during READ_PLUS testing on NFS v4.2. I found that we could end up accessing xdr->buf->pages[pgnr] with a pgnr greater than the number of pages in the array. So let's just return early if we're setting base to a point at the end of the page data and let xdr_set_tail_base() handle setting up the buffer pointers instead. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Fixes: 8d86e373b0ef ("SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-12wwan: add interface creation supportJohannes Berg
Add support to create (and destroy) interfaces via a new rtnetlink kind "wwan". The responsible driver has to use the new wwan_register_ops() to make this possible. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12rtnetlink: add IFLA_PARENT_[DEV|DEV_BUS]_NAMEJohannes Berg
In some cases, for example in the upcoming WWAN framework changes, there's no natural "parent netdev", so sometimes dummy netdevs are created or similar. IFLA_PARENT_DEV_NAME is a new attribute intended to contain a device (sysfs, struct device) name that can be used instead when creating a new netdev, if the rtnetlink family implements it. As suggested by Parav Pandit, we also introduce IFLA_PARENT_DEV_BUS_NAME attribute in order to uniquely identify a device on the system (with bus/name pair). ip-link(8) support for the generic parent device attributes will help us avoid code duplication, so no other link type will require a custom code to handle the parent name attribute. E.g. the WWAN interface creation command will looks like this: $ ip link add wwan0-1 parent-dev wwan0 type wwan channel-id 1 So, some future subsystem (or driver) FOO will have an interface creation command that looks like this: $ ip link add foo1-3 parent-dev foo1 type foo bar-id 3 baz-type Y Below is an example of dumping link info of a random device with these new attributes: $ ip --details link show wlp0s20f3 4: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000 ... parent_bus pci parent_dev 0000:00:14.3 Co-developed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Co-developed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12rtnetlink: add alloc() method to rtnl_link_opsJohannes Berg
In order to make rtnetlink ops that can create different kinds of devices, like what we want to add to the WWAN framework, the priv_size and setup parameters aren't quite sufficient. Make this easier to manage by allowing ops to allocate their own netdev via an @alloc method that gets the tb netlink data. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12net: make get_net_ns return error if NET_NS is disabledChangbin Du
There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12net/af_iucv: clean up some forward declarationsJulian Wiedmann
The forward declarations for the iucv_handler callbacks are causing various compile warnings with gcc-11. Reshuffle the code to get rid of these prototypes. Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11vsock/loopback: enable SEQPACKET for transportArseny Krasnov
Add SEQPACKET ops for loopback transport and 'seqpacket_allow()' callback. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: enable SEQPACKET for transportArseny Krasnov
To make transport work with SOCK_SEQPACKET add two things: 1) SOCK_SEQPACKET ops for virtio transport and 'seqpacket_allow()' callback. 2) Handling of SEQPACKET bit: guest tries to negotiate it with vhost, so feature will be enabled only if bit is negotiated with device. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: rest of SOCK_SEQPACKET supportArseny Krasnov
Small updates to make SOCK_SEQPACKET work: 1) Send SHUTDOWN on socket close for SEQPACKET type. 2) Set SEQPACKET packet type during send. 3) Set 'VIRTIO_VSOCK_SEQ_EOR' bit in flags for last packet of message. 4) Implement data check function for SEQPACKET. 5) Check for max datagram size. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: add SEQPACKET receive logicArseny Krasnov
Update current receive logic for SEQPACKET support: performs check for packet and socket types on receive(if mismatch, then reset connection). Increment EOR counter on receive. Also if buffer of new packet was appended to buffer of last packet in rx queue, update flags of last packet with flags of new packet. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: dequeue callback for SOCK_SEQPACKETArseny Krasnov
Callback fetches RW packets from rx queue of socket until whole record is copied(if user's buffer is full, user is not woken up). This is done to not stall sender, because if we wake up user and it leaves syscall, nobody will send credit update for rest of record, and sender will wait for next enter of read syscall at receiver's side. So if user buffer is full, we just send credit update and drop data. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: simplify credit update function APIArseny Krasnov
This function is static and 'hdr' arg was always NULL. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11virtio/vsock: set packet's type in virtio_transport_send_pkt_info()Arseny Krasnov
There is no need to set type of packet which differs from type of socket, so move passing type of packet from 'info' structure to 'virtio_transport_send_pkt_info()' function. Since at current time only stream type is supported, set it directly in 'virtio_ transport_send_pkt_info()', so callers don't need to set it. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: update comments for stream socketsArseny Krasnov
Replace 'stream' to 'connection oriented' in comments as SEQPACKET is also connection oriented. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: rest of SEQPACKET supportArseny Krasnov
Add socket ops for SEQPACKET type and .seqpacket_allow() callback to query transports if they support SEQPACKET. Also split path for data check for STREAM and SEQPACKET branches. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement send logic for SEQPACKETArseny Krasnov
Update current stream enqueue function for SEQPACKET support: 1) Call transport's seqpacket enqueue callback. 2) Return value from enqueue function is whole record length or error for SOCK_SEQPACKET. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement SEQPACKET receive loopArseny Krasnov
Add receive loop for SEQPACKET. It looks like receive loop for STREAM, but there are differences: 1) It doesn't call notify callbacks. 2) It doesn't care about 'SO_SNDLOWAT' and 'SO_RCVLOWAT' values, because there is no sense for these values in SEQPACKET case. 3) It waits until whole record is received. 4) It processes and sets 'MSG_TRUNC' flag. So to avoid extra conditions for two types of socket inside one loop, two independent functions were created. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: separate receive data loopArseny Krasnov
Some code in receive data loop could be shared between SEQPACKET and STREAM sockets, while another part is type specific, so move STREAM specific data receive logic to '__vsock_stream_recvmsg()' dedicated function, while checks, that will be same for both STREAM and SEQPACKET sockets, stays in 'vsock_connectible_recvmsg()'. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: separate wait data loopArseny Krasnov
Wait loop for data could be shared between SEQPACKET and STREAM sockets, so move it to dedicated function. While moving the code around, let's update an old comment. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: update functions for connectible socketArseny Krasnov
Prepare af_vsock.c for SEQPACKET support: rename some functions such as setsockopt(), getsockopt(), connect(), recvmsg(), sendmsg() in general manner, because they are shared with stream sockets. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11net: devres: Correct a grammatical errorgushengxian
Correct a grammatical error. Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>