summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca. 2) Better parameter validation in pfkey_dump(), from Mark Salyzyn. 3) Fix several clang issues on powerpc in selftests, from Tanner Love. 4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al Viro. 5) Out of bounds access in mlx5e driver, from Raed Salem. 6) Fix transfer buffer memleak in lan78xx, from Johan Havold. 7) RCU fixups in rhashtable, from Herbert Xu. 8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang. 9) vxlan FDB dump must be done under RCU, from Ido Schimmel. 10) Fix use after free in mlxsw, from Ido Schimmel. 11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko. 12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar Thiagarajan. 13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang. 14) Fix bpf program reference count leaks in mlx5 during mlx5e_alloc_rq(), from Xin Xiong. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits) vxlan: fix memleak of fdb rds: Prevent kernel-infoleak in rds_notify_queue_get() net/sched: The error lable position is corrected in ct_init_module net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq net/mlx5e: E-Switch, Specify flow_source for rule with no in_port net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring net/mlx5e: CT: Support restore ipv6 tunnel net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe() ionic: unlock queue mutex in error path atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent net: ethernet: mtk_eth_soc: fix MTU warnings net: nixge: fix potential memory leak in nixge_probe() devlink: ignore -EOPNOTSUPP errors on dumpit rxrpc: Fix race between recvmsg and sendmsg on immediate call failure MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer selftests/bpf: fix netdevsim trap_flow_action_cookie read ipv6: fix memory leaks on IPV6_ADDRFORM path net/bpfilter: Initialize pos in __bpfilter_process_sockopt igb: reinit_locked() should be called with rtnl_lock e1000e: continue to init PHY even when failed to disable ULP ...
2020-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2020-07-31 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 21 day(s) which contain a total of 5 files changed, 126 insertions(+), 18 deletions(-). The main changes are: 1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko. 2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no btf_vmlinux is available, from Peilin Ye. 3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig. 4) Fix a cgroup sockopt verifier test by specifying expected attach type, from Jean-Philippe Brucker. Note that when net gets merged into net-next later on, there is a small merge conflict in kernel/bpf/btf.c between commit 5b801dfb7feb ("bpf: Fix NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree and commit 138b9a0511c7 ("bpf: Remove btf_id helpers resolving") from the net-next tree. Resolve as follows: remove the old hunk with the __btf_resolve_helper_id() function. Change the btf_resolve_helper_id() so it actually tests for a NULL btf_vmlinux and bails out: int btf_resolve_helper_id(struct bpf_verifier_log *log, const struct bpf_func_proto *fn, int arg) { int id; if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux) return -EINVAL; id = fn->btf_id[arg]; if (!id || id > btf_vmlinux->nr_types) return -EINVAL; return id; } Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in the loop with regards to merge conflict resolution). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-07-31 1) Fix policy matching with mark and mask on userspace interfaces. From Xin Long. 2) Several fixes for the new ESP in TCP encapsulation. From Sabrina Dubroca. 3) Fix crash when the hold queue is used. The assumption that xdst->path and dst->child are not a NULL pointer only if dst->xfrm is not a NULL pointer is true with the exception of using the hold queue. Fix this by checking for hold queue usage before dereferencing xdst->path or dst->child. 4) Validate pfkey_dump parameter before sending them. From Mark Salyzyn. 5) Fix the location of the transport header with ESP in UDPv6 encapsulation. From Sabrina Dubroca. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31rds: Prevent kernel-infoleak in rds_notify_queue_get()Peilin Ye
rds_notify_queue_get() is potentially copying uninitialized kernel stack memory to userspace since the compiler may leave a 4-byte hole at the end of `cmsg`. In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which unfortunately does not always initialize that 4-byte hole. Fix it by using memset() instead. Cc: stable@vger.kernel.org Fixes: f037590fff30 ("rds: fix a leak of kernel memory") Fixes: bdbe6fbc6a2f ("RDS: recv.c") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31net/sched: The error lable position is corrected in ct_init_moduleliujian
Exchange the positions of the err_tbl_init and err_register labels in ct_init_module function. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: liujian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30Merge tag 'mac80211-for-davem-2020-07-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of more changes: * remove a warning that can trigger in certain races * check a function pointer before using it * check before adding 6 GHz to avoid a warning in mesh * fix two memory leaks in mesh * fix a TX status bug leading to a memory leak ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30devlink: ignore -EOPNOTSUPP errors on dumpitJakub Kicinski
Number of .dumpit functions try to ignore -EOPNOTSUPP errors. Recent change missed that, and started reporting all errors but -EMSGSIZE back from dumps. This leads to situation like this: $ devlink dev info devlink answers: Operation not supported Dump should not report an error just because the last device to be queried could not provide an answer. To fix this and avoid similar confusion make sure we clear err properly, and not leave it set to an error if we don't terminate the iteration. Fixes: c62c2cfb801b ("net: devlink: don't ignore errors during dumpit") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30rxrpc: Fix race between recvmsg and sendmsg on immediate call failureDavid Howells
There's a race between rxrpc_sendmsg setting up a call, but then failing to send anything on it due to an error, and recvmsg() seeing the call completion occur and trying to return the state to the user. An assertion fails in rxrpc_recvmsg() because the call has already been released from the socket and is about to be released again as recvmsg deals with it. (The recvmsg_q queue on the socket holds a ref, so there's no problem with use-after-free.) We also have to be careful not to end up reporting an error twice, in such a way that both returns indicate to userspace that the user ID supplied with the call is no longer in use - which could cause the client to malfunction if it recycles the user ID fast enough. Fix this by the following means: (1) When sendmsg() creates a call after the point that the call has been successfully added to the socket, don't return any errors through sendmsg(), but rather complete the call and let recvmsg() retrieve them. Make sendmsg() return 0 at this point. Further calls to sendmsg() for that call will fail with ESHUTDOWN. Note that at this point, we haven't send any packets yet, so the server doesn't yet know about the call. (2) If sendmsg() returns an error when it was expected to create a new call, it means that the user ID wasn't used. (3) Mark the call disconnected before marking it completed to prevent an oops in rxrpc_release_call(). (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate that the user ID is no longer known by the kernel. An oops like the following is produced: kernel BUG at net/rxrpc/recvmsg.c:605! ... RIP: 0010:rxrpc_recvmsg+0x256/0x5ae ... Call Trace: ? __init_waitqueue_head+0x2f/0x2f ____sys_recvmsg+0x8a/0x148 ? import_iovec+0x69/0x9c ? copy_msghdr_from_user+0x5c/0x86 ___sys_recvmsg+0x72/0xaa ? __fget_files+0x22/0x57 ? __fget_light+0x46/0x51 ? fdget+0x9/0x1b do_recvmmsg+0x15e/0x232 ? _raw_spin_unlock+0xa/0xb ? vtime_delta+0xf/0x25 __x64_sys_recvmmsg+0x2c/0x2f do_syscall_64+0x4c/0x78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()") Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30ipv6: fix memory leaks on IPV6_ADDRFORM pathCong Wang
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket to IPv4, particularly struct ipv6_ac_socklist. Similar to struct ipv6_mc_socklist, we should just close it on this path. This bug can be easily reproduced with the following C program: #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <arpa/inet.h> int main() { int s, value; struct sockaddr_in6 addr; struct ipv6_mreq m6; s = socket(AF_INET6, SOCK_DGRAM, 0); addr.sin6_family = AF_INET6; addr.sin6_port = htons(5000); inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr); connect(s, (struct sockaddr *)&addr, sizeof(addr)); inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr); m6.ipv6mr_interface = 5; setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6)); value = AF_INET; setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value)); close(s); return 0; } Reported-by: ch3332xr@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31net/bpfilter: Initialize pos in __bpfilter_process_sockoptChristoph Hellwig
__bpfilter_process_sockopt never initialized the pos variable passed to the pipe write. This has been mostly harmless in the past as pipes ignore the offset, but the switch to kernel_write now verified the position, which can lead to a failure depending on the exact stack initialization pattern. Initialize the variable to zero to make rw_verify_area happy. Fixes: 6955a76fbcd5 ("bpfilter: switch to kernel_write") Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Reported-by: Rodrigo Madera <rodrigo.madera@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Rodrigo Madera <rodrigo.madera@gmail.com> Tested-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/bpf/20200730160900.187157-1-hch@lst.de
2020-07-30Bluetooth: fix kernel oops in store_pending_adv_reportAlain Michaud
Fix kernel oops observed when an ext adv data is larger than 31 bytes. This can be reproduced by setting up an advertiser with advertisement larger than 31 bytes. The issue is not sensitive to the advertisement content. In particular, this was reproduced with an advertisement of 229 bytes filled with 'A'. See stack trace below. This is fixed by not catching ext_adv as legacy adv are only cached to be able to concatenate a scanable adv with its scan response before sending it up through mgmt. With ext_adv, this is no longer necessary. general protection fault: 0000 [#1] SMP PTI CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic #41-Ubuntu Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020 Workqueue: hci0 hci_rx_work [bluetooth] RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth] Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75 RSP: 0018:ffffbc6a40493c70 EFLAGS: 00010286 RAX: 4141414141414141 RBX: 000000000000001b RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9903e76c100f RDI: ffff9904289d4b28 RBP: ffffbc6a40493c70 R08: 0000000093570362 R09: 0000000000000000 R10: 0000000000000000 R11: ffff9904344eae38 R12: ffff9904289d4000 R13: 0000000000000000 R14: 00000000ffffffa3 R15: ffff9903e76c100f FS: 0000000000000000(0000) GS:ffff990434580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007feed125a000 CR3: 00000001b860a003 CR4: 00000000003606e0 Call Trace: process_adv_report+0x12e/0x560 [bluetooth] hci_le_meta_evt+0x7b2/0xba0 [bluetooth] hci_event_packet+0x1c29/0x2a90 [bluetooth] hci_rx_work+0x19b/0x360 [bluetooth] process_one_work+0x1eb/0x3b0 worker_thread+0x4d/0x400 kthread+0x104/0x140 Fixes: c215e9397b00 ("Bluetooth: Process extended ADV report event") Reported-by: Andy Nguyen <theflow@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Balakrishna Godavarthi <bgodavar@codeaurora.org> Signed-off-by: Alain Michaud <alainm@chromium.org> Tested-by: Sonny Sasaka <sonnysasaka@chromium.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-30mac80211: remove STA txq pending airtime underflow warningFelix Fietkau
This warning can trigger if there is a mismatch between frames that were sent with the sta pointer set vs tx status frames reported for the sta address. This can happen due to race conditions on re-creating stations, or even in the case of .sta_add/remove being used instead of .sta_state, which can cause frames to be sent to a station that has not been uploaded yet. If there is an actual underflow issue, it should show up in the device airtime warning below, so it is better to remove this one. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200725084533.13829-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30mac80211: Fix bug in Tx ack status reporting in 802.3 xmit pathVasanthakumar Thiagarajan
Allocated ack_frame id from local->ack_status_frames is not really stored in the tx_info for 802.3 Tx path. Due to this, tx ack status is not reported and ack_frame id is not freed for the buffers requiring tx ack status. Also move the memset to 0 of tx_info before IEEE80211_TX_CTL_REQ_TX_STATUS flag assignment. Fixes: 50ff477a8639 ("mac80211: add 802.11 encapsulation offloading support") Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@codeaurora.org> Link: https://lore.kernel.org/r/1595427617-1713-1-git-send-email-vthiagar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30cfg80211: check vendor command doit pointer before useJulian Squires
In the case where a vendor command does not implement doit, and has no flags set, doit would not be validated and a NULL pointer dereference would occur, for example when invoking the vendor command via iw. I encountered this while developing new vendor commands. Perhaps in practice it is advisable to always implement doit along with dumpit, but it seems reasonable to me to always check doit anyway, not just when NEED_WDEV. Signed-off-by: Julian Squires <julian@cipht.net> Link: https://lore.kernel.org/r/20200706211353.2366470-1-julian@cipht.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30mac80211: mesh: Free pending skb when destroying a mpathRemi Pommarel
A mpath object can hold reference on a list of skb that are waiting for mpath resolution to be sent. When destroying a mpath this skb list should be cleaned up in order to not leak memory. Fixing that kind of leak: unreferenced object 0xffff0000181c9300 (size 1088): comm "openvpn", pid 1782, jiffies 4295071698 (age 80.416s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 f9 80 36 00 00 00 00 00 ..........6..... 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000004bc6a443>] kmem_cache_alloc+0x1a4/0x2f0 [<000000002caaef13>] sk_prot_alloc.isra.39+0x34/0x178 [<00000000ceeaa916>] sk_alloc+0x34/0x228 [<00000000ca1f1d04>] inet_create+0x198/0x518 [<0000000035626b1c>] __sock_create+0x134/0x328 [<00000000a12b3a87>] __sys_socket+0xb0/0x158 [<00000000ff859f23>] __arm64_sys_socket+0x40/0x58 [<00000000263486ec>] el0_svc_handler+0xd0/0x1a0 [<0000000005b5157d>] el0_svc+0x8/0xc unreferenced object 0xffff000012973a40 (size 216): comm "openvpn", pid 1782, jiffies 4295082137 (age 38.660s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 c0 06 16 00 00 ff ff 00 93 1c 18 00 00 ff ff ................ backtrace: [<000000004bc6a443>] kmem_cache_alloc+0x1a4/0x2f0 [<0000000023c8c8f9>] __alloc_skb+0xc0/0x2b8 [<000000007ad950bb>] alloc_skb_with_frags+0x60/0x320 [<00000000ef90023a>] sock_alloc_send_pskb+0x388/0x3c0 [<00000000104fb1a3>] sock_alloc_send_skb+0x1c/0x28 [<000000006919d2dd>] __ip_append_data+0xba4/0x11f0 [<0000000083477587>] ip_make_skb+0x14c/0x1a8 [<0000000024f3d592>] udp_sendmsg+0xaf0/0xcf0 [<000000005aabe255>] inet_sendmsg+0x5c/0x80 [<000000008651ea08>] __sys_sendto+0x15c/0x218 [<000000003505c99b>] __arm64_sys_sendto+0x74/0x90 [<00000000263486ec>] el0_svc_handler+0xd0/0x1a0 [<0000000005b5157d>] el0_svc+0x8/0xc Fixes: 2bdaf386f99c (mac80211: mesh: move path tables into if_mesh) Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://lore.kernel.org/r/20200704135419.27703-1-repk@triplefau.lt Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30mac80211: mesh: Free ie data when leaving meshRemi Pommarel
At ieee80211_join_mesh() some ie data could have been allocated (see copy_mesh_setup()) and need to be cleaned up when leaving the mesh. This fixes the following kmemleak report: unreferenced object 0xffff0000116bc600 (size 128): comm "wpa_supplicant", pid 608, jiffies 4294898983 (age 293.484s) hex dump (first 32 bytes): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 0............... 00 0f ac 08 00 00 00 00 c4 65 40 00 00 00 00 00 .........e@..... backtrace: [<00000000bebe439d>] __kmalloc_track_caller+0x1c0/0x330 [<00000000a349dbe1>] kmemdup+0x28/0x50 [<0000000075d69baa>] ieee80211_join_mesh+0x6c/0x3b8 [mac80211] [<00000000683bb98b>] __cfg80211_join_mesh+0x1e8/0x4f0 [cfg80211] [<0000000072cb507f>] nl80211_join_mesh+0x520/0x6b8 [cfg80211] [<0000000077e9bcf9>] genl_family_rcv_msg+0x374/0x680 [<00000000b1bd936d>] genl_rcv_msg+0x78/0x108 [<0000000022c53788>] netlink_rcv_skb+0xb0/0x1c0 [<0000000011af8ec9>] genl_rcv+0x34/0x48 [<0000000069e41f53>] netlink_unicast+0x268/0x2e8 [<00000000a7517316>] netlink_sendmsg+0x320/0x4c0 [<0000000069cba205>] ____sys_sendmsg+0x354/0x3a0 [<00000000e06bab0f>] ___sys_sendmsg+0xd8/0x120 [<0000000037340728>] __sys_sendmsg+0xa4/0xf8 [<000000004fed9776>] __arm64_sys_sendmsg+0x44/0x58 [<000000001c1e5647>] el0_svc_handler+0xd0/0x1a0 Fixes: c80d545da3f7 (mac80211: Let userspace enable and configure vendor specific path selection.) Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://lore.kernel.org/r/20200704135007.27292-1-repk@triplefau.lt Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30mac80211: fix warning in 6 GHz IE addition in mesh modeRajkumar Manoharan
The commit 24a2042cb22f ("mac80211: add HE 6 GHz Band Capability element") failed to check device capability before adding HE 6 GHz capability element. Below warning is reported in 11ac device in mesh. Fix that by checking device capability at HE 6 GHz cap IE addition in mesh beacon and association request. WARNING: CPU: 1 PID: 1897 at net/mac80211/util.c:2878 ieee80211_ie_build_he_6ghz_cap+0x149/0x150 [mac80211] [ 3138.720358] Call Trace: [ 3138.720361] ieee80211_mesh_build_beacon+0x462/0x530 [mac80211] [ 3138.720363] ieee80211_start_mesh+0xa8/0xf0 [mac80211] [ 3138.720365] __cfg80211_join_mesh+0x122/0x3e0 [cfg80211] [ 3138.720368] nl80211_join_mesh+0x3d3/0x510 [cfg80211] Fixes: 24a2042cb22f ("mac80211: add HE 6 GHz Band Capability element") Reported-by: Markus Theil <markus.theil@tu-ilmenau.de> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Link: https://lore.kernel.org/r/1593656424-18240-1-git-send-email-rmanohar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-30espintcp: count packets dropped in espintcp_rcvSabrina Dubroca
Currently, espintcp_rcv drops packets silently, which makes debugging issues difficult. Count packets as either XfrmInHdrError (when the packet was too short or contained invalid data) or XfrmInError (for other issues). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-30espintcp: handle short messages instead of breaking the encap socketSabrina Dubroca
Currently, short messages (less than 4 bytes after the length header) will break the stream of messages. This is unnecessary, since we can still parse messages even if they're too short to contain any usable data. This is also bogus, as keepalive messages (a single 0xff byte), though not needed with TCP encapsulation, should be allowed. This patch changes the stream parser so that short messages are accepted and dropped in the kernel. Messages that contain a valid SPI or non-ESP header are processed as before. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Reported-by: Andrew Cagney <cagney@libreswan.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-29Merge tag '9p-for-5.8-2' of git://github.com/martinetd/linux into masterLinus Torvalds
Pull 9p fixes from Dominique Martinet: "A couple of syzcaller fixes for 5.8 The first one in particular has been quite noisy ("broke" in -rc5) so this would be worth landing even this late even if users likely won't see a difference" * tag '9p-for-5.8-2' of git://github.com/martinetd/linux: 9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work net/9p: validate fds in p9_fd_open
2020-07-29mlxsw: spectrum: Use different trap group for externally routed packetsIdo Schimmel
Cited commit mistakenly removed the trap group for externally routed packets (e.g., via the management interface) and grouped locally routed and externally routed packet traps under the same group, thereby subjecting them to the same policer. This can result in problems, for example, when FRR is restarted and suddenly all transient traffic is trapped to the CPU because of a default route through the management interface. Locally routed packets required to re-establish a BGP connection will never reach the CPU and the routing tables will not be re-populated. Fix this by using a different trap group for externally routed packets. Fixes: 8110668ecd9a ("mlxsw: spectrum_trap: Register layer 3 control traps") Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29ipv4: Silence suspicious RCU usage warningIdo Schimmel
fib_trie_unmerge() is called with RTNL held, but not from an RCU read-side critical section. This leads to the following warning [1] when the FIB alias list in a leaf is traversed with hlist_for_each_entry_rcu(). Since the function is always called with RTNL held and since modification of the list is protected by RTNL, simply use hlist_for_each_entry() and silence the warning. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted ----------------------------- net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/164: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 stack backtrace: CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d fib_trie_unmerge+0x608/0xdb0 fib_unmerge+0x44/0x360 fib4_rule_configure+0xc8/0xad0 fib_nl_newrule+0x37a/0x1dd0 rtnetlink_rcv_msg+0x4f7/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 ____sys_sendmsg+0x879/0xa00 ___sys_sendmsg+0x122/0x190 __sys_sendmsg+0x103/0x1d0 __x64_sys_sendmsg+0x7d/0xb0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc80a234e97 Code: Bad RIP value. RSP: 002b:00007ffef8b66798 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc80a234e97 RDX: 0000000000000000 RSI: 00007ffef8b66800 RDI: 0000000000000003 RBP: 000000005f141b1c R08: 0000000000000001 R09: 0000000000000000 R10: 00007fc80a2a8ac0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ffef8b67008 R15: 0000556fccb10020 Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28ipv6: Fix nexthop refcnt leak when creating ipv6 route infoXiyu Yang
ip6_route_info_create() invokes nexthop_get(), which increases the refcount of the "nh". When ip6_route_info_create() returns, local variable "nh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of ip6_route_info_create(). When nexthops can not be used with source routing, the function forgets to decrease the refcnt increased by nexthop_get(), causing a refcnt leak. Fix this issue by pulling up the error source routing handling when nexthops can not be used with source routing. Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28xfrm: esp6: fix the location of the transport header with encapsulationSabrina Dubroca
commit 17175d1a27c6 ("xfrm: esp6: fix encapsulation header offset computation") changed esp6_input_done2 to correctly find the size of the IPv6 header that precedes the TCP/UDP encapsulation header, but didn't adjust the final call to skb_set_transport_header, which I assumed was correct in using skb_network_header_len. Xiumei Mu reported that when we create xfrm states that include port numbers in the selector, traffic from the user sockets is dropped. It turns out that we get a state mismatch in __xfrm_policy_check, because we end up trying to compare the encapsulation header's ports with the selector that's based on user traffic ports. Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP") Fixes: 26333c37fc28 ("xfrm: add IPv6 support for espintcp") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-27fix a braino in cmsghdr_from_user_compat_to_kern()Al Viro
commit 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") missed one of the places where ucmlen should've been replaced with cmsg.cmsg_len, now that we are fetching the entire struct rather than doing it field-by-field. As the result, compat sendmsg() with several different-sized cmsg attached started to fail with EINVAL. Trivial to fix, fortunately. Fixes: 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") Reported-by: Nick Bowler <nbowler@draconx.ca> Tested-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27mptcp: fix joined subflows with unblocking skMatthieu Baerts
Unblocking sockets used for outgoing connections were not containing inet info about the initial connection due to a typo there: the value of "err" variable is negative in the kernelspace. This fixes the creation of additional subflows where the remote port has to be reused if the other host didn't announce another one. This also fixes inet_diag showing blank info about MPTCP sockets from unblocking sockets doing a connect(). Fixes: 41be81a8d3d0 ("mptcp: fix unblocking connect()") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into masterLinus Torvalds
Pull networking fixes from David Miller: 1) Fix RCU locaking in iwlwifi, from Johannes Berg. 2) mt76 can access uninitialized NAPI struct, from Felix Fietkau. 3) Fix race in updating pause settings in bnxt_en, from Vasundhara Volam. 4) Propagate error return properly during unbind failures in ax88172a, from George Kennedy. 5) Fix memleak in adf7242_probe, from Liu Jian. 6) smc_drv_probe() can leak, from Wang Hai. 7) Don't muck with the carrier state if register_netdevice() fails in the bonding driver, from Taehee Yoo. 8) Fix memleak in dpaa_eth_probe, from Liu Jian. 9) Need to check skb_put_padto() return value in hsr_fill_tag(), from Murali Karicheri. 10) Don't lose ionic RSS hash settings across FW update, from Shannon Nelson. 11) Fix clobbered SKB control block in act_ct, from Wen Xu. 12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang. 13) IS_UDPLITE cleanup a long time ago, incorrectly handled transformations involving UDPLITE_RECV_CC. From Miaohe Lin. 14) Unbalanced locking in netdevsim, from Taehee Yoo. 15) Suppress false-positive error messages in qed driver, from Alexander Lobakin. 16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye. 17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost. 18) Uninitialized value in geneve_changelink(), from Cong Wang. 19) Fix deadlock in xen-netfront, from Andera Righi. 19) flush_backlog() frees skbs with IRQs disabled, so should use dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) drivers/net/wan: lapb: Corrected the usage of skb_cow dev: Defer free of skbs in flush_backlog qrtr: orphan socket in qrtr_release() xen-netfront: fix potential deadlock in xennet_remove() flow_offload: Move rhashtable inclusion to the source file geneve: fix an uninitialized value in geneve_changelink() bonding: check return value of register_netdevice() in bond_newlink() tcp: allow at most one TLP probe per flight AX.25: Prevent integer overflows in connect and sendmsg cxgb4: add missing release on skb in uld_send() net: atlantic: fix PTP on AQC10X AX.25: Prevent out-of-bounds read in ax25_sendmsg() sctp: shrink stream outq when fails to do addstream reconf sctp: shrink stream outq only when new outcnt < old outcnt AX.25: Fix out-of-bounds read in ax25_connect() enetc: Remove the mdio bus on PF probe bailout net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload net: ethernet: ave: Fix error returns in ave_init drivers/net/wan/x25_asy: Fix to make it work ipvs: fix the connection sync failed in some cases ...
2020-07-24dev: Defer free of skbs in flush_backlogSubash Abhinov Kasiviswanathan
IRQs are disabled when freeing skbs in input queue. Use the IRQ safe variant to free skbs here. Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24qrtr: orphan socket in qrtr_release()Cong Wang
We have to detach sock from socket in qrtr_release(), otherwise skb->sk may still reference to this socket when the skb is released in tun->queue, particularly sk->sk_wq still points to &sock->wq, which leads to a UAF. Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space") Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24flow_offload: Move rhashtable inclusion to the source fileHerbert Xu
I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix NAT hook deletion when table is dormant, from Florian Westphal. 2) Fix IPVS sync stalls, from guodeqing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23tcp: allow at most one TLP probe per flightYuchung Cheng
Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23AX.25: Prevent integer overflows in connect and sendmsgDan Carpenter
We recently added some bounds checking in ax25_connect() and ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because they were no longer required. Unfortunately, I believe they are required to prevent integer overflows so I have added them back. Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()") Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22AX.25: Prevent out-of-bounds read in ax25_sendmsg()Peilin Ye
Checks on `addr_len` and `usax->sax25_ndigis` are insufficient. ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7 or 8. Fix it. It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)` Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22sctp: shrink stream outq when fails to do addstream reconfXin Long
When adding a stream with stream reconf, the new stream firstly is in CLOSED state but new out chunks can still be enqueued. Then once gets the confirmation from the peer, the state will change to OPEN. However, if the peer denies, it needs to roll back the stream. But when doing that, it only sets the stream outcnt back, and the chunks already in the new stream don't get purged. It caused these chunks can still be dequeued in sctp_outq_dequeue_data(). As its stream is still in CLOSE, the chunk will be enqueued to the head again by sctp_outq_head_data(). This chunk will never be sent out, and the chunks after it can never be dequeued. The assoc will be 'hung' in a dead loop of sending this chunk. To fix it, this patch is to purge these chunks already in the new stream by calling sctp_stream_shrink_out() when failing to do the addstream reconf. Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22sctp: shrink stream outq only when new outcnt < old outcntXin Long
It's not necessary to go list_for_each for outq->out_chunk_list when new outcnt >= old outcnt, as no chunk with higher sid than new (outcnt - 1) exists in the outqueue. While at it, also move the list_for_each code in a new function sctp_stream_shrink_out(), which will be used in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22AX.25: Fix out-of-bounds read in ax25_connect()Peilin Ye
Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient. ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis` equals to 7 or 8. Fix it. This issue has been reported as a KMSAN uninit-value bug, because in such a case, ax25_connect() reaches into the uninitialized portion of the `struct sockaddr_storage` statically allocated in __sys_connect(). It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`. Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22af_key: pfkey_dump needs parameter validationMark Salyzyn
In pfkey_dump() dplen and splen can both be specified to access the xfrm_address_t structure out of bounds in__xfrm_state_filter_match() when it calls addr_match() with the indexes. Return EINVAL if either are out of range. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-team@android.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-22ipvs: fix the connection sync failed in some casesguodeqing
The sync_thread_backup only checks sk_receive_queue is empty or not, there is a situation which cannot sync the connection entries when sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf, the sync packets are dropped in __udp_enqueue_schedule_skb, this is because the packets in reader_queue is not read, so the rmem is not reclaimed. Here I add the check of whether the reader_queue of the udp sock is empty or not to solve this problem. Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Reported-by: zhouxudong <zhouxudong8@huawei.com> Signed-off-by: guodeqing <geffrey.guo@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-21net: udp: Fix wrong clean up for IS_UDPLITE macroMiaohe Lin
We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21net-sysfs: add a newline when printing 'tx_timeout' by sysfsXiongfeng Wang
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading. root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21udp: Improve load balancing for SO_REUSEPORT.Kuniyuki Iwashima
Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21udp: Copy has_conns in reuseport_grow().Kuniyuki Iwashima
If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20tipc: allow to build NACK message in link timeout functionTung Nguyen
Commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment. This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer. Fixes: 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20net/sched: act_ct: fix restore the qdisc_skb_cb after defragwenxu
The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20net: hsr: check for return value of skb_put_padto()Murali Karicheri
skb_put_padto() can fail. So check for return type and return NULL for skb. Caller checks for skb and acts correctly if it is NULL. Fixes: 6d6148bc78d2 ("net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20net/smc: fix dmb buffer shortageKarsten Graul
There is a current limit of 1920 registered dmb buffers per ISM device for smc-d. One link group can contain 255 connections, each connection is using one dmb buffer. When the connection is closed then the registered buffer is held in a queue and is reused by the next connection. When a link group is 'full' then another link group is created and uses an own buffer pool. The link groups are added to a list using list_add() which puts a new link group to the first position in the list. In the situation that many connections are opened (>1920) and a few of them stay open while others are closed quickly we end up with at least 8 link groups. For a new connection a matching link group is looked up, iterating over the list of link groups. The trailing 7 link groups all have registered dmb buffers which could be reused, while the first link group has only a few dmb buffers and then hit the 1920 limit. Because the first link group is not full (255 connection limit not reached) it is chosen and finally the connection falls back to TCP because there is no dmb buffer available in this link group. There are multiple ways to fix that: using list_add_tail() allows to scan older link groups first for free buffers which ensures that buffers are reused first. This fixes the problem for smc-r link groups as well. For smc-d there is an even better way to address this problem because smc-d does not have the 255 connections per link group limit. So fix the problem for smc-d by allowing large link groups. Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20net/smc: put slot when connection is killedKarsten Graul
To get a send slot smc_wr_tx_get_free_slot() is called, which might wait for a free slot. When smc_wr_tx_get_free_slot() returns there is a check if the connection was killed in the meantime. In that case don't only return an error, but also put back the free slot. Fixes: b290098092e4 ("net/smc: cancel send and receive for terminated socket") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATADavid Howells
rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read. Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read as this particular error doesn't get stored in ->sk_err by the networking core. Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive errors (there's no way for it to report which call, if any, the error was caused by). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net/smc: fix restoring of fallback changesKarsten Graul
When a listen socket is closed then all non-accepted sockets in its accept queue are to be released. Inside __smc_release() the helper smc_restore_fallback_changes() restores the changes done to the socket without to check if the clcsocket has a file set. This can result in a crash. Fix this by checking the file pointer first. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: f536dffc0b79 ("net/smc: fix closing of fallback SMC sockets") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>