summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-04-30mptcp: avoid a WARN on bad input.Paolo Abeni
Syzcaller has found a way to trigger the WARN_ON_ONCE condition in check_fully_established(). The root cause is a legit fallback to TCP scenario, so replace the WARN with a plain message on a more strict condition. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: move option parsing into mptcp_incoming_options()Paolo Abeni
The mptcp_options_received structure carries several per packet flags (mp_capable, mp_join, etc.). Such fields must be cleared on each packet, even on dropped ones or packet not carrying any MPTCP options, but the current mptcp code clears them only on TCP option reset. On several races/corner cases we end-up with stray bits in incoming options, leading to WARN_ON splats. e.g.: [ 171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713 [ 171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata [ 171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95 [ 171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c [ 171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282 [ 171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e [ 171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955 [ 171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca [ 171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020 [ 171.228460] FS: 00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000 [ 171.230065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0 [ 171.232586] Call Trace: [ 171.233109] <IRQ> [ 171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691) [ 171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832) [ 171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1)) [ 171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217) [ 171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822) [ 171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730) [ 171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009) [ 171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1)) [ 171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232) [ 171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252) [ 171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539) [ 171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135) [ 171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640) [ 171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293) [ 171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083) [ 171.282358] </IRQ> We could address the issue clearing explicitly the relevant fields in several places - tcp_parse_option, tcp_fast_parse_options, possibly others. Instead we move the MPTCP option parsing into the already existing mptcp ingress hook, so that we need to clear the fields in a single place. This allows us dropping an MPTCP hook from the TCP code and removing the quite large mptcp_options_received from the tcp_sock struct. On the flip side, the MPTCP sockets will traverse the option space twice (in tcp_parse_option() and in mptcp_incoming_options(). That looks acceptable: we already do that for syn and 3rd ack packets, plain TCP socket will benefit from it, and even MPTCP sockets will experience better code locality, reducing the jumps between TCP and MPTCP code. v1 -> v2: - rebased on current '-net' tree Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: consolidate synack processing.Paolo Abeni
Currently the MPTCP code uses 2 hooks to process syn-ack packets, mptcp_rcv_synsent() and the sk_rx_dst_set() callback. We can drop the first, moving the relevant code into the latter, reducing the hooking into the TCP code. This is also needed by the next patch. v1 -> v2: - use local tcp sock ptr instead of casting the sk variable several times - DaveM Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29mptcp: replace mptcp_disconnect with a stubFlorian Westphal
Paolo points out that mptcp_disconnect is bogus: "lock_sock(sk); looks suspicious (lock should be already held by the caller) And call to: tcp_disconnect(sk, flags); too, sk is not a tcp socket". ->disconnect() gets called from e.g. inet_stream_connect when one tries to disassociate a connected socket again (to re-connect without closing the socket first). MPTCP however uses mptcp_stream_connect, not inet_stream_connect, for the mptcp-socket connect call. inet_stream_connect only gets called indirectly, for the tcp socket, so any ->disconnect() calls end up calling tcp_disconnect for that tcp subflow sk. This also explains why syzkaller has not yet reported a problem here. So for now replace this with a stub that doesn't do anything. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29netfilter: nf_osf: avoid passing pointer to local varArnd Bergmann
gcc-10 points out that a code path exists where a pointer to a stack variable may be passed back to the caller: net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init': cc1: warning: function may return address of local variable [-Wreturn-local-addr] net/netfilter/nfnetlink_osf.c:171:16: note: declared here 171 | struct tcphdr _tcph; | ^~~~~ I am not sure whether this can happen in practice, but moving the variable declaration into the callers avoids the problem. Fixes: 31a9c29210e2 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-28net/x25: Fix null-ptr-deref in x25_disconnectYueHaibing
We should check null before do x25_neigh_put in x25_disconnect, otherwise may cause null-ptr-deref like this: #include <sys/socket.h> #include <linux/x25.h> int main() { int sck_x25; sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0); close(sck_x25); return 0; } BUG: kernel NULL pointer dereference, address: 00000000000000d8 CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3- RIP: 0010:x25_disconnect+0x91/0xe0 Call Trace: x25_release+0x18a/0x1b0 __sock_release+0x3d/0xc0 sock_close+0x13/0x20 __fput+0x107/0x270 ____fput+0x9/0x10 task_work_run+0x6d/0xb0 exit_to_usermode_loop+0x102/0x110 do_syscall_64+0x23c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28Merge tag 'nfs-rdma-for-5.7-2' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA Client Fixes for Linux 5.7 Bugfixes: - Restore wake-up-all to rpcrdma_cm_event_handler() - Otherwise the client won't respond to server disconnect requests - Fix tracepoint use-after-free race - Fix usage of xdr_stream_encode_item_{present, absent} - These functions return a size on success, and not 0 Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-28SUNRPC: defer slow parts of rpc_free_client() to a workqueue.NeilBrown
The rpciod workqueue is on the write-out path for freeing dirty memory, so it is important that it never block waiting for memory to be allocated - this can lead to a deadlock. rpc_execute() - which is often called by an rpciod work item - calls rcp_task_release_client() which can lead to rpc_free_client(). rpc_free_client() makes two calls which could potentially block wating for memory allocation. rpc_clnt_debugfs_unregister() calls into debugfs and will block while any of the debugfs files are being accessed. In particular it can block while any of the 'open' methods are being called and all of these use malloc for one thing or another. So this can deadlock if the memory allocation waits for NFS to complete some writes via rpciod. rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't obvious that memory allocations can happen while the lock it held, it is safer to assume they might and to not let rpciod call rpc_clnt_remove_pipedir(). So this patch moves these two calls (together with the final kfree() and rpciod_down()) into a work-item to be run from the system work-queue. rpciod can continue its important work, and the final stages of the free can happen whenever they happen. I have seen this deadlock on a 4.12 based kernel where debugfs used synchronize_srcu() when removing objects. synchronize_srcu() requires a workqueue and there were no free workther threads and none could be allocated. While debugsfs no longer uses SRCU, I believe the deadlock is still possible. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-27bpf: Fix sk_psock refcnt leak when receiving messageXiyu Yang
tcp_bpf_recvmsg() invokes sk_psock_get(), which returns a reference of the specified sk_psock object to "psock" with increased refcnt. When tcp_bpf_recvmsg() returns, local variable "psock" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in several exception handling paths of tcp_bpf_recvmsg(). When those error scenarios occur such as "flags" includes MSG_ERRQUEUE, the function forgets to decrease the refcnt increased by sk_psock_get(), causing a refcnt leak. Fix this issue by calling sk_psock_put() or pulling up the error queue read handling when those error scenarios occur. Fixes: e7a5f1f1cd000 ("bpf/sockmap: Read psock ingress_msg before sk_receive_queue") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1587872115-42805-1-git-send-email-xiyuyang19@fudan.edu.cn
2020-04-27Merge tag 'batadv-net-for-davem-20200427' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix random number generation in network coding, by George Spelvin - fix reference counter leaks, by Xiyu Yang (3 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27sch_sfq: validate silly quantum valuesEric Dumazet
syzbot managed to set up sfq so that q->scaled_quantum was zero, triggering an infinite loop in sfq_dequeue() More generally, we must only accept quantum between 1 and 2^18 - 7, meaning scaled_quantum must be in [1, 0x7FFF] range. Otherwise, we also could have a loop in sfq_dequeue() if scaled_quantum happens to be 0x8000, since slot->allot could indefinitely switch between 0 and 0x8000. Fixes: eeaeb068f139 ("sch_sfq: allow big packets and be fair") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27sch_choke: avoid potential panic in choke_reset()Eric Dumazet
If choke_init() could not allocate q->tab, we would crash later in choke_reset(). BUG: KASAN: null-ptr-deref in memset include/linux/string.h:366 [inline] BUG: KASAN: null-ptr-deref in choke_reset+0x208/0x340 net/sched/sch_choke.c:326 Write of size 8 at addr 0000000000000000 by task syz-executor822/7022 CPU: 1 PID: 7022 Comm: syz-executor822 Not tainted 5.7.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 __kasan_report.cold+0x5/0x4d mm/kasan/report.c:515 kasan_report+0x33/0x50 mm/kasan/common.c:625 check_memory_region_inline mm/kasan/generic.c:187 [inline] check_memory_region+0x141/0x190 mm/kasan/generic.c:193 memset+0x20/0x40 mm/kasan/common.c:85 memset include/linux/string.h:366 [inline] choke_reset+0x208/0x340 net/sched/sch_choke.c:326 qdisc_reset+0x6b/0x520 net/sched/sch_generic.c:910 dev_deactivate_queue.constprop.0+0x13c/0x240 net/sched/sch_generic.c:1138 netdev_for_each_tx_queue include/linux/netdevice.h:2197 [inline] dev_deactivate_many+0xe2/0xba0 net/sched/sch_generic.c:1195 dev_deactivate+0xf8/0x1c0 net/sched/sch_generic.c:1233 qdisc_graft+0xd25/0x1120 net/sched/sch_api.c:1051 tc_modify_qdisc+0xbab/0x1a00 net/sched/sch_api.c:1670 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 Fixes: 77e62da6e60c ("sch_choke: drop all packets in queue during reset") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checksEric Dumazet
My intent was to not let users set a zero drop_batch_size, it seems I once again messed with min()/max(). Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27net/tls: Fix sk_psock refcnt leak when in tls_data_ready()Xiyu Yang
tls_data_ready() invokes sk_psock_get(), which returns a reference of the specified sk_psock object to "psock" with increased refcnt. When tls_data_ready() returns, local variable "psock" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of tls_data_ready(). When "psock->ingress_msg" is empty but "psock" is not NULL, the function forgets to decrease the refcnt increased by sk_psock_get(), causing a refcnt leak. Fix this issue by calling sk_psock_put() on all paths when "psock" is not NULL. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27net/x25: Fix x25_neigh refcnt leak when x25 disconnectXiyu Yang
x25_connect() invokes x25_get_neigh(), which returns a reference of the specified x25_neigh object to "x25->neighbour" with increased refcnt. When x25 connect success and returns, the reference still be hold by "x25->neighbour", so the refcount should be decreased in x25_disconnect() to keep refcount balanced. The reference counting issue happens in x25_disconnect(), which forgets to decrease the refcnt increased by x25_get_neigh() in x25_connect(), causing a refcnt leak. Fix this issue by calling x25_neigh_put() before x25_disconnect() returns. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict()Xiyu Yang
bpf_exec_tx_verdict() invokes sk_psock_get(), which returns a reference of the specified sk_psock object to "psock" with increased refcnt. When bpf_exec_tx_verdict() returns, local variable "psock" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of bpf_exec_tx_verdict(). When "policy" equals to NULL but "psock" is not NULL, the function forgets to decrease the refcnt increased by sk_psock_get(), causing a refcnt leak. Fix this issue by calling sk_psock_put() on this error path before bpf_exec_tx_verdict() returns. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27vsock/virtio: fix multiple packet delivery to monitoring devicesStefano Garzarella
In virtio_transport.c, if the virtqueue is full, the transmitting packet is queued up and it will be sent in the next iteration. This causes the same packet to be delivered multiple times to monitoring devices. We want to continue to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. This patch fixes the issue, adding a new flag (tap_delivered) in struct virtio_vsock_pkt, to check if the packet is already delivered to monitoring devices. In vhost/vsock.c, we are splitting packets, so we must set 'tap_delivered' to false when we queue up the same virtio_vsock_pkt to handle the remaining bytes. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27SUNRPC: Revert 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()")Chuck Lever
I've noticed that when krb5i or krb5p security is in use, retransmitted requests are missing the server's duplicate reply cache. The computed checksum on the retransmitted request does not match the cached checksum, resulting in the server performing the retransmitted request again instead of returning the cached reply. The assumptions made when removing xdr_buf_trim() were not correct. In the send paths, the upper layer has already set the segment lengths correctly, and shorting the buffer's content is simply a matter of reducing buf->len. xdr_buf_trim() is the right answer in the receive/unwrap path on both the client and the server. The buffer segment lengths have to be shortened one-by-one. On the server side in particular, head.iov_len needs to be updated correctly to enable nfsd_cache_csum() to work correctly. The simple buf->len computation doesn't do that, and that results in checksumming stale data in the buffer. The problem isn't noticed until there's significant instability of the RPC transport. At that point, the reliability of retransmit detection on the server becomes crucial. Fixes: 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-27SUNRPC: Fix GSS privacy computation of auth->au_ralignChuck Lever
When the au_ralign field was added to gss_unwrap_resp_priv, the wrong calculation was used. Setting au_rslack == au_ralign is probably correct for kerberos_v1 privacy, but kerberos_v2 privacy adds additional GSS data after the clear text RPC message. au_ralign needs to be smaller than au_rslack in that fairly common case. When xdr_buf_trim() is restored to gss_unwrap_kerberos_v2(), it does exactly what I feared it would: it trims off part of the clear text RPC message. However, that's because rpc_prepare_reply_pages() does not set up the rq_rcv_buf's tail correctly because au_ralign is too large. Fixing the au_ralign computation also corrects the alignment of rq_rcv_buf->pages so that the client does not have to shift reply data payloads after they are received. Fixes: 35e77d21baa0 ("SUNRPC: Add rpc_auth::au_ralign field") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-27SUNRPC: Add "@len" parameter to gss_unwrap()Chuck Lever
Refactor: This is a pre-requisite to fixing the client-side ralign computation in gss_unwrap_resp_priv(). The length value is passed in explicitly rather that as the value of buf->len. This will subsequently allow gss_unwrap_kerberos_v1() to compute a slack and align value, instead of computing it in gss_unwrap_resp_priv(). Fixes: 35e77d21baa0 ("SUNRPC: Add rpc_auth::au_ralign field") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-26netfilter: nat: never update the UDP checksum when it's 0Guillaume Nault
If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN device has disabled UDP checksums and enabled Tx checksum offloading, then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet checksum offloaded). Because of the ->ip_summed value, udp_manip_pkt() tries to update the outer checksum with the new address and port, leading to an invalid checksum sent on the wire, as the original null checksum obviously didn't take the old address and port into account. So, we can't take ->ip_summed into account in udp_manip_pkt(), as it might not refer to the checksum we're acting on. Instead, we can base the decision to update the UDP checksum entirely on the value of hdr->check, because it's null if and only if checksum is disabled: * A fully computed checksum can't be 0, since a 0 checksum is represented by the CSUM_MANGLED_0 value instead. * A partial checksum can't be 0, since the pseudo-header always adds at least one non-zero value (the UDP protocol type 0x11) and adding more values to the sum can't make it wrap to 0 as the carry is then added to the wrapped number. * A disabled checksum uses the special value 0. The problem seems to be there from day one, although it was probably not visible before UDP tunnels were implemented. Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-25net: remove obsolete commentEric Dumazet
Commit b656722906ef ("net: Increase the size of skb_frag_t") removed the 16bit limitation of a frag on some 32bit arches. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25mptcp: fix race in msk status updatePaolo Abeni
Currently subflow_finish_connect() changes unconditionally any msk socket status other than TCP_ESTABLISHED. If an unblocking connect() races with close(), we can end-up triggering: IPv4: Attempt to release TCP socket in state 1 00000000e32b8b7e when the msk socket is disposed. Be sure to enter the established status only from SYN_SENT. Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in netfilter flowtable, from Roi Dayan. 2) Ref-count leaks in netrom and tipc, from Xiyu Yang. 3) Fix warning when mptcp socket is never accepted before close, from Florian Westphal. 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang. 5) Fix large delays during PTP synchornization in cxgb4, from Rahul Lakkireddy. 6) team_mode_get() can hang, from Taehee Yoo. 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver, from Niklas Schnelle. 8) Fix handling of bpf XADD on BTF memory, from Jann Horn. 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson. 10) Missing queue memory release in iwlwifi pcie code, from Johannes Berg. 11) Fix NULL deref in macvlan device event, from Taehee Yoo. 12) Initialize lan87xx phy correctly, from Yuiko Oshino. 13) Fix looping between VRF and XFRM lookups, from David Ahern. 14) etf packet scheduler assumes all sockets are full sockets, which is not necessarily true. From Eric Dumazet. 15) Fix mptcp data_fin handling in RX path, from Paolo Abeni. 16) fib_select_default() needs to handle nexthop objects, from David Ahern. 17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun. 18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca. 19) Correct rx/tx stats in bcmgenet driver, from Doug Berger. 20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix from Luke Nelson. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits) selftests/bpf: Fix a couple of broken test_btf cases tools/runqslower: Ensure own vmlinux.h is picked up first bpf: Make bpf_link_fops static bpftool: Respect the -d option in struct_ops cmd selftests/bpf: Add test for freplace program with expected_attach_type bpf: Propagate expected_attach_type when verifying freplace programs bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd bpf, x86_32: Fix logic error in BPF_LDX zero-extension bpf, x86_32: Fix clobbering of dst for BPF_JSET bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension bpf: Fix reStructuredText markup net: systemport: suppress warnings on failed Rx SKB allocations net: bcmgenet: suppress warnings on failed Rx SKB allocations macsec: avoid to set wrong mtu mac80211: sta_info: Add lockdep condition for RCU list usage mac80211: populate debugfs only after cfg80211 init net: bcmgenet: correct per TX/RX ring statistics net: meth: remove spurious copyright text net: phy: bcm84881: clear settings on link down chcr: Fix CPU hard lockup ...
2020-04-24Merge tag 'mac80211-for-net-2020-04-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just three changes: * fix a wrong GFP_KERNEL in hwsim * fix the debugfs mess after the mac80211 registration race fix * suppress false-positive RCU list lockdep warnings ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24mac80211: sta_info: Add lockdep condition for RCU list usageMadhuparna Bhowmik
The function sta_info_get_by_idx() uses RCU list primitive. It is called with local->sta_mtx held from mac80211/cfg.c. Add lockdep expression to avoid any false positive RCU list warnings. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200409082906.27427-1-madhuparnabhowmik10@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-24mac80211: populate debugfs only after cfg80211 initJohannes Berg
When fixing the initialization race, we neglected to account for the fact that debugfs is initialized in wiphy_register(), and some debugfs things went missing (or rather were rerooted to the global debugfs root). Fix this by adding debugfs entries only after wiphy_register(). This requires some changes in the rate control code since it currently adds debugfs at alloc time, which can no longer be done after the reordering. Reported-by: Jouni Malinen <j@w1.fi> Reported-by: kernel test robot <rong.a.chen@intel.com> Reported-by: Hauke Mehrtens <hauke@hauke-m.de> Reported-by: Felix Fietkau <nbd@nbd.name> Cc: stable@vger.kernel.org Fixes: 52e04b4ce5d0 ("mac80211: fix race in ieee80211_register_hw()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/20200423111344.0e00d3346f12.Iadc76a03a55093d94391fc672e996a458702875d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-04-23net/x25: Fix x25_neigh refcnt leak when receiving frameXiyu Yang
x25_lapb_receive_frame() invokes x25_get_neigh(), which returns a reference of the specified x25_neigh object to "nb" with increased refcnt. When x25_lapb_receive_frame() returns, local variable "nb" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one path of x25_lapb_receive_frame(). When pskb_may_pull() returns false, the function forgets to decrease the refcnt increased by x25_get_neigh(), causing a refcnt leak. Fix this issue by calling x25_neigh_put() when pskb_may_pull() returns false. Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23mptcp/pm_netlink.c : add check for nla_put_in/6_addrBo YU
Normal there should be checked for nla_put_in6_addr like other usage in net. Detected by CoverityScan, CID# 1461639 Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Signed-off-by: Bo YU <tsu.yubo@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23Merge tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull nfsd fixes from Chuck Lever: "The first set of 5.7-rc fixes for NFS server issues. These were all unresolved at the time the 5.7 window opened, and needed some additional time to ensure they were correctly addressed. They are ready now. At the moment I know of one more urgent issue regarding the NFS server. A fix has been tested and is under review. I expect to send one more pull request, containing this fix (which now consists of 3 patches). Fixes: - Address several use-after-free and memory leak bugs - Prevent a backchannel livelock" * tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: svcrdma: Fix leak of svc_rdma_recv_ctxt objects svcrdma: Fix trace point use-after-free race SUNRPC: Fix backchannel RPC soft lockups SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge nfsd: memory corruption in nfsd4_lock()
2020-04-22ipv4: Update fib_select_default to handle nexthop objectsDavid Ahern
A user reported [0] hitting the WARN_ON in fib_info_nh: [ 8633.839816] ------------[ cut here ]------------ [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381 ... [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381 ... [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286 [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000 [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000 [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0 [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60 [ 8633.839857] FS: 00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000 [ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0 [ 8633.839867] Call Trace: [ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890 [ 8633.839873] ip_route_output_key_hash+0x5e/0x80 [ 8633.839876] ip_route_output_flow+0x1a/0x50 [ 8633.839878] __ip4_datagram_connect+0x154/0x310 [ 8633.839880] ip4_datagram_connect+0x28/0x40 [ 8633.839882] __sys_connect+0xd6/0x100 ... The WARN_ON is triggered in fib_select_default which is invoked when there are multiple default routes. Update the function to use fib_info_nhc and convert the nexthop checks to use fib_nh_common. Add test case that covers the affected code path. [0] https://github.com/FRRouting/frr/issues/6089 Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22netlabel: Kconfig: Update reference for NetLabel Tools projectSalvatore Bonaccorso
The NetLabel Tools project has moved from http://netlabel.sf.net to a GitHub project. Update to directly refer to the new home for the tools. Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22mptcp: fix data_fin handing in RX pathPaolo Abeni
The data fin flag is set only via a DSS option, but mptcp_incoming_options() copies it unconditionally from the provided RX options. Since we do not clear all the mptcp sock RX options in a socket free/alloc cycle, we can end-up with a stray data_fin value while parsing e.g. MPC packets. That would lead to mapping data corruption and will trigger a few WARN_ON() in the RX path. Instead of adding a costly memset(), fetch the data_fin flag only for DSS packets - when we always explicitly initialize such bit at option parsing time. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22sctp: Fix SHUTDOWN CTSN Ack in the peer restart caseJere Leppänen
When starting shutdown in sctp_sf_do_dupcook_a(), get the value for SHUTDOWN Cumulative TSN Ack from the new association, which is reconstructed from the cookie, instead of the old association, which the peer doesn't have anymore. Otherwise the SHUTDOWN is either ignored or replied to with an ABORT by the peer because CTSN Ack doesn't match the peer's Initial TSN. Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.") Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22sctp: Fix bundling of SHUTDOWN with COOKIE-ACKJere Leppänen
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them at the same time and in the correct order. This bundling was broken by commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc"), which assigns a transport for the COOKIE-ACK, but not for the SHUTDOWN. Fix this by passing a reference to the COOKIE-ACK chunk as an argument to sctp_sf_do_9_2_start_shutdown() and onward to sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same transport as the COOKIE-ACK chunk, which allows them to be bundled. In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was previously unused. Now that we're taking it into use, it must be a valid pointer to a chunk, or NULL. There is only one call site where it's not, in sctp_sf_autoclose_timer_expire(). Fix that too. Fixes: 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc") Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22net: dsa: don't fail to probe if we couldn't set the MTUVladimir Oltean
There is no reason to fail the probing of the switch if the MTU couldn't be configured correctly (either the switch port itself, or the host port) for whatever reason. MTU-sized traffic probably won't work, sure, but we can still probably limp on and support some form of communication anyway, which the users would probably appreciate more. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22sched: etf: do not assume all sockets are full blownEric Dumazet
skb->sk does not always point to a full blown socket, we need to use sk_fullsock() before accessing fields which only make sense on full socket. BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630 CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382 __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511 kasan_report+0x33/0x50 mm/kasan/common.c:625 report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170 __dev_xmit_skb net/core/dev.c:3710 [inline] __dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117 __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176 dst_output include/net/dst.h:435 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280 tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521 tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916 inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline] reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405 expire_timers kernel/time/timer.c:1450 [inline] __run_timers kernel/time/timer.c:1774 [inline] __run_timers kernel/time/timer.c:1741 [inline] run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787 __do_softirq+0x26c/0x9f7 kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x192/0x1d0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792 Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40 RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0 RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8 RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012 R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006 cipher_crypt_one crypto/cipher.c:75 [inline] crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82 crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830 crypto_shash_update+0xc4/0x120 crypto/shash.c:119 shash_ahash_update+0xa3/0x110 crypto/shash.c:246 crypto_ahash_update include/crypto/hash.h:547 [inline] hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x308/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmmsg+0x195/0x480 net/socket.c:2506 __do_sys_sendmmsg net/socket.c:2535 [inline] __se_sys_sendmmsg net/socket.c:2532 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829 RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4 Fixes: 4b15c7075352 ("net/sched: Make etf report drops on error_queue") Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finishDavid Ahern
IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by xfrm code to tell other skb handlers that the packet has been passed through the xfrm output functions. Simplify the code and just always set them rather than conditionally based on netfilter enabled thus making the flag available for other users. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-21SUNRPC: Remove unreachable error conditionXiyu Yang
rpc_clnt_test_and_add_xprt() invokes rpc_call_null_helper(), which return the value of rpc_run_task() to "task". Since rpc_run_task() is impossible to return an ERR pointer, there is no need to add the IS_ERR() condition on "task" here. So we need to remove it. Fixes: 7f554890587c ("SUNRPC: Allow addition of new transports to a struct rpc_clnt") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-21cgroup, netclassid: remove double cond_reschedJiri Slaby
Commit 018d26fcd12a ("cgroup, netclassid: periodically release file_lock on classid") added a second cond_resched to write_classid indirectly by update_classid_task. Remove the one in write_classid. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan. 2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-21batman-adv: Fix refcnt leak in batadv_v_ogm_processXiyu Yang
batadv_v_ogm_process() invokes batadv_hardif_neigh_get(), which returns a reference of the neighbor object to "hardif_neigh" with increased refcount. When batadv_v_ogm_process() returns, "hardif_neigh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling paths of batadv_v_ogm_process(). When batadv_v_ogm_orig_get() fails to get the orig node and returns NULL, the refcnt increased by batadv_hardif_neigh_get() is not decreased, causing a refcnt leak. Fix this issue by jumping to "out" label when batadv_v_ogm_orig_get() fails to get the orig node. Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21batman-adv: Fix refcnt leak in batadv_store_throughput_overrideXiyu Yang
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(), which gets a batadv_hard_iface object from net_dev with increased refcnt and its reference is assigned to a local pointer 'hard_iface'. When batadv_store_throughput_override() returns, "hard_iface" becomes invalid, so the refcount should be decreased to keep refcount balanced. The issue happens in one error path of batadv_store_throughput_override(). When batadv_parse_throughput() returns NULL, the refcnt increased by batadv_hardif_get_by_netdev() is not decreased, causing a refcnt leak. Fix this issue by jumping to "out" label when batadv_parse_throughput() returns NULL. Fixes: 0b5ecc6811bd ("batman-adv: add throughput override attribute to hard_ifaces") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21batman-adv: Fix refcnt leak in batadv_show_throughput_overrideXiyu Yang
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(), which gets a batadv_hard_iface object from net_dev with increased refcnt and its reference is assigned to a local pointer 'hard_iface'. When batadv_show_throughput_override() returns, "hard_iface" becomes invalid, so the refcount should be decreased to keep refcount balanced. The issue happens in the normal path of batadv_show_throughput_override(), which forgets to decrease the refcnt increased by batadv_hardif_get_by_netdev() before the function returns, causing a refcnt leak. Fix this issue by calling batadv_hardif_put() before the batadv_show_throughput_override() returns in the normal path. Fixes: 0b5ecc6811bd ("batman-adv: add throughput override attribute to hard_ifaces") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-21batman-adv: fix batadv_nc_random_weight_tqGeorge Spelvin
and change to pseudorandom numbers, as this is a traffic dithering operation that doesn't need crypto-grade. The previous code operated in 4 steps: 1. Generate a random byte 0 <= rand_tq <= 255 2. Multiply it by BATADV_TQ_MAX_VALUE - tq 3. Divide by 255 (= BATADV_TQ_MAX_VALUE) 4. Return BATADV_TQ_MAX_VALUE - rand_tq This would apperar to scale (BATADV_TQ_MAX_VALUE - tq) by a random value between 0/255 and 255/255. But! The intermediate value between steps 3 and 4 is stored in a u8 variable. So it's truncated, and most of the time, is less than 255, after which the division produces 0. Specifically, if tq is odd, the product is always even, and can never be 255. If tq is even, there's exactly one random byte value that will produce a product byte of 255. Thus, the return value is 255 (511/512 of the time) or 254 (1/512 of the time). If we assume that the truncation is a bug, and the code is meant to scale the input, a simpler way of looking at it is that it's returning a random value between tq and BATADV_TQ_MAX_VALUE, inclusive. Well, we have an optimized function for doing just that. Fixes: 3c12de9a5c75 ("batman-adv: network coding - code and transmit packets if possible") Signed-off-by: George Spelvin <lkml@sdf.org> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-04-20mptcp: drop req socket remote_key* fieldsPaolo Abeni
We don't need them, as we can use the current ingress opt data instead. Setting them in syn_recv_sock() may causes inconsistent mptcp socket status, as per previous commit. Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20mptcp: avoid flipping mp_capable field in syn_recv_sock()Paolo Abeni
If multiple CPUs races on the same req_sock in syn_recv_sock(), flipping such field can cause inconsistent child socket status. When racing, the CPU losing the req ownership may still change the mptcp request socket mp_capable flag while the CPU owning the request is cloning the socket, leaving the child socket with 'is_mptcp' set but no 'mp_capable' flag. Such socket will stay with 'conn' field cleared, heading to oops in later mptcp callback. Address the issue tracking the fallback status in a local variable. Fixes: 58b09919626b ("mptcp: create msk early") Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20mptcp: handle mptcp listener destruction via rcuFlorian Westphal
Following splat can occur during self test: BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160 Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808 subflow_data_ready+0x156/0x160 tcp_child_process+0x6a3/0xb30 tcp_v4_rcv+0x2231/0x3730 ip_protocol_deliver_rcu+0x5c/0x860 ip_local_deliver_finish+0x220/0x360 ip_local_deliver+0x1c8/0x4e0 ip_rcv_finish+0x1da/0x2f0 ip_rcv+0xd0/0x3c0 __netif_receive_skb_one_core+0xf5/0x160 __netif_receive_skb+0x27/0x1c0 process_backlog+0x21e/0x780 net_rx_action+0x35f/0xe90 do_softirq+0x4c/0x50 [..] This occurs when accessing subflow_ctx->conn. Problem is that tcp_child_process() calls listen sockets' sk_data_ready() notification, but it doesn't hold the listener lock. Another cpu calling close() on the listener will then cause transition of refcount to 0. Fixes: 58b09919626bf ("mptcp: create msk early") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20ipv6: fix restrict IPV6_ADDRFORM operationJohn Haxby
Commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a problem found by syzbot an unfortunate logic error meant that it also broke IPV6_ADDRFORM. Rearrange the checks so that the earlier test is just one of the series of checks made before moving the socket from IPv6 to IPv4. Fixes: b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") Signed-off-by: John Haxby <john.haxby@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20net: openvswitch: ovs_ct_exit to be done under ovs_lockTonghao Zhang
syzbot wrote: | ============================= | WARNING: suspicious RCU usage | 5.7.0-rc1+ #45 Not tainted | ----------------------------- | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!! | | other info that might help us debug this: | rcu_scheduler_active = 2, debug_locks = 1 | ... | | stack backtrace: | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 | Workqueue: netns cleanup_net | Call Trace: | ... | ovs_ct_exit | ovs_exit_net | ops_exit_list.isra.7 | cleanup_net | process_one_work | worker_thread To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add lockdep_ovsl_is_held as optional lockdep expression. Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Yi-Hung Wei <yihung.wei@gmail.com> Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>