summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur Gautier. 2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei. 3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano Brivio. 4) Use after free in hns driver, from Yonglong Liu. 5) icmp6_send() needs to handle the case of NULL skb, from Eric Dumazet. 6) Missing rcu read lock in __inet6_bind() when operating on mapped addresses, from David Ahern. 7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva. 8) Fix PHY vs r8169 module loading ordering issues, from Heiner Kallweit. 9) Fix bridge vlan memory leak, from Ido Schimmel. 10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe. 11) Infoleak in ipv6_local_error(), flow label isn't completely initialized. From Eric Dumazet. 12) Handle mv88e6390 errata, from Andrew Lunn. 13) Making vhost/vsock CID hashing consistent, from Zha Bin. 14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo. 15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) bnxt_en: Fix context memory allocation. bnxt_en: Fix ring checking logic on 57500 chips. mISDN: hfcsusb: Use struct_size() in kzalloc() net: clear skb->tstamp in bridge forwarding path net: bpfilter: disallow to remove bpfilter module while being used net: bpfilter: restart bpfilter_umh when error occurred net: bpfilter: use cleanup callback to release umh_info umh: add exit routine for UMH process isdn: i4l: isdn_tty: Fix some concurrency double-free bugs vhost/vsock: fix vhost vsock cid hashing inconsistent net: stmmac: Prevent RX starvation in stmmac_napi_poll() net: stmmac: Fix the logic of checking if RX Watchdog must be enabled net: stmmac: Check if CBS is supported before configuring net: stmmac: dwxgmac2: Only clear interrupts that are active net: stmmac: Fix PCI module removal leak tools/bpf: fix bpftool map dump with bitfields tools/bpf: test btf bitfield with >=256 struct member offset bpf: fix bpffs bitfield pretty print net: ethernet: mediatek: fix warning in phy_start_aneg tcp: change txhash on SYN-data timeout ...
2019-01-11net: clear skb->tstamp in bridge forwarding pathPaolo Abeni
Matteo reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc. Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths"), we need to clear the tstamp field in the bridge forwarding path. Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Reported-and-tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: disallow to remove bpfilter module while being usedTaehee Yoo
The bpfilter.ko module can be removed while functions of the bpfilter.ko are executing. so panic can occurred. in order to protect that, locks can be used. a bpfilter_lock protects routines in the __bpfilter_process_sockopt() but it's not enough because __exit routine can be executed concurrently. Now, the bpfilter_umh can not run in parallel. So, the module do not removed while it's being used and it do not double-create UMH process. The members of the umh_info and the bpfilter_umh_ops are protected by the bpfilter_umh_ops.lock. test commands: while : do iptables -I FORWARD -m string --string ap --algo kmp & modprobe -rv bpfilter & done splat looks like: [ 298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b [ 298.628512] #PF error: [normal kernel read fault] [ 298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0 [ 298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154 [ 298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0 [ 298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6 [ 298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202 [ 298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000 [ 298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000 [ 298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000 [ 298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c [ 298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000 [ 298.638859] FS: 00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 298.638859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0 [ 298.638859] Call Trace: [ 298.638859] ? mutex_lock_io_nested+0x1560/0x1560 [ 298.638859] ? kasan_kmalloc+0xa0/0xd0 [ 298.638859] ? kmem_cache_alloc+0x1c2/0x260 [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? alloc_empty_file+0x43/0x120 [ 298.638859] ? alloc_file_pseudo+0x220/0x330 [ 298.638859] ? sock_alloc_file+0x39/0x160 [ 298.638859] ? __sys_socket+0x113/0x1d0 [ 298.638859] ? __x64_sys_socket+0x6f/0xb0 [ 298.638859] ? do_syscall_64+0x138/0x560 [ 298.638859] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? init_object+0x6b/0x80 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? hlock_class+0x140/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? check_flags.part.37+0x440/0x440 [ 298.638859] ? __lock_acquire+0x4f90/0x4f90 [ 298.638859] ? set_rq_offline.part.89+0x140/0x140 [ ... ] Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: restart bpfilter_umh when error occurredTaehee Yoo
The bpfilter_umh will be stopped via __stop_umh() when the bpfilter error occurred. The bpfilter_umh() couldn't start again because there is no restart routine. The section of the bpfilter_umh_{start/end} is no longer .init.rodata because these area should be reused in the restart routine. hence the section name is changed to .bpfilter_umh. The bpfilter_ops->start() is restart callback. it will be called when bpfilter_umh is stopped. The stop bit means bpfilter_umh is stopped. this bit is set by both start and stop routine. Before this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL [ 480.045136] bpfilter: write fail -32 $ iptables -vnL All iptables commands will fail. After this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL $ iptables -vnL Now, all iptables commands will work. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11net: bpfilter: use cleanup callback to release umh_infoTaehee Yoo
Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback to release members of the umh_info. This patch makes bpfilter_umh's cleanup routine to use the umh_info->cleanup callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-01-11 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix TCP-BPF support for correctly setting the initial window via TCP_BPF_IW on an active TFO sender, from Yuchung. 2) Fix a panic in BPF's stack_map_get_build_id()'s ELF parsing on 32 bit archs caused by page_address() returning NULL, from Song. 3) Fix BTF pretty print in kernel and bpftool when bitfield member offset is greater than 256. Also add test cases, from Yonghong. 4) Fix improper argument handling in xdp1 sample, from Ioana. 5) Install missing tcp_server.py and tcp_client.py files from BPF selftests, from Anders. 6) Add test_libbpf to gitignore in libbpf and BPF selftests, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-10tcp: change txhash on SYN-data timeoutYuchung Cheng
Previously upon SYN timeouts the sender recomputes the txhash to try a different path. However this does not apply on the initial timeout of SYN-data (active Fast Open). Therefore an active IPv6 Fast Open connection may incur one second RTO penalty to take on a new path after the second SYN retransmission uses a new flow label. This patch removes this undesirable behavior so Fast Open changes the flow label just like the regular connections. This also helps avoid falsely disabling Fast Open on the sender which triggers after two consecutive SYN timeouts on Fast Open. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-10ipv6: fix kernel-infoleak in ipv6_local_error()Eric Dumazet
This patch makes sure the flow label in the IPv6 header forged in ipv6_local_error() is initialized. BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675 kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601 _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:177 [inline] move_addr_to_user+0x2e9/0x4f0 net/socket.c:227 ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284 __sys_recvmsg net/socket.c:2327 [inline] __do_sys_recvmsg net/socket.c:2337 [inline] __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x457ec9 Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9 RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4 R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline] kmsan_save_stack mm/kmsan/kmsan.c:219 [inline] kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439 __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200 ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475 udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335 inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830 sock_recvmsg_nosec net/socket.c:794 [inline] sock_recvmsg+0x1d1/0x230 net/socket.c:801 ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278 __sys_recvmsg net/socket.c:2327 [inline] __do_sys_recvmsg net/socket.c:2337 [inline] __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline] kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2759 [inline] __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383 __kmalloc_reserve net/core/skbuff.c:137 [inline] __alloc_skb+0x309/0xa20 net/core/skbuff.c:205 alloc_skb include/linux/skbuff.h:998 [inline] ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334 __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311 ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775 udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384 inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] __sys_sendto+0x8c4/0xac0 net/socket.c:1788 __do_sys_sendto net/socket.c:1800 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:1796 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 Bytes 4-7 of 28 are uninitialized Memory access of size 28 starts at ffff8881937bfce0 Data copied to user address 0000000020000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-10net/core/neighbour: tell kmemleak about hash tablesKonstantin Khlebnikov
This fixes false-positive kmemleak reports about leaked neighbour entries: unreferenced object 0xffff8885c6e4d0a8 (size 1024): comm "softirq", pid 0, jiffies 4294922664 (age 167640.804s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 20 2c f3 83 ff ff ff ff ........ ,...... 08 c0 ef 5f 84 88 ff ff 01 8c 7d 02 01 00 00 00 ..._......}..... backtrace: [<00000000748509fe>] ip6_finish_output2+0x887/0x1e40 [<0000000036d7a0d8>] ip6_output+0x1ba/0x600 [<0000000027ea7dba>] ip6_send_skb+0x92/0x2f0 [<00000000d6e2111d>] udp_v6_send_skb.isra.24+0x680/0x15e0 [<000000000668a8be>] udpv6_sendmsg+0x18c9/0x27a0 [<000000004bd5fa90>] sock_sendmsg+0xb3/0xf0 [<000000008227b29f>] ___sys_sendmsg+0x745/0x8f0 [<000000008698009d>] __sys_sendmsg+0xde/0x170 [<00000000889dacf1>] do_syscall_64+0x9b/0x400 [<0000000081cdb353>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000005767ed39>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-10ip: on queued skb use skb_header_pointer instead of pskb_may_pullWillem de Bruijn
Commit 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") avoided a read beyond the end of the skb linear segment by calling pskb_may_pull. That function can trigger a BUG_ON in pskb_expand_head if the skb is shared, which it is when when peeking. It can also return ENOMEM. Avoid both by switching to safer skb_header_pointer. Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-09bpf: correctly set initial window on active Fast Open senderYuchung Cheng
The existing BPF TCP initial congestion window (TCP_BPF_IW) does not to work on (active) Fast Open sender. This is because it changes the (initial) window only if data_segs_out is zero -- but data_segs_out is also incremented on SYN-data. This patch fixes the issue by proerly accounting for SYN-data additionally. Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd") Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-08packet: Do not leak dev refcounts on error exitJason Gunthorpe
'dev' is non NULL when the addr_len check triggers so it must goto a label that does the dev_put otherwise dev will have a leaked refcount. This bug causes the ib_ipoib module to become unloadable when using systemd-network as it triggers this check on InfiniBand links. Fixes: 99137b7888f4 ("packet: validate address length") Reported-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08net: bridge: Fix VLANs memory leakIdo Schimmel
When adding / deleting VLANs to / from a bridge port, the bridge driver first tries to propagate the information via switchdev and falls back to the 8021q driver in case the underlying driver does not support switchdev. This can result in a memory leak [1] when VXLAN and mlxsw ports are enslaved to the bridge: $ ip link set dev vxlan0 master br0 # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev # notification and the bridge driver adds the VLAN on 'vxlan0' via the # 8021q driver $ bridge vlan add vid 10 dev vxlan0 pvid untagged # mlxsw port is enslaved to the bridge $ ip link set dev swp1 master br0 # mlxsw processes the switchdev notification and the 8021q driver is # skipped $ bridge vlan del vid 10 dev vxlan0 This results in 'struct vlan_info' and 'struct vlan_vid_info' being leaked, as they were allocated by the 8021q driver during VLAN addition, but never freed as the 8021q driver was skipped during deletion. Fix this by introducing a new VLAN private flag that indicates whether the VLAN was added on the port by switchdev or the 8021q driver. If the VLAN was added by the 8021q driver, then we make sure to delete it via the 8021q driver as well. [1] unreferenced object 0xffff88822d20b1e8 (size 256): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s) hex dump (first 32 bytes): e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00 .B.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<00000000e0178b02>] vlan_vid_add+0x661/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 unreferenced object 0xffff888227454308 (size 32): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s) hex dump (first 32 bytes): 88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff .. -...... -.... 81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<0000000018050631>] vlan_vid_add+0x3e6/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: bridge@lists.linux-foundation.org Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07Merge ra.kernel.org:/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-01-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for unconnected UDP sockets with [::1] _after_ cgroup BPF invocation, from Andrey. 2) Follow-up fix to the speculation fix where we need to reject a corner case for sanitation when ptr and scalars are mixed in the same alu op. Also, some unrelated minor doc fixes, from Daniel. 3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup() by not assuming fd of zero value to be the result of an error case, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07libceph: allow setting abort_on_full for rbdDongsheng Yang
Introduce a new option abort_on_full, default to false. Then we can get -ENOSPC when the pool is full, or reaches quota. [ Don't show abort_on_full in /proc/mounts. ] Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-07smc: move unhash as early as possible in smc_release()Cong Wang
In smc_release() we release smc->clcsock before unhash the smc sock, but a parallel smc_diag_dump() may be still reading smc->clcsock, therefore this could cause a use-after-free as reported by syzbot. Reported-and-tested-by: syzbot+fbd1e5476e4c94c7b34e@syzkaller.appspotmail.com Fixes: 51f1de79ad8e ("net/smc: replace sock_put worker by socket refcounting") Cc: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07net: bridge: fix a bug on using a neighbour cache entry without checking its ↵JianJhen Chen
state When handling DNAT'ed packets on a bridge device, the neighbour cache entry from lookup was used without checking its state. It means that a cache entry in the NUD_STALE state will be used directly instead of entering the NUD_DELAY state to confirm the reachability of the neighbor. This problem becomes worse after commit 2724680bceee ("neigh: Keep neighbour cache entries if number of them is small enough."), since all neighbour cache entries in the NUD_STALE state will be kept in the neighbour table as long as the number of cache entries does not exceed the value specified in gc_thresh1. This commit validates the state of a neighbour cache entry before using the entry. Signed-off-by: JianJhen Chen <kchen@synology.com> Reviewed-by: JinLin Chen <jlchen@synology.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07tipc: fix memory leak in tipc_nl_compat_publ_dumpGustavo A. R. Silva
There is a memory leak in case genlmsg_put fails. Fix this by freeing *args* before return. Addresses-Coverity-ID: 1476406 ("Resource leak") Fixes: 46273cf7e009 ("tipc: fix a missing check of genlmsg_put") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07rds: use DIV_ROUND_UP instead of ceilJacob Wen
Yes indeed, DIV_ROUND_UP is in kernel.h. Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07can: gw: ensure DLC boundaries after CAN frame modificationOliver Hartkopp
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN frame modification rule that makes the data length code a higher value than the available CAN frame data size. In combination with a configured checksum calculation where the result is stored relatively to the end of the data (e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in skb_shared_info) can be rewritten which finally can cause a system crash. Michael Kubecek suggested to drop frames that have a DLC exceeding the available space after the modification process and provided a patch that can handle CAN FD frames too. Within this patch we also limit the length for the checksum calculations to the maximum of Classic CAN data length (8). CAN frames that are dropped by these additional checks are counted with the CGW_DELETED counter which indicates misconfigurations in can-gw rules. This fixes CVE-2019-3701. Reported-by: Muyu Yu <ieatmuttonchuan@gmail.com> Reported-by: Marcus Meissner <meissner@suse.de> Suggested-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Muyu Yu <ieatmuttonchuan@gmail.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> # >= v3.2 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-06jump_label: move 'asm goto' support test to KconfigMasahiro Yamada
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2019-01-05ipv6: Take rcu_read_lock in __inet6_bind for mapped addressesDavid Ahern
I realized the last patch calls dev_get_by_index_rcu in a branch not holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock. Fixes: ec90ad334986 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-05Merge tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "A fairly quiet round: a couple of messenger performance improvements from myself and a few cap handling fixes from Zheng" * tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client: ceph: don't encode inode pathes into reconnect message ceph: update wanted caps after resuming stale session ceph: skip updating 'wanted' caps if caps are already issued ceph: don't request excl caps when mount is readonly ceph: don't update importing cap's mseq when handing cap export libceph: switch more to bool in ceph_tcp_sendmsg() libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage() libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage() libceph: drop last_piece logic from write_partial_message_data() ceph: remove redundant assignment ceph: cleanup splice_dentry()
2019-01-04bpf: Fix [::] -> [::1] rewrite in sys_sendmsgAndrey Ignatov
sys_sendmsg has supported unspecified destination IPv6 (wildcard) for unconnected UDP sockets since 876c7f41. When [::] is passed by user as destination, sys_sendmsg rewrites it with [::1] to be consistent with BSD (see "BSD'ism" comment in the code). This didn't work when cgroup-bpf was enabled though since the rewrite [::] -> [::1] happened before passing control to cgroup-bpf block where fl6.daddr was updated with passed by user sockaddr_in6.sin6_addr (that might or might not be changed by BPF program). That way if user passed [::] as dst IPv6 it was first rewritten with [::1] by original code from 876c7f41, but then rewritten back with [::] by cgroup-bpf block. It happened even when BPF_CGROUP_UDP6_SENDMSG program was not present (CONFIG_CGROUP_BPF=y was enough). The fix is to apply BSD'ism after cgroup-bpf block so that [::] is replaced with [::1] no matter where it came from: passed by user to sys_sendmsg or set by BPF_CGROUP_UDP6_SENDMSG program. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Reported-by: Nitin Rawat <nitin.rawat@intel.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-04ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped addressDavid Ahern
Similar to c5ee066333eb ("ipv6: Consider sk_bound_dev_if when binding a socket to an address"), binding a socket to v4 mapped addresses needs to consider if the socket is bound to a device. This problem also exists from the beginning of git history. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04ipv6: make icmp6_send() robust against null skb->devEric Dumazet
syzbot was able to crash one host with the following stack trace : kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 8625 Comm: syz-executor4 Not tainted 4.20.0+ #8 RIP: 0010:dev_net include/linux/netdevice.h:2169 [inline] RIP: 0010:icmp6_send+0x116/0x2d30 net/ipv6/icmp.c:426 icmpv6_send smack_socket_sock_rcv_skb security_sock_rcv_skb sk_filter_trim_cap __sk_receive_skb dccp_v6_do_rcv release_sock This is because a RX packet found socket owned by user and was stored into socket backlog. Before leaving RCU protected section, skb->dev was cleared in __sk_receive_skb(). When socket backlog was finally handled at release_sock() time, skb was fed to smack_socket_sock_rcv_skb() then icmp6_send() We could fix the bug in smack_socket_sock_rcv_skb(), or simply make icmp6_send() more robust against such possibility. In the future we might provide to icmp6_send() the net pointer instead of infering it. Fixes: d66a8acbda92 ("Smack: Inform peer that IPv6 traffic has been blocked") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Piotr Sawicki <p.sawicki2@partner.samsung.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04fou6: Prevent unbounded recursion in GUE error handlerStefano Brivio
I forgot to deal with IPv6 in commit 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler"). Now syzbot reported what might be the same type of issue, caused by gue6_err(), that is, handling exceptions for direct UDP encapsulation in GUE (UDP-in-UDP) leads to unbounded recursion in the GUE exception handler. As it probably doesn't make sense to set up GUE this way, and it's currently not even possible to configure this, skip exception handling for UDP (or UDP-Lite) packets encapsulated in UDP (or UDP-Lite) packets with GUE on IPv6. Reported-by: syzbot+4ad25edc7a33e4ab91e0@syzkaller.appspotmail.com Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04fou: Prevent unbounded recursion in GUE error handler also with UDP-LiteStefano Brivio
In commit 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler"), I didn't take care of the case where UDP-Lite is encapsulated into UDP or UDP-Lite with GUE. From a syzbot report about a possibly similar issue with GUE on IPv6, I just realised the same thing might happen with a UDP-Lite inner payload. Also skip exception handling for inner UDP-Lite protocol. Fixes: 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04openvswitch: Fix IPv6 later frags parsingYi-Hung Wei
The previous commit fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") introduces IP protocol number parsing for IPv6 later frags that can mess up the network header length calculation logic, i.e. nh_len < 0. However, the network header length calculation is mainly for deriving the transport layer header in the key extraction process which the later fragment does not apply. Therefore, this commit skips the network header length calculation to fix the issue. Reported-by: Chris Mi <chrism@mellanox.com> Reported-by: Greg Rose <gvrose8192@gmail.com> Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net, skbuff: do not prefer skb allocation fails earlyDavid Rientjes
Commit dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may directly reclaim. The previous behavior would require reclaim up to 1 << order pages for skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing, otherwise the allocations in alloc_skb() would loop in the page allocator looking for memory. __GFP_RETRY_MAYFAIL makes both allocations failable under memory pressure, including for the HEAD allocation. This can cause, among many other things, write() to fail with ENOTCONN during RPC when under memory pressure. These allocations should succeed as they did previous to dcda9b04713c even if it requires calling the oom killer and additional looping in the page allocator to find memory. There is no way to specify the previous behavior of __GFP_REPEAT, but it's unlikely to be necessary since the previous behavior only guaranteed that 1 << order pages would be reclaimed before failing for order > PAGE_ALLOC_COSTLY_ORDER. That reclaim is not guaranteed to be contiguous memory, so repeating for such large orders is usually not beneficial. Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous behavior, specifically not allowing alloc_skb() to fail for small orders and oom kill if necessary rather than allowing RPCs to fail. Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04netlink: fixup regression in RTM_GETADDRArthur Gautier
This commit fixes a regression in AF_INET/RTM_GETADDR and AF_INET6/RTM_GETADDR. Before this commit, the kernel would stop dumping addresses once the first skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error shouldn't be sent back to netlink_dump so the callback is kept alive. The userspace is expected to call back with a new empty skb. Changes from V1: - The error is not handled in netlink_dump anymore but rather in inet_dump_ifaddr and inet6_dump_addr directly as suggested by David Ahern. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Arthur Gautier <baloo@gandi.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ...
2019-01-02ipv6: Consider sk_bound_dev_if when binding a socket to an addressDavid Ahern
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF. Resolve by considering the device if sk_bound_dev_if is set. This problem exists from the beginning of git history. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02ipv6: Fix dump of specific table with strict checkingDavid Ahern
Dump of a specific table with strict checking enabled is looping. The problem is that the end of the table dump is not marked in the cb. When dumping a specific table, cb args 0 and 1 are not used (they are the hash index and entry with an hash table index when dumping all tables). Re-use args[0] to hold a 'done' flag for the specific table dump. Fixes: 13e38901d46ca ("net/ipv6: Plumb support for filtering route dumps") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
2019-01-02Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...
2019-01-02Merge tag '9p-for-4.21' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "Missing prototype warning fix and a syzkaller fix when a 9p server advertises a too small msize" * tag '9p-for-4.21' of git://github.com/martinetd/linux: 9p/net: put a lower bound on msize net/9p: include trans_common.h to fix missing prototype warning.
2019-01-02ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() errorStefano Brivio
In ip6_neigh_lookup(), we must not return errors coming from neigh_create(): if creation of a neighbour entry fails, the lookup should return NULL, in the same way as it's done in __neigh_lookup(). Otherwise, callers legitimately checking for a non-NULL return value of the lookup function might dereference an invalid pointer. For instance, on neighbour table overflow, ndisc_router_discovery() crashes ndisc_update() by passing ERR_PTR(-ENOBUFS) as 'neigh' argument. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: f8a1b43b709d ("net/ipv6: Create a neigh_lookup for FIB entries") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02sunrpc: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: Add xprt after nfs4_test_session_trunk()Santosh kumar pradhan
Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify the path along with check for session trunking. Add the xprt in nfs4_test_session_trunk() only when nfs4_detect_session_trunking() returns success. Also release refcount hold by rpc_clnt_setup_test_and_add_xprt(). Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com> Tested-by: Suresh Jayaraman <suresh.jayaraman@wdc.com> Reported-by: Aditya Agnihotri <aditya.agnihotri@wdc.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFSJ. Bruce Fields
It's OK to sleep here, we just don't want to recurse into the filesystem as a writeout could be waiting on this. Future work: the documentation for GFP_NOFS says "Please try to avoid using this flag directly and instead use memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't recurse into the FS layer with a short explanation why. All allocation requests will inherit GFP_NOFS implicitly." But I'm not sure where to do this. Should the workqueue be arranging that for us in the case of workqueues created with WQ_MEM_RECLAIM? Reported-by: Trond Myklebust <trondmy@hammer.space> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: handle ENOMEM in rpcb_getport_asyncJ. Bruce Fields
If we ignore the error we'll hit a null dereference a little later. Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever
If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Don't leak freed MRsChuck Lever
Defensive clean up. Don't set frwr->fr_mr until we know that the scatterlist allocation has succeeded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Add documenting comment for rpcrdma_buffer_destroyChuck Lever
Make a note of the function's dependency on an earlier ib_drain_qp. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Replace outdated comment for rpcrdma_ep_postChuck Lever
Since commit 7c8d9e7c8863 ("xprtrdma: Move Receive posting to Receive handler"), rpcrdma_ep_post is no longer responsible for posting Receive buffers. Update the documenting comment to reflect this change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Update comments in frwr_op_sendChuck Lever
Commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was written before commit ce5b37178283 ("xprtrdma: Replace all usage of "frmr" with "frwr""), but was merged afterwards. Thus it still refers to FRMR and MWs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02SUNRPC: Fix some kernel doc complaintsChuck Lever
Clean up some warnings observed when building with "make W=1". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Trace mapping, alloc, and dereg failuresChuck Lever
These are rare, but can be helpful at tracking down DMAR and other problems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>