linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-03-08	flow_dissector: Move ARP dissection into a separate function	Jiri Pirko
	Make the main flow_dissect function a bit smaller and move the ARP dissection into a separate function. Along with that, do the ARP header processing only in case the flow dissection user requires it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-08	netfilter: nf_reject: remove unused variable	Taehee Yoo
	variable oiph is not used. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-08	netfilter: bridge: remove unneeded rcu_read_lock	Florian Westphal
	as comment says, the function is always called with rcu read lock held. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-08	netfilter: nf_nat_sctp: fix ICMP packet to be dropped accidently	Ying Xue
	Regarding RFC 792, the first 64 bits of the original SCTP datagram's data could be contained in ICMP packet, such as: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Type \| Code \| Checksum \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| unused \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Internet Header + 64 bits of Original Data Datagram \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ However, according to RFC 4960, SCTP datagram header is as below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Source Port Number \| Destination Port Number \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Verification Tag \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Checksum \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ It means only the first three fields of SCTP header can be carried in ICMP packet except for Checksum field. At present in sctp_manip_pkt(), no matter whether the packet is ICMP or not, it always calculates SCTP packet checksum. However, not only the calculation of checksum is unnecessary for ICMP, but also it causes another fatal issue that ICMP packet is dropped. The header size of SCTP is used to identify whether the writeable length of skb is bigger than skb->len through skb_make_writable() in sctp_manip_pkt(). But when it deals with ICMP packet, skb_make_writable() directly returns false as the writeable length of skb is bigger than skb->len. Subsequently ICMP is dropped. Now we correct this misbahavior. When sctp_manip_pkt() handles ICMP packet, 8 bytes rather than the whole SCTP header size is used to check if writeable length of skb is overflowed. Meanwhile, as it's meaningless to calculate checksum when packet is ICMP, the computation of checksum is ignored as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-08	netfilter: don't track fragmented packets	Florian Westphal
	Andrey reports syzkaller splat caused by NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb))); in ipv4 nat. But this assertion (and the comment) are wrong, this function does see fragments when IP_NODEFRAG setsockopt is used. As conntrack doesn't track packets without complete l4 header, only the first fragment is tracked. Because applying nat to first packet but not the rest makes no sense this also turns off tracking of all fragments. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-08	mac80211: reject/clear user rate mask if not usable	Johannes Berg
	If the user rate mask results in no (basic) rates being usable, clear it. Also, if we're already operating when it's set, reject it instead. Technically, selecting basic rates as the criterion is a bit too restrictive, but calculating the usable rates over all stations (e.g. in AP mode) is harder, and all stations must support the basic rates. Similarly, in client mode, the basic rates will be used anyway for control frames. This fixes the "no supported rates (...) in rate_mask ..." warning that occurs on TX when you've selected a rate mask that's not compatible with the connection (e.g. an AP that enables only the rates 36, 48, 54 and you've selected only 6, 9, 12.) Reported-by: Kirtika Ruchandani <kirtika@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-07	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-03-06 1) Fix lockdep splat on xfrm policy subsystem initialization. From Florian Westphal. 2) When using socket policies on IPv4-mapped IPv6 addresses, we access the flow informations of the wrong address family what leads to an out of bounds access. Fix this by using the family we get with the dst_entry, like we do it for the standard policy lookup. 3) vti6 can report a PMTU below IPV6_MIN_MTU. Fix this by adding a check for that before sending a ICMPV6_PKT_TOOBIG message. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	ipv6: reorder icmpv6_init() and ip6_mr_init()	WANG Cong
	Andrey reported the following kernel crash: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88001f311700 task.stack: ffff88001f6e8000 RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618 RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000 RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020 RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000 R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040 FS: 00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0 DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432 sock_release+0x8d/0x1e0 net/socket.c:597 __sock_create+0x39d/0x880 net/socket.c:1226 sock_create_kern+0x3f/0x50 net/socket.c:1243 inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526 icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954 ops_init+0x10a/0x550 net/core/net_namespace.c:115 setup_net+0x261/0x660 net/core/net_namespace.c:291 copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396 9pnet_virtio: no channels available for device ./file1 create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205 SYSC_unshare kernel/fork.c:2281 [inline] SyS_unshare+0x64e/0x1000 kernel/fork.c:2231 entry_SYSCALL_64_fastpath+0x1f/0xc2 This is because net->ipv6.mr6_tables is not initialized at that point, ip6mr_rules_init() is not called yet, therefore on the error path when we iterator the list, we trigger this oops. Fix this by reordering ip6mr_rules_init() before icmpv6_sk_init(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	dccp: fix use-after-free in dccp_feat_activate_values	Eric Dumazet
	Dmitry reported crashes in DCCP stack [1] Problem here is that when I got rid of listener spinlock, I missed the fact that DCCP stores a complex state in struct dccp_request_sock, while TCP does not. Since multiple cpus could access it at the same time, we need to add protection. [1] BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 at addr ffff88003713be68 Read of size 8 by task syz-executor2/8457 CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x292/0x398 lib/dump_stack.c:51 kasan_object_err+0x1c/0x70 mm/kasan/report.c:162 print_address_description mm/kasan/report.c:200 [inline] kasan_report_error mm/kasan/report.c:289 [inline] kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311 kasan_report mm/kasan/report.c:332 [inline] __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 </IRQ> do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328 do_softirq kernel/softirq.c:176 [inline] __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181 local_bh_enable include/linux/bottom_half.h:31 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline] ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123 ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162 ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501 inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179 dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141 dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280 dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362 dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 SYSC_sendto+0x660/0x810 net/socket.c:1687 SyS_sendto+0x40/0x50 net/socket.c:1655 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x4458b9 RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9 RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017 RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020 R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8 R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700 Object at ffff88003713be50, in cache kmalloc-64 size: 64 Allocated: PID = 8446 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738 kmalloc include/linux/slab.h:490 [inline] dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467 dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487 __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741 dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949 dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012 dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423 dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217 dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377 dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632 sk_backlog_rcv include/net/sock.h:893 [inline] __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479 dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Freed: PID = 15 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1355 [inline] slab_free_freelist_hook mm/slub.c:1377 [inline] slab_free mm/slub.c:2954 [inline] kfree+0xe8/0x2b0 mm/slub.c:3874 dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418 dccp_feat_entry_destructor net/dccp/feat.c:416 [inline] dccp_feat_list_pop net/dccp/feat.c:541 [inline] dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Memory state around the buggy address: ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb ^ Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump	Alexey Khoroshilov
	Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races	Sowmini Varadhan
	Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready") added the function rds_tcp_listen_sock_def_readable() to handle the case when a partially set-up acceptor socket drops into rds_tcp_listen_data_ready(). However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going through a tear-down via rds_tcp_listen_stop(), the (ready)() will be null and we would hit a panic of the form BUG: unable to handle kernel NULL pointer dereference at (null) IP: (null) : ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp] tcp_data_queue+0x39d/0x5b0 tcp_rcv_established+0x2e5/0x660 tcp_v4_do_rcv+0x122/0x220 tcp_v4_rcv+0x8b7/0x980 : In the above case, it is not fatal to encounter a NULL value for ready- we should just drop the packet and let the flush of the acceptor thread finish gracefully. In general, the tear-down sequence for listen() and accept() socket that is ensured by this commit is: rtn->rds_tcp_listen_sock = NULL; / prevent any new accepts */ In rds_tcp_listen_stop(): serialize with, and prevent, further callbacks using lock_sock() flush rds_wq flush acceptor workq sock_release(listen socket) Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races	Sowmini Varadhan
	Order of initialization in rds_tcp_init needs to be done so that resources are set up and destroyed in the correct synchronization sequence with both the data path, as well as netns create/destroy path. Specifically, - we must call register_pernet_subsys and get the rds_tcp_netid before calling register_netdevice_notifier, otherwise we risk the sequence 1. register_netdevice_notifier sets up netdev notifier callback 2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds the wrong rtn, resulting in a panic with string that is of the form: BUG: unable to handle kernel NULL pointer dereference at 000000000000000d IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp] : - the rds_tcp_incoming_slab kmem_cache must be initialized before the datapath starts up. The latter can happen any time after the pernet_subsys registration of rds_tcp_net_ops, whose -> init function sets up the listen socket. If the rds_tcp_incoming_slab has not been set up at that time, a panic of the form below may be encountered BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 IP: kmem_cache_alloc+0x90/0x1c0 : rds_tcp_data_recv+0x1e7/0x370 [rds_tcp] tcp_read_sock+0x96/0x1c0 rds_tcp_recv_path+0x65/0x80 [rds_tcp] : Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	rds: tcp: Take explicit refcounts on struct net	Sowmini Varadhan
	It is incorrect for the rds_connection to piggyback on the sock_net() refcount for the netns because this gives rise to a chicken-and-egg problem during rds_conn_destroy. Instead explicitly take a ref on the net, and hold the netns down till the connection tear-down is complete. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	decnet: Use TCP nagle macro instead of literal number in decnet	Gao Feng
	Use existing TCP nagle macro TCP_NAGLE_OFF and TCP_NAGLE_CORK instead of the literal number 1 and 2 in the current decnet codes. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	net: fix socket refcounting in skb_complete_tx_timestamp()	Eric Dumazet
	TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt By the time TX completion happens, sk_refcnt might be already 0. sock_hold()/sock_put() would then corrupt critical state, like sk_wmem_alloc and lead to leaks or use after free. Fixes: 62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	net: fix socket refcounting in skb_complete_wifi_ack()	Eric Dumazet
	TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt By the time TX completion happens, sk_refcnt might be already 0. sock_hold()/sock_put() would then corrupt critical state, like sk_wmem_alloc. Fixes: bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	rxrpc: Call state should be read with READ_ONCE() under some circumstances	David Howells
	The call state may be changed at any time by the data-ready routine in response to received packets, so if the call state is to be read and acted upon several times in a function, READ_ONCE() must be used unless the call state lock is held. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	tcp: fix various issues for sockets morphing to listen state	Eric Dumazet
	Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting tcp_disconnect() path that was never really considered and/or used before syzkaller ;) I was not able to reproduce the bug, but it seems issues here are the three possible actions that assumed they would never trigger on a listener. 1) tcp_write_timer_handler 2) tcp_delack_timer_handler 3) MTU reduction Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN states from tcp_v6_mtu_reduced() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07	libceph: osd_request_timeout option	Ilya Dryomov
	osd_request_timeout specifies how many seconds to wait for a response from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default) means no limit. osd_request_timeout is osdkeepalive-precise -- in-flight requests are swept through every osdkeepalive seconds. With ack vs commit behaviour gone, abort_request() is really simple. This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>. Tested-by: Artur Molchanov <artur.molchanov@synesis.ru> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2017-03-07	libceph: don't set weight to IN when OSD is destroyed	Ilya Dryomov
	Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info, osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted. This changes the result of applying an incremental for clients, not just OSDs. Because CRUSH computations are obviously affected, pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on object placement, resulting in misdirected requests. Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f. Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals") Link: http://tracker.ceph.com/issues/19122 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2017-03-07	libceph: fix crush_decode() for older maps	Ilya Dryomov
	Older (shorter) CRUSH maps too need to be finalized. Fixes: 66a0e2d579db ("crush: remove mutable part of CRUSH map") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-03-07	mac80211: remove ieee80211_tx_rate_control.max_rate_idx	Johannes Berg
	As promised a little more than 7 years ago, remove it now since nothing uses it anymore. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-07	mac80211: ignore VHT membership selector when parsing rates	Johannes Berg
	There isn't really much harm in not ignoring, since it doesn't represent a valid rate, but since we already ignore the HT one also ignore VHT. Also simplify the code a bit. Fix a typo in the related comment (pointed out by Arend) while at it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	ipv6: Provide ipv6 version of "disable_policy" sysctl	David Forster
	This provides equivalent functionality to the existing ipv4 "disable_policy" systcl. ie. Allows IPsec processing to be skipped on terminating packets on a per-interface basis. Signed-off-by: David Forster <dforster@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-06	netfilter: nf_tables: add nft_set_lookup()	Pablo Neira Ayuso
	This new function consolidates set lookup via either name or ID by introducing a new nft_set_lookup() function. Replace existing spots where we can use this too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: nf_tables: validate the expr explicitly after init successfully	Liping Zhang
	When we want to validate the expr's dependency or hooks, we must do two things to accomplish it. First, write a X_validate callback function and point ->validate to it. Second, call X_validate in init routine. This is very common, such as fib, nat, reject expr and so on ... It is a little ugly, since we will call X_validate in the expr's init routine, it's better to do it in nf_tables_newexpr. So we can avoid to do this again and again. After doing this, the second step listed above is not useful anymore, remove them now. Patch was tested by nftables/tests/py/nft-test.py and nftables/tests/shell/run-tests.sh. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: arp_tables: remove redundant check on ret being non-zero	Colin Ian King
	ret is initialized to zero and if it is set to non-zero in the xt_entry_foreach loop then we exit via the out_free label. Hence the check for ret being non-zero is redundant and can be removed. Detected by CoverityScan, CID#1357132 ("Logically Dead Code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: Use pr_cont where appropriate	Joe Perches
	Logging output was changed when simple printks without KERN_CONT are now emitted on a new line and KERN_CONT is required to continue lines so use pr_cont. Miscellanea: o realign arguments o use print_hex_dump instead of a local variant Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: nft_hash: support of symmetric hash	Laura Garcia Liebana
	This patch provides symmetric hash support according to source ip address and port, and destination ip address and port. For this purpose, the __skb_get_hash_symmetric() is used to identify the flow as it uses FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL flag by default. The new attribute NFTA_HASH_TYPE has been included to support different types of hashing functions. Currently supported NFT_HASH_JENKINS through jhash and NFT_HASH_SYM through symhash. The main difference between both types are: - jhash requires an expression with sreg, symhash doesn't. - symhash supports modulus and offset, but not seed. Examples: nft add rule ip nat prerouting ct mark set jhash ip saddr mod 2 nft add rule ip nat prerouting ct mark set symhash mod 2 By default, jenkins hash will be used if no hash type is provided for compatibility reasons. Signed-off-by: Laura Garcia Liebana <laura.garcia@zevenet.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: nft_hash: rename nft_hash to nft_jhash	Laura Garcia Liebana
	This patch renames the local nft_hash structure and functions to nft_jhash in order to prepare the nft_hash module code to add new hash functions. Signed-off-by: Laura Garcia Liebana <laura.garcia@zevenet.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	netfilter: nft_exthdr: Allow checking TCP option presence, too	Phil Sutter
	Honor NFT_EXTHDR_F_PRESENT flag so we check if the TCP option is present. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-06	cfg80211: Share Channel DFS state across wiphys of same DFS domain	Vasanthakumar Thiagarajan
	Sharing DFS channel state across multiple wiphys (radios) could be useful with multiple radios on the system. When one radio completes CAC and markes the channel available another radio can use this information and start beaconing without really doing CAC. Whenever there is a state change in dfs channel associated to a particular wiphy the the same state change is propagated to other wiphys having the same DFS reg domain configuration. Also when a new wiphy is created the dfs channel state of other existing wiphys of same DFS domain is copied. Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	cfg80211: Disallow moving out of operating DFS channel in non-ETSI	Vasanthakumar Thiagarajan
	For non-ETSI regulatory domain, CAC result on DFS channel may not be valid once moving out of that channel (as done during remain-on-channel, scannning and off-channel tx). Running CAC on an operating DFS channel after every off-channel operation will only add complexity and disturb the current link. Better do not allow any off-channel switch from a DFS operating channel in non-ETSI domain. Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	cfg80211: Make pre-CAC results valid only for ETSI domain	Vasanthakumar Thiagarajan
	DFS requirement for ETSI domain (section 4.7.1.4 in ETSI EN 301 893 V1.8.1) is the only one which explicitly states that once DFS channel is marked as available afer the CAC, this channel will remain in available state even moving to a different operating channel. But the same is not explicitly stated in FCC DFS requirement. Also, Pre-CAC requriements are not explicitly mentioned in FCC requirement. Current implementation in keeping DFS channel in available state is same as described in ETSI domain. For non-ETSI DFS domain, this patch gives a grace period of 2 seconds since the completion of successful CAC before moving the channel's DFS state to 'usable' from 'available' state. The same grace period is checked against the channel's dfs_state_entered timestamp while deciding if a DFS channel is available for operation. There is a new radar event, NL80211_RADAR_PRE_CAC_EXPIRED, reported when DFS channel is moved from available to usable state after the grace period. Also make sure the DFS channel state is reset to usable once the beaconing operation on that channel is brought down (like stop_ap, leave_ibss and leave_mesh) in non-ETSI domain. Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: Use setup_timer instead of init_timer	Ondřej Lysoněk
	Use setup_timer() and setup_deferrable_timer() to set the data and function timer fields. It makes the code cleaner and will allow for easier change of the timer struct internals. Signed-off-by: Ondřej Lysoněk <ondrej.lysonek@seznam.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: <linux-wireless@vger.kernel.org> Cc: <netdev@vger.kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: fix mesh fail_avg check	Manoharan, Rajkumar
	Mesh failure average never be more than 100. Only in case of fixed path, average will be more than threshold limit (95%). With recent EWMA changes it may go upto 99 as it is scaled to 100. It make sense to return maximum metric when average is greater than threshold limit. Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: encode rate type (legacy, HT, VHT) with fewer bits	Johannes Berg
	We don't really need three different bits for each, since the types are mutually exclusive. Use just two bits for it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	cfg80211: refactor cfg80211_calculate_bitrate()	Johannes Berg
	This function contains the HT calculations, which makes no sense - split that out into a separate function. As a side effect, this makes the 60G flag independent from HT_MCS so remove the MCS one from wil6210 (also deleting a duplicate assignment.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: remove local pointer from rate_ctrl_ref	Johannes Berg
	This pointer really isn't needed, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	ieee80211: rename CCFS1/CCFS2 to CCFS0/CCFS1	Johannes Berg
	This matches the spec, and otherwise things are really confusing with the next patch adding CCFS2. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: Print text for disassociation reason	Arkadiusz Miskiewicz
	When disassociation happens only numeric reason is printed in ieee80211_rx_mgmt_disassoc(). Add text variant, too. Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	cfg80211: combine two nested ifs into a single condition	Johannes Berg
	Combine two instances of having two nested if statements into a single one with a combined condition to reduce the indentation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: Add set_cqm_rssi_range_config	Andrew Zaborowski
	Support .set_cqm_rssi_range_config if the beacons are available for processing in mac80211. There's no reason that this couldn't be offloaded by mac80211-based drivers but there's no driver method for that added in this patch. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	cfg80211: Accept multiple RSSI thresholds for CQM	Andrew Zaborowski
	Change the SET CQM command's RSSI threshold attribute to accept any number of thresholds as a sorted array. The API should be backwards compatible so that if one s32 threshold value is passed, the old mechanism is enabled. The netlink event generated is the same in both cases. cfg80211 handles an arbitrary number of RSSI thresholds but drivers have to provide a method (set_cqm_rssi_range_config) that configures a range set by a high and a low value. Drivers have to call back when the RSSI goes out of that range and there's no additional event for each time the range is reconfigured as there was with the current one-threshold API. This method doesn't have a hysteresis parameter because there's no benefit to the cfg80211 code from having the hysteresis be handled by hardware/driver in terms of the number of wakeups. At the same time it would likely be less consistent between drivers if offloaded or done in the drivers. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-06	mac80211: use DECLARE_EWMA for mesh_fail_avg	Manoharan, Rajkumar
	As moving average is not considering fractional part, it will get stuck at the same level after certain state. For example, with current values, it can get stuck at 96. Fortunately the current threshold 95%, but if it were increased to 96 or more mesh paths would never be deactivated. Fix failure average movement by using EWMA helpers, which does take into account fractional parts. Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> [johannes: pick a larger EWMA factor for more precision with the limited range that we will feed into it, adjust to new API] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-04	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Fix double-free in batman-adv, from Sven Eckelmann. 2) Fix packet stats for fast-RX path, from Joannes Berg. 3) Netfilter's ip_route_me_harder() doesn't handle request sockets properly, fix from Florian Westphal. 4) Fix sendmsg deadlock in rxrpc, from David Howells. 5) Add missing RCU locking to transport hashtable scan, from Xin Long. 6) Fix potential packet loss in mlxsw driver, from Ido Schimmel. 7) Fix race in NAPI handling between poll handlers and busy polling, from Eric Dumazet. 8) TX path in vxlan and geneve need proper RCU locking, from Jakub Kicinski. 9) SYN processing in DCCP and TCP need to disable BH, from Eric Dumazet. 10) Properly handle net_enable_timestamp() being invoked from IRQ context, also from Eric Dumazet. 11) Fix crash on device-tree systems in xgene driver, from Alban Bedel. 12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de Melo. 13) Fix use-after-free in netvsc driver, from Dexuan Cui. 14) Fix max MTU setting in bonding driver, from WANG Cong. 15) xen-netback hash table can be allocated from softirq context, so use GFP_ATOMIC. From Anoob Soman. 16) Fix MAC address change bug in bgmac driver, from Hari Vyas. 17) strparser needs to destroy strp_wq on module exit, from WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) strparser: destroy workqueue on module exit sfc: fix IPID endianness in TSOv2 sfc: avoid max() in array size rds: remove unnecessary returned value check rxrpc: Fix potential NULL-pointer exception nfp: correct DMA direction in XDP DMA sync nfp: don't tell FW about the reserved buffer space net: ethernet: bgmac: mac address change bug net: ethernet: bgmac: init sequence bug xen-netback: don't vfree() queues under spinlock xen-netback: keep a local pointer for vif in backend_disconnect() netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups netfilter: nf_conntrack_sip: fix wrong memory initialisation can: flexcan: fix typo in comment can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer can: gs_usb: fix coding style can: gs_usb: Don't use stack memory for USB transfers ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines ixgbe: update the rss key on h/w, when ethtool ask for it ...
2017-03-04	batman-adv: Initialize gw sel_class via batadv_algo	Sven Eckelmann
	The gateway selection class variable is shared between different algorithm versions. But the interpretation of the content is algorithm specific. The initialization is therefore also algorithm specific. But this was implemented incorrectly and the initialization for BATMAN_V always overwrote the value previously written for BATMAN_IV. This could only be avoided when BATMAN_V was disabled during compile time. Using a special batadv_algo hook for this initialization avoids this problem. Fixes: 50164d8f500f ("batman-adv: B.A.T.M.A.N. V - implement GW selection logic") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-03-04	batman-adv: Keep fragments equally sized	Sven Eckelmann
	The batman-adv fragmentation packets have the design problem that they cannot be refragmented and cannot handle padding by the underlying link. The latter often leads to problems when networks are incorrectly configured and don't use a common MTU. The sender could for example fragment a 1271 byte frame (plus external ethernet header (14) and batadv unicast header (10)) to fit in a 1280 bytes large MTU of the underlying link (max. 1294 byte frames). This would create a 1294 bytes large frame (fragment 2) and a 55 bytes large frame (fragment 1). The extra 54 bytes are the fragment header (20) added to each fragment and the external ethernet header (14) for the second fragment. Let us assume that the next hop is then not able to transport 1294 bytes to its next hop. The 1294 byte large frame will be dropped but the 55 bytes large fragment will still be forwarded to its destination. Or let us assume that the underlying hardware requires that each frame has a minimum size (e.g. 60 bytes). Then it will pad the 55 bytes frame to 60 bytes. The receiver of the 60 bytes frame will no longer be able to correctly assemble the two frames together because it is not aware that 5 bytes of the 60 bytes frame are padding and don't belong to the reassembled frame. This can partly be avoided by splitting frames more equally. In this example, the 675 and 674 bytes large fragment frames could both potentially reach its destination without being too large or too small. Reported-by: Martin Weinelt <martin@darmstadt.freifunk.net> Fixes: ee75ed88879a ("batman-adv: Fragment and send skbs larger than mtu") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-03-03	Merge branch 'work.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc final vfs updates from Al Viro: "A few unrelated patches that got beating in -next. Everything else will have to go into the next window ;-/" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: hfs: fix hfs_readdir() selftest for default_file_splice_read() infoleak 9p: constify ->d_name handling
2017-03-03	strparser: destroy workqueue on module exit	WANG Cong
	Fixes: 43a0c6751a32 ("strparser: Stream parser for messages") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>