summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-08-31net/ipv4: Add extack message that dev is required for ONLINKDavid Ahern
Make IPv4 consistent with IPv6 and return an extack message that the ONLINK flag requires a nexthop device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31tcp: change IPv6 flow-label upon receiving spurious retransmissionYuchung Cheng
Currently a Linux IPv6 TCP sender will change the flow label upon timeouts to potentially steer away from a data path that has gone bad. However this does not help if the problem is on the ACK path and the data path is healthy. In this case the receiver is likely to receive repeated spurious retransmission because the sender couldn't get the ACKs in time and has recurring timeouts. This patch adds another feature to mitigate this problem. It leverages the DSACK states in the receiver to change the flow label of the ACKs to speculatively re-route the ACK packets. In order to allow triggering on the second consecutive spurious RTO, the receiver changes the flow label upon sending a second consecutive DSACK for a sequence number below RCV.NXT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Revert "packet: switch kvzalloc to allocate memory"Eric Dumazet
This reverts commit 71e41286203c017d24f041a7cd71abea7ca7b1e0. mmap()/munmap() can not be backed by kmalloced pages : We fault in : VM_BUG_ON_PAGE(PageSlab(page), page); unmap_single_vma+0x8a/0x110 unmap_vmas+0x4b/0x90 unmap_region+0xc9/0x140 do_munmap+0x274/0x360 vm_munmap+0x81/0xc0 SyS_munmap+0x2b/0x40 do_syscall_64+0x13e/0x1c0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: 71e41286203c ("packet: switch kvzalloc to allocate memory") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: John Sperbeck <jsperbeck@google.com> Bisected-by: John Sperbeck <jsperbeck@google.com> Cc: Zhang Yu <zhangyu31@baidu.com> Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net_sched: add missing tcf_lock for act_connmarkCong Wang
According to the new locking rule, we have to take tcf_lock for both ->init() and ->dump(), as RTNL will be removed. However, it is missing for act_connmark. Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Revert "net: sched: act: add extack for lookup callback"Cong Wang
This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback"). This extack is never used after 6 months... In fact, it can be just set in the caller, right after ->lookup(). Cc: Alexander Aring <aring@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-01 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus. 2) BPF verifier improvements by giving each register its own liveness chain which allows to simplify and getting rid of skip_callee() logic, from Edward. 3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap and percpu lru hashmap. Also add generic percpu formatted print on bpftool so the same can be dumped there, from Yonghong. 4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and TCP_SAVED_SYN options to allow reflection of tos/tclass from received SYN packet, from Nikita. 5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2 interaction and removal of incorrect shutdown() calls, from John. 6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01xsk: i40e: get rid of useless struct xdp_umem_propsMagnus Karlsson
This commit gets rid of the structure xdp_umem_props. It was there to be able to break a dependency at one point, but this is no longer needed. The values in the struct are instead stored directly in the xdp_umem structure. This simplifies the xsk code as well as af_xdp zero-copy drivers and as a bonus gets rid of one internal header file. The i40e driver is also adapted to the new interface in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xsk: remove unnecessary assignmentPrashant Bhole
Since xdp_umem_query() was added one assignment of bpf.command was missed from cleanup. Removing the assignment statement. Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockoptNikita V. Shirokov
Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get). This would allow for bpf program to build logic based on data from ingress SYN packet (e.g. doing tcp's tos/ tclass reflection (see sample prog)) and do it transparently from userspace program point of view. Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xdp: remove redundant variable 'headroom'Colin Ian King
Variable 'headroom' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable ‘headroom’ set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-31netfilter: nf_tables: release chain in flushing setTaehee Yoo
When element of verdict map is deleted, the delete routine should release chain. however, flush element of verdict map routine doesn't release chain. test commands: %nft add table ip filter %nft add chain ip filter c1 %nft add map ip filter map1 { type ipv4_addr : verdict \; } %nft add element ip filter map1 { 1 : jump c1 } %nft flush map ip filter map1 %nft flush ruleset splat looks like: [ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415! [ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55 [ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables] [ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 [ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202 [ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0 [ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8 [ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000 [ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200 [ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000 [ 4895.234841] FS: 00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 4895.234841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0 [ 4895.234841] Call Trace: [ 4895.234841] nf_tables_commit+0x2704/0x2c70 [nf_tables] [ 4895.234841] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink] [ 4895.234841] ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables] [ 4895.323824] ? __lock_is_held+0x9d/0x130 [ 4895.323824] ? kasan_unpoison_shadow+0x30/0x40 [ 4895.333299] ? kasan_kmalloc+0xa9/0xc0 [ 4895.333299] ? kmem_cache_alloc_trace+0x2c0/0x310 [ 4895.333299] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink] [ 4895.333299] nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink] [ 4895.333299] ? debug_show_all_locks+0x290/0x290 [ 4895.333299] ? nfnetlink_net_init+0x150/0x150 [nfnetlink] [ 4895.333299] ? sched_clock_cpu+0xe5/0x170 [ 4895.333299] ? sched_clock_local+0xff/0x130 [ 4895.333299] ? sched_clock_cpu+0xe5/0x170 [ 4895.333299] ? find_held_lock+0x39/0x1b0 [ 4895.333299] ? sched_clock_local+0xff/0x130 [ 4895.333299] ? memset+0x1f/0x40 [ 4895.333299] ? nla_parse+0x33/0x260 [ 4895.333299] ? ns_capable_common+0x6e/0x110 [ 4895.333299] nfnetlink_rcv+0x2c0/0x310 [nfnetlink] [ ... ] Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-31netfilter: kconfig: nat related expression depend on nftables coreFlorian Westphal
NF_TABLES_IPV4 is now boolean so it is possible to set NF_TABLES=m NF_TABLES_IPV4=y NFT_CHAIN_NAT_IPV4=y which causes: nft_chain_nat_ipv4.c:(.text+0x6d): undefined reference to `nft_do_chain' Wrap NFT_CHAIN_NAT_IPV4 and related nat expressions with NF_TABLES to restore the dependency. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-30xsk: include XDP meta data in AF_XDP framesBjörn Töpel
Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did not include XDP meta data in the data buffers copied out to the user application. In this commit, we check if meta data is available, and if so, it is prepended to the frame. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30notifier: Remove notifier header file wherever not usedMukesh Ojha
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
2018-08-30mac80211: always account for A-MSDU header changesJohannes Berg
In the error path of changing the SKB headroom of the second A-MSDU subframe, we would not account for the already-changed length of the first frame that just got converted to be in A-MSDU format and thus is a bit longer now. Fix this by doing the necessary accounting. It would be possible to reorder the operations, but that would make the code more complex (to calculate the necessary pad), and the headroom expansion should not fail frequently enough to make that worthwhile. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-30mac80211: do not convert to A-MSDU if frag/subframe limitedLorenzo Bianconi
Do not start to aggregate packets in a A-MSDU frame (converting the first subframe to A-MSDU, adding the header) if max_tx_fragments or max_amsdu_subframes limits are already exceeded by it. In particular, this happens when drivers set the limit to 1 to avoid A-MSDUs at all. Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> [reword commit message to be more precise] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-30cfg80211: nl80211_update_ft_ies() to validate NL80211_ATTR_IEArunk Khandavalli
nl80211_update_ft_ies() tried to validate NL80211_ATTR_IE with is_valid_ie_attr() before dereferencing it, but that helper function returns true in case of NULL pointer (i.e., attribute not included). This can result to dereferencing a NULL pointer. Fix that by explicitly checking that NL80211_ATTR_IE is included. Fixes: 355199e02b83 ("cfg80211: Extend support for IEEE 802.11r Fast BSS Transition") Signed-off-by: Arunk Khandavalli <akhandav@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29Merge tag 'mac80211-next-for-davem-2018-08-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Only a few changes at this point: * new channels in 60 GHz * clarify (average) ACK signal reporting API * expose ieee80211_send_layer2_update() for all drivers * start/stop mac80211's TXQs properly when required * avoid regulatory restore with IE ignoring * spelling: contidion -> condition * fully implement WFA Multi-AP backhaul ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net_sched: reject unknown tcfa_action valuesPaolo Abeni
After the commit 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action values"), unknown tcfa_action values are converted to TC_ACT_UNSPEC, but the common agreement is instead rejecting such configurations. This change also introduces a helper to simplify the destruction of a single action, avoiding code duplication. v1 -> v2: - helper is now static and renamed according to act_* convention - updated extack message, according to the new behavior Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action values") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/tls: Calculate nsg for zerocopy path without skb_cow_data.Doron Roberts-Kedes
decrypt_skb fails if the number of sg elements required to map it is greater than MAX_SKB_FRAGS. nsg must always be calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy case. The new function skb_nsg calculates the number of scatterlist elements required to map the skb without the extra overhead of skb_cow_data. This patch reduces memcpy by 50% on my encrypted NBD benchmarks. Reported-by: Vakul Garg <Vakul.garg@nxp.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Tested-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ip: fail fast on IP defrag errorsPeter Oskolkov
The current behavior of IP defragmentation is inconsistent: - some overlapping/wrong length fragments are dropped without affecting the queue; - most overlapping fragments cause the whole frag queue to be dropped. This patch brings consistency: if a bad fragment is detected, the whole frag queue is dropped. Two major benefits: - fail fast: corrupted frag queues are cleared immediately, instead of by timeout; - testing of overlapping fragments is now much easier: any kind of random fragment length mutation now leads to the frag queue being discarded (IP packet dropped); before this patch, some overlaps were "corrected", with tests not seeing expected packet drops. Note that in one case (see "if (end&7)" conditional) the current behavior is preserved as there are concerns that this could be legitimate padding. Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ethtool: drop get_settings and set_settings callbacksMichal Kubecek
Since [gs]et_settings ethtool_ops callbacks have been deprecated in February 2016, all in tree NIC drivers have been converted to provide [gs]et_link_ksettings() and out of tree drivers have had enough time to do the same. Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net: rtnl: return early from rtnl_unregister_all when protocol isn't registeredSabrina Dubroca
rtnl_unregister_all(PF_INET6) gets called from inet6_init in cases when no handler has been registered for PF_INET6 yet, for example if ip6_mr_init() fails. Abort and avoid a NULL pointer deref in that case. Example of panic (triggered by faking a failure of register_pernet_subsys): general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [...] RIP: 0010:rtnl_unregister_all+0x17e/0x2a0 [...] Call Trace: ? rtnetlink_net_init+0x250/0x250 ? sock_unregister+0x103/0x160 ? kernel_getsockopt+0x200/0x200 inet6_init+0x197/0x20d Fixes: e2fddf5e96df ("[IPV6]: Make af_inet6 to check ip6_route_init return value.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ipv6: fix cleanup ordering for pingv6 registrationSabrina Dubroca
Commit 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") contains an error in the cleanup path of inet6_init(): when proto_register(&pingv6_prot, 1) fails, we try to unregister &pingv6_prot. When rawv6_init() fails, we skip unregistering &pingv6_prot. Example of panic (triggered by faking a failure of proto_register(&pingv6_prot, 1)): general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [...] RIP: 0010:__list_del_entry_valid+0x79/0x160 [...] Call Trace: proto_unregister+0xbb/0x550 ? trace_preempt_on+0x6f0/0x6f0 ? sock_no_shutdown+0x10/0x10 inet6_init+0x153/0x1b8 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ipv6: fix cleanup ordering for ip6_mr failureSabrina Dubroca
Commit 15e668070a64 ("ipv6: reorder icmpv6_init() and ip6_mr_init()") moved the cleanup label for ipmr_fail, but should have changed the contents of the cleanup labels as well. Now we can end up cleaning up icmpv6 even though it hasn't been initialized (jump to icmp_fail or ipmr_fail). Simply undo things in the reverse order of their initialization. Example of panic (triggered by faking a failure of icmpv6_init): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [...] RIP: 0010:__list_del_entry_valid+0x79/0x160 [...] Call Trace: ? lock_release+0x8a0/0x8a0 unregister_pernet_operations+0xd4/0x560 ? ops_free_list+0x480/0x480 ? down_write+0x91/0x130 ? unregister_pernet_subsys+0x15/0x30 ? down_read+0x1b0/0x1b0 ? up_read+0x110/0x110 ? kmem_cache_create_usercopy+0x1b4/0x240 unregister_pernet_subsys+0x1d/0x30 icmpv6_cleanup+0x1d/0x30 inet6_init+0x1b5/0x23f Fixes: 15e668070a64 ("ipv6: reorder icmpv6_init() and ip6_mr_init()") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/ncsi: remove duplicated include from ncsi-netlink.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/sched: act_pedit: fix dump of extended layered opDavide Caratti
in the (rare) case of failure in nla_nest_start(), missing NULL checks in tcf_pedit_key_ex_dump() can make the following command # tc action add action pedit ex munge ip ttl set 64 dereference a NULL pointer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 0 PID: 3336 Comm: tc Tainted: G E 4.18.0.pedit+ #425 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit] Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24 RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246 RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40 R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988 FS: 00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0 Call Trace: ? __nla_reserve+0x38/0x50 tcf_action_dump_1+0xd2/0x130 tcf_action_dump+0x6a/0xf0 tca_get_fill.constprop.31+0xa3/0x120 tcf_action_add+0xd1/0x170 tc_ctl_action+0x137/0x150 rtnetlink_rcv_msg+0x263/0x2d0 ? _cond_resched+0x15/0x40 ? rtnl_calcit.isra.30+0x110/0x110 netlink_rcv_skb+0x4d/0x130 netlink_unicast+0x1a3/0x250 netlink_sendmsg+0x2ae/0x3a0 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x26f/0x2d0 ? do_wp_page+0x8e/0x5f0 ? handle_pte_fault+0x6c3/0xf50 ? __handle_mm_fault+0x38e/0x520 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f75a4583ba0 Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0 RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003 RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000 R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100 Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit] CR2: 0000000000000000 Like it's done for other TC actions, give up dumping pedit rules and return an error if nla_nest_start() returns NULL. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29tipc: switch to rhashtable iteratorCong Wang
syzbot reported a use-after-free in tipc_group_fill_sock_diag(), where tipc_group_fill_sock_diag() still reads tsk->group meanwhile tipc_group_delete() just deletes it in tipc_release(). tipc_nl_sk_walk() aims to lock this sock when walking each sock in the hash table to close race conditions with sock changes like this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately this doesn't work at all. All non-BH call path should take lock_sock() instead to make it work. tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu() where RCU read lock is required, this is the reason why lock_sock() can't be taken on this path. This could be resolved by switching to rhashtable iterator API's, where taking a sleepable lock is possible. Also, the iterator API's are friendly for restartable calls like diag dump, the last position is remembered behind the scence, all we need to do here is saving the iterator into cb->args[]. I tested this with parallel tipc diag dump and thousands of tipc socket creation and release, no crash or memory leak. Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29tipc: fix a missing rhashtable_walk_exit()Cong Wang
rhashtable_walk_exit() must be paired with rhashtable_walk_enter(). Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions") Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29vti6: remove !skb->ignore_df check from vti6_xmit()Alexey Kodanev
Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting on xmit") '!skb->ignore_df' check was always true because the function skb_scrub_packet() was called before it, resetting ignore_df to zero. In the commit, skb_scrub_packet() was moved below, and now this check can be false for the packet, e.g. when sending it in the two fragments, this prevents successful PMTU updates in such case. The next attempts to send the packet lead to the same tx error. Moreover, vti6 initial MTU value relies on PMTU adjustments. This issue can be reproduced with the following LTP test script: udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000 Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29xsk: expose xdp_umem_get_{data,dma} to driversBjörn Töpel
Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h, so that the upcoming zero-copy implementation in the Ethernet drivers can utilize them. Also, supply some dummy function implementations for CONFIG_XDP_SOCKETS=n configs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: export xdp_rxq_info_unreg_mem_modelBjörn Töpel
Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model, so it can be used from netdev drivers. Also, add additional checks for the memory type. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPYBjörn Töpel
This commit adds proper MEM_TYPE_ZERO_COPY support for convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might make sense to implement a more sophisticated thread-safe alloc/free scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is required in the fast-path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf: fix sg shift repair start offset in bpf_msg_pull_dataDaniel Borkmann
When we perform the sg shift repair for the scatterlist ring, we currently start out at i = first_sg + 1. However, this is not correct since the first_sg could point to the sge sitting at slot MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var() helper for iterating through the ring, and apply the same rule for advancing to the next ring element as we do elsewhere. Later work will use this helper also in other places. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf: fix shift upon scatterlist ring wrap-around in bpf_msg_pull_dataDaniel Borkmann
If first_sg and last_sg wraps around in the scatterlist ring, then we need to account for that in the shift as well. E.g. crafting such msgs where this is the case leads to a hang as shift becomes negative. E.g. consider the following scenario: first_sg := 14 |=> shift := -12 msg->sg_start := 10 last_sg := 3 | msg->sg_end := 5 round 1: i := 15, move_from := 3, sg[15] := sg[ 3] round 2: i := 0, move_from := -12, sg[ 0] := sg[-12] round 3: i := 1, move_from := -11, sg[ 1] := sg[-11] round 4: i := 2, move_from := -10, sg[ 2] := sg[-10] [...] round 13: i := 11, move_from := -1, sg[ 2] := sg[ -1] round 14: i := 12, move_from := 0, sg[ 2] := sg[ 0] round 15: i := 13, move_from := 1, sg[ 2] := sg[ 1] round 16: i := 14, move_from := 2, sg[ 2] := sg[ 2] round 17: i := 15, move_from := 3, sg[ 2] := sg[ 3] [...] This means we will loop forever and never hit the msg->sg_end condition to break out of the loop. When we see that the ring wraps around, then the shift should be MAX_SKB_FRAGS - first_sg + last_sg - 1. Meaning, the remainder slots from the tail of the ring and the head until last_sg combined. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_dataDaniel Borkmann
In the current code, msg->data is set as sg_virt(&sg[i]) + start - offset and msg->data_end relative to it as msg->data + bytes. Using iterator i to point to the updated starting scatterlist element holds true for some cases, however not for all where we'd end up pointing out of bounds. It is /correct/ for these ones: 1) When first finding the starting scatterlist element (sge) where we find that the page is already privately owned by the msg and where the requested bytes and headroom fit into the sge's length. However, it's /incorrect/ for the following ones: 2) After we made the requested area private and updated the newly allocated page into first_sg slot of the scatterlist ring; when we find that no shift repair of the ring is needed where we bail out updating msg->data and msg->data_end. At that point i will point to last_sg, which in this case is the next elem of first_sg in the ring. The sge at that point might as well be invalid (e.g. i == msg->sg_end), which we use for setting the range of sg_virt(&sg[i]). The correct one would have been first_sg. 3) Similar as in 2) but when we find that a shift repair of the ring is needed. In this case we fix up all sges and stop once we've reached the end. In this case i will point to will point to the new msg->sg_end, and the sge at that point will be invalid. Again here the requested range sits in first_sg. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29netfilter: nf_tables: rework ct timeout set supportFlorian Westphal
Using a private template is problematic: 1. We can't assign both a zone and a timeout policy (zone assigns a conntrack template, so we hit problem 1) 2. Using a template needs to take care of ct refcount, else we'll eventually free the private template due to ->use underflow. This patch reworks template policy to instead work with existing conntrack. As long as such conntrack has not yet been placed into the hash table (unconfirmed) we can still add the timeout extension. The only caveat is that we now need to update/correct ct->timeout to reflect the initial/new state, otherwise the conntrack entry retains the default 'new' timeout. Side effect of this change is that setting the policy must now occur from chains that are evaluated *after* the conntrack lookup has taken place. No released kernel contains the timeout policy feature yet, so this change should be ok. Changes since v2: - don't handle 'ct is confirmed case' - after previous patch, no need to special-case tcp/dccp/sctp timeout anymore Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29netfilter: conntrack: place 'new' timeout in first location tooFlorian Westphal
tcp, sctp and dccp trackers re-use the userspace ctnetlink states to index their timeout arrays, which means timeout[0] is never used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well so external users can simply read it off timeouts[0] without need to differentiate dccp/sctp/tcp and udp/icmp/gre/generic. The alternative is to map all array accesses to 'i - 1', but that is a much more intrusive change. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29mac80211: avoid kernel panic when building AMSDU from non-linear SKBSara Sharon
When building building AMSDU from non-linear SKB, we hit a kernel panic when trying to push the padding to the tail. Instead, put the padding at the head of the next subframe. This also fixes the A-MSDU subframes to not have the padding accounted in the length field and not have pad at all for the last subframe, both required by the spec. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Sara Sharon <sara.sharon@intel.com> Reviewed-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29mac80211: mesh: fix HWMP sequence numbering to follow standardYuan-Chi Pang
IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says: If it is a target mesh STA, it shall update its own HWMP SN to maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1 immediately before it generates a PREP element in response to a PREQ element. Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29cfg80211: clarify frames covered by average ACK signal reportBalaji Pothunoori
Modify the API to include all ACK frames in average ACK signal strength reporting, not just ACKs for data frames. Make exposing the data conditional on implementing the extended feature flag. This is how it was really implemented in mac80211, update the code there to use the new defines and clean up some of the setting code. Keep nl80211.h source compatibility by keeping the old names. Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> [rewrite commit log, change compatibility to be old=new instead of the other way around, update kernel-doc, roll in mac80211 changes, make mac80211 depend on valid bit instead of HW flag] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29mac80211: add missing WFA Multi-AP backhaul STA Rx requirementSathishkumar Muruganandam
The current mac80211 WDS (4-address mode) can be used to cover most of the Multi-AP requirements for Data frames per the WFA Multi-AP Specification v1.0. When configuring AP/STA interfaces in 4-address mode, they are able to function as fronthaul AP/backhaul STA of Multi-AP device complying below Tx, Rx requirements except one missing STA Rx requirement added by this patch. Multi-AP specification section 14.1 describes the following requirements: Transmitter requirements ------------------------ 1. Fronthaul AP i) When DA!=RA of backhaul STA, must use 4-address format ii) When DA==RA of backhaul STA, shall use either 3-address or 4-address format with RA updated with STA MAC (mac80211 support 4-address format via AP/VLAN interface) 2. Backhaul STA i) When SA!=TA of backhaul STA, must use 4-address format ii) When SA==TA of backhaul STA, shall use either 3-address or 4-address format with RA updated with AP MAC (mac80211 support 4-address format via use_4addr) Receiver requirements --------------------- 1. Fronthaul AP i) When SA!=TA of backhaul STA, must support receiving 4-address format frames ii) When SA==TA of backhaul STA, must support receiving both 3-address and 4-address format frames (mac80211 support both 3-addr & 4-addr via AP/VLAN interface) 2. Backhaul STA i) When DA!=RA of backhaul STA, must support receiving 4-address format frames ii) When DA==RA of backhaul STA, must support receiving both 3-address and 4-address format frames (mac80211 support only receiving 4-address format via use_4addr) This patch addresses the above Rx requirement (ii) for backhaul STA to receive unicast (DA==RA) 3-address frames in addition to 4-address frames. The current design doesn't accept 3-address frames when configured in 4-address mode (use_4addr). Hence add a check to allow 3-address frames when DA==RA of backhaul STA (adhering to Table 9-26 of IEEE Std 802.11™-2016). This case was tested with a bridged station interface when associated with a non-mac80211 based vendor AP implementation using 3-address frames for WDS. STA was able to support the Multi-AP Rx requirement when DA==RA. No issues, no loops seen when tested with mac80211 based AP as well. Verified and confirmed all other Tx and Rx requirements of AP and STA for Multi-AP respectively. They all work using the current mac80211-WDS design. Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29xfrm: allow driver to quietly refuse offloadShannon Nelson
If the "offload" attribute is used to create an IPsec SA and the .xdo_dev_state_add() fails, the SA creation fails. However, if the "offload" attribute is used on a device that doesn't offer it, the attribute is quietly ignored and the SA is created without an offload. Along the same line of that second case, it would be good to have a way for the device to refuse to offload an SA without failing the whole SA creation. This patch adds that feature by allowing the driver to return -EOPNOTSUPP as a signal that the SA may be fine, it just can't be offloaded. This allows the user a little more flexibility in requesting offloads and not needing to know every detail at all times about each specific NIC when trying to create SAs. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-29esp: remove redundant define esphHaishuang Yan
The pointer 'esph' is defined but is never used hence it is redundant and canbe removed. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-29xfrm: Make function xfrmi_get_link_net() staticWei Yongjun
Fixes the following sparse warning: net/xfrm/xfrm_interface.c:745:12: warning: symbol 'xfrmi_get_link_net' was not declared. Should it be static? Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-08-28bpf: fix several offset tests in bpf_msg_pull_dataDaniel Borkmann
While recently going over bpf_msg_pull_data(), I noticed three issues which are fixed in here: 1) When we attempt to find the first scatterlist element (sge) for the start offset, we add len to the offset before we check for start < offset + len, whereas it should come after when we iterate to the next sge to accumulate the offsets. For example, given a start offset of 12 with a sge length of 8 for the first sge in the list would lead us to determine this sge as the first sge thinking it covers first 16 bytes where start is located, whereas start sits in subsequent sges so we would end up pulling in the wrong data. 2) After figuring out the starting sge, we have a short-cut test in !msg->sg_copy[i] && bytes <= len. This checks whether it's not needed to make the page at the sge private where we can just exit by updating msg->data and msg->data_end. However, the length test is not fully correct. bytes <= len checks whether the requested bytes (end - start offsets) fit into the sge's length. The part that is missing is that start must not be sge length aligned. Meaning, the start offset into the sge needs to be accounted as well on top of the requested bytes as otherwise we can access the sge out of bounds. For example the sge could have length of 8, our requested bytes could have length of 8, but at a start offset of 4, so we also would need to pull in 4 bytes of the next sge, when we jump to the out label we do set msg->data to sg_virt(&sg[i]) + start - offset and msg->data_end to msg->data + bytes which would be oob. 3) The subsequent bytes < copy test for finding the last sge has the same issue as in point 2) but also it tests for less than rather than less or equal to. Meaning if the sge length is of 8 and requested bytes of 8 while having the start aligned with the sge, we would unnecessarily go and pull in the next sge as well to make it private. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-28nl80211: Pass center frequency in kHz instead of MHzHaim Dreyfuss
freq_reg_info expects to get the frequency in kHz. Instead we accidently pass it in MHz. Thus, currently the function always return ERR rule. Fix that. Fixes: 50f32718e125 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command") Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [fix kHz/MHz in commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28nl80211: Fix nla_put_u8 to u16 for NL80211_WMMR_TXOPHaim Dreyfuss
TXOP (also known as Channel Occupancy Time) is u16 and should be added using nla_put_u16 instead of u8, fix that. Fixes: 50f32718e125 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command") Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28cfg80211: Add support for 60GHz band channels 5 and 6Alexei Avshalom Lazar
The current support in the 60GHz band is for channels 1-4. Add support for channels 5 and 6. This requires enlarging ieee80211_channel.center_freq from u16 to u32. Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28mac80211: add stop/start logic for software TXQsManikanta Pubbisetty
Sometimes, it is required to stop the transmissions momentarily and resume it later; stopping the txqs becomes very critical in scenarios where the packet transmission has to be ceased completely. For example, during the hardware restart, during off channel operations, when initiating CSA(upon detecting a radar on the DFS channel), etc. The TX queue stop/start logic in mac80211 works well in stopping the TX when drivers make use of netdev queues, i.e, when Qdiscs in network layer take care of traffic scheduling. Since the devices implementing wake_tx_queue can run without Qdiscs, packets will be handed to mac80211 directly without queueing them in the netdev queues. Also, mac80211 does not invoke any of the netif_stop_*/netif_wake_* APIs if wake_tx_queue is implemented. Since the queues are not stopped in this case, transmissions can continue and this will impact negatively on the operation of the wireless device. For example, During hardware restart, we stop the netdev queues so that packets are not sent to the driver. Since ath10k implements wake_tx_queue, TX queues will not be stopped and packets might reach the hardware while it is restarting; this can make hardware unresponsive and the only possible option for recovery is to reboot the entire system. There is another problem to this, it is observed that the packets were sent on the DFS channel for a prolonged duration after radar detection impacting the channel closing time. We can still invoke netif stop/wake APIs when wake_tx_queue is implemented but this could lead to packet drops in network layer; adding stop/start logic for software TXQs in mac80211 instead makes more sense; the change proposed adds the same in mac80211. Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>