summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-08-17netfilter: conntrack: do not dump other netns's conntrack entries via procLiping Zhang
We should skip the conntracks that belong to a different namespace, otherwise other unrelated netns's conntrack entries will be dumped via /proc/net/nf_conntrack. Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces") Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-16Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost fixes from Michael Tsirkin: - test fixes - a vsock fix * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: add dma stubs vhost/test: fix after swiotlb changes vhost/vsock: drop space available check for TX vq ringtest: test build fix
2016-08-15tipc: fix NULL pointer dereference in shutdown()Vegard Nossum
tipc_msg_create() can return a NULL skb and if so, we shouldn't try to call tipc_node_xmit_skb() on it. general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000 RIP: 0010:[<ffffffff830bb46b>] [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580 RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000 R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000 FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0 DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 Stack: 0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208 ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018 0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611 Call Trace: [<ffffffff830c84a3>] tipc_shutdown+0x553/0x880 [<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00 RIP [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP <ffff8800595bfce8> ---[ end trace 57b0484e351e71f1 ]--- I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure userspace is equipped to handle that. Anyway, this is better than a GPF and looks somewhat consistent with other tipc_msg_create() callers. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-15switchdev: Put export declaration in the right placeOr Gerlitz
Move exporting of switchdev_port_same_parent_id to be right below it and not elsewhere. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-15gre: set inner_protocol on xmitSimon Horman
Ensure that the inner_protocol is set on transmit so that GSO segmentation, which relies on that field, works correctly. This is achieved by setting the inner_protocol in gre_build_header rather than each caller of that function. It ensures that the inner_protocol is set when gre_fb_xmit() is used to transmit GRE which was not previously the case. I have observed this is not the case when OvS transmits GRE using lwtunnel metadata (which it always does). Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol") Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-15net: ipv6: Fix ping to link-local addresses.Lorenzo Colitti
ping_v6_sendmsg does not set flowi6_oif in response to sin6_scope_id or sk_bound_dev_if, so it is not possible to use these APIs to ping an IPv6 address on a different interface. Instead, it sets flowi6_iif, which is incorrect but harmless. Stop setting flowi6_iif, and support various ways of setting oif in the same priority order used by udpv6_sendmsg. Tested: https://android-review.googlesource.com/#/c/254470/ Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14net: make net namespace sysctls belong to container's ownerDmitry Torokhov
If net namespace is attached to a user namespace let's make container's root owner of sysctls affecting said network namespace instead of global root. This also allows us to clean up net_ctl_permissions() because we do not need to fudge permissions anymore for the container's owner since it now owns the objects in question. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14netns: do not call pernet ops for not yet set up init_net namespaceDmitry Torokhov
When CONFIG_NET_NS is disabled, registering pernet operations causes init() to be called immediately with init_net as an argument. Unfortunately this leads to some pernet ops, such as proc_net_ns_init() to be called too early, when init_net namespace has not been fully initialized. This causes issues when we want to change pernet ops to use more data from the net namespace in question, for example reference user namespace that owns our network namespace. To fix this we could either play game of musical chairs and rearrange init order, or we could do the same as when CONFIG_NET_NS is enabled, and postpone calling pernet ops->init() until namespace is set up properly. Note that we can not simply undo commit ed160e839d2e ("[NET]: Cleanup pernet operation without CONFIG_NET_NS") and use the same implementations for __register_pernet_operations() and __unregister_pernet_operations(), because many pernet ops are marked as __net_initdata and will be discarded, which wreaks havoc on our ops lists. Here we rely on the fact that we only use lists until init_net is fully initialized, which happens much earlier than discarding __net_initdata sections. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-15vhost/vsock: drop space available check for TX vqGerard Garcia
Remove unnecessary use of enable/disable callback notifications and the incorrect more space available check. The virtio_transport_tx_work handles when the TX virtqueue has more buffers available. Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-13net: remove type_check from dev_get_nest_level()Sabrina Dubroca
The idea for type_check in dev_get_nest_level() was to count the number of nested devices of the same type (currently, only macvlan or vlan devices). This prevented the false positive lockdep warning on configurations such as: eth0 <--- macvlan0 <--- vlan0 <--- macvlan1 However, this doesn't prevent a warning on a configuration such as: eth0 <--- macvlan0 <--- vlan0 eth1 <--- vlan1 <--- macvlan1 In this case, all the locks end up with a nesting subclass of 1, so lockdep thinks that there is still a deadlock: - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then take (vlan_netdev_xmit_lock_key, 1) - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then take (macvlan_netdev_addr_lock_key, 1) By removing the linktype check in dev_get_nest_level() and always incrementing the nesting depth, lockdep considers this configuration valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13net: ipv6: Do not keep IPv6 addresses when IPv6 is disabledMike Manning
If IPv6 is disabled when the option is set to keep IPv6 addresses on link down, userspace is unaware of this as there is no such indication via netlink. The solution is to remove the IPv6 addresses in this case, which results in netlink messages indicating removal of addresses in the usual manner. This fix also makes the behavior consistent with the case of having IPv6 disabled first, which stops IPv6 addresses from being added. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13Merge tag 'mac80211-next-for-davem-2016-08-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Not much for -next so far, but here it goes: * send more nl80211 events for interfaces * remove useless network/transport offset mangling code * validate beacon intervals identically for all interface types * use driver rate estimates for mesh * fix a compiler type/signedness warning ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13net/sctp: always initialise sctp_ht_iter::start_failVegard Nossum
sctp_transport_seq_start() does not currently clear iter->start_fail on success, but relies on it being zero when it is allocated (by seq_open_net()). This can be a problem in the following sequence: open() // allocates iter (and implicitly sets iter->start_fail = 0) read() - iter->start() // fails and sets iter->start_fail = 1 - iter->stop() // doesn't call sctp_transport_walk_stop() (correct) read() again - iter->start() // succeeds, but doesn't change iter->start_fail - iter->stop() // doesn't call sctp_transport_walk_stop() (wrong) We should initialize sctp_ht_iter::start_fail to zero if ->start() succeeds, otherwise it's possible that we leave an old value of 1 there, which will cause ->stop() to not call sctp_transport_walk_stop(), which causes all sorts of problems like not calling rcu_read_unlock() (and preempt_enable()), eventually leading to more warnings like this: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2 Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295892>] _raw_spin_lock+0x12/0x40 [<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150 [<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60 [<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff81439e50>] traverse+0x170/0x850 [<ffffffff8143aeec>] seq_read+0x7cc/0x1180 [<ffffffff814f996c>] proc_reg_read+0xbc/0x180 [<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210 [<ffffffff813d2a95>] do_readv_writev+0x565/0x660 [<ffffffff813d6857>] vfs_readv+0x67/0xa0 [<ffffffff813d6c16>] do_preadv+0x126/0x170 [<ffffffff813d710c>] SyS_preadv+0xc/0x10 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Notice that this is a subtly different stacktrace from the one in commit 5fc382d875 ("net/sctp: terminate rhashtable walk correctly"). Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13net/irda: handle iriap_register_lsap() allocation failureVegard Nossum
If iriap_register_lsap() fails to allocate memory, self->lsap is set to NULL. However, none of the callers handle the failure and irlmp_connect_request() will happily dereference it: iriap_register_lsap: Unable to allocated LSAP! ================================================================================ UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2 member access within null pointer of type 'struct lsap_cb' CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3 ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880 ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a Call Trace: [<ffffffff82344f40>] dump_stack+0xac/0xfc [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411 [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970 [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160 [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0 [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120 [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0 [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630 [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0 [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340 [<ffffffff833c7e09>] SyS_connect+0x9/0x10 [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0 [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25 ================================================================================ The bug seems to have been around since forever. There's more problems with missing error checks in iriap_init() (and indeed all of irda_init()), but that's a bigger problem that needs very careful review and testing. This patch will fix the most serious bug (as it's easily reached from unprivileged userspace). I have tested my patch with a reproducer. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13bpf: fix write helpers with regards to non-linear partsDaniel Borkmann
Fix the bpf_try_make_writable() helper and all call sites we have in BPF, it's currently defect with regards to skbs when the write_len spans into non-linear parts, no matter if cloned or not. There are multiple issues at once. First, using skb_store_bits() is not correct since even if we have a cloned skb, page frags can still be shared. To really make them private, we need to pull them in via __pskb_pull_tail() first, which also gets us a private head via pskb_expand_head() implicitly. This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(), bpf_l4_csum_replace(). Really, the only thing reasonable and working here is to call skb_ensure_writable() before any write operation. Meaning, via pskb_may_pull() it makes sure that parts we want to access are pulled in and if not does so plus unclones the skb implicitly. If our write_len still fits the headlen and we're cloned and our header of the clone is not writable, then we need to make a private copy via pskb_expand_head(). skb_store_bits() is a bit misleading and only safe to store into non-linear data in different contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory"). For above BPF helper functions, it means after fixed bpf_try_make_writable(), we've pulled in enough, so that we operate always based on skb->data. Thus, the call to skb_header_pointer() and skb_store_bits() becomes superfluous. In bpf_skb_store_bytes(), the len check is unnecessary too since it can only pass in maximum of BPF stack size, so adding offset is guaranteed to never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper offset alignment since they use __sum16 pointer for writing resulting csum. The remaining helpers that change skb data not discussed here yet are bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The vlan helpers internally call either skb_ensure_writable() (pop case) and skb_cow_head() (push case, for head expansion), respectively. Similarly, bpf_skb_proto_xlat() takes care to not mangle page frags. Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers") Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13calipso: fix resource leak on calipso_genopt failureColin Ian King
Currently, if calipso_genopt fails then the error exit path does not free the ipv6_opt_hdr new causing a memory leak. Fix this by kfree'ing new on the error exit path. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13netfilter: remove ip_conntrack* sysctl compat codePablo Neira Ayuso
This backward compatibility has been around for more than ten years, since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and the conntrack utility got adopted by many people in the user community according to what I observed on the netfilter user mailing list. So let's get rid of this. Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do not need to be exported as symbol anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12bpf: fix bpf_skb_in_cgroup helper namingDaniel Borkmann
While hashing out BPF's current_task_under_cgroup helper bits, it came to discussion that the skb_in_cgroup helper name was suboptimally chosen. Tejun says: So, I think in_cgroup should mean that the object is in that particular cgroup while under_cgroup in the subhierarchy of that cgroup. Let's rename the other subhierarchy test to under too. I think that'd be a lot less confusing going forward. [...] It's more intuitive and gives us the room to implement the real "in" test if ever necessary in the future. Since this touches uapi bits, we need to change this as long as v4.8 is not yet officially released. Thus, change the helper enum and rename related bits. Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto") Reference: http://patchwork.ozlabs.org/patch/658500/ Suggested-by: Sargun Dhillon <sargun@sargun.me> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-08-12sit: make function ipip6_valid_ip_proto() staticWei Yongjun
Fixes the following sparse warning: net/ipv6/sit.c:1129:6: warning: symbol 'ipip6_valid_ip_proto' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12Merge tag 'batadv-next-for-davem-20160812' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes (mostly chronological order): - bump version strings, by Simon Wunderlich - kerneldoc clean up, by Sven Eckelmann - enable RTNL automatic loading and according documentation changes, by Sven Eckelmann (2 patches) - fix/improve interface removal and associated locking, by Sven Eckelmann (3 patches) - clean up unused variables, by Linus Luessing - implement Gateway selection code for B.A.T.M.A.N. V by Antonio Quartulli (4 patches) - rewrite TQ comparison by Markus Pargmann - fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels kbuild test robot) and bitwise arithmetic operations (by Linus Luessing) - rewrite packet creation for forwarding for readability and to avoid reference count mistakes, by Linus Luessing - use kmem_cache for translation table, which results in more efficient storing of translation table entries, by Sven Eckelmann - rewrite/clarify reference handling for send_skb_unicast, by Sven Eckelmann - fix debug messages when updating routes, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures
2016-08-12netfilter: nf_tables: add hash expressionLaura Garcia Liebana
This patch adds a new hash expression, this provides jhash support but this can be extended to support for other hash functions. The modulus and seed already comes embedded into this new expression. Use case example: ... meta mark set hash ip saddr mod 10 Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12xfrm: policy: convert policy_lock to spinlockFlorian Westphal
After earlier patches conversions all spots acquire the writer lock and we can now convert this to a normal spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: don't acquire policy lock in xfrm_spd_getinfoFlorian Westphal
It doesn't seem that important. We now get inconsistent view of the counters, but those are stale anyway right after we drop the lock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: only use rcu in xfrm_sk_policy_lookupFlorian Westphal
Don't acquire the readlock anymore and rely on rcu alone. In case writer on other CPU changed policy at the wrong moment (after we obtained sk policy pointer but before we could obtain the reference) just repeat the lookup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: make xfrm_policy_lookup_bytype locklessFlorian Westphal
side effect: no longer disables BH (should be fine). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: use atomic_inc_not_zero in rcu sectionFlorian Westphal
If we don't hold the policy lock anymore the refcnt might already be 0, i.e. policy struct is about to be free'd. Switch to atomic_inc_not_zero to avoid this. On removal policies are already unlinked from the tables (lists) before the last _put occurs so we are not supposed to find the same 'dead' entry on the next loop, so its safe to just repeat the lookup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: add sequence count to sync with hash resizeFlorian Westphal
Once xfrm_policy_lookup_bytype doesn't grab xfrm_policy_lock anymore its possible for a hash resize to occur in parallel. Use sequence counter to block lookup in case a resize is in progress and to also re-lookup in case hash table was altered in the mean time (might cause use to not find the best-match). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: prepare policy_bydst hash for rcu lookupsFlorian Westphal
Since commit 56f047305dd4b6b617 ("xfrm: add rcu grace period in xfrm_policy_destroy()") xfrm policy objects are already free'd via rcu. In order to make more places lockless (i.e. use rcu_read_lock instead of grabbing read-side of policy rwlock) we only need to: - use rcu_assign_pointer to store address of new hash table backend memory - add rcu barrier so that freeing of old memory is delayed (expansion and free happens from system workqueue, so synchronize_rcu is fine) - use rcu_dereference to fetch current address of the hash table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12xfrm: policy: use rcu versions for iteration and list add/delFlorian Westphal
This is required once we allow lockless readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12netfilter: nf_tables: rename set implementationsPablo Neira Ayuso
Use nft_set_* prefix for backend set implementations, thus we can use nft_hash for the new hash expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12ipvs: use nf_ct_kill helperFlorian Westphal
Once timer is removed from nf_conn struct we cannot open-code the removal sequence anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12netfilter: use_nf_conn_expires helper in more placesFlorian Westphal
... so we don't need to touch all of these places when we get rid of the timer in nf_conn. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12netfilter: nf_dup4: remove redundant checksum recalculationLiping Zhang
IP header checksum will be recalculated at ip_local_out, so there's no need to calculated it here, remove it. Also update code comments to illustrate it, and delete the misleading comments about checksum recalculation. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12netfilter: physdev: add missed blankHangbin Liu
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12netfilter: conntrack: Only need first 4 bytes to get l4proto portsGao Feng
We only need first 4 bytes instead of 8 bytes to get the ports of tcp/udp/dccp/sctp/udplite in their pkt_to_tuple function. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost fixes and cleanups from Michael Tsirkin: "Misc fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio/s390: deprecate old transport virtio/s390: keep early_put_chars virtio_blk: Fix a slient kernel panic virtio-vsock: fix include guard typo vhost/vsock: fix vhost virtio_vsock_pkt use-after-free 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() virtio: fix error handling for debug builds virtio: fix memory leak in virtqueue_add()
2016-08-11nl80211: explicitly check enum nl80211_mesh_power_modeJohannes Berg
Different gcc versions appear to be treating enum with different signedness, causing warnings with the out parameter one way or the other. Just use the correct type to avoid all that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11mac80211: call get_expected_throughput only after adding stationMaxim Altshul
Depending on which method the driver implements, userspace could call this (indirectly, by getting station info) before the driver knows about the station, possibly causing it to misbehave. Therefore, add a check for sta->uploaded which indicates that the driver knows about the station. Signed-off-by: Maxim Altshul <maxim.altshul@ti.com> [reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11cfg80211: identically validate beacon interval for AP/MESH/IBSSPurushottam Kushwaha
Beacon interval interface combinations validation was missing for MESH/IBSS join, add those. Johannes: also move the beacon interval check disallowing really tiny and really big intervals into the common function, which adds it for AP mode. Signed-off-by: Purushottam Kushwaha <pkushwah@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11cfg80211: always notify userspace when wireless netdev is removedDenis Kenzior
This change alters the semantics of NL80211_CMD_DEL_INTERFACE events by always sending this event whenever a net_device object associated with a wdev is destroyed. Prior to this change, this event was only emitted as a result of NL80211_CMD_DEL_INTERFACE command sent from userspace. This allows userspace to reliably detect when wireless interfaces have been removed, e.g. due to USB removal events, etc. For wireless device objects without an associated net_device (e.g. NL80211_IFTYPE_P2P_DEVICE), the NL80211_CMD_DEL_INTERFACE event is now generated inside cfg80211_unregister_wdev. Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11cfg80211: always notify userspace of new wireless netdevsDenis Kenzior
This change alters the semantics of NL80211_CMD_NEW_INTERFACE events by always sending this event whenever a new net_device object associated with a wdev is registered. Prior to this change, this event was only sent as a result of NL80211_CMD_NEW_INTERFACE command sent from userspace. This allows userspace to reliably detect new wireless interfaces (e.g. due to hardware hot-plug events, etc). For wdevs created without an associated net_device object (e.g. NL80211_IFTYPE_P2P_DEVICE), the NL80211_CMD_NEW_INTERFACE event is still generated inside the relevant nl80211 command handler. Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11mac80211: remove skb header offset mangling in ieee80211_build_hdrFelix Fietkau
Since the code only touches the MAC headers, the offsets to the network/transport headers remain the same throughout this function. Remove pointless pieces of code that try to 'preserve' them. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11mac80211: mesh: Add support for HW RC implementationMaxim Altshul
Mesh HWMP module will be able to rely on the HW RC algorithm if it exists, for path metric calculations. This allows the metric calculation mechanism to calculate a correct metric, based on PER and last TX rate both via HW RC algorithm if it exists or via parameters collected by the SW. Signed-off-by: Maxim Altshul <maxim.altshul@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-11net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_keyAlexey Kodanev
Running LTP 'icmp-uni-basic.sh -6 -p ipcomp -m tunnel' test over openvswitch + veth can trigger kernel panic: BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0 IP: [<ffffffff8169d1d2>] xfrm_input+0x82/0x750 ... [<ffffffff816d472e>] xfrm6_rcv_spi+0x1e/0x20 [<ffffffffa082c3c2>] xfrm6_tunnel_rcv+0x42/0x50 [xfrm6_tunnel] [<ffffffffa082727e>] tunnel6_rcv+0x3e/0x8c [tunnel6] [<ffffffff8169f365>] ip6_input_finish+0xd5/0x430 [<ffffffff8169fc53>] ip6_input+0x33/0x90 [<ffffffff8169f1d5>] ip6_rcv_finish+0xa5/0xb0 ... It seems that tunnel.ip6 can have garbage values and also dereferenced without a proper check, only tunnel.ip4 is being verified. Fix it by adding one more if block for AF_INET6 and initialize tunnel.ip6 with NULL inside xfrm6_rcv_spi() (which is similar to xfrm4_rcv_spi()). Fixes: 049f8e2 ("xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-10openvswitch: do not ignore netdev errors when creating tunnel vportsMartynas Pumputis
The creation of a tunnel vport (geneve, gre, vxlan) brings up a corresponding netdev, a multi-step operation which can fail. For example, changing a vxlan vport's netdev state to 'up' binds the vport's socket to a UDP port - if the binding fails (e.g. due to the port being in use), the error is currently ignored giving the appearance that the tunnel vport creation completed successfully. Signed-off-by: Martynas Pumputis <martynas@weave.works> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10tipc: fix variable dereference before NULL checkParthasarathy Bhuvaragan
In commit cf6f7e1d5109 ("tipc: dump monitor attributes"), I dereferenced a pointer before checking if its valid. This is reported by static check Smatch as: net/tipc/monitor.c:733 tipc_nl_add_monitor_peer() warn: variable dereferenced before check 'mon' (see line 731) In this commit, we check for a valid monitor before proceeding with any other operation. Fixes: cf6f7e1d5109 ("tipc: dump monitor attributes") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10rps: Inspect PPTP encapsulated by GRE to get flow hashGao Feng
The PPTP is encapsulated by GRE header with that GRE_VERSION bits must contain one. But current GRE RPS needs the GRE_VERSION must be zero. So RPS does not work for PPTP traffic. In my test environment, there are four MIPS cores, and all traffic are passed through by PPTP. As a result, only one core is 100% busy while other three cores are very idle. After this patch, the usage of four cores are balanced well. Signed-off-by: Gao Feng <fgao@ikuai8.com> Reviewed-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10net: sched: convert qdisc linked list to hashtableJiri Kosina
Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10net: resolve symbol conflicts with generic hashtable.hJiri Kosina
This is a preparatory patch for converting qdisc linked list into a hashtable. As we'll need to include hashtable.h in netdevice.h, we first have to make sure that this will not introduce symbol conflicts for any of the netdevice.h users. Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>