summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-09-28packet: only test po->has_vnet_hdr once in packet_sndWillem de Bruijn
Packet socket option po->has_vnet_hdr can be updated concurrently with other operations if no ring is attached. Do not test the option twice in packet_snd, as the value may change in between calls. A race on setsockopt disable may cause a packet > mtu to be sent without having GSO options set. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28packet: in packet_do_bind, test fanout with bind_lock heldWillem de Bruijn
Once a socket has po->fanout set, it remains a member of the group until it is destroyed. The prot_hook must be constant and identical across sockets in the group. If fanout_add races with packet_do_bind between the test of po->fanout and taking the lock, the bind call may make type or dev inconsistent with that of the fanout group. Hold po->bind_lock when testing po->fanout to avoid this race. I had to introduce artificial delay (local_bh_enable) to actually observe the race. Fixes: dc99f600698d ("packet: Add fanout support.") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: rtnl_have_link_slave_info doesn't need rtnlFlorian Westphal
it can be switched to rcu. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helpers to dump netnsid informationFlorian Westphal
Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helpers to dump vf informationFlorian Westphal
similar to earlier patches, split out more parts of this function to better see what is happening and where we assume rtnl is locked. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helper to put master and link ifindexesFlorian Westphal
rtnl_fill_ifinfo currently requires caller to hold the rtnl mutex. Unfortunately the function is quite large which makes it harder to see which spots require the lock, which spots assume it and which ones could do without. Add helpers to factor out the ifindex dumping, one can use rcu to avoid rtnl dependency. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: dsa: Fix network device registration orderFlorian Fainelli
We cannot be registering the network device first, then setting its carrier off and finally connecting it to a PHY, doing that leaves a window during which the carrier is at best inconsistent, and at worse the device is not usable without a down/up sequence since the network device is visible to user space with possibly no PHY device attached. Re-order steps so that they make logical sense. This fixes some devices where the port was not usable after e.g: an unbind then bind of the driver. Fixes: 0071f56e46da ("dsa: Register netdev before phy") Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate u32 filter handlesCong Wang
Instead of calling u32_lookup_ht() in a loop to find a unused handle, just switch to idr API to allocate new handles. u32 filters are special as the handle could contain a hash table id and a key id, so we need two IDR to allocate each of them. Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate basic filter handlesCong Wang
Instead of calling basic_get() in a loop to find a unused handle, just switch to idr API to allocate new handles. Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate bpf filter handlesCong Wang
Instead of calling cls_bpf_get() in a loop to find a unused handle, just switch to idr API to allocate new handles. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28inetpeer: speed up inetpeer_invalidate_tree()Eric Dumazet
As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) From: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28inetpeer: fix RCU lookup() againEric Dumazet
My prior fix was not complete, as we were dereferencing a pointer three times per node, not twice as I initially thought. Fixes: 4cc5b44b29a9 ("inetpeer: fix RCU lookup()") Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28batman-adv: Fix "line over 80 characters" checkpatch warningSven Eckelmann
Fixes: 242c1a28eb61 ("net: Remove useless function skb_header_release") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-09-28batman-adv: Remove unnecessary parenthesesSven Eckelmann
checkpatch introduced with commit 63b7c73ec86b ("checkpatch: add --strict check for ifs with unnecessary parentheses") an additional test which identifies some unnecessary parentheses. Remove these unnecessary parentheses to avoid the warnings and to unify the coding style slightly more. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-09-28batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-09-28xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lockArtem Savkov
I might be wrong but it doesn't look like xfrm_state_lock is required for xfrm_policy_cache_flush and calling it under this lock triggers both "sleeping function called from invalid context" and "possible circular locking dependency detected" warnings on flush. Fixes: ec30d78c14a8 xfrm: add xdst pcpu cache Signed-off-by: Artem Savkov <asavkov@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-09-27net: mroute: Check if rule is a default ruleYotam Gigi
When the ipmr starts, it adds one default FIB rule that matches all packets and sends them to the DEFAULT (multicast) FIB table. A more complex rule can be added by user to specify that for a specific interface, a packet should be look up at either an arbitrary table or according to the l3mdev of the interface. For drivers willing to offload the ipmr logic into a hardware but don't want to offload all the FIB rules functionality, provide a function that can indicate whether the FIB rule is the default multicast rule, thus only one routing table is needed. This way, a driver can register to the FIB notification chain, get notifications about FIB rules added and trigger some kind of an internal abort mechanism when a non default rule is added by the user. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27net: ipmr: Add MFC offload indicationYotam Gigi
Allow drivers, registered to the fib notification chain indicate whether a multicast MFC route is offloaded or not, similarly to unicast routes. The indication of whether a route is offloaded is done using the mfc_flags field on an mfc_cache struct, and the information is sent to the userspace via the RTNetlink interface only. Currently, MFC routes are either offloaded or not, thus there is no need to add per-VIF offload indication. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Send FIB notifications on MFC and VIF entriesYotam Gigi
Use the newly introduced notification chain to send events upon VIF and MFC addition and deletion. The MFC notifications are sent only on resolved MFC entries, as unresolved cannot be offloaded. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Add FIB notification access functionsYotam Gigi
Make the ipmr module register as a FIB notifier. To do that, implement both the ipmr_seq_read and ipmr_dump ops. The ipmr_seq_read op returns a sequence counter that is incremented on every notification related operation done by the ipmr. To implement that, add a sequence counter in the netns_ipv4 struct and increment it whenever a new MFC route or VIF are added or deleted. The sequence operations are protected by the RTNL lock. The ipmr_dump iterates the list of MFC routes and the list of VIF entries and sends notifications about them. The entries dump is done under RCU where the VIF dump uses the mrt_lock too, as the vif->dev field can change under RCU. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Add reference count to MFC entriesYotam Gigi
Next commits will introduce MFC notifications through the atomic fib_notification chain, thus allowing modules to be aware of MFC entries. Due to the fact that modules may need to hold a reference to an MFC entry, add reference count to MFC entries to prevent them from being freed while these modules use them. The reference counting is done only on resolved MFC entries currently. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26sctp: Fix a big endian bug in sctp_diag_dump()Dan Carpenter
The sctp_for_each_transport() function takes an pointer to int. The cb->args[] array holds longs so it's only using the high 32 bits. It works on little endian system but will break on big endian 64 bit machines. Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: dsa: use phy_ethtool_nway_resetVivien Didelot
Use phy_ethtool_nway_reset now that dsa_slave_nway_reset does exactly the same. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: dsa: use phy_ethtool_set_link_ksettingsVivien Didelot
Use phy_ethtool_set_link_ksettings now that dsa_slave_set_link_ksettings does exactly the same. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: dsa: use phy_ethtool_get_link_ksettingsVivien Didelot
Use phy_ethtool_get_link_ksettings now that dsa_slave_get_link_ksettings does exactly the same. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: dsa: use slave device phydevVivien Didelot
There is no need to store a phy_device in dsa_slave_priv since net_device already provides one. Simply s/p->phy/dev->phydev/. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26net: dsa: return -ENODEV is there is no slave PHYVivien Didelot
Instead of returning -EOPNOTSUPP when a slave device has no PHY, directly return -ENODEV as ethtool and phylib do. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: add meta pointer for direct accessDaniel Borkmann
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: rename bpf_compute_data_end into bpf_compute_data_pointersDaniel Borkmann
Just do the rename into bpf_compute_data_pointers() as we'll add one more pointer here to recompute. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26netfilter: ipset: pernet ops must be unregistered lastFlorian Westphal
Removing the ipset module leaves a small window where one cpu performs module removal while another runs a command like 'ipset flush'. ipset uses net_generic(), unregistering the pernet ops frees this storage area. Fix it by first removing the user-visible api handlers and the pernet ops last. Fixes: 1785e8f473082 ("netfiler: ipset: Add net namespace for ipset") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-26netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addressesJozsef Kadlecsik
Wrong comparison prevented the hash types to add a range with more than 2^31 addresses but reported as a success. Fixes Netfilter's bugzilla id #1005, reported by Oleg Serditov and Oliver Ford. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-26netfilter: xt_socket: Restore mark from full sockets onlySubash Abhinov Kasiviswanathan
An out of bounds error was detected on an ARM64 target with Android based kernel 4.9. This occurs while trying to restore mark on a skb from an inet request socket. BUG: KASAN: slab-out-of-bounds in socket_match.isra.2+0xc8/0x1f0 net/netfilter/xt_socket.c:248 Read of size 4 at addr ffffffc06a8d824c by task syz-fuzzer/1532 CPU: 7 PID: 1532 Comm: syz-fuzzer Tainted: G W O 4.9.41+ #1 Call trace: [<ffffff900808d2f8>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:76 [<ffffff900808d760>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226 [<ffffff90085f7dc8>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff90085f7dc8>] dump_stack+0xe4/0x134 lib/dump_stack.c:51 [<ffffff900830f358>] print_address_description+0x68/0x258 mm/kasan/report.c:248 [<ffffff900830f770>] kasan_report_error mm/kasan/report.c:347 [inline] [<ffffff900830f770>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371 [<ffffff900830fdec>] kasan_report+0x5c/0x70 mm/kasan/report.c:372 [<ffffff900830de98>] check_memory_region_inline mm/kasan/kasan.c:308 [inline] [<ffffff900830de98>] __asan_load4+0x88/0xa0 mm/kasan/kasan.c:740 [<ffffff90097498f8>] socket_match.isra.2+0xc8/0x1f0 net/netfilter/xt_socket.c:248 [<ffffff9009749a5c>] socket_mt4_v1_v2_v3+0x3c/0x48 net/netfilter/xt_socket.c:272 [<ffffff90097f7e4c>] ipt_do_table+0x54c/0xad8 net/ipv4/netfilter/ip_tables.c:311 [<ffffff90097fcf14>] iptable_mangle_hook+0x6c/0x220 net/ipv4/netfilter/iptable_mangle.c:90 ... Allocated by task 1532: save_stack_trace_tsk+0x0/0x2a0 arch/arm64/kernel/stacktrace.c:131 save_stack_trace+0x28/0x38 arch/arm64/kernel/stacktrace.c:215 save_stack mm/kasan/kasan.c:495 [inline] set_track mm/kasan/kasan.c:507 [inline] kasan_kmalloc+0xd8/0x188 mm/kasan/kasan.c:599 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:537 slab_post_alloc_hook mm/slab.h:417 [inline] slab_alloc_node mm/slub.c:2728 [inline] slab_alloc mm/slub.c:2736 [inline] kmem_cache_alloc+0x14c/0x2e8 mm/slub.c:2741 reqsk_alloc include/net/request_sock.h:87 [inline] inet_reqsk_alloc+0x4c/0x238 net/ipv4/tcp_input.c:6236 tcp_conn_request+0x2b0/0xea8 net/ipv4/tcp_input.c:6341 tcp_v4_conn_request+0xe0/0x100 net/ipv4/tcp_ipv4.c:1256 tcp_rcv_state_process+0x384/0x18a8 net/ipv4/tcp_input.c:5926 tcp_v4_do_rcv+0x2f0/0x3e0 net/ipv4/tcp_ipv4.c:1430 tcp_v4_rcv+0x1278/0x1350 net/ipv4/tcp_ipv4.c:1709 ip_local_deliver_finish+0x174/0x3e0 net/ipv4/ip_input.c:216 v1->v2: Change socket_mt6_v1_v2_v3() as well as mentioned by Eric v2->v3: Put the correct fixes tag Fixes: 01555e74bde5 ("netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-26l2tp: fix race condition in l2tp_tunnel_deleteSabrina Dubroca
If we try to delete the same tunnel twice, the first delete operation does a lookup (l2tp_tunnel_get), finds the tunnel, calls l2tp_tunnel_delete, which queues it for deletion by l2tp_tunnel_del_work. The second delete operation also finds the tunnel and calls l2tp_tunnel_delete. If the workqueue has already fired and started running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the same tunnel a second time, and try to free the socket again. Add a dead flag to prevent firing the workqueue twice. Then we can remove the check of queue_work's result that was meant to prevent that race but doesn't. Reproducer: ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000 ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000 ip link set l2tp1 up ip l2tp del tunnel tunnel_id 3000 ip l2tp del tunnel tunnel_id 3000 Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmitAlexey Kodanev
When running LTP IPsec tests, KASan might report: BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti] Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack+0x63/0x89 print_address_description+0x7c/0x290 kasan_report+0x28d/0x370 ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti] __asan_report_load4_noabort+0x19/0x20 vti_tunnel_xmit+0xeee/0xff0 [ip_vti] ? vti_init_net+0x190/0x190 [ip_vti] ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 dev_hard_start_xmit+0x147/0x510 ? icmp_echo.part.24+0x1f0/0x210 __dev_queue_xmit+0x1394/0x1c60 ... Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x70/0xc0 kmem_cache_free+0x81/0x1e0 kfree_skbmem+0xb1/0xe0 kfree_skb+0x75/0x170 kfree_skb_list+0x3e/0x60 __dev_queue_xmit+0x1298/0x1c60 dev_queue_xmit+0x10/0x20 neigh_resolve_output+0x3a8/0x740 ip_finish_output2+0x5c0/0xe70 ip_finish_output+0x4ba/0x680 ip_output+0x1c1/0x3a0 xfrm_output_resume+0xc65/0x13d0 xfrm_output+0x1e4/0x380 xfrm4_output_finish+0x5c/0x70 Can be fixed if we get skb->len before dst_output(). Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code") Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26kcm: Remove redundant unlikely()Tobias Klauser
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26ipv6: Remove redundant unlikely()Tobias Klauser
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26datagram: Remove redundant unlikely()Tobias Klauser
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26netfilter: ipvs: full-functionality option for ECN encapsulation in tunnelVadim Fedorenko
IPVS tunnel mode works as simple tunnel (see RFC 3168) copying ECN field to outer header. That's result in packet drops on egress tunnels in case the egress tunnel operates as ECN-capable with Full-functionality option (like ip_tunnel and ip6_tunnel kernel modules), according to RFC 3168 section 9.1.1 recommendation. This patch implements ECN full-functionality option into ipvs xmit code. Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-25neigh: make strucrt neigh_table::entry_size unsigned intAlexey Dobriyan
Key length can't be negative. Leave comparisons against nla_len() signed just in case truncated attribute can sneak in there. Space savings: add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-7 (-7) function old new delta pneigh_delete 273 272 -1 mlx5e_rep_netevent_event 1415 1414 -1 mlx5e_create_encap_header_ipv6 1194 1193 -1 mlx5e_create_encap_header_ipv4 1071 1070 -1 cxgb4_l2t_get 1104 1103 -1 __pneigh_lookup 69 68 -1 __neigh_create 2452 2451 -1 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: speed up skb_rbtree_purge()Eric Dumazet
As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) Also note that there is not even an increase of text size : $ size net/core/skbuff.o.before net/core/skbuff.o text data bss dec hex filename 40711 1298 0 42009 a419 net/core/skbuff.o.before 40711 1298 0 42009 a419 net/core/skbuff.o From: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25sch_netem: faster rb tree removalEric Dumazet
While running TCP tests involving netem storing millions of packets, I had the idea to speed up tfifo_reset() and did experiments. I tried the rbtree_postorder_for_each_entry_safe() method that is used in skb_rbtree_purge() but discovered it was slower than the current tfifo_reset() method. I measured time taken to release skbs with three occupation levels : 10^4, 10^5 and 10^6 skbs with three methods : 1) (current 'naive' method) while ((p = rb_first(&q->t_root))) { struct sk_buff *skb = netem_rb_to_skb(p); rb_erase(p, &q->t_root); rtnl_kfree_skbs(skb, skb); } 2) Use rb_next() instead of rb_first() in the loop : p = rb_first(&q->t_root); while (p) { struct sk_buff *skb = netem_rb_to_skb(p); p = rb_next(p); rb_erase(&skb->rbnode, &q->t_root); rtnl_kfree_skbs(skb, skb); } 3) "optimized" method using rbtree_postorder_for_each_entry_safe() struct sk_buff *skb, *next; rbtree_postorder_for_each_entry_safe(skb, next, &q->t_root, rbnode) { rtnl_kfree_skbs(skb, skb); } q->t_root = RB_ROOT; Results : method_1:while (rb_first()) rb_erase() 10000 skbs in 690378 ns (69 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 10000 skbs in 541846 ns (54 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 10000 skbs in 868307 ns (86 ns per skb) method_1:while (rb_first()) rb_erase() 99996 skbs in 7804021 ns (78 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 100000 skbs in 5942456 ns (59 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 100000 skbs in 11584940 ns (115 ns per skb) method_1:while (rb_first()) rb_erase() 1000000 skbs in 108577838 ns (108 ns per skb) method_2:rb_first; while (p) { p = rb_next(p); ...} 1000000 skbs in 82619635 ns (82 ns per skb) method_3:rbtree_postorder_for_each_entry_safe() 1000000 skbs in 127328743 ns (127 ns per skb) Method 2) is simply faster, probably because it maintains a smaller working size set. Note that this is the method we use in tcp_ofo_queue() already. I will also change skb_rbtree_purge() in a second patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: dsa: add port enable and disable helpersVivien Didelot
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: dsa: make slave close symmetrical to openVivien Didelot
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25l2tp: fix race between l2tp_session_delete() and l2tp_tunnel_closeall()Guillaume Nault
There are several ways to remove L2TP sessions: * deleting a session explicitly using the netlink interface (with L2TP_CMD_SESSION_DELETE), * deleting the session's parent tunnel (either by closing the tunnel's file descriptor or using the netlink interface), * closing the PPPOL2TP file descriptor of a PPP pseudo-wire. In some cases, when these methods are used concurrently on the same session, the session can be removed twice, leading to use-after-free bugs. This patch adds a 'dead' flag, used by l2tp_session_delete() and l2tp_tunnel_closeall() to prevent them from stepping on each other's toes. The session deletion path used when closing a PPPOL2TP file descriptor doesn't need to be adapted. It already has to ensure that a session remains valid for the lifetime of its PPPOL2TP file descriptor. So it takes an extra reference on the session in the ->session_close() callback (pppol2tp_session_close()), which is eventually dropped in the ->sk_destruct() callback of the PPPOL2TP socket (pppol2tp_session_destruct()). Still, __l2tp_session_unhash() and l2tp_session_queue_purge() can be called twice and even concurrently for a given session, but thanks to proper locking and re-initialisation of list fields, this is not an issue. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25l2tp: ensure sessions are freed after their PPPOL2TP socketGuillaume Nault
If l2tp_tunnel_delete() or l2tp_tunnel_closeall() deletes a session right after pppol2tp_release() orphaned its socket, then the 'sock' variable of the pppol2tp_session_close() callback is NULL. Yet the session is still used by pppol2tp_release(). Therefore we need to take an extra reference in any case, to prevent l2tp_tunnel_delete() or l2tp_tunnel_closeall() from freeing the session. Since the pppol2tp_session_close() callback is only set if the session is associated to a PPPOL2TP socket and that both l2tp_tunnel_delete() and l2tp_tunnel_closeall() hold the PPPOL2TP socket before calling pppol2tp_session_close(), we're sure that pppol2tp_session_close() and pppol2tp_session_destruct() are paired and called in the right order. So the reference taken by the former will be released by the later. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: nfc: llcp_core: use setup_timer() helper.Allen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: nfc: hci: llc_shdlc: use setup_timer() helper.Allen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: nfc: hci: use setup_timer() helper.Allen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25net: af_packet: use setup_timer() helper.Allen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25IB: Correct MR length field to be 64-bitParav Pandit
The ib_mr->length represents the length of the MR in bytes as per the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION). Currently ib_mr->length field is defined as only 32-bits field. This might result into truncation and failed WRs of consumers who registers more than 4GB bytes memory regions and whose WRs accessing such MRs. This patch makes the length 64-bit to avoid such truncation. Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Faisal Latif <faisal.latif@intel.com> Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>