summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-02-02netfilter: nf_tables: Eliminate duplicated code in nf_tables_table_enable()Feng
If something fails in nf_tables_table_enable(), it unregisters the chains. But the rollback code is the same as nf_tables_table_disable() almostly, except there is one counter check. Now create one wrapper function to eliminate the duplicated codes. Signed-off-by: Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix handling of interrupt status in stmmac driver. Just because we have masked the event from generating interrupts, doesn't mean the bit won't still be set in the interrupt status register. From Alexey Brodkin. 2) Fix DMA API debugging splats in gianfar driver, from Arseny Solokha. 3) Fix off-by-one error in __ip6_append_data(), from Vlad Yasevich. 4) cls_flow does not match on icmpv6 codes properly, from Simon Horman. 5) Initial MAC address can be set incorrectly in some scenerios, from Ivan Vecera. 6) Packet header pointer arithmetic fix in ip6_tnl_parse_tlv_end_lim(), from Dan Carpenter. 7) Fix divide by zero in __tcp_select_window(), from Eric Dumazet. 8) Fix crash in iwlwifi when unregistering thermal zone, from Jens Axboe. 9) Check for DMA mapping errors in starfire driver, from Alexey Khoroshilov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) tcp: fix 0 divide in __tcp_select_window() ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() net: fix ndo_features_check/ndo_fix_features comment ordering net/sched: matchall: Fix configuration race be2net: fix initial MAC setting ipv6: fix flow labels when the traffic class is non-0 net: thunderx: avoid dereferencing xcv when NULL net/sched: cls_flower: Correct matching on ICMPv6 code ipv6: Paritially checksum full MTU frames net/mlx4_core: Avoid command timeouts during VF driver device shutdown gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page net: ethtool: add support for 2500BaseT and 5000BaseT link modes can: bcm: fix hrtimer/tasklet termination in bcm op removal net: adaptec: starfire: add checks for dma mapping errors net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause can: Fix kernel panic at security_sock_rcv_skb net: macb: Fix 64 bit addressing support for GEM stmmac: Discard masked flags in interrupt status register net/mlx5e: Check ets capability before ets query FW command net/mlx5e: Fix update of hash function/key via ethtool ...
2017-02-01net/sched: act_psample: Remove unnecessary ASSERT_RTNLYotam Gigi
The ASSERT_RTNL is not necessary in the init function, as it does not touch any rtnl protected structures, as opposed to the mirred action which does have to hold a net device. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net/sched: act_sample: Fix error path in initYotam Gigi
Fix error path of in sample init, by releasing the tc hash in case of failure in psample_group creation. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01tcp: fix 0 divide in __tcp_select_window()Eric Dumazet
syszkaller fuzzer was able to trigger a divide by zero, when TCP window scaling is not enabled. SO_RCVBUF can be used not only to increase sk_rcvbuf, also to decrease it below current receive buffers utilization. If mss is negative or 0, just return a zero TCP window. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()Dan Carpenter
Casting is a high precedence operation but "off" and "i" are in terms of bytes so we need to have some parenthesis here. Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net: ipv6: add NLM_F_APPEND in notifications when applicableDavid Ahern
IPv6 does not set the NLM_F_APPEND flag in notifications to signal that a NEWROUTE is an append versus a new route or a replaced one. Add the flag if the request has it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net: reduce skb_warn_bad_offload() noiseEric Dumazet
Dmitry reported warnings occurring in __skb_gso_segment() [1] All SKB_GSO_DODGY producers can allow user space to feed packets that trigger the current check. We could prevent them from doing so, rejecting packets, but this might add regressions to existing programs. It turns out our SKB_GSO_DODGY handlers properly set up checksum information that is needed anyway when packets needs to be segmented. By checking again skb_needs_check() after skb_mac_gso_segment(), we should remove these pesky warnings, at a very minor cost. With help from Willem de Bruijn [1] WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20 Call Trace: [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline] [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline] [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline] [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline] [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net/sched: matchall: Fix configuration raceYotam Gigi
In the current version, the matchall internal state is split into two structs: cls_matchall_head and cls_matchall_filter. This makes little sense, as matchall instance supports only one filter, and there is no situation where one exists and the other does not. In addition, that led to some races when filter was deleted while packet was processed. Unify that two structs into one, thus simplifying the process of matchall creation and deletion. As a result, the new, delete and get callbacks have a dummy implementation where all the work is done in destroy and change callbacks, as was done in cls_cgroup. Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlinkTheuns Verwoerd
Allow a master interface to be specified as one of the parameters when creating a new interface via rtnl_newlink. Previously this would require invoking interface creation, waiting for it to complete, and then separately binding that new interface to a master. In particular, this is used when creating a macvlan child interface for VRRP in a VRF configuration, allowing the interface creator to specify directly what master interface should be inherited by the child, without having to deal with asynchronous complications and potential race conditions. Signed-off-by: Theuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-02-01 1) Some typo fixes, from Alexander Alemayhu. 2) Don't acquire state lock in get_mtu functions. The only rece against a dead state does not matter. From Florian Westphal. 3) Remove xfrm4_state_fini, it is unused for more than 10 years. From Florian Westphal. 4) Various rcu usage improvements. From Florian Westphal. 5) Properly handle crypto arrors in ah4/ah6. From Gilad Ben-Yossef. 6) Try to avoid skb linearization in esp4 and esp6. 7) The esp trailer is now set up in different places, add a helper for this. 8) With the upcomming usage of gro_cells in IPsec, a gro merged skb can have a secpath. Drop it before freeing or reusing the skb. 9) Add a xfrm dummy network device for napi. With this we can use gro_cells from within xfrm, it allows IPsec GRO without impact on the generic networking code. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31net: ethtool: convert large order kmalloc allocations to vzallocAlexei Starovoitov
under memory pressure 'ethtool -S' command may warn: [ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0 [ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted [ 2374.423071] Call Trace: [ 2374.423076] [<ffffffff8148cb29>] dump_stack+0x4d/0x64 [ 2374.423080] [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150 [ 2374.423082] [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0 [ 2374.423084] [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0 [ 2374.423091] [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core] [ 2374.423095] [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110 [ 2374.423097] [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90 [ 2374.423099] [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0 [ 2374.423101] [<ffffffff811c0084>] __kmalloc+0x204/0x220 [ 2374.423105] [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80 [ 2374.423106] [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80 [ 2374.423108] [<ffffffff816d6926>] dev_ioctl+0x156/0x560 [ 2374.423111] [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0 [ 2374.423117] [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50 [ 2374.423119] [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250 [ 2374.423121] [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580 [ 2374.423123] [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70 [ 2374.423124] [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120 [ 2374.423126] [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90 [ 2374.423127] [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0 [ 2374.423129] [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25 ~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does. Also take care of drivers without counters similar to commit 67ae7cf1eeda ("ethtool: Allow zero-length register dumps again") and reduce warn_on to warn_on_once. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31svcrpc: fix oops in absence of krb5 moduleJ. Bruce Fields
Olga Kornievskaia says: "I ran into this oops in the nfsd (below) (4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try to mount the server with krb5 where the server doesn't have the rpcsec_gss_krb5 module built." The problem is that rsci.cred is copied from a svc_cred structure that gss_proxy didn't properly initialize. Fix that. [120408.542387] general protection fault: 0000 [#1] SMP ... [120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16 [120408.567037] Hardware name: VMware, Inc. VMware Virtual = Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000 [120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss] ... [120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss] [120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss] [120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss] [120408.588257] ? __enqueue_entity+0x6c/0x70 [120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss] [120408.590212] ? try_to_wake_up+0x4a/0x360 [120408.591036] ? wake_up_process+0x15/0x20 [120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc] [120408.593177] svc_authenticate+0xe1/0x100 [sunrpc] [120408.594168] svc_process_common+0x203/0x710 [sunrpc] [120408.595220] svc_process+0x105/0x1c0 [sunrpc] [120408.596278] nfsd+0xe9/0x160 [nfsd] [120408.597060] kthread+0x101/0x140 [120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd] [120408.598626] ? kthread_park+0x90/0x90 [120408.599448] ret_from_fork+0x22/0x30 Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth" Cc: stable@vger.kernel.org Cc: Simo Sorce <simo@redhat.com> Reported-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-30net/sched: cls_flower: Correct matching on ICMPv6 codeSimon Horman
When matching on the ICMPv6 code ICMPV6_CODE rather than ICMPV4_CODE attributes should be used. This corrects what appears to be a typo. Sample usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Without this change the code parameter above is effectively ignored. Fixes: 7b684884fbfa ("net/sched: cls_flower: Support matching on ICMP type and code") Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30Merge tag 'linux-can-fixes-for-4.10-20170130' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-01-30 this is a pull request of one patch. The patch is by Oliver Hartkopp and fixes the hrtimer/tasklet termination in bcm op removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30smc: some potential use after free bugsDan Carpenter
Say we got really unlucky and these failed on the last iteration, then it could lead to a use after free bug. Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: dsa: Add plumbing for port mirroringFlorian Fainelli
Add necessary plumbing at the slave network device level to have switch drivers implement ndo_setup_tc() and most particularly the cls_matchall classifier. We add support for two switch operations: port_add_mirror and port_del_mirror() which configure, on a per-port basis the mirror parameters requested from the cls_matchall classifier. Code is largely borrowed from the Mellanox Spectrum switch driver. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30ipv6: Paritially checksum full MTU framesVlad Yasevich
IPv6 will mark data that is smaller that mtu - headersize as CHECKSUM_PARTIAL, but if the data will completely fill the mtu, the packet checksum will be computed in software instead. Extend the conditional to include the data that fills the mtu as well. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30lwtunnel: remove device arg to lwtunnel_build_stateDavid Ahern
Nothing about lwt state requires a device reference, so remove the input argument. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: Avoid receiving packets with an l3mdev on unbound UDP socketsRobert Shearman
Packets arriving in a VRF currently are delivered to UDP sockets that aren't bound to any interface. TCP defaults to not delivering packets arriving in a VRF to unbound sockets. IP route lookup and socket transmit both assume that unbound means using the default table and UDP applications that haven't been changed to be aware of VRFs may not function correctly in this case since they may not be able to handle overlapping IP address ranges, or be able to send packets back to the original sender if required. So add a sysctl, udp_l3mdev_accept, to control this behaviour with it being analgous to the existing tcp_l3mdev_accept, namely to allow a process to have a VRF-global listen socket. Have this default to off as this is the behaviour that users will expect, given that there is no explicit mechanism to set unmodified VRF-unaware application into a default VRF. Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: dsa: Hook {get,set}_rxnfc ethtool operationsFlorian Fainelli
In preparation for adding support for CFP/TCAMP in the bcm_sf2 driver add the plumbing to call into driver specific {get,set}_rxnfc operations. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30can: bcm: fix hrtimer/tasklet termination in bcm op removalOliver Hartkopp
When removing a bcm tx operation either a hrtimer or a tasklet might run. As the hrtimer triggers its associated tasklet and vice versa we need to take care to mutually terminate both handlers. Reported-by: Michael Josenhans <michael.josenhans@web.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Michael Josenhans <michael.josenhans@web.de> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-01-30xfrm: Add a dummy network device for napi.Steffen Klassert
This patch adds a dummy network device so that we can use gro_cells for IPsec GRO. With this, we handle IPsec GRO with no impact on the generic networking code. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30net: Drop secpath on free after gro merge.Steffen Klassert
With a followup patch, a gro merged skb can have a secpath. So drop it before freeing or reusing the skb. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-29net: add devm version of alloc_etherdev_mqs functionRafał Miłecki
This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev macro. These can be used for simpler netdev allocation without having to care about calling free_netdev. Thanks to this change drivers, their error paths and removal paths may get simpler by a bit. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29Merge tag 'batadv-next-for-davem-20170128' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are two fixes for batman-adv for net-next: - fix double call of dev_queue_xmit(), caused by the recent introduction of net_xmit_eval(), by Sven Eckelmann - Fix includes for IS_ERR/ERR_PTR, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29tcp: include locally failed retries in retransmission statsYuchung Cheng
Currently the retransmission stats are not incremented if the retransmit fails locally. But we always increment the other packet counters that track total packet/bytes sent. Awkwardly while we don't count these failed retransmits in RETRANSSEGS, we do count them in FAILEDRETRANS. If the qdisc is dropping many packets this could under-estimate TCP retransmission rate substantially from both SNMP or per-socket TCP_INFO stats. This patch changes this by always incrementing retransmission stats on retransmission attempts and failures. Another motivation is to properly track retransmists in SCM_TIMESTAMPING_OPT_STATS. Since SCM_TSTAMP_SCHED collection is triggered in tcp_transmit_skb(), If tp->total_retrans is incremented after the function, we'll always mis-count by the amount of the latest retransmission. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29tcp: record pkts sent and retransmisttedYuchung Cheng
Add two stats in SCM_TIMESTAMPING_OPT_STATS: TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission TCP_NLA_TOTAL_RETRANS: total data packets retransmitted The names are picked to be consistent with corresponding fields in TCP_INFO. This allows applications that are using the timestamping API to measure latency stats to also retrive retransmission rate of application write. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29openvswitch: Simplify do_execute_actions().andy zhou
do_execute_actions() implements a worthwhile optimization: in case an output action is the last action in an action list, skb_clone() can be avoided by outputing the current skb. However, the implementation is more complicated than necessary. This patch simplify this logic. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: pass bridge device when a port leavesVivien Didelot
Upon reception of the NETDEV_CHANGEUPPER, a leaving port is already unbridged, so reflect this by assigning the port's bridge_dev pointer to NULL before calling the port_bridge_leave DSA driver operation. Now that the bridge_dev pointer is exposed to the drivers, reflecting the current state of the DSA switch fabric is necessary for the drivers to adjust their port based VLANs correctly. Pass the bridge device pointer to the port_bridge_leave operation so that drivers have all information to re-program their chips properly, and do not need to cache it anymore. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: move bridge device in dsa_portVivien Didelot
Move the bridge_dev pointer from dsa_slave_priv to dsa_port so that DSA drivers can access this information and remove the need to cache it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: store a dsa_port in dsa_slave_privVivien Didelot
Store a pointer to the dsa_port structure in the dsa_slave_priv structure, instead of the switch/port index. This will allow to store more information such as the bridge device, needed in DSA drivers for multi-chip configuration. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: add ds and index to dsa_portVivien Didelot
Add the physical switch instance and port index a DSA port belongs to to the dsa_port structure. That can be used later to retrieve information about a physical port when configuring a switch fabric, or lighten up struct dsa_slave_priv. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: use ds->num_ports when possibleVivien Didelot
The dsa_switch structure contains the number of ports. Use it where the structure is valid instead of the DSA_MAX_PORTS value. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: variable number of portsVivien Didelot
Change the ports[DSA_MAX_PORTS] array of the dsa_switch structure for a zero-length array, allocated at the same time as the dsa_switch structure itself. A dsa_switch_alloc() helper is provided for that. This commit brings no functional change yet since we pass DSA_MAX_PORTS as the number of ports for the moment. Future patches can update the DSA drivers separately to support dynamic number of ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29can: Fix kernel panic at security_sock_rcv_skbEric Dumazet
Zhang Yanmin reported crashes [1] and provided a patch adding a synchronize_rcu() call in can_rx_unregister() The main problem seems that the sockets themselves are not RCU protected. If CAN uses RCU for delivery, then sockets should be freed only after one RCU grace period. Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's ease stable backports with the following fix instead. [1] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0 Call Trace: <IRQ> [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60 [<ffffffff81d55771>] sk_filter+0x41/0x210 [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0 [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0 [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370 [<ffffffff81f07af9>] can_receive+0xd9/0x120 [<ffffffff81f07beb>] can_rcv+0xab/0x100 [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0 [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0 [<ffffffff81d37f67>] process_backlog+0x127/0x280 [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0 [<ffffffff810c88d4>] __do_softirq+0x184/0x440 [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40 [<ffffffff810c8bed>] do_softirq+0x1d/0x20 [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110 [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520 [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230 [<ffffffff810e3baf>] process_one_work+0x24f/0x670 [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0 [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480 [<ffffffff810ebafc>] kthread+0x12c/0x150 [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70 Reported-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Stable patches: - NFSv4.1: Fix a deadlock in layoutget - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode - Fix a memory leak when removing the SUNRPC module Bugfixes: - Fix a reference leak in _pnfs_return_layout" * tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS: Fix a reference leak in _pnfs_return_layout nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED" SUNRPC: cleanup ida information when removing sunrpc module NFSv4.0: always send mode in SETATTR after EXCLUSIVE4 nfs: Don't increment lock sequence ID after NFS4ERR_MOVED NFSv4.1: Fix a deadlock in layoutget
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28batman-adv: Fix includes for IS_ERR/ERR_PTRSven Eckelmann
IS_ERR/ERR_PTR are not defined in linux/device.h but in linux/err.h. The files using these macros therefore have to include the correct one. Reported-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-28batman-adv: Fix double call of dev_queue_xmitSven Eckelmann
The net_xmit_eval has side effects because it is not making sure that e isn't evaluated twice. #define net_xmit_eval(e) ((e) == NET_XMIT_CN ? 0 : (e)) The code requested by David Miller [1] return net_xmit_eval(dev_queue_xmit(skb)); will get transformed into return ((dev_queue_xmit(skb)) == NET_XMIT_CN ? 0 : (dev_queue_xmit(skb))) dev_queue_xmit will therefore be tried again (with an already consumed skb) whenever the return code is not NET_XMIT_CN. [1] https://lkml.kernel.org/r/20170125.225624.965229145391320056.davem@davemloft.net Fixes: c33705188c49 ("batman-adv: Treat NET_XMIT_CN as transmit successfully") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP DF on transmit). 2) Netfilter needs to reflect the fwmark when sending resets, from Pau Espin Pedrol. 3) nftable dump OOPS fix from Liping Zhang. 4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit, from Rolf Neugebauer. 5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd Bergmann. 6) Fix regression in handling of NETIF_F_SG in harmonize_features(), from Eric Dumazet. 7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern. 8) tcp_fastopen_create_child() needs to setup tp->max_window, from Alexey Kodanev. 9) Missing kmemdup() failure check in ipv6 segment routing code, from Eric Dumazet. 10) Don't execute unix_bind() under the bindlock, otherwise we deadlock with splice. From WANG Cong. 11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer, therefore callers must reload cached header pointers into that skb. Fix from Eric Dumazet. 12) Fix various bugs in legacy IRQ fallback handling in alx driver, from Tobias Regnery. 13) Do not allow lwtunnel drivers to be unloaded while they are referenced by active instances, from Robert Shearman. 14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven. 15) Fix a few regressions from virtio_net XDP support, from John Fastabend and Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits) ISDN: eicon: silence misleading array-bounds warning net: phy: micrel: add support for KSZ8795 gtp: fix cross netns recv on gtp socket gtp: clear DF bit on GTP packet tx gtp: add genl family modules alias tcp: don't annotate mark on control socket from tcp_v6_send_response() ravb: unmap descriptors when freeing rings virtio_net: reject XDP programs using header adjustment virtio_net: use dev_kfree_skb for small buffer XDP receive r8152: check rx after napi is enabled r8152: re-schedule napi for tx r8152: avoid start_xmit to schedule napi when napi is disabled r8152: avoid start_xmit to call napi_schedule during autosuspend net: dsa: Bring back device detaching in dsa_slave_suspend() net: phy: leds: Fix truncated LED trigger names net: phy: leds: Break dependency of phy.h on phy_led_triggers.h net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash net-next: ethernet: mediatek: change the compatible string Documentation: devicetree: change the mediatek ethernet compatible string bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status(). ...
2017-01-27net: adjust skb->truesize in pskb_expand_head()Eric Dumazet
Slava Shwartsman reported a warning in skb_try_coalesce(), when we detect skb->truesize is completely wrong. In his case, issue came from IPv6 reassembly coping with malicious datagrams, that forced various pskb_may_pull() to reallocate a bigger skb->head than the one allocated by NIC driver before entering GRO layer. Current code does not change skb->truesize, leaving this burden to callers if they care enough. Blindly changing skb->truesize in pskb_expand_head() is not easy, as some producers might track skb->truesize, for example in xmit path for back pressure feedback (sk->sk_wmem_alloc) We can detect the cases where it should be safe to change skb->truesize : 1) skb is not attached to a socket. 2) If it is attached to a socket, destructor is sock_edemux() My audit gave only two callers doing their own skb->truesize manipulation. I had to remove skb parameter in sock_edemux macro when CONFIG_INET is not set to avoid a compile error. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27tcp: don't annotate mark on control socket from tcp_v6_send_response()Pablo Neira
Unlike ipv4, this control socket is shared by all cpus so we cannot use it as scratchpad area to annotate the mark that we pass to ip6_xmit(). Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket family caches the flowi6 structure in the sctp_transport structure, so we cannot use to carry the mark unless we later on reset it back, which I discarded since it looks ugly to me. Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net/ipv6: support more tunnel interfaces for EUI64 link-local generationFelix Jia
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net/ipv6: allow sysctl to change link-local address generation modeFelix Jia
The address generation mode for IPv6 link-local can only be configured by netlink messages. This patch adds the ability to change the address generation mode via sysctl. v1 -> v2 Removed the rtnl lock and switch to use RCU lock to iterate through the netdev list. v2 -> v3 Removed the addrgenmode variable from the idev structure and use the systcl storage for the flag. Simplifed the logic for sysctl handling by removing the supported for all operation. Added support for more types of tunnel interfaces for link-local address generation. Based the patches from net-next. v3 -> v4 Removed unnecessary whitespace changes. Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: ignore null_entry on route dumpsDavid Ahern
lkp-robot reported a BUG: [ 10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198 [ 10.152525] IP: rt6_fill_node+0x164/0x4b8 [ 10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000 [ 10.153309] [ 10.154492] Oops: 0000 [#1] [ 10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10 [ 10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 10.158254] task: d0deb000 task.stack: d0e0c000 [ 10.159059] EIP: rt6_fill_node+0x164/0x4b8 [ 10.159780] EFLAGS: 00010296 CPU: 0 [ 10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44 [ 10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4 [ 10.162534] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0 [ 10.164535] Call Trace: [ 10.164993] ? paravirt_sched_clock+0x9/0xd [ 10.165727] ? sched_clock+0x9/0xc [ 10.166329] ? sched_clock_cpu+0x19/0xe9 [ 10.166991] ? lock_release+0x13e/0x36c [ 10.167652] rt6_dump_route+0x4c/0x56 [ 10.168276] fib6_dump_node+0x1d/0x3d [ 10.168913] fib6_walk_continue+0xab/0x167 [ 10.169611] fib6_walk+0x2a/0x40 [ 10.170182] inet6_dump_fib+0xfb/0x1e0 [ 10.170855] netlink_dump+0xcd/0x21f This happens when the loopback device is set down and a ipv6 fib route dump is requested. ip6_null_entry is the root of all ipv6 fib tables making it integrated into the table and hence passed to the ipv6 route dump code. The null_entry route uses the loopback device for dst.dev but may not have rt6i_idev set because of the order in which initializations are done -- ip6_route_net_init is run before addrconf_init has initialized the loopback device. Fixing the initialization order is a much bigger problem with no obvious solution thus far. The BUG is triggered when the loopback is set down and the netif_running check added by a1a22c1206 fails. The fill_node descends to checking rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is NULL it faults. The null_entry route should not be processed in a dump request. Catch and ignore. This check is done in rt6_dump_route as it is the highest place in the callchain with knowledge of both the route and the network namespace. Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: remove skb_reserve in getrouteDavid Ahern
Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The allocated skb is not passed through the routing engine (like it is for IPv4) and has not since the beginning of git time. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Move ports assignment closer to error checkingFlorian Fainelli
Move the assignment of ports in _dsa_register_switch() closer to where it is checked, no functional change. Re-order declarations to be preserve the inverted christmas tree style. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Suffix function manipulating device_node with _dnFlorian Fainelli
Make it clear that these functions take a device_node structure pointer Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Make most functions take a dsa_port argumentFlorian Fainelli
In preparation for allowing platform data, and therefore no valid device_node pointer, make most DSA functions takes a pointer to a dsa_port structure whenever possible. While at it, introduce a dsa_port_is_valid() helper function which checks whether port->dn is NULL or not at the moment. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>