summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-04-24batman-adv: Check skb size before using encapsulated ETH+VLAN headerSven Eckelmann
The encapsulated ethernet and VLAN header may be outside the received ethernet frame. Thus the skb buffer size has to be checked before it can be parsed to find out if it encapsulates another batman-adv packet. Fixes: 420193573f11 ("batman-adv: softif bridge loop avoidance") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, mostly from Florian Westphal to sort out the lack of sufficient validation in x_tables and connlabel preparation patches to add nf_tables support. They are: 1) Ensure we don't go over the ruleset blob boundaries in mark_source_chains(). 2) Validate that target jumps land on an existing xt_entry. This extra sanitization comes with a performance penalty when loading the ruleset. 3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables. 4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables. 5) Make sure the minimal possible target size in x_tables. 6) Similar to #3, add xt_compat_check_entry_offsets() for compat code. 7) Check that standard target size is valid. 8) More sanitization to ensure that the target_offset field is correct. 9) Add xt_check_entry_match() to validate that matches are well-formed. 10-12) Three patch to reduce the number of parameters in translate_compat_table() for {arp,ip,ip6}tables by using a container structure. 13) No need to return value from xt_compat_match_from_user(), so make it void. 14) Consolidate translate_table() so it can be used by compat code too. 15) Remove obsolete check for compat code, so we keep consistent with what was already removed in the native layout code (back in 2007). 16) Get rid of target jump validation from mark_source_chains(), obsoleted by #2. 17) Introduce xt_copy_counters_from_user() to consolidate counter copying, and use it from {arp,ip,ip6}tables. 18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump functions. 19) Move nf_connlabel_match() to xt_connlabel. 20) Skip event notification if connlabel did not change. 21) Update of nf_connlabels_get() to make the upcoming nft connlabel support easier. 23) Remove spinlock to read protocol state field in conntrack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23xfrm: align nlattr properly when neededNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23libnl: nla_put_msecs(): align on a 64-bit areaNicolas Dichtel
nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23libnl: nla_put_be64(): align on a 64-bit areaNicolas Dichtel
nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23libnl: nla_put_le64(): align on a 64-bit areaNicolas Dichtel
nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in iwlwifi, from Matti Gottlieb. 2) Add missing registration of netfilter arp_tables into initial namespace, from Florian Westphal. 3) Fix potential NULL deref in DecNET routing code. 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry Ivanov. 5) Fix dst ref counting in VRF, from David Ahern. 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck. 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause. 8) Ravalidate IPV6 datagram socket cached routes properly, particularly with UDP, from Martin KaFai Lau. 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang. 10) Fix stats typing in bcmgenet driver, from Eric Dumazet. 11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing, from Joe Stringer. 12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown. 13) atl2 doesn't actually support scatter-gather, so don't advertise the feature. From Ben Hucthings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits) openvswitch: use flow protocol when recalculating ipv6 checksums Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets atl2: Disable unimplemented scatter/gather feature net/mlx4_en: Split SW RX dropped counter per RX ring net/mlx4_core: Don't allow to VF change global pause settings net/mlx4_core: Avoid repeated calls to pci enable/disable net/mlx4_core: Implement pci_resume callback net: phy: spi_ks8895: Don't leak references to SPI devices net: ethernet: davinci_emac: Fix platform_data overwrite net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable qede: Fix single MTU sized packet from firmware GRO flow qede: Fix setting Skb network header qede: Fix various memory allocation error flows for fastpath tcp: Merge tx_flags and tskey in tcp_shifted_skb tcp: Merge tx_flags and tskey in tcp_collapse_retrans drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks openvswitch: Orphan skbs before IPv6 defrag Revert "Prevent NUll pointer dereference with two PHYs on cpsw" VSOCK: Only check error on skb_recv_datagram when skb is NULL ...
2016-04-21openvswitch: use flow protocol when recalculating ipv6 checksumsSimon Horman
When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the flow key rather than that in the set field action. Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.") Cc: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: Add support for IP ID mangling TSO in cases that require encapsulationAlexander Duyck
This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports NETIF_F_TSO. This way if needed a device can then later enable the TSO with IP ID mangling and the tunnels on top of that device can then also make use of the IP ID mangling as well. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Merge tx_flags and tskey in tcp_shifted_skbMartin KaFai Lau
After receiving sacks, tcp_shifted_skb() will collapse skbs if possible. tx_flags and tskey also have to be merged. This patch reuses the tcp_skb_collapse_tstamp() to handle them. BPF Output Before: ~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~ <...>-2024 [007] d.s. 88.644374: : ee_data:14599 Packetdrill Script: ~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Merge tx_flags and tskey in tcp_collapse_retransMartin KaFai Lau
If two skbs are merged/collapsed during retransmission, the current logic does not merge the tx_flags and tskey. The end result is the SCM_TSTAMP_ACK timestamp could be missing for a packet. The patch: 1. Merge the tx_flags 2. Overwrite the prev_skb's tskey with the next_skb's tskey BPF Output Before: ~~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~~ packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459 Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 write(4, ..., 11680) = 11680 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 13141 win 257 0.400 close(4) = 0 0.400 > F. 13141:13141(0) ack 1 0.500 < F. 1:1(0) ack 13142 win 257 0.500 > . 13142:13142(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21ip6mr: align RTA_MFC_STATS on 64-bitNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21ipmr: align RTA_MFC_STATS on 64-bitNicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21rtnl: use the new API to align IFLA_STATS*Nicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21NLA_BINARY misuse bug in HSRPeter Heise
Removed .type field from NLA to do proper length checking. Reported by Daniel Borkmann and Julia Lawall. Signed-off-by: Peter Heise <peter.heise@airbus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diagXin Long
EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates back to before jiffies_to_msecs() has been introduced. Now we can remove it and use jiffies_to_msecs(). Suggested-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acksMartin KaFai Lau
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received, it could incorrectly think that a skb has already been acked and queue a SCM_TSTAMP_ACK cmsg to the sk->sk_error_queue. In tcp_ack_tstamp(), it checks 'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'. If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill script, between() returns true but the tskey is actually not acked. e.g. try between(3, 2, 1). The fix is to replace between() with one before() and one !before(). By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be removed. A packetdrill script is used to reproduce the dup ack scenario. Due to the lacking cmsg support in packetdrill (may be I cannot find it), a BPF prog is used to kprobe to sock_queue_err_skb() and print out the value of serr->ee.ee_data. Both the packetdrill and the bcc BPF script is attached at the end of this commit message. BPF Output Before Fix: ~~~~~~ <...>-2056 [001] d.s. 433.927987: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.929563: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.930765: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 434.028177: : ee_data:1459 packetdrill-2056 [001] d.s. 434.029686: : ee_data:14599 BPF Output After Fix: ~~~~~~ <...>-2049 [000] d.s. 113.517039: : ee_data:1459 <...>-2049 [000] d.s. 113.517253: : ee_data:14599 BCC BPF Script: ~~~~~~ #!/usr/bin/env python from __future__ import print_function from bcc import BPF bpf_text = """ #include <uapi/linux/ptrace.h> #include <net/sock.h> #include <bcc/proto.h> #include <linux/errqueue.h> #ifdef memset #undef memset #endif int trace_err_skb(struct pt_regs *ctx) { struct sk_buff *skb = (struct sk_buff *)ctx->si; struct sock *sk = (struct sock *)ctx->di; struct sock_exterr_skb *serr; u32 ee_data = 0; if (!sk || !skb) return 0; serr = SKB_EXT_ERR(skb); bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data); bpf_trace_printk("ee_data:%u\\n", ee_data); return 0; }; """ b = BPF(text=bpf_text) b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb") print("Attached to kprobe") b.trace_print() Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 1460) = 1460 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: dsa: remove tag_protocol from dsa_switchVivien Didelot
Having the tag protocol in dsa_switch_driver for setup time and in dsa_switch_tree for runtime is enough. Remove dsa_switch's one. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21openvswitch: Orphan skbs before IPv6 defragJoe Stringer
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()"). Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be cloned (implicitly orphaning) prior to queueing for reassembly. As such, when the IPv6 message is eventually reassembled, the skb->sk for all fragments would be NULL. After that commit was introduced, rather than cloning, the original skbs were queued directly without orphaning. The end result is that all frags except for the first and last may have a socket attached. This commit explicitly orphans such skbs during nf_ct_frag6_gather() to prevent BUG_ON(skb->sk) during a later call to ip6_fragment(). kernel BUG at net/ipv6/ip6_output.c:631! [...] Call Trace: <IRQ> [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch] [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70 [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch] [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20 [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80 [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch] [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch] [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch] [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch] [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch] [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch] [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch] [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660 [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0 [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950 [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff81690990>] dev_queue_xmit+0x10/0x20 [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220 [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0 [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0 [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0 [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500 [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580 [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690 [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690 [<ffffffff817567a0>] ip6_input+0x30/0xa0 [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0 [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0 [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0 [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0 [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0 [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230 [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70 [<ffffffff8168c088>] process_backlog+0x78/0x230 [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230 [<ffffffff8168db43>] net_rx_action+0x203/0x480 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817c156e>] __do_softirq+0xde/0x49f [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0 [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40 [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0 [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0 [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30 [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0 [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8166d078>] sock_sendmsg+0x38/0x50 [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170 [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d [<ffffffff8166e38e>] SyS_sendto+0xe/0x10 [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8 Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8# RIP [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50 RSP <ffff880072803120> Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20rtnetlink: add new RTM_GETSTATS message to dump link statsRoopa Prabhu
This patch adds a new RTM_GETSTATS message to query link stats via netlink from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK returns a lot more than just stats and is expensive in some cases when frequent polling for stats from userspace is a common operation. RTM_GETSTATS is an attempt to provide a light weight netlink message to explicity query only link stats from the kernel on an interface. The idea is to also keep it extensible so that new kinds of stats can be added to it in the future. This patch adds the following attribute for NETDEV stats: struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = { [IFLA_STATS_LINK_64] = { .len = sizeof(struct rtnl_link_stats64) }, }; Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of a single interface or all interfaces with NLM_F_DUMP. Future possible new types of stat attributes: link af stats: - IFLA_STATS_LINK_IPV6 (nested. for ipv6 stats) - IFLA_STATS_LINK_MPLS (nested. for mpls/mdev stats) extended stats: - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge, vlan, vxlan etc) - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are available via ethtool today) This patch also declares a filter mask for all stat attributes. User has to provide a mask of stats attributes to query. filter mask can be specified in the new hdr 'struct if_stats_msg' for stats messages. Other important field in the header is the ifindex. This api can also include attributes for global stats (eg tcp) in the future. When global stats are included in a stats msg, the ifindex in the header must be zero. A single stats message cannot contain both global and netdev specific stats. To easily distinguish them, netdev specific stat attributes name are prefixed with IFLA_STATS_LINK_ Without any attributes in the filter_mask, no stats will be returned. This patch has been tested with mofified iproute2 ifstat. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20ipvs: don't alter conntrack in OPS modeMarco Angaroni
When using OPS mode in conjunction with SIP persistent-engine, packets originating from the same ip-address/port could be balanced to different real servers, and (to properly handle SIP responses) OPS connections are created in the in-out direction too, where ip_vs_update_conntrack() is called to modify the reply tuple. As a result, there can be collision of conntrack tuples, causing random packet drops, as explained below: conntrack1: orig=CIP->VIP, reply=RIP1->CIP conntrack2: orig=RIP2->CIP, reply=CIP->VIP Tuple CIP->VIP is both in orig of conntrack1 and reply of conntrack2. The collision triggers packet drop inside nf_conntrack processing. In addition, the current implementation deletes the conntrack object at every expire of an OPS connection (once every forwarded packet), to have it recreated from scratch at next packet traversing IPVS. Since in OPS mode, by definition, we don't expect any associated response, the choices implemented in this patch are: a) don't call nf_conntrack_alter_reply() for OPS connections inside ip_vs_update_conntrack(). b) don't delete the conntrack object at OPS connection expire. The result is that created conntrack objects for each tuple CIP->VIP, RIP-N->CIP, etc. are left in UNREPLIED state and not modified by IPVS OPS connection management. This eliminates packet drops and leaves a single conntrack object for each tuple packets are sent from. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20ipvs: optimize release of connections in OPS modeMarco Angaroni
One-packet-scheduling is the most expensive mode in IPVS from performance point of view: for each packet to be processed a new connection data structure is created and, after packet is sent, deleted by starting a new timer set to expire immediately. SIP persistent-engine needs OPS mode to have Call-ID based load balancing, so OPS mode performance has negative impact in SIP protocol load balancing. This patch aims to improve performance of OPS mode by means of the following changes in the release mechanism of OPS connections: a) call expire callback ip_vs_conn_expire() directly instead of starting a timer programmed to fire immediately. b) avoid call_rcu() overhead inside expire callback, since OPS connection are not inserted in the hash-table and last just the time to process the packet, hence there is no concurrent access to such data structures. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20ipvs: handle connections started by real-serversMarco Angaroni
When using LVS-NAT and SIP persistence-egine over UDP, the following limitations are present with current implementation: 1) To actually have load-balancing based on Call-ID header, you need to use one-packet-scheduling mode. But with one-packet-scheduling the connection is deleted just after packet is forwarded, so SIP responses coming from real-servers do not match any connection and SNAT is not applied. 2) If you do not use "-o" option, IPVS behaves as normal UDP load balancer, so different SIP calls (each one identified by a different Call-ID) coming from the same ip-address/port go to the same real-server. So basically you don’t have load-balancing based on Call-ID as intended. 3) Call-ID is not learned when a new SIP call is started by a real-server (inside-to-outside direction), but only in the outside-to-inside direction. This would be a general problem for all SIP servers acting as Back2BackUserAgent. This patch aims to solve problems 1) and 3) while keeping OPS mode mandatory for SIP-UDP, so that 2) is not a problem anymore. The basic mechanism implemented is to make packets, that do not match any existent connection but come from real-servers, create new connections instead of let them pass without any effect. When such packets pass through ip_vs_out(), if their source ip address and source port match a configured real-server, a new connection is automatically created in the same way as it would have happened if the packet had come from outside-to-inside direction. A new connection template is created too if the virtual-service is persistent and there is no matching connection template found. The new connection automatically created, if the service had "-o" option, is an OPS connection that lasts only the time to forward the packet, just like it happens on the ingress side. The main part of this mechanism is implemented inside a persistent-engine specific callback (at the moment only SIP persistent engine exists) and is triggered only for UDP packets, since connection oriented protocols, by using different set of ports (typically ephemeral ports) to open new outgoing connections, should not need this feature. The following requisites are needed for automatic connection creation; if any is missing the packet simply goes the same way as before. a) virtual-service is not fwmark based (this is because fwmark services do not store address and port of the virtual-service, required to build the connection data). b) virtual-service and real-servers must not have been configured with omitted port (this is again to have all data to create the connection). Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-19VSOCK: Only check error on skb_recv_datagram when skb is NULLJorgen Hansen
If skb_recv_datagram returns an skb, we should ignore the err value returned. Otherwise, datagram receives will return EAGAIN when they have to wait for a datagram. Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net: dsa: kill circular reference with slave privVivien Didelot
The dsa_slave_priv structure does not need a pointer to its net_device. Kill it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19bpf: add event output helper for notifications/sampling/loggingDaniel Borkmann
This patch adds a new helper for cls/act programs that can push events to user space applications. For networking, this can be f.e. for sampling, debugging, logging purposes or pushing of arbitrary wake-up events. The idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example"). The eBPF program utilizes a perf event array map that user space populates with fds from perf_event_open(), the eBPF program calls into the helper f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw)) so that the raw data is pushed into the fd f.e. at the map index of the current CPU. User space can poll/mmap/etc on this and has a data channel for receiving events that can be post-processed. The nice thing is that since the eBPF program and user space application making use of it are tightly coupled, they can define their own arbitrary raw data format and what/when they want to push. While f.e. packet headers could be one part of the meta data that is being pushed, this is not a substitute for things like packet sockets as whole packet is not being pushed and push is only done in a single direction. Intention is more of a generically usable, efficient event pipe to applications. Workflow is that tc can pin the map and applications can attach themselves e.g. after cls/act setup to one or multiple map slots, demuxing is done by the eBPF program. Adding this facility is with minimal effort, it reuses the helper introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and we get its functionality for free by overloading its BPF_FUNC_ identifier for cls/act programs, ctx is currently unused, but will be made use of in future. Example will be added to iproute2's BPF example files. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net/ipv6/addrconf: fix sysctl table indentationKonstantin Khlebnikov
Separated from previous patch for readability. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net/ipv6/addrconf: simplify sysctl registrationKonstantin Khlebnikov
Struct ctl_table_header holds pointer to sysctl table which could be used for freeing it after unregistration. IPv4 sysctls already use that. Remove redundant NULL assignment: ndev allocated using kzalloc. This also saves some bytes: sysctl table could be shorter than DEVCONF_MAX+1 if some options are disable in config. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net: Add helpers for 64-bit aligning netlink attributes.David S. Miller
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net: Align IFLA_STATS64 attributes properly on architectures that need it.David S. Miller
Since the nlattr header is 4 bytes in size, it can cause the netlink attribute payload to not be 8-byte aligned. This is particularly troublesome for IFLA_STATS64 which contains 64-bit statistic values. Solve this by creating a dummy IFLA_PAD attribute which has a payload which is zero bytes in size. When HAVE_EFFICIENT_UNALIGNED_ACCESS is false, we insert an IFLA_PAD attribute into the netlink response when necessary such that the IFLA_STATS64 payload will be properly aligned. With help and suggestions from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19netfilter: conntrack: don't acquire lock during seq_printfFlorian Westphal
read access doesn't need any lock here. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18netfilter: ctnetlink: restore inlining for netlink message size calculationPablo Neira Ayuso
Calm down gcc warnings: net/netfilter/nf_conntrack_netlink.c:529:15: warning: 'ctnetlink_proto_size' defined but not used [-Wunused-function] static size_t ctnetlink_proto_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:546:15: warning: 'ctnetlink_acct_size' defined but not used [-Wunused-function] static size_t ctnetlink_acct_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:556:12: warning: 'ctnetlink_secctx_size' defined but not used [-Wunused-function] static int ctnetlink_secctx_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:572:15: warning: 'ctnetlink_timestamp_size' defined but not used [-Wunused-function] static size_t ctnetlink_timestamp_size(const struct nf_conn *ct) ^ So gcc compiles them out when CONFIG_NF_CONNTRACK_EVENTS and CONFIG_NETFILTER_NETLINK_GLUE_CT are not set. Fixes: 4054ff45454a9a4 ("netfilter: ctnetlink: remove unnecessary inlining") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arnd Bergmann <arnd@arndb.de>
2016-04-18net: ethtool: export conversion function between u32 and link modePhilippe Reynes
The function convert_legacy_u32_to_link_mode and convert_link_mode_to_legacy_u32 may be used outside of ethtool.c. We rename them to ethtool_convert_... and export them, so we could use them in others drivers and modules. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18netfilter: connlabels: change nf_connlabels_get bit arg to 'highest used'Florian Westphal
nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18netfilter: labels: don't emit ct event if labels were not changedFlorian Westphal
make the replace function only send a ctnetlink event if the contents of the new set is different. Otherwise 'ct label set ct label | bar' will cause netlink event storm since we "replace" labels for each packet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18netfilter: connlabels: move helpers to xt_connlabelFlorian Westphal
Currently labels can only be set either by iptables connlabel match or via ctnetlink. Before adding nftables set support, clean up the clabel core and move helpers that nft will not need after all to the xtables module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18udp: fix if statement in SIOCINQ ioctlDan Carpenter
We deleted a line of code and accidentally made the "return put_user()" part of the if statement when it's supposed to be unconditional. Fixes: 9f9a45beaa96 ('udp: do not expect udp headers on ioctl SIOCINQ') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copyRoopa Prabhu
This patch passes netlink attr data ptr directly to dev_get_stats thus elimiating a stats copy. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18treewide: Fix typos in printkMasanari Iida
This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-17net: dsa: constify probed nameVivien Didelot
Change the dsa_switch_driver.probe function to return a const char *. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16openvswitch: Convert to using IFF_NO_QUEUEPhil Sutter
Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip6gre: Add support for GSOAlexander Duyck
This patch adds code borrowed from bits and pieces of other protocols to the IPv6 GRE path so that we can support GSO over IPv6 based GRE tunnels. By adding this support we are able to significantly improve the throughput for GRE tunnels as we are able to make use of GSO. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16GRE: Add support for GRO/GSO of IPv6 GRE trafficAlexander Duyck
Since GRE doesn't really care about L3 protocol we can support IPv4 and IPv6 using the same offloads. With that being the case we can add a call to register the offloads for IPv6 as a part of our GRE offload initialization. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip6gre: Add support for basic offloads offloads excluding GSOAlexander Duyck
This patch adds support for the basic offloads we support on most devices. Specifically with this patch set we can support checksum offload, basic scatter-gather, and highdma. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip6gretap: Fix MTU to allow for Ethernet headerAlexander Duyck
When we were creating an ip6gretap interface the MTU was about 6 bytes short of what was needed. It turns out we were not taking the Ethernet header into account and as a result we were eating into the 8 bytes reserved for the encap limit. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skbAlexander Duyck
This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16RDS: Fix the atomicity for congestion map updatesantosh.shilimkar@oracle.com
Two different threads with different rds sockets may be in rds_recv_rcvbuf_delta() via receive path. If their ports both map to the same word in the congestion map, then using non-atomic ops to update it could cause the map to be incorrect. Lets use atomics to avoid such an issue. Full credit to Wengang <wen.gang.wang@oracle.com> for finding the issue, analysing it and also pointing out to offending code with spin lock based fix. Reviewed-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16RDS: fix endianness for dp_ack_seqQing Huang
dp->dp_ack_seq is used in big endian format. We need to do the big endianness conversion when we assign a value in host format to it. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15vlan: pull on __vlan_insert_tag error path and fix csum correctionDaniel Borkmann
When __vlan_insert_tag() fails from skb_vlan_push() path due to the skb_cow_head(), we need to undo the __skb_push() in the error path as well that was done earlier to move skb->data pointer to mac header. Moreover, I noticed that when in the non-error path the __skb_pull() is done and the original offset to mac header was non-zero, we fixup from a wrong skb->data offset in the checksum complete processing. So the skb_postpush_rcsum() really needs to be done before __skb_pull() where skb->data still points to the mac header start and thus operates under the same conditions as in __vlan_insert_tag(). Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>