summaryrefslogtreecommitdiff
path: root/drivers/net/ipvlan
AgeCommit message (Collapse)Author
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25ipvlan: use pernet operations and restrict l3s hooks to master netnsFlorian Westphal
commit 4fbae7d83c98c30efc ("ipvlan: Introduce l3s mode") added registration of netfilter hooks via nf_register_hooks(). This API provides the illusion of 'global' netfilter hooks by placing the hooks in all current and future network namespaces. In case of ipvlan the hook appears to be only needed in the namespace that contains the ipvlan master device (i.e., usually init_net), so placing them in all namespaces is not needed. This switches ipvlan driver to pernet operations, and then only registers hooks in namespaces where a ipvlan master device is set to l3s mode. Extra care has to be taken when the master device is moved to another namespace, as we might have to 'move' the netfilter hooks too. This is done by storing the namespace the ipvlan port was created in. On REGISTER event, do (un)register operations in the old/new namespaces. This will also allow removal of the nf_register_hooks() in a future patch. Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11ipvtap: IP-VLAN based tap driverSainath Grandhi
This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20ipvlan: use netdev_is_rx_handler_busy instead of checking specific typeMahesh Bandewar
IPvlan checks if the master device is already used by checking a specific device (here it's macvlan device). This is technically not sufficient and it should just ensure the rx_handler is busy or not. This would be a super check that includes macvlan and any other that has already registered rx-handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16ipvlan: fix dev_id creation corner case.Mahesh Bandewar
In the last patch da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan") I missed some part of Dave's suggestion and because of that the dev_id creation could fail in a corner case scenario. This would happen when more or less 64k devices have been already created and several have been deleted. If the devices that are still sticking around are the last n bits from the bitmap. So in this scenario even if lower bits are available, the dev_id search is so narrow that it always fails. Fixes: da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan") CC: David Miller <davem@davemloft.org> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10ipvlan: improvise dev_id generation logic in IPvlanMahesh Bandewar
The patch 009146d117b ("ipvlan: assign unique dev-id for each slave device.") used ida_simple_get() to generate dev_ids assigned to the slave devices. However (Eric has pointed out that) there is a shortcoming with that approach as it always uses the first available ID. This becomes a problem when a slave gets deleted and a new slave gets added. The ID gets reassigned causing the new slave to get the same link-local address. This side-effect is undesirable. This patch adds a per-port variable that keeps track of the IDs assigned and used as the stat-base for the IDR api. This base will be wrapped around when it reaches the MAX (0xFFFE) value possibly on a busy system where slaves are added and deleted routinely. Fixes: 009146d117b ("ipvlan: assign unique dev-id for each slave device.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: David Miller <davem@davemloft.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04ipvlan: assign unique dev-id for each slave device.Mahesh Bandewar
IPvlan setup uses one mac-address (of master). The IPv6 link-local addresses are derived using the mac-address on the link. Lack of dev-ids makes these link-local addresses same for all slaves including that of master device. dev-ids are necessary to add differentiation when L2 address is shared. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rxGao Feng
There are three functions which would invoke the ipvlan_count_rx. They are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input. The former two functions already use the ipvlan directly before ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from ipvl_addr->master, it is not possible to be NULL too. So the ipvlan pointer check is unnecessary in ipvlan_count_rx. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28driver: ipvlan: Define common functions to decrease duplicated codes used to ↵Gao Feng
add or del IP address There are some duplicated codes in ipvlan_add_addr6/4 and ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr and ipvlan_del_addr to decrease the duplicated codes. It could be helful to maintain the codes. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipvlan: fix multicast processingMahesh Bandewar
In an IPvlan setup when master is set in loopback mode e.g. ethtool -K eth0 set loopback on where eth0 is master device for IPvlan setup. The failure is caused by the faulty logic that determines if the packet is from TX-path vs. RX-path by just looking at the mac- addresses on the packet while processing multicast packets. In the loopback-mode where this crash was happening, the packets that are sent out are reflected by the NIC and are processed on the RX path, but mac-address check tricks into thinking this packet is from TX path and falsely uses dev_forward_skb() to pass packets to the slave (virtual) devices. This patch records the path while queueing packets and eliminates logic of looking at mac-addresses for the same decision. ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1737! Call Trace: [<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0 [<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan] [<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan] [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91cdff09>] process_one_work+0x1a9/0x660 [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91ce086d>] worker_thread+0x11d/0x360 [<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350 [<ffffffff91ce960b>] kthread+0xdb/0xe0 [<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 [<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipvlan: fix various issues in ipvlan_process_multicast()Eric Dumazet
1) netif_rx() / dev_forward_skb() should not be called from process context. 2) ipvlan_count_rx() should be called with preemption disabled. 3) We should check if ipvlan->dev is up before feeding packets to netif_rx() 4) We need to prevent device from disappearing if some packets are in the multicast backlog. 5) One kfree_skb() should be a consume_skb() eventually Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-08driver: ipvlan: Unlink the upper dev when ipvlan_link_new failedGao Feng
When netdev_upper_dev_unlink failed in ipvlan_link_new, need to unlink the ipvlan dev with upper dev. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcuGao Feng
There are two functions which would free the ipvl_port now. The first is ipvlan_port_create. It frees the ipvl_port in the error handler, so it could kfree it directly. The second is ipvlan_port_destroy. It invokes netdev_rx_handler_unregister which enforces one grace period by synchronize_net firstly, so it also could kfree the ipvl_port directly and safely. So it is unnecessary to use kfree_rcu to free ipvl_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30driver: ipvlan: Remove useless member mtu_adj of struct ipvl_devGao Feng
The mtu_adj is initialized to zero when alloc mem, there is no any assignment to mtu_adj. It is only used in ipvlan_adjust_mtu as one right value. So it is useless member of struct ipvl_dev, then remove it. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27driver: ipvlan: Fix one possible memleak in ipvlan_link_newGao Feng
When ipvlan_link_new fails and creates one ipvlan port, it does not destroy the ipvlan port created. It causes mem leak and the physical device contains invalid ipvlan data. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15ipvlan: constify l3mdev_ops structureJulia Lawall
This l3mdev_ops structure is only stored in the l3mdev_ops field of a net_device structure. This field is declared const, so the l3mdev_ops structure can be declared as const also. Additionally drop the __read_mostly annotation. The semantic patch that adds const is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r disable optional_qualifier@ identifier i; position p; @@ static struct l3mdev_ops i@p = { ... }; @ok@ identifier r.i; struct net_device *e; position p; @@ e->l3mdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct l3mdev_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct l3mdev_ops i = { ... }; // </smpl> The effect on the layout of the .o file is shown by the following output of the size command, first before then after the transformation: text data bss dec hex filename 7364 466 52 7882 1eca drivers/net/ipvlan/ipvlan_main.o 7412 434 52 7898 1eda drivers/net/ipvlan/ipvlan_main.o Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19ipvlan: Introduce l3s modeMahesh Bandewar
In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and if admins want to deploy iptables rules in default-ns, this asymmetric data path makes it impossible to do so. This patch makes use of the l3_rcv() (added as part of l3mdev enhancements) to perform input route lookup on RX packets without changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN to change the skb->dev just before handing over skb to L4. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25ipvlan: Scrub skb before crossing the namespace boundryMahesh Bandewar
The earlier patch c3aaa06d5a63 (ipvlan: scrub skb before routing in L3 mode.) did this but only for TX path in L3 mode. This patch extends it for both the modes for TX/RX path. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: ipvlan: call netdev_lockdep_set_classes()Eric Dumazet
In case a qdisc is used on a ipvlan device, we need to use different lockdep classes to avoid false positives. Use the new netdev_lockdep_set_classes() generic helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28ipvlan: Fix failure path in dev registration during link creationMahesh Bandewar
When newlink creation fails at device-registration, the port->count is decremented twice. Francesco Ruggeri (fruggeri@arista.com) found this issue in Macvlan and the same exists in IPvlan driver too. While fixing this issue I noticed another issue of missing unregister in case of failure, so adding it to the fix which is similar to the macvlan fix by Francesco in commit 308379607548 ("macvlan: fix failure during registration v3") Reported-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17vlan: propagate gso_max_segsEric Dumazet
vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25net: ipvlan: use __ethtool_get_ksettingsDavid Decotigny
Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21ipvlan: misc changesMahesh Bandewar
1. scope correction for few functions that are used in single file. 2. Adjust variables that are used in fast-path to fit into single cacheline 3. Update rcv_frame() to skip shared check for frames coming over wire Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21ipvlan: mode is u16Mahesh Bandewar
The mode argument was erronusly defined as u32 but it has always been u16. Also use ipvlan_set_mode() helper to set the mode instead of assigning directly. This should avoid future erronus assignments / updates. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21ipvlan: scrub skb before routing in L3 mode.Mahesh Bandewar
Scrub skb before hitting the iptable hooks to ensure packets hit these hooks. Set the xnet param only when the packet is crossing the ns boundry so if the IPvlan slave and master belong to the same ns, the param will be set to false. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-04ipvlan: inherit MTU from master deviceMahesh Bandewar
When we create IPvlan slave; we use ether_setup() and that sets up default MTU to 1500 while the master device may have lower / different MTU. Any subsequent changes to the masters' MTU are reflected into the slaves' MTU setting. However if those don't happen (most likely scenario), the slaves' MTU stays at 1500 which could be bad. This change adds code to inherit MTU from the master device instead of using the default value during the link initialization phase. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Tim Hockins <thockins@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASKTom Herbert
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17ipvlan: fix use after free of skbSabrina Dubroca
ipvlan_handle_frame is a rx_handler, and when it returns a value other than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER), __netif_receive_skb_core expects that the skb still exists and will process it further, but we just freed it. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17ipvlan: fix leak in ipvlan_rcv_frameSabrina Dubroca
Pass a **skb to ipvlan_rcv_frame so that if skb_share_check returns a new skb, we actually use it during further processing. It's safe to ignore the new skb in the ipvlan_xmit_* functions, because they call ipvlan_rcv_frame with local == true, so that dev_forward_skb is called and always takes ownership of the skb. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22ipvlan: read direct ifindex instead of iflinkBrenden Blanco
In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is being grabbed from dev_get_iflink. This works for the physical device case, since as the documentation of that function notes: "Physical interfaces have the same 'ifindex' and 'iflink' values.". However, if the master device is a veth, and the pairs are in separate net namespaces, the route lookup will fail with -ENODEV due to outer veth pair being in a separate namespace from the ipvlan master/routing namespace. ns0 | ns1 | ns2 veth0a--|--veth0b--|--ipvl0 In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above configuration will pass fl.flowi4_oif == veth0a to ip_route_output_flow(), but *net == ns1. Notice also that ipv6 processing is not using iflink. Since there is a discrepancy in usage, fixup both v4 and v6 case to use local dev variable. Tested this with l3 ipvlan on top of veth, as well as with single physical interface in the top namespace. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4, ipv6: Pass net into ip_local_out and ip6_local_outEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipvlan: Cache net in ipvlan_process_v4_outbound and ipvlan_process_v6_outboundEric W. Biederman
Compute net once in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound and store it in a variable so that net does not need to be recomputed next time it is used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv6: Merge ip6_local_out and ip6_local_out_skEric W. Biederman
Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4: Merge ip_local_out and ip_local_out_skEric W. Biederman
It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18net: ipvlan: convert to using IFF_NO_QUEUEPhil Sutter
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipvlan: ignore addresses from ipv6 autoconfigurationKonstantin Khlebnikov
Inet6addr notifier is atomic and runs in bh context without RTNL when ipv6 receives router advertisement packet and performs autoconfiguration. Proper fix still in discussion. Let's at least plug the bug. v1: http://lkml.kernel.org/r/20150514135618.14062.1969.stgit@buzz v2: http://lkml.kernel.org/r/20150703125840.24121.91556.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipvlan: use rcu_deference_bh() in ipvlan_queue_xmit()WANG Cong
In tx path rcu_read_lock_bh() is held, so we need rcu_deference_bh(). This fixes the following warning: =============================== [ INFO: suspicious RCU usage. ] 4.1.0-rc1+ #1007 Not tainted ------------------------------- drivers/net/ipvlan/ipvlan.h:106 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by dhclient/1076: #0: (rcu_read_lock_bh){......}, at: [<ffffffff817e8d84>] rcu_lock_acquire+0x0/0x26 stack backtrace: CPU: 2 PID: 1076 Comm: dhclient Not tainted 4.1.0-rc1+ #1007 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff8800d381bac8 ffffffff81a4154f 000000003c1a3c19 ffff8800d4d0a690 ffff8800d381baf8 ffffffff810b849f ffff880117d41148 ffff880117d40000 ffff880117d40068 0000000000000156 ffff8800d381bb18 Call Trace: [<ffffffff81a4154f>] dump_stack+0x4c/0x65 [<ffffffff810b849f>] lockdep_rcu_suspicious+0x107/0x110 [<ffffffff8165a522>] ipvlan_port_get_rcu+0x47/0x4e [<ffffffff8165ad14>] ipvlan_queue_xmit+0x35/0x450 [<ffffffff817ea45d>] ? rcu_read_unlock+0x3e/0x5f [<ffffffff810a20bf>] ? local_clock+0x19/0x22 [<ffffffff810b4781>] ? __lock_is_held+0x39/0x52 [<ffffffff8165b64c>] ipvlan_start_xmit+0x1b/0x44 [<ffffffff817edf7f>] dev_hard_start_xmit+0x2ae/0x467 [<ffffffff817ee642>] __dev_queue_xmit+0x50a/0x60c [<ffffffff817ee7a7>] dev_queue_xmit_sk+0x13/0x15 [<ffffffff81997596>] dev_queue_xmit+0x10/0x12 [<ffffffff8199b41c>] packet_sendmsg+0xb6b/0xbdf [<ffffffff810b5ea7>] ? mark_lock+0x2e/0x226 [<ffffffff810a1fcc>] ? sched_clock_cpu+0x9e/0xb7 [<ffffffff817d56f9>] sock_sendmsg_nosec+0x12/0x1d [<ffffffff817d7257>] sock_sendmsg+0x29/0x2e [<ffffffff817d72cc>] sock_write_iter+0x70/0x91 [<ffffffff81199563>] __vfs_write+0x7e/0xa7 [<ffffffff811996bc>] vfs_write+0x92/0xe8 [<ffffffff811997d7>] SyS_write+0x47/0x7e [<ffffffff81a4d517>] system_call_fastpath+0x12/0x6f Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipvlan: unhash addresses without synchronize_rcuKonstantin Khlebnikov
All structures used in traffic forwarding are rcu-protected: ipvl_addr, ipvl_dev and ipvl_port. Thus we can unhash addresses without synchronization. We'll anyway hash it back into the same bucket: in worst case lockless lookup will scan hash once again. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipvlan: plug memory leak in ipvlan_link_deleteKonstantin Khlebnikov
Add missing kfree_rcu(addr, rcu); Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipvlan: remove counters of ipv4 and ipv6 addressesKonstantin Khlebnikov
They are unused after commit f631c44bbe15 ("ipvlan: Always set broadcast bit in multicast filter"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ipvlan: Always set broadcast bit in multicast filterMahesh Bandewar
Earlier tricks of setting broadcast bit only when IPv4 address is added onto interface are not good enough especially when autoconf comes in play. Setting them on always is performance drag but now that multicast / broadcast is not processed in fast-path; enabling broadcast will let autoconf work correctly without affecting performance characteristics of the device. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ipvlan: Defer multicast / broadcast processing to a work-queueMahesh Bandewar
Processing multicast / broadcast in fast path is performance draining and having more links means more cloning and bringing performance down further. Broadcast; in particular, need to be given to all the virtual links. Earlier tricks of enabling broadcast bit for IPv4 only interfaces are not really working since it fails autoconf. Which means enabling broadcast for all the links if protocol specific hacks do not have to be added into the driver. This patch defers all (incoming as well as outgoing) multicast traffic to a work-queue leaving only the unicast traffic in the fast-path. Now if we need to apply any additional tricks to further reduce the impact of this (multicast / broadcast) type of traffic, it can be implemented while processing this work without affecting the fast-path. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02ipvlan: implement ndo_get_iflinkNicolas Dichtel
Don't use dev->iflink anymore. CC: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02dev: introduce dev_get_iflink()Nicolas Dichtel
The goal of this patch is to prepare the removal of the iflink field. It introduces a new ndo function, which will be implemented by virtual interfaces. There is no functional change into this patch. All readers of iflink field now call dev_get_iflink(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: fix check for IP addresses in control pathJiri Benc
When an ipvlan interface is down, its addresses are not on the hash list. Fix checks for existence of addresses not to depend on the hash list, walk through all interface addresses instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: do not use rcu operations for address listJiri Benc
All accesses to ipvlan->addrs are under rtnl. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>