summaryrefslogtreecommitdiff
path: root/drivers/net/ipvlan
AgeCommit message (Collapse)Author
2017-12-15ipvlan: remove excessive packet scrubbingMahesh Bandewar
IPvlan currently scrubs packets at every location where packets may be crossing namespace boundary. Though this is desirable, currently IPvlan does it more than necessary. e.g. packets that are going to take dev_forward_skb() path will get scrubbed so no point in scrubbing them before forwarding. Another side-effect of scrubbing is that pkt-type gets set to PACKET_HOST which overrides what was already been set by the earlier path making erroneous delivery of the packets. Also scrubbing packets just before calling dev_queue_xmit() has detrimental effects since packets lose skb->sk and because of that miss prio updates, incorrect socket back-pressure and would even break TSQ. Fixes: b93dd49c1a35 ('ipvlan: Scrub skb before crossing the namespace boundary') Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Revert "ipvlan: add L2 check for packets arriving via virtual devices"Mahesh Bandewar
This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4. Even though the check added is not that taxing, it's not really needed. First of all this will be per packet cost and second thing is that the eth_type_trans() already does this correctly. The excessive scrubbing in IPvlan was changing the pkt-type skb metadata of the packet which made it necessary to re-check the mac. The subsequent patch in this series removes the faulty packet-scrub. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11ipvlan: add L2 check for packets arriving via virtual devicesMahesh Bandewar
Packets that don't have dest mac as the mac of the master device should not be entertained by the IPvlan rx-handler. This is mostly true as the packet path mostly takes care of that, except when the master device is a virtual device. As demonstrated in the following case - ip netns add ns1 ip link add ve1 type veth peer name ve2 ip link add link ve2 name iv1 type ipvlan mode l2 ip link set dev iv1 netns ns1 ip link set ve1 up ip link set ve2 up ip -n ns1 link set iv1 up ip addr add 192.168.10.1/24 dev ve1 ip -n ns1 addr 192.168.10.2/24 dev iv1 ping -c2 192.168.10.2 <Works!> ip neigh show dev ve1 ip neigh show 192.168.10.2 lladdr <random> dev ve1 ping -c2 192.168.10.2 <Still works! Wrong!!> This patch adds that missing check in the IPvlan rx-handler. Reported-by: Amit Sikka <amit.sikka@ericsson.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06ipvlan: Eliminate duplicated codes with existing functionGao Feng
The recv flow of ipvlan l2 mode performs as same as l3 mode for non-multicast packet, so use the existing func ipvlan_handle_mode_l3 instead of these duplicated statements in non-multicast case. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Small overlapping change conflict ('net' changed a line, 'net-next' added a line right afterwards) in flexcan.c Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03ipvlan: Add new func ipvlan_is_valid_dev instead of duplicated codesGao Feng
There are multiple duplicated condition checks in the current codes, so I add the new func ipvlan_is_valid_dev instead of the duplicated codes to check if the netdev is real ipvlan dev. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03ipvlan: Add the skb->mark as flow4's member to lookup routeGao Feng
Current codes don't use skb->mark to assign flowi4_mark, it would make the policy route rule with fwmark doesn't work as expected. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24ipvlan: Fix insufficient skb linear check for ipv6 icmpGao Feng
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for ipv6 header. But it would use the latter memory directly without linear check when it is icmp. So it still may access the unepxected memory in ipvlan_addr_lookup. Now invoke the pskb_may_pull again if it is ipv6 icmp. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24ipvlan: Fix insufficient skb linear check for arpGao Feng
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for arp header. But it would access the arp payload in func ipvlan_addr_lookup. So it still may access the unepxected memory. Now use arp_hdr_len(port->dev) instead of the arp header as the param. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-18ipvlan: NULL pointer dereference panic in ipvlan_port_destroyGirish Moodalbail
When call to register_netdevice() (called from ipvlan_link_new()) fails, we call ipvlan_uninit() (through ndo_uninit()) to destroy the ipvlan port. After returning unsuccessfully from register_netdevice() we go ahead and call ipvlan_port_destroy() again which causes NULL pointer dereference panic. Fix the issue by making ipvlan_init() and ipvlan_uninit() call symmetric. The ipvlan port will now be created inside ipvlan_init() and will be destroyed in ipvlan_uninit(). Fixes: 2ad7bf363841 (ipvlan: Initial check-in of the IPVLAN driver) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11ipvlan: fix ipv6 outbound deviceKeefe Liu
When process the outbound packet of ipv6, we should assign the master device to output device other than input device. Signed-off-by: Keefe Liu <liuqifa@huawei.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several conflicts here. NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to nfp_fl_output() needed some adjustments because the code block is in an else block now. Parallel additions to net/pkt_cls.h and net/sch_generic.h A bug fix in __tcp_retransmit_skb() conflicted with some of the rbtree changes in net-next. The tc action RCU callback fixes in 'net' had some overlap with some of the recent tcf_block reworking. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29ipvlan: implement VEPA modeMahesh Bandewar
This is very similar to the Macvlan VEPA mode, however, there is some difference. IPvlan uses the mac-address of the lower device, so the VEPA mode has implications of ICMP-redirects for packets destined for its immediate neighbors sharing same master since the packets will have same source and dest mac. The external switch/router will send redirect msg. Having said that, this will be useful tool in terms of debugging since IPvlan will not switch packets within its slaves and rely completely on the external entity as intended in 802.1Qbg. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29ipvlan: introduce 'private' attribute for all existing modes.Mahesh Bandewar
IPvlan has always operated in bridge mode. However there are scenarios where each slave should be able to talk through the master device but not necessarily across each other. Think of an environment where each of a namespace is a private and independant customer. In this scenario the machine which is hosting these namespaces neither want to tell who their neighbor is nor the individual namespaces care to talk to neighbor on short-circuited network path. This patch implements the mode that is very similar to the 'private' mode in macvlan where individual slaves can send and receive traffic through the master device, just that they can not talk among slave devices. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28tap: reference to KVA of an unloaded module causes kernel panicGirish Moodalbail
The commit 9a393b5d5988 ("tap: tap as an independent module") created a separate tap module that implements tap functionality and exports interfaces that will be used by macvtap and ipvtap modules to create create respective tap devices. However, that patch introduced a regression wherein the modules macvtap and ipvtap can be removed (through modprobe -r) while there are applications using the respective /dev/tapX devices. These applications cause kernel to hold reference to /dev/tapX through 'struct cdev macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap modules respectively. So, when the application is later closed the kernel panics because we are referencing KVA that is present in the unloaded modules. ----------8<------- Example ----------8<---------- $ sudo ip li add name mv0 link enp7s0 type macvtap $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}' 14:mv0@enp7s0: $ cat /dev/tap14 & $ lsmod |egrep -i 'tap|vlan' macvtap 16384 0 macvlan 24576 1 macvtap tap 24576 3 macvtap $ sudo modprobe -r macvtap $ fg cat /dev/tap14 ^C <...system panics...> BUG: unable to handle kernel paging request at ffffffffa038c500 IP: cdev_put+0xf/0x30 ----------8<-----------------8<---------- The fix is to set cdev.owner to the module that creates the tap device (either macvtap or ipvtap). With this set, the operations (in fs/char_dev.c) on char device holds and releases the module through cdev_get() and cdev_put() and will not allow the module to unload prematurely. Fixes: 9a393b5d5988ea4e (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: Add extack to validator_info structs used for address notifierDavid Ahern
Add extack to in_validator_info and in6_validator_info. Update the one user of each, ipvlan, to return an error message for failures. Only manual configuration of an address is plumbed in the IPv6 code path. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20net: ipv6: Make inet6addr_validator a blocking notifierDavid Ahern
inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan should return an error when an address is already in use") to allow address validation before changes are committed and to be able to fail the address change with an error back to the user. The address validation is not done for addresses received from router advertisements. Handling RAs in softirq context is the only reason for the notifier chain to be atomic versus blocking. Since the only current user, ipvlan, of the validator chain ignores softirq context, the notifier can be made blocking and simply not invoked for softirq path. The blocking option is needed by spectrum for example to validate resources for an adding an address to an interface. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12ipvlan: always use the current L2 addr of the masterMahesh Bandewar
If the underlying master ever changes its L2 (e.g. bonding device), then make sure that the IPvlan slaves always emit packets with the current L2 of the master instead of the stale mac addr which was copied during the device creation. The problem can be seen with following script - #!/bin/bash # Create a vEth pair ip link add dev veth0 type veth peer name veth1 ip link set veth0 up ip link set veth1 up ip link show veth0 ip link show veth1 # Create an IPvlan device on one end of this vEth pair. ip link add link veth0 dev ipvl0 type ipvlan mode l2 ip link show ipvl0 # Change the mac-address of the vEth master. ip link set veth0 address 02:11:22:33:44:55 Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: Add extack to upper device linkingDavid Ahern
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, updates to the conntrack core, enhancements for nf_tables, conversion of netfilter hooks from linked list to array to improve memory locality and asorted improvements for the Netfilter codebase. More specifically, they are: 1) Add expection to hashes after timer initialization to prevent access from another CPU that walks on the hashes and calls del_timer(), from Florian Westphal. 2) Don't update nf_tables chain counters from hot path, this is only used by the x_tables compatibility layer. 3) Get rid of nested rcu_read_lock() calls from netfilter hook path. Hooks are always guaranteed to run from rcu read side, so remove nested rcu_read_lock() where possible. Patch from Taehee Yoo. 4) nf_tables new ruleset generation notifications include PID and name of the process that has updated the ruleset, from Phil Sutter. 5) Use skb_header_pointer() from nft_fib, so we can reuse this code from the nf_family netdev family. Patch from Pablo M. Bermudo. 6) Add support for nft_fib in nf_tables netdev family, also from Pablo. 7) Use deferrable workqueue for conntrack garbage collection, to reduce power consumption, from Patch from Subash Abhinov Kasiviswanathan. 8) Add nf_ct_expect_iterate_net() helper and use it. From Florian Westphal. 9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian. 10) Drop references on conntrack removal path when skbuffs has escaped via nfqueue, from Florian. 11) Don't queue packets to nfqueue with dying conntrack, from Florian. 12) Constify nf_hook_ops structure, from Florian. 13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter. 14) Add nla_strdup(), from Phil Sutter. 15) Rise nf_tables objects name size up to 255 chars, people want to use DNS names, so increase this according to what RFC 1035 specifies. Patch series from Phil Sutter. 16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook registration on demand, suggested by Eric Dumazet, patch from Florian. 17) Remove unused variables in compat_copy_entry_from_user both in ip_tables and arp_tables code. Patch from Taehee Yoo. 18) Constify struct nf_conntrack_l4proto, from Julia Lawall. 19) Constify nf_loginfo structure, also from Julia. 20) Use a single rb root in connlimit, from Taehee Yoo. 21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo. 22) Use audit_log() instead of open-coding it, from Geliang Tang. 23) Allow to mangle tcp options via nft_exthdr, from Florian. 24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes a fix for a miscalculation of the minimal length. 25) Simplify branch logic in h323 helper, from Nick Desaulniers. 26) Calculate netlink attribute size for conntrack tuple at compile time, from Florian. 27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure. From Florian. 28) Remove holes in nf_conntrack_l4proto structure, so it becomes smaller. From Florian. 29) Get rid of print_tuple() indirection for /proc conntrack listing. Place all the code in net/netfilter/nf_conntrack_standalone.c. Patch from Florian. 30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is off. From Florian. 31) Constify most nf_conntrack_{l3,l4}proto helper functions, from Florian. 32) Fix broken indentation in ebtables extensions, from Colin Ian King. 33) Fix several harmless sparse warning, from Florian. 34) Convert netfilter hook infrastructure to use array for better memory locality, joint work done by Florian and Aaron Conole. Moreover, add some instrumentation to debug this. 35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once per batch, from Florian. 36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian. 37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao. 38) Remove unused code in the generic protocol tracker, from Davide Caratti. I think I will have material for a second Netfilter batch in my queue if time allow to make it fit in this merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The UDP offload conflict is dealt with by simply taking what is in net-next where we have removed all of the UFO handling code entirely. The TCP conflict was a case of local variables in a function being removed from both net and net-next. In netvsc we had an assignment right next to where a missing set of u64 stats sync object inits were added. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01ipvlan: Fix 64-bit statistics seqcount initializationFlorian Fainelli
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that by using the proper helper function: netdev_alloc_pcpu_stats(). Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31netfilter: nf_hook_ops structs can be constFlorian Westphal
We no longer place these on a list so they can be const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17ipvlan: Stop advertising NETIF_F_UFO support.David S. Miller
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09Ipvlan should return an error when an address is already in use.Krister Johansen
The ipvlan code already knows how to detect when a duplicate address is about to be assigned to an ipvlan device. However, that failure is not propogated outward and leads to a silent failure. Introduce a validation step at ip address creation time and allow device drivers to register to validate the incoming ip addresses. The ipvlan code is the first consumer. If it detects an address in use, we can return an error to the user before beginning to commit the new ifa in the networking code. This can be especially useful if it is necessary to provision many ipvlans in containers. The provisioning software (or operator) can use this to detect situations where an ip address is unexpectedly in use. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25ipvlan: use pernet operations and restrict l3s hooks to master netnsFlorian Westphal
commit 4fbae7d83c98c30efc ("ipvlan: Introduce l3s mode") added registration of netfilter hooks via nf_register_hooks(). This API provides the illusion of 'global' netfilter hooks by placing the hooks in all current and future network namespaces. In case of ipvlan the hook appears to be only needed in the namespace that contains the ipvlan master device (i.e., usually init_net), so placing them in all namespaces is not needed. This switches ipvlan driver to pernet operations, and then only registers hooks in namespaces where a ipvlan master device is set to l3s mode. Extra care has to be taken when the master device is moved to another namespace, as we might have to 'move' the netfilter hooks too. This is done by storing the namespace the ipvlan port was created in. On REGISTER event, do (un)register operations in the old/new namespaces. This will also allow removal of the nf_register_hooks() in a future patch. Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11ipvtap: IP-VLAN based tap driverSainath Grandhi
This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20ipvlan: use netdev_is_rx_handler_busy instead of checking specific typeMahesh Bandewar
IPvlan checks if the master device is already used by checking a specific device (here it's macvlan device). This is technically not sufficient and it should just ensure the rx_handler is busy or not. This would be a super check that includes macvlan and any other that has already registered rx-handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16ipvlan: fix dev_id creation corner case.Mahesh Bandewar
In the last patch da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan") I missed some part of Dave's suggestion and because of that the dev_id creation could fail in a corner case scenario. This would happen when more or less 64k devices have been already created and several have been deleted. If the devices that are still sticking around are the last n bits from the bitmap. So in this scenario even if lower bits are available, the dev_id search is so narrow that it always fails. Fixes: da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan") CC: David Miller <davem@davemloft.org> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10ipvlan: improvise dev_id generation logic in IPvlanMahesh Bandewar
The patch 009146d117b ("ipvlan: assign unique dev-id for each slave device.") used ida_simple_get() to generate dev_ids assigned to the slave devices. However (Eric has pointed out that) there is a shortcoming with that approach as it always uses the first available ID. This becomes a problem when a slave gets deleted and a new slave gets added. The ID gets reassigned causing the new slave to get the same link-local address. This side-effect is undesirable. This patch adds a per-port variable that keeps track of the IDs assigned and used as the stat-base for the IDR api. This base will be wrapped around when it reaches the MAX (0xFFFE) value possibly on a busy system where slaves are added and deleted routinely. Fixes: 009146d117b ("ipvlan: assign unique dev-id for each slave device.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: David Miller <davem@davemloft.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04ipvlan: assign unique dev-id for each slave device.Mahesh Bandewar
IPvlan setup uses one mac-address (of master). The IPv6 link-local addresses are derived using the mac-address on the link. Lack of dev-ids makes these link-local addresses same for all slaves including that of master device. dev-ids are necessary to add differentiation when L2 address is shared. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rxGao Feng
There are three functions which would invoke the ipvlan_count_rx. They are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input. The former two functions already use the ipvlan directly before ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from ipvl_addr->master, it is not possible to be NULL too. So the ipvlan pointer check is unnecessary in ipvlan_count_rx. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28driver: ipvlan: Define common functions to decrease duplicated codes used to ↵Gao Feng
add or del IP address There are some duplicated codes in ipvlan_add_addr6/4 and ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr and ipvlan_del_addr to decrease the duplicated codes. It could be helful to maintain the codes. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipvlan: fix multicast processingMahesh Bandewar
In an IPvlan setup when master is set in loopback mode e.g. ethtool -K eth0 set loopback on where eth0 is master device for IPvlan setup. The failure is caused by the faulty logic that determines if the packet is from TX-path vs. RX-path by just looking at the mac- addresses on the packet while processing multicast packets. In the loopback-mode where this crash was happening, the packets that are sent out are reflected by the NIC and are processed on the RX path, but mac-address check tricks into thinking this packet is from TX path and falsely uses dev_forward_skb() to pass packets to the slave (virtual) devices. This patch records the path while queueing packets and eliminates logic of looking at mac-addresses for the same decision. ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1737! Call Trace: [<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0 [<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan] [<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan] [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91cdff09>] process_one_work+0x1a9/0x660 [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91ce086d>] worker_thread+0x11d/0x360 [<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350 [<ffffffff91ce960b>] kthread+0xdb/0xe0 [<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 [<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipvlan: fix various issues in ipvlan_process_multicast()Eric Dumazet
1) netif_rx() / dev_forward_skb() should not be called from process context. 2) ipvlan_count_rx() should be called with preemption disabled. 3) We should check if ipvlan->dev is up before feeding packets to netif_rx() 4) We need to prevent device from disappearing if some packets are in the multicast backlog. 5) One kfree_skb() should be a consume_skb() eventually Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-08driver: ipvlan: Unlink the upper dev when ipvlan_link_new failedGao Feng
When netdev_upper_dev_unlink failed in ipvlan_link_new, need to unlink the ipvlan dev with upper dev. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcuGao Feng
There are two functions which would free the ipvl_port now. The first is ipvlan_port_create. It frees the ipvl_port in the error handler, so it could kfree it directly. The second is ipvlan_port_destroy. It invokes netdev_rx_handler_unregister which enforces one grace period by synchronize_net firstly, so it also could kfree the ipvl_port directly and safely. So it is unnecessary to use kfree_rcu to free ipvl_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30driver: ipvlan: Remove useless member mtu_adj of struct ipvl_devGao Feng
The mtu_adj is initialized to zero when alloc mem, there is no any assignment to mtu_adj. It is only used in ipvlan_adjust_mtu as one right value. So it is useless member of struct ipvl_dev, then remove it. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27driver: ipvlan: Fix one possible memleak in ipvlan_link_newGao Feng
When ipvlan_link_new fails and creates one ipvlan port, it does not destroy the ipvlan port created. It causes mem leak and the physical device contains invalid ipvlan data. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15ipvlan: constify l3mdev_ops structureJulia Lawall
This l3mdev_ops structure is only stored in the l3mdev_ops field of a net_device structure. This field is declared const, so the l3mdev_ops structure can be declared as const also. Additionally drop the __read_mostly annotation. The semantic patch that adds const is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r disable optional_qualifier@ identifier i; position p; @@ static struct l3mdev_ops i@p = { ... }; @ok@ identifier r.i; struct net_device *e; position p; @@ e->l3mdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct l3mdev_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct l3mdev_ops i = { ... }; // </smpl> The effect on the layout of the .o file is shown by the following output of the size command, first before then after the transformation: text data bss dec hex filename 7364 466 52 7882 1eca drivers/net/ipvlan/ipvlan_main.o 7412 434 52 7898 1eda drivers/net/ipvlan/ipvlan_main.o Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19ipvlan: Introduce l3s modeMahesh Bandewar
In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and if admins want to deploy iptables rules in default-ns, this asymmetric data path makes it impossible to do so. This patch makes use of the l3_rcv() (added as part of l3mdev enhancements) to perform input route lookup on RX packets without changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN to change the skb->dev just before handing over skb to L4. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25ipvlan: Scrub skb before crossing the namespace boundryMahesh Bandewar
The earlier patch c3aaa06d5a63 (ipvlan: scrub skb before routing in L3 mode.) did this but only for TX path in L3 mode. This patch extends it for both the modes for TX/RX path. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>