summaryrefslogtreecommitdiff
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-12net: bonding: fix tlb_dynamic_lb default valueNikolay Aleksandrov
Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the default value for tlb_dynamic_lb which lead to either broken ALB mode (since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode with tlb_dynamic_lb equal to 0. The first issue was recently fixed by setting tlb_dynamic_lb to 1 always when switching to ALB mode, but the default value is still wrong and we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is changed via netlink or sysfs. In order to restore the previous behaviour and default value simply remove the mode check around the default param initialization for tlb_dynamic_lb which will always set it to 1 as before. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11net: bonding: Fix transmit load balancing in balance-alb mode if specified ↵Kosuke Tatsukawa
by sysfs Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in balance-alb mode") tried to fix transmit dynamic load balancing in balance-alb mode, which wasn't working after commit 8b426dc54cf4 ("bonding: remove hardcoded value"). It turned out that my previous patch only fixed the case when balance-alb was specified as bonding module parameter, and not when balance-alb mode was set using /sys/class/net/*/bonding/mode (the most common usage). In the latter case, tlb_dynamic_lb was set up according to the default mode of the bonding interface, which happens to be balance-rr. This additional patch addresses this issue by setting up tlb_dynamic_lb to 1 if "mode" is set to balance-alb through the sysfs interface. I didn't add code to change tlb_balance_lb back to the default value for other modes, because "mode" is usually set up only once during initialization, and it's not worthwhile to change the static variable bonding_defaults in bond_main.c to a global variable just for this purpose. Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for balance-tlb mode if it is set up using the sysfs interface. I didn't change that behavior, because the value of tlb_balance_lb can be changed using the sysfs interface for balance-tlb, and I didn't like changing the default value back and forth for balance-tlb. As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be written to. However, I think balance-alb with tlb_dynamic_lb set to 0 is not an intended usage, so there is little use making it writable at this moment. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Reported-by: Reinis Rozitis <r@roze.lv> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Cc: stable@vger.kernel.org # v4.12+ Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-08-13bonding: ratelimit failed speed/duplex update warningAndreas Born
bond_miimon_commit() handles the UP transition for each slave of a bond in the case of MII. It is triggered 10 times per second for the default MII Polling interval of 100ms. For device drivers that do not implement __ethtool_get_link_ksettings() the call to bond_update_speed_duplex() fails persistently while the MII status could remain UP. That is, in this and other cases where the speed/duplex update keeps failing over a longer period of time while the MII state is UP, a warning is printed every MII polling interval. To address these excessive warnings net_ratelimit() should be used. Printing a warning once would not be sufficient since the call to bond_update_speed_duplex() could recover to succeed and fail again later. In that case there would be no new indication what went wrong. Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase) Signed-off-by: Andreas Born <futur.andy@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11bonding: require speed/duplex only for 802.3ad, alb and tlbAndreas Born
The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two minor conflicts in virtio_net driver (bug fix overlapping addition of a helper) and MAINTAINERS (new driver edit overlapping revamp of PHY entry). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-26bonding: commit link status change after proposeWANG Cong
Commit de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") moves link status commitment into bond_mii_monitor(), but it still relies on the return value of bond_miimon_inspect() as the hint. We need to return non-zero as long as we propose a link status change. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-07-20net: bonding: Fix transmit load balancing in balance-alb modeKosuke Tatsukawa
balance-alb mode used to have transmit dynamic load balancing feature enabled by default. However, transmit dynamic load balancing no longer works in balance-alb after commit 8b426dc54cf4 ("bonding: remove hardcoded value"). Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to send packets. This function uses the parameter tlb_dynamic_lb. tlb_dynamic_lb used to have the default value of 1 for balance-alb, but now the value is set to 0 except in balance-tlb. Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb for balance-alb similar to balance-tlb. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18net: bonding: constify attribute_group structures.Arvind Yadav
attribute_group are not supposed to change at runtime. All functions working with attribute_group provided by <linux/netdevice.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 4512 1472 0 5984 1760 drivers/net/bonding/bond_sysfs.o File size After adding 'const': text data bss dec hex filename 4576 1408 0 5984 1760 drivers/net/bonding/bond_sysfs.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-08bonding: avoid NETDEV_CHANGEMTU event when unregistering slaveWANG Cong
As Hongjun/Nicolas summarized in their original patch: " When a device changes from one netns to another, it's first unregistered, then the netns reference is updated and the dev is registered in the new netns. Thus, when a slave moves to another netns, it is first unregistered. This triggers a NETDEV_UNREGISTER event which is caught by the bonding driver. The driver calls bond_release(), which calls dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in the old netns). " This is a very special case, because the device is being unregistered no one should still care about the NETDEV_CHANGEMTU event triggered at this point, we can avoid broadcasting this event on this path, and avoid touching inetdev_event()/addrconf_notify() path. It requires to export __dev_set_mtu() to bonding driver. Reported-by: Hongjun Li <hongjun.li@6wind.com> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29Bonding: Convert multiple netdev_info messages to netdev_dbgMichael Dilmore
The bond_options.c file contains multiple netdev_info statements that clutter kernel output. This patch replaces all netdev_info with netdev_dbg and adds a netdev_dbg statement for the packets per slave parameter. Also fixes misalignment at line 467. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Michael J Dilmore <michael.j.dilmore@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.slave_changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20net: manual clean code which call skb_put_[data:zero]yuan linyu
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: introduce and use skb_put_data()Johannes Berg
A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10bonding: warn user when 802.3ad speed is unknownNicolas Dichtel
Goal is to advertise the user when ethtool speeds and 802.3ad speeds are desynchronized. When this case happens, the kernel needs to be patched. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08bonding: fix 802.3ad support for 14G speedNicolas Dichtel
This patch adds 14 Gbps enum definition, and fixes aggregated bandwidth calculation based on above slave links. Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08bonding: fix 802.3ad support for 5G and 50G speedsThibaut Collet
This patch adds [5|50] Gbps enum definition, and fixes aggregated bandwidth calculation based on above slave links. Fixes: c9a70d43461d ("net-next: ethtool: Added port speed macros.") Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: Remove support for bridge bypass ndos from stacked devicesArkadi Sharshevsky
Remove support for bridge bypass ndos from stacked devices. At this point no driver which supports stack device behavior offload supports operation with SELF flag. The case for upper device is already taken care of in both of the following cases: 1. FDB add/del - driver should check at the notification cb if the stacked device contains his ports. 2. Port attribute - calls switchdev code directly which checks for case of stack device. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-27bonding: Prevent duplicate userspace notificationVlad Yasevich
Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA notificatin is generated which results in a rtnelink message to be sent. While runnig 'ip monitor', we can actually see 2 messages, one a result of the event, and the other a result of state change that is generated bo netdev_state_change(). However, this is not always the case. If bonding changes were done via sysfs or ifenslave (old ioctl interface), then only 1 message is seen. This patch removes duplicate messages in the case of using netlink to configure bonding. It introduceds a separte function that triggers a netdev event and uses that function in the syfs and ioctl cases. This was discovered while auditing all the different envents and continues the effort of cleaning up duplicated netlink messages. CC: David Ahern <dsa@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25bonding: Don't update slave->link until ready to commitNithin Sujir
In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state. However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link. This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link. NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that. Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link. Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-22bonding: fix randomly populated arp target arrayJarod Wilson
In commit dc9c4d0fe023, the arp_target array moved from a static global to a local variable. By the nature of static globals, the array used to be initialized to all 0. At present, it's full of random data, which that gets interpreted as arp_target values, when none have actually been specified. Systems end up booting with spew along these lines: [ 32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0 [ 32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.204892] lacp0: Setting MII monitoring interval to 100 [ 32.211071] lacp0: Removing ARP target 216.124.228.17 [ 32.216824] lacp0: Removing ARP target 218.160.255.255 [ 32.222646] lacp0: Removing ARP target 185.170.136.184 [ 32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal [ 32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255) [ 32.243987] lacp0: Removing ARP target 56.125.228.17 [ 32.249625] lacp0: Removing ARP target 218.160.255.255 [ 32.255432] lacp0: Removing ARP target 15.157.233.184 [ 32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal [ 32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255) [ 32.276632] lacp0: Removing ARP target 16.0.0.0 [ 32.281755] lacp0: Removing ARP target 218.160.255.255 [ 32.287567] lacp0: Removing ARP target 72.125.228.17 [ 32.293165] lacp0: Removing ARP target 218.160.255.255 [ 32.298970] lacp0: Removing ARP target 8.125.228.17 [ 32.304458] lacp0: Removing ARP target 218.160.255.255 None of these were actually specified as ARP targets, and the driver does seem to clean up the mess okay, but it's rather noisy and confusing, leaks values to userspace, and the 255.255.255.255 spew shows up even when debug prints are disabled. The fix: just zero out arp_target at init time. While we're in here, init arp_all_targets_value in the right place. Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables") CC: Mahesh Bandewar <maheshb@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-22bonding: fix accounting of active ports in 3adJarod Wilson
As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not removed from the aggregator when they are down, and the active slave count is NOT equal to number of ports in the aggregator, but rather the number of ports in the aggregator that are still enabled. The sysfs spew for bonding_show_ad_num_ports() has a comment that says "Show number of active 802.3ad ports.", but it's currently showing total number of ports, both active and inactive. Remedy it by using the same logic introduced in 0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and netlink all report the number of active ports. Note that this means that IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of NUM_PORTS, and thus perhaps should be renamed for clarity. Lightly tested on a dual i40e lacp bond, simulating link downs with an ip link set dev <slave2> down, was able to produce the state where I could see both in the same aggregator, but a number of ports count of 1. MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 <--- Slave Interface: ens10 MII Status: up <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 <--- Slave Interface: ens10 MII Status: down <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08bonding: check nla_put_be32 return valueHangbin Liu
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-04-28bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removalPaolo Abeni
On slave list updates, the bonding driver computes its hard_header_len as the maximum of all enslaved devices's hard_header_len. If the slave list is empty, e.g. on last enslaved device removal, ETH_HLEN is used. Since the bonding header_ops are set only when the first enslaved device is attached, the above can lead to header_ops->create() being called with the wrong skb headroom in place. If bond0 is configured on top of ipoib devices, with the following commands: ifup bond0 for slave in $BOND_SLAVES_LIST; do ip link set dev $slave nomaster done ping -c 1 <ip on bond0 subnet> we will obtain a skb_under_panic() with a similar call trace: skb_push+0x3d/0x40 push_pseudo_header+0x17/0x30 [ib_ipoib] ipoib_hard_header+0x4e/0x80 [ib_ipoib] arp_create+0x12f/0x220 arp_send_dst.part.19+0x28/0x50 arp_solicit+0x115/0x290 neigh_probe+0x4d/0x70 __neigh_event_send+0xa7/0x230 neigh_resolve_output+0x12e/0x1c0 ip_finish_output2+0x14b/0x390 ip_finish_output+0x136/0x1e0 ip_output+0x76/0xe0 ip_local_out+0x35/0x40 ip_send_skb+0x19/0x40 ip_push_pending_frames+0x33/0x40 raw_sendmsg+0x7d3/0xb50 inet_sendmsg+0x31/0xb0 sock_sendmsg+0x38/0x50 SYSC_sendto+0x102/0x190 SyS_sendto+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 This change addresses the issue avoiding updating the bonding device hard_header_len when the slaves list become empty, forbidding to shrink it below the value used by header_ops->create(). The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len") but the panic can be triggered only since commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard header"). Reported-by: Norbert P <noe@physik.uzh.ch> Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len") Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21bonding: fix wq initialization for links created via netlinkMahesh Bandewar
Earlier patch 4493b81bea ("bonding: initialize work-queues during creation of bond") moved the work-queue initialization from bond_open() to bond_create(). However this caused the link those are created using netlink 'create bond option' (ip link add bondX type bond); create the new trunk without initializing work-queues. Prior to the above mentioned change, ndo_open was in both paths and things worked correctly. The consequence is visible in the report shared by Joe Stringer - I've noticed that this patch breaks bonding within namespaces if you're not careful to perform device cleanup correctly. Here's my repro script, you can run on any net-next with this patch and you'll start seeing some weird behaviour: ip netns add foo ip li add veth0 type veth peer name veth0+ netns foo ip li add veth1 type veth peer name veth1+ netns foo ip netns exec foo ip li add bond0 type bond ip netns exec foo ip li set dev veth0+ master bond0 ip netns exec foo ip li set dev veth1+ master bond0 ip netns exec foo ip addr add dev bond0 192.168.0.1/24 ip netns exec foo ip li set dev bond0 up ip li del dev veth0 ip li del dev veth1 The second to last command segfaults, last command hangs. rtnl is now permanently locked. It's not a problem if you take bond0 down before deleting veths, or delete bond0 before deleting veths. If you delete either end of the veth pair as per above, either inside or outside the namespace, it hits this problem. Here's some kernel logs: [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link [ 1281.193863] bond0: Releasing backup interface veth0+ [ 1281.193866] bond0: the permanent HWaddr of veth0+ - 16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of veth0+ to a different address to avoid conflicts [ 1281.193867] ------------[ cut here ]------------ [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511 __queue_delayed_work+0x13f/0x150 [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid hid mptspi mptscsih e1000 mptbase ahci libahci [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G W 4.10.0-bisect-bond-v0.14 #37 [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 1281.193906] Call Trace: [ 1281.193912] dump_stack+0x63/0x89 [ 1281.193915] __warn+0xd1/0xf0 [ 1281.193917] warn_slowpath_null+0x1d/0x20 [ 1281.193918] __queue_delayed_work+0x13f/0x150 [ 1281.193920] queue_delayed_work_on+0x27/0x40 [ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding] [ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30 [ 1281.193935] __bond_release_one+0x489/0x510 [bonding] [ 1281.193939] ? addrconf_notify+0x1b7/0xab0 [ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding] [ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole] [ 1281.193947] notifier_call_chain+0x49/0x70 [ 1281.193948] raw_notifier_call_chain+0x16/0x20 [ 1281.193950] call_netdevice_notifiers_info+0x35/0x60 [ 1281.193951] rollback_registered_many+0x23b/0x3e0 [ 1281.193953] unregister_netdevice_many+0x24/0xd0 [ 1281.193955] rtnl_delete_link+0x3c/0x50 [ 1281.193956] rtnl_dellink+0x8d/0x1b0 [ 1281.193960] rtnetlink_rcv_msg+0x95/0x220 [ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280 [ 1281.193964] ? __netlink_lookup+0xf1/0x110 [ 1281.193966] ? rtnl_newlink+0x830/0x830 [ 1281.193967] netlink_rcv_skb+0xa7/0xc0 [ 1281.193969] rtnetlink_rcv+0x28/0x30 [ 1281.193970] netlink_unicast+0x15b/0x210 [ 1281.193971] netlink_sendmsg+0x319/0x390 [ 1281.193974] sock_sendmsg+0x38/0x50 [ 1281.193975] ___sys_sendmsg+0x25c/0x270 [ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0 [ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0 [ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0 [ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170 [ 1281.193987] __sys_sendmsg+0x45/0x80 [ 1281.193989] SyS_sendmsg+0x12/0x20 [ 1281.193991] do_syscall_64+0x6e/0x180 [ 1281.193993] entry_SYSCALL64_slow_path+0x25/0x25 [ 1281.193995] RIP: 0033:0x7f6ec122f5a0 [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0 [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003 [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003 [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450 [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]--- Fixes: 4493b81bea ("bonding: initialize work-queues during creation of bond") Reported-by: Joe Stringer <joe@ovn.org> Tested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bonding: deliver link-local packets with skb->dev set to link that packets ↵Chonggang Li
arrived on Bonding driver changes the skb->dev to the bonding-master before passing the packet to stack for further processing. This, however does not make sense for the link-local packets and it loses "the link info" once its skb->dev is changed to bonding-master. This patch changes this behavior for link-local packets by not changing the skb->dev to the bonding-master and maintaining it as it is, i.e. the link on which the packet arrived. Signed-off-by: Chonggang Li <chonggangli@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13bonding: handle link transition from FAIL to UP correctlyMahesh Bandewar
When link transitions from LINK_FAIL to LINK_UP, the commit phase is not called. This leads to an erroneous state causing slave-link state to get stuck in "going down" state while its speed and duplex are perfectly fine. This issue is a side-effect of splitting link-set into propose and commit phases introduced by de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") This patch fixes these issues by calling commit phase whenever link state change is proposed. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05bonding: attempt to better support longer hw addressesJarod Wilson
People are using bonding over Infiniband IPoIB connections, and who knows what else. Infiniband has a hardware address length of 20 octets (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32. Various places in the bonding code are currently hard-wired to 6 octets (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides, only alb is currently possible on Infiniband links right now anyway, due to commit 1533e7731522, so the alb code is where most of the changes are. One major component of this change is the addition of a bond_hw_addr_copy function that takes a length argument, instead of using ether_addr_copy everywhere that hardware addresses need to be copied about. The other major component of this change is converting the bonding code from using struct sockaddr for address storage to struct sockaddr_storage, as the former has an address storage space of only 14, while the latter is 128 minus a few, which is necessary to support bonding over device with up to MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes up some memory corruption issues with the current code, where it's possible to write an infiniband hardware address into a sockaddr declared on the stack. Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet hardware address now: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active) Primary Slave: mlx4_ib0 (primary_reselect always) Currently Active Slave: mlx4_ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 100 Down Delay (ms): 100 Slave Interface: mlx4_ib0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01 Slave queue ID: 0 Slave Interface: mlx4_ib1 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02 Slave queue ID: 0 Also tested with a standard 1Gbps NIC bonding setup (with a mix of e1000 and e1000e cards), running LNST's bonding tests. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05bonding: fix active-backup transitionMahesh Bandewar
Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") made an attempt to keep slave state consistent with speed and duplex settings. Unfortunately link-state transition is used to change the active link especially when used in conjunction with mii-mon. The above mentioned patch broke that logic. Also when speed and duplex settings for a link are updated during a link-event, the link-status should not be changed to invoke correct transition logic. This patch fixes this issue by moving the link-state update outside of the bond_update_speed_duplex() fn and to the places where this fn is called and update link-state selectively. Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30bonding: refine bond_fold_stats() wrap detectionEric Dumazet
Some device drivers reset their stats at down/up events, possibly fooling bonding stats, since they operate with relative deltas. It is nearly not possible to fix drivers, since some of them compute the tx/rx counters based on per rx/tx queue stats, and the queues can be reconfigured (ethtool -L) between the down/up sequence. Lets avoid accumulating 'negative' values that render bonding stats useless. It is better to lose small deltas, assuming the bonding stats are fetched at a reasonable frequency. Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: avoid printing while holding a spinlockMahesh Bandewar
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: correctly update link status during mii-commit phaseMahesh Bandewar
bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this patch the link will be marked UP only if the speed and duplex values retrieved have sane values and processed further. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: make speed, duplex setting consistent with link stateMahesh Bandewar
bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cannot be retrieved, then keeping the link UP causes problems. The updated bond_update_speed_duplex() returns 0 on success if it retrieves sane values for speed and duplex. On failure it returns 1 and marks the link down. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: improve link-status update in mii-monitoringMahesh Bandewar
The primary issue is that mii-inspect phase updates link-state and expects changes to be committed during the mii-commit phase. After the inspect phase if it fails to acquire rtnl-mutex, the commit phase (bond_mii_commit) doesn't get to run. This partially updated state stays and makes the internal-state inconsistent. e.g. setup bond0 => slaves: eth1, eth2 eth1 goes DOWN -> UP mii_monitor() mii-inspect() bond_set_slave_link_state(eth1, UP, DontNotify) rtnl_trylock() <- fails! Next mii-monitor round eth1: No change mii_monitor() mii-inspect() eth1->link == current-status (ethtool_ops->get_link) no-change-detected End result: eth1: Link = BOND_LINK_UP Speed = 0xfffff [SpeedUnknown] Duplex = 0xff [DuplexUnknown] This doesn't always happen but for some unlucky machines in a large set of machines it creates problems. The fix for this is to avoid making changes during inspect phase and postpone them until acquiring the rtnl-mutex / invoking commit phase. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bonding: add 802.3ad support for 25G speedsJarod Wilson
Cut-n-paste enablement of 802.3ad bonding on 25G NICs, which currently report 0 as their bandwidth. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09bonding: reduce scope of some global variablesMahesh Bandewar
Many of the bond param variables are declared global while it's not really necessary for these variables to be global. So moving them to the location these are used. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09bonding: remove "port-moved" state that was never implementedMahesh Bandewar
LACP state-machine defines "port-moved" state when the same ActorSystemID and Port are seen in a LACPDU received on different port. The state is never set since it's not implemented. However the state-machine attempts to clear that state occasionally. LACP state machine is already complicated and since this state is not implemented, removing it's checks makes the state-machine little simpler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09bonding: remove hardcoded valueMahesh Bandewar
Eliminate hard-coded value and use the default that is set. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09bonding: initialize work-queues during creation of bondMahesh Bandewar
Initializing work-queues every time ifup operation performed is unnecessary and can be performed only once when the port is created. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09bonding: restructure arp-monitorMahesh Bandewar
In preparation to move the work-queue initialization to port creation from current port_open phase. Work-queue initialization does not make sense every time we do 'ifup/ifdown'. So moving to port creation phase. Arp monitoring work depends on the bonding mode and that is not tied to the port creation and can change anytime during the life after port creation. So this restructuring allows us to move the initialization at creation without losing the ability to arm the correct work for the mode user has selected. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix double-free in batman-adv, from Sven Eckelmann. 2) Fix packet stats for fast-RX path, from Joannes Berg. 3) Netfilter's ip_route_me_harder() doesn't handle request sockets properly, fix from Florian Westphal. 4) Fix sendmsg deadlock in rxrpc, from David Howells. 5) Add missing RCU locking to transport hashtable scan, from Xin Long. 6) Fix potential packet loss in mlxsw driver, from Ido Schimmel. 7) Fix race in NAPI handling between poll handlers and busy polling, from Eric Dumazet. 8) TX path in vxlan and geneve need proper RCU locking, from Jakub Kicinski. 9) SYN processing in DCCP and TCP need to disable BH, from Eric Dumazet. 10) Properly handle net_enable_timestamp() being invoked from IRQ context, also from Eric Dumazet. 11) Fix crash on device-tree systems in xgene driver, from Alban Bedel. 12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de Melo. 13) Fix use-after-free in netvsc driver, from Dexuan Cui. 14) Fix max MTU setting in bonding driver, from WANG Cong. 15) xen-netback hash table can be allocated from softirq context, so use GFP_ATOMIC. From Anoob Soman. 16) Fix MAC address change bug in bgmac driver, from Hari Vyas. 17) strparser needs to destroy strp_wq on module exit, from WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) strparser: destroy workqueue on module exit sfc: fix IPID endianness in TSOv2 sfc: avoid max() in array size rds: remove unnecessary returned value check rxrpc: Fix potential NULL-pointer exception nfp: correct DMA direction in XDP DMA sync nfp: don't tell FW about the reserved buffer space net: ethernet: bgmac: mac address change bug net: ethernet: bgmac: init sequence bug xen-netback: don't vfree() queues under spinlock xen-netback: keep a local pointer for vif in backend_disconnect() netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups netfilter: nf_conntrack_sip: fix wrong memory initialisation can: flexcan: fix typo in comment can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer can: gs_usb: fix coding style can: gs_usb: Don't use stack memory for USB transfers ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines ixgbe: update the rss key on h/w, when ethtool ask for it ...