summaryrefslogtreecommitdiff
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2020-09-28net: core: introduce struct netdev_nested_priv for nested interface ↵Taehee Yoo
infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25bonding: set dev->needed_headroom in bond_setup_by_slave()Eric Dumazet
syzbot managed to crash a host by creating a bond with a GRE device. For non Ethernet device, bonding calls bond_setup_by_slave() instead of ether_setup(), and unfortunately dev->needed_headroom was not copied from the new added member. [ 171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0 [ 171.243111] ------------[ cut here ]------------ [ 171.243112] kernel BUG at net/core/skbuff.c:112! [ 171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI [ 171.243469] gsmi: Log Shutdown Reason 0x03 [ 171.243505] Call Trace: [ 171.243506] <IRQ> [ 171.243512] [<ffffffffa171be59>] skb_push+0x49/0x50 [ 171.243516] [<ffffffffa184b9ea>] ipgre_header+0x2a/0xf0 [ 171.243520] [<ffffffffa17452d7>] neigh_connected_output+0xb7/0x100 [ 171.243524] [<ffffffffa186f1d3>] ip6_finish_output2+0x383/0x490 [ 171.243528] [<ffffffffa186ede2>] __ip6_finish_output+0xa2/0x110 [ 171.243531] [<ffffffffa186acbc>] ip6_finish_output+0x2c/0xa0 [ 171.243534] [<ffffffffa186abe9>] ip6_output+0x69/0x110 [ 171.243537] [<ffffffffa186ac90>] ? ip6_output+0x110/0x110 [ 171.243541] [<ffffffffa189d952>] mld_sendpack+0x1b2/0x2d0 [ 171.243544] [<ffffffffa189d290>] ? mld_send_report+0xf0/0xf0 [ 171.243548] [<ffffffffa189c797>] mld_ifc_timer_expire+0x2d7/0x3b0 [ 171.243551] [<ffffffffa189c4c0>] ? mld_gq_timer_expire+0x50/0x50 [ 171.243556] [<ffffffffa0fea270>] call_timer_fn+0x30/0x130 [ 171.243559] [<ffffffffa0fea17c>] expire_timers+0x4c/0x110 [ 171.243563] [<ffffffffa0fea0e3>] __run_timers+0x213/0x260 [ 171.243566] [<ffffffffa0fecb7d>] ? ktime_get+0x3d/0xa0 [ 171.243570] [<ffffffffa0ff9c4e>] ? clockevents_program_event+0x7e/0xe0 [ 171.243574] [<ffffffffa0f7e5d5>] ? sched_clock_cpu+0x15/0x190 [ 171.243577] [<ffffffffa0fe973d>] run_timer_softirq+0x1d/0x40 [ 171.243581] [<ffffffffa1c00152>] __do_softirq+0x152/0x2f0 [ 171.243585] [<ffffffffa0f44e1f>] irq_exit+0x9f/0xb0 [ 171.243588] [<ffffffffa1a02e1d>] smp_apic_timer_interrupt+0xfd/0x1a0 [ 171.243591] [<ffffffffa1a01ea6>] apic_timer_interrupt+0x86/0x90 Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-18bonding: fix active-backup failover for current ARP slaveJiri Wiesner
When the ARP monitor is used for link detection, ARP replies are validated for all slaves (arp_validate=3) and fail_over_mac is set to active, two slaves of an active-backup bond may get stuck in a state where both of them are active and pass packets that they receive to the bond. This state makes IPv6 duplicate address detection fail. The state is reached thus: 1. The current active slave goes down because the ARP target is not reachable. 2. The current ARP slave is chosen and made active. 3. A new slave is enslaved. This new slave becomes the current active slave and can reach the ARP target. As a result, the current ARP slave stays active after the enslave action has finished and the log is littered with "PROBE BAD" messages: > bond0: PROBE: c_arp ens10 && cas ens11 BAD The workaround is to remove the slave with "going back" status from the bond and re-enslave it. This issue was encountered when DPDK PMD interfaces were being enslaved to an active-backup bond. I would be possible to fix the issue in bond_enslave() or bond_change_active_slave() but the ARP monitor was fixed instead to keep most of the actions changing the current ARP slave in the ARP monitor code. The current ARP slave is set as inactive and backup during the commit phase. A new state, BOND_LINK_FAIL, has been introduced for slaves in the context of the ARP monitor. This allows administrators to see how slaves are rotated for sending ARP requests and attempts are made to find a new active slave. Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16bonding: fix a potential double-unregisterCong Wang
When we tear down a network namespace, we unregister all the netdevices within it. So we may queue a slave device and a bonding device together in the same unregister queue. If the only slave device is non-ethernet, it would automatically unregister the bonding device as well. Thus, we may end up unregistering the bonding device twice. Workaround this special case by checking reg_state. Fixes: 9b5e383c11b0 ("net: Introduce unregister_netdevice_many()") Reported-by: syzbot+af23e7f3e0a7e10c8b67@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: bonding: bond_alb: Describe alb_handle_addr_collision_on_attach()'s ↵Lee Jones
'bond' and 'addr' params Fixes the following W=1 kernel build warning(s): drivers/net/bonding/bond_alb.c:1222: warning: Function parameter or member 'bond' not described in 'alb_set_mac_address' Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: bonding: bond_main: Document 'proto' and rename 'new_active' parametersLee Jones
Fixes the following W=1 kernel build warning(s): drivers/net/bonding/bond_main.c:329: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_add_vid' drivers/net/bonding/bond_main.c:362: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_kill_vid' drivers/net/bonding/bond_main.c:964: warning: Function parameter or member 'new_active' not described in 'bond_change_active_slave' drivers/net/bonding/bond_main.c:964: warning: Excess function parameter 'new' description in 'bond_change_active_slave' Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14net: bonding: bond_3ad: Fix a bunch of kerneldoc parameter issuesLee Jones
Renames and missing descriptions. Fixes the following W=1 kernel build warning(s): drivers/net/bonding/bond_3ad.c:140: warning: Function parameter or member 'port' not described in '__get_first_agg' drivers/net/bonding/bond_3ad.c:140: warning: Excess function parameter 'bond' description in '__get_first_agg' drivers/net/bonding/bond_3ad.c:1655: warning: Function parameter or member 'agg' not described in 'ad_agg_selection_logic' drivers/net/bonding/bond_3ad.c:1655: warning: Excess function parameter 'aggregator' description in 'ad_agg_selection_logic' drivers/net/bonding/bond_3ad.c:1817: warning: Function parameter or member 'port' not described in 'ad_initialize_port' drivers/net/bonding/bond_3ad.c:1817: warning: Excess function parameter 'aggregator' description in 'ad_initialize_port' drivers/net/bonding/bond_3ad.c:1976: warning: Function parameter or member 'timeout' not described in 'bond_3ad_initiate_agg_selection' drivers/net/bonding/bond_3ad.c:2274: warning: Function parameter or member 'work' not described in 'bond_3ad_state_machine_handler' drivers/net/bonding/bond_3ad.c:2274: warning: Excess function parameter 'bond' description in 'bond_3ad_state_machine_handler' drivers/net/bonding/bond_3ad.c:2508: warning: Function parameter or member 'link' not described in 'bond_3ad_handle_link_change' drivers/net/bonding/bond_3ad.c:2508: warning: Excess function parameter 'status' description in 'bond_3ad_handle_link_change' drivers/net/bonding/bond_3ad.c:2566: warning: Function parameter or member 'bond' not described in 'bond_3ad_set_carrier' drivers/net/bonding/bond_3ad.c:2677: warning: Function parameter or member 'bond' not described in 'bond_3ad_update_lacp_rate' drivers/net/bonding/bond_3ad.c:1655: warning: Function parameter or member 'agg' not described in 'ad_agg_selection_logic' drivers/net/bonding/bond_3ad.c:1655: warning: Excess function parameter 'aggregator' description in 'ad_agg_selection_logic' drivers/net/bonding/bond_3ad.c:1817: warning: Function parameter or member 'port' not described in 'ad_initialize_port' drivers/net/bonding/bond_3ad.c:1817: warning: Excess function parameter 'aggregator' description in 'ad_initialize_port' drivers/net/bonding/bond_3ad.c:1976: warning: Function parameter or member 'timeout' not described in 'bond_3ad_initiate_agg_selection' drivers/net/bonding/bond_3ad.c:2274: warning: Function parameter or member 'work' not described in 'bond_3ad_state_machine_handler' drivers/net/bonding/bond_3ad.c:2274: warning: Excess function parameter 'bond' description in 'bond_3ad_state_machine_handler' drivers/net/bonding/bond_3ad.c:2508: warning: Function parameter or member 'link' not described in 'bond_3ad_handle_link_change' drivers/net/bonding/bond_3ad.c:2508: warning: Excess function parameter 'status' description in 'bond_3ad_handle_link_change' drivers/net/bonding/bond_3ad.c:2566: warning: Function parameter or member 'bond' not described in 'bond_3ad_set_carrier' drivers/net/bonding/bond_3ad.c:2677: warning: Function parameter or member 'bond' not described in 'bond_3ad_update_lacp_rate' Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14bonding: show saner speed for broadcast modeJarod Wilson
Broadcast mode bonds transmit a copy of all traffic simultaneously out of all interfaces, so the "speed" of the bond isn't really the aggregate of all interfaces, but rather, the speed of the slowest active interface. Also, the type of the speed field is u32, not unsigned long, so adjust that accordingly, as required to make min() function here without complaining about mismatching types. Fixes: bb5b052f751b ("bond: add support to read speed and duplex via ethtool") CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23bonding: check return value of register_netdevice() in bond_newlink()Cong Wang
Very similar to commit 544f287b8495 ("bonding: check error value of register_netdevice() immediately"), we should immediately check the return value of register_netdevice() before doing anything else. Fixes: 005db31d5f5f ("bonding: set carrier off for devices created through netlink") Reported-and-tested-by: syzbot+bbc3a11c4da63c1b74d6@syzkaller.appspotmail.com Cc: Beniamino Galvani <bgalvani@redhat.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19bonding: check error value of register_netdevice() immediatelyTaehee Yoo
If register_netdevice() is failed, net_device should not be used because variables are uninitialized or freed. So, the routine should be stopped immediately. But, bond_create() doesn't check return value of register_netdevice() immediately. That will result in a panic because of using uninitialized or freed memory. Test commands: modprobe netdev-notifier-error-inject echo -22 > /sys/kernel/debug/notifier-error-inject/netdev/\ actions/NETDEV_REGISTER/error modprobe bonding max_bonds=3 Splat looks like: [ 375.028492][ T193] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 375.033207][ T193] CPU: 2 PID: 193 Comm: kworker/2:2 Not tainted 5.8.0-rc4+ #645 [ 375.036068][ T193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 375.039673][ T193] Workqueue: events linkwatch_event [ 375.041557][ T193] RIP: 0010:dev_activate+0x4a/0x340 [ 375.043381][ T193] Code: 40 a8 04 0f 85 db 00 00 00 8b 83 08 04 00 00 85 c0 0f 84 0d 01 00 00 31 d2 89 d0 48 8d 04 40 48 c1 e0 07 48 03 83 00 04 00 00 <48> 8b 48 10 f6 41 10 01 75 08 f0 80 a1 a0 01 00 00 fd 48 89 48 08 [ 375.050267][ T193] RSP: 0018:ffff9f8facfcfdd8 EFLAGS: 00010202 [ 375.052410][ T193] RAX: 6b6b6b6b6b6b6b6b RBX: ffff9f8fae6ea000 RCX: 0000000000000006 [ 375.055178][ T193] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f8fae6ea000 [ 375.057762][ T193] RBP: ffff9f8fae6ea000 R08: 0000000000000000 R09: 0000000000000000 [ 375.059810][ T193] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9f8facfcfe08 [ 375.061892][ T193] R13: ffffffff883587e0 R14: 0000000000000000 R15: ffff9f8fae6ea580 [ 375.063931][ T193] FS: 0000000000000000(0000) GS:ffff9f8fbae00000(0000) knlGS:0000000000000000 [ 375.066239][ T193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.067841][ T193] CR2: 00007f2f542167a0 CR3: 000000012cee6002 CR4: 00000000003606e0 [ 375.069657][ T193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 375.071471][ T193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 375.073269][ T193] Call Trace: [ 375.074005][ T193] linkwatch_do_dev+0x4d/0x50 [ 375.075052][ T193] __linkwatch_run_queue+0x10b/0x200 [ 375.076244][ T193] linkwatch_event+0x21/0x30 [ 375.077274][ T193] process_one_work+0x252/0x600 [ 375.078379][ T193] ? process_one_work+0x600/0x600 [ 375.079518][ T193] worker_thread+0x3c/0x380 [ 375.080534][ T193] ? process_one_work+0x600/0x600 [ 375.081668][ T193] kthread+0x139/0x150 [ 375.082567][ T193] ? kthread_park+0x90/0x90 [ 375.083567][ T193] ret_from_fork+0x22/0x30 Fixes: e826eafa65c6 ("bonding: Call netif_carrier_off after register_netdevice") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08bonding: don't need RTNL for ipsec helpersJarod Wilson
The bond_ipsec_* helpers don't need RTNL, and can potentially get called without it being held, so switch from rtnl_dereference() to rcu_dereference() to access bond struct data. Lightly tested with xfrm bonding, no problems found, should address the syzkaller bug referenced below. Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08bonding: deal with xfrm state in all modes and add more error-checkingJarod Wilson
It's possible that device removal happens when the bond is in non-AB mode, and addition happens in AB mode, so bond_ipsec_del_sa() never gets called, which leaves security associations in an odd state if bond_ipsec_add_sa() then gets called after switching the bond into AB. Just call add and delete universally for all modes to keep things consistent. However, it's also possible that this code gets called when the system is shutting down, and the xfrm subsystem has already been disconnected from the bond device, so we need to do some error-checking and bail, lest we hit a null ptr deref. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01bonding: allow xfrm offload setup post-module-loadJarod Wilson
At the moment, bonding xfrm crypto offload can only be set up if the bonding module is loaded with active-backup mode already set. We need to be able to make this work with bonds set to AB after the bonding driver has already been loaded. So what's done here is: 1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used by both bond_main.c and bond_options.c 2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than only when loading in AB mode 3) wire up xfrmdev_ops universally too 4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB 5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a performance hit from traversing into the underlying drivers 5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call netdev_change_features() from bond_option_mode_set() In my local testing, I can change bonding modes back and forth on the fly, have hardware offload work when I'm in AB, and see no performance penalty to non-AB software encryption, despite having xfrm bits all wired up for all modes now. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Reported-by: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26bonding: Remove extraneous parentheses in bond_setupNathan Chancellor
Clang warns: drivers/net/bonding/bond_main.c:4657:23: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/bonding/bond_main.c:4681:23: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ This warning occurs when a comparision has two sets of parentheses, which is usually the convention for doing an assignment within an if statement. Since equality comparisons do not need a second set of parentheses, remove them to fix the warning. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Link: https://github.com/ClangBuiltLinux/linux/issues/1066 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kernelci.org bot <bot@kernelci.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23bonding/xfrm: use real_dev instead of slave_devJarod Wilson
Rather than requiring every hw crypto capable NIC driver to do a check for slave_dev being set, set real_dev in the xfrm layer and xso init time, and then override it in the bonding driver as needed. Then NIC drivers can always use real_dev, and at the same time, we eliminate the use of a variable name that probably shouldn't have been used in the first place, particularly given recent current events. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22bonding: support hardware encryption offload to slavesJarod Wilson
Currently, this support is limited to active-backup mode, as I'm not sure about the feasilibity of mapping an xfrm_state's offload handle to multiple hardware devices simultaneously, and we rely on being able to pass some hints to both the xfrm and NIC driver about whether or not they're operating on a slave device. I've tested this atop an Intel x520 device (ixgbe) using libreswan in transport mode, succesfully achieving ~4.3Gbps throughput with netperf (more or less identical to throughput on a bare NIC in this system), as well as successful failover and recovery mid-netperf. v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-09net: change addr_list_lock back to static keyCong Wang
The dynamic key update for addr_list_lock still causes troubles, for example the following race condition still exists: CPU 0: CPU 1: (RCU read lock) (RTNL lock) dev_mc_seq_show() netdev_update_lockdep_key() -> lockdep_unregister_key() -> netif_addr_lock_bh() because lockdep doesn't provide an API to update it atomically. Therefore, we have to move it back to static keys and use subclass for nest locking like before. In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), I already reverted most parts of commit ab92d68fc22f ("net: core: add generic lockdep keys"). This patch reverts the rest and also part of commit f3b0a18bb6cb ("net: remove unnecessary variables and callback"). After this patch, addr_list_lock changes back to using static keys and subclasses to satisfy lockdep. Thanks to dev->lower_level, we do not have to change back to ->ndo_get_lock_subclass(). And hopefully this reduces some syzbot lockdep noises too. Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28bonding: Fix reference count leak in bond_sysfs_slave_add.Qiushi Wu
kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Previous commit "b8eb718348b8" fixed a similar problem. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-09Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge includes updates to bonding driver needed for the rdma stack, to avoid conflicts with the RDMA branch. Maor Gottlieb Says: ==================== Bonding: Add support to get xmit slave The following series adds support to get the LAG master xmit slave by introducing new .ndo - ndo_get_xmit_slave. Every LAG module can implement it and it first implemented in the bond driver. This is follow-up to the RFC discussion [1]. The main motivation for doing this is for drivers that offload part of the LAG functionality. For example, Mellanox Connect-X hardware implements RoCE LAG which selects the TX affinity when the resources are created and port is remapped when it goes down. The first part of this patchset introduces the new .ndo and add the support to the bonding module. The second part adds support to get the RoCE LAG xmit slave by building skb of the RoCE packet based on the AH attributes and call to the new .ndo. The third part change the mlx5 driver driver to set the QP's affinity port according to the slave which found by the .ndo. ==================== Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-07bonding: propagate transmit statusEric Dumazet
Currently, bonding always returns NETDEV_TX_OK to its caller. It is worth trying to be more accurate : TCP for instance can have different recovery strategies if it can have more precise status, if packet was dropped by slave qdisc. This is especially important when host is under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04bonding: remove useless stats_lock_keyCong Wang
After commit b3e80d44f5b1 ("bonding: fix lockdep warning in bond_get_stats()") the dynamic key is no longer necessary, as we compute nest level at run-time. So, we can just remove it to save some lockdep key entries. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: partially revert dynamic lockdep key changesCong Wang
This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01bonding: Implement ndo_get_xmit_slaveMaor Gottlieb
Add implementation of ndo_get_xmit_slave. Find the slave by using the helper function according to the bond mode. If the caller set all_slaves to true, then it assumes that all slaves are available to transmit. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Add array of all slavesMaor Gottlieb
Keep all slaves in array so it could be used to get the xmit slave assume all the slaves are active. The logic to add slave to the array is like the usable slaves, except that we also add slaves that currently can't transmit - not up or active. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Add function to get the xmit slave in active-backup modeMaor Gottlieb
Add helper function to get the xmit slave in active-backup mode. It's only one line function that return the curr_active_slave, but it will used both in the xmit flow and by the new .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Add helper function to get the xmit slave in rr modeMaor Gottlieb
Add helper function to get the xmit slave when bond is in round robin mode. Change bond_xmit_slave_id to bond_get_slave_by_id, then the logic for find the next slave for transmit could be used both by the xmit flow and the .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Add helper function to get the xmit slave based on hashMaor Gottlieb
Both xor and 802.3ad modes use bond_xmit_hash to get the xmit slave. Export the logic to helper function so it could be used in the following patches by the .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding/alb: Add helper functions to get the xmit slaveMaor Gottlieb
Add two helper functions to get the xmit slave of bond in alb or tlb mode. Extract the logic of find the xmit slave from the xmit flow to function. Xmit flow will xmit through this slave and in the following patches the new .ndo will call to the helper function to return the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Rename slave_arr to usable_slavesMaor Gottlieb
Rename slave_arr to usable_slaves, since we will have two arrays, one for the usable slaves and the other to all slaves. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01bonding: Export skip slave logic to functionMaor Gottlieb
As a preparation for following change that add array of all slaves, extract code that skip slave to function. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-21drivers: Remove inclusion of vermagic headerLeon Romanovsky
Get rid of linux/vermagic.h includes, so that MODULE_ARCH_VERMAGIC from the arch header arch/x86/include/asm/module.h won't be redefined. In file included from ./include/linux/module.h:30, from drivers/net/ethernet/3com/3c515.c:56: ./arch/x86/include/asm/module.h:73: warning: "MODULE_ARCH_VERMAGIC" redefined 73 | # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY | In file included from drivers/net/ethernet/3com/3c515.c:25: ./include/linux/vermagic.h:28: note: this is the location of the previous definition 28 | #define MODULE_ARCH_VERMAGIC "" | Fixes: 6bba2e89a88c ("net/3com: Delete driver and module versions from 3com drivers") Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Shannon Nelson <snelson@pensando.io> # ionic Acked-by: Sebastian Reichel <sre@kernel.org> # power Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06bonding/alb: make sure arp header is pulled before accessing itEric Dumazet
Similar to commit 38f88c454042 ("bonding/alb: properly access headers in bond_alb_xmit()"), we need to make sure arp header was pulled in skb->head before blindly accessing it in rlb_arp_xmit(). Remove arp_pkt() private helper, since it is more readable/obvious to have the following construct back to back : if (!pskb_network_may_pull(skb, sizeof(*arp))) return NULL; arp = (struct arp_pkt *)skb_network_header(skb); syzbot reported : BUG: KMSAN: uninit-value in bond_slave_has_mac_rx include/net/bonding.h:704 [inline] BUG: KMSAN: uninit-value in rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline] BUG: KMSAN: uninit-value in bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477 CPU: 0 PID: 12743 Comm: syz-executor.4 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 bond_slave_has_mac_rx include/net/bonding.h:704 [inline] rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline] bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477 __bond_start_xmit drivers/net/bonding/bond_main.c:4257 [inline] bond_start_xmit+0x85d/0x2f70 drivers/net/bonding/bond_main.c:4282 __netdev_start_xmit include/linux/netdevice.h:4524 [inline] netdev_start_xmit include/linux/netdevice.h:4538 [inline] xmit_one net/core/dev.c:3470 [inline] dev_hard_start_xmit+0x531/0xab0 net/core/dev.c:3486 __dev_queue_xmit+0x37de/0x4220 net/core/dev.c:4063 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4096 packet_snd net/packet/af_packet.c:2967 [inline] packet_sendmsg+0x8347/0x93b0 net/packet/af_packet.c:2992 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] __sys_sendto+0xc1b/0xc50 net/socket.c:1998 __do_sys_sendto net/socket.c:2010 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2006 __x64_sys_sendto+0x6e/0x90 net/socket.c:2006 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c479 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fc77ffbbc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fc77ffbc6d4 RCX: 000000000045c479 RDX: 000000000000000e RSI: 00000000200004c0 RDI: 0000000000000003 RBP: 000000000076bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000a04 R14: 00000000004cc7b0 R15: 000000000076bf2c Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2793 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1051 [inline] alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766 sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242 packet_alloc_skb net/packet/af_packet.c:2815 [inline] packet_snd net/packet/af_packet.c:2910 [inline] packet_sendmsg+0x66a0/0x93b0 net/packet/af_packet.c:2992 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] __sys_sendto+0xc1b/0xc50 net/socket.c:1998 __do_sys_sendto net/socket.c:2010 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2006 __x64_sys_sendto+0x6e/0x90 net/socket.c:2006 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net/bond: Delete driver and module versionsLeon Romanovsky
The in-kernel code has already unique version, which is based on Linus's tag, update the bond driver to be consistent with that version. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: use netif_is_bridge_port() to check for IFF_BRIDGE_PORTJulian Wiedmann
Trivial cleanup, so that all bridge port-specific code can be found in one go. CC: Johannes Berg <johannes@sipsolutions.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16bonding: fix lockdep warning in bond_get_stats()Taehee Yoo
In the "struct bonding", there is stats_lock. This lock protects "bond_stats" in the "struct bonding". bond_stats is updated in the bond_get_stats() and this function would be executed concurrently. So, the lock is needed. Bonding interfaces would be nested. So, either stats_lock should use dynamic lockdep class key or stats_lock should be used by spin_lock_nested(). In the current code, stats_lock is using a dynamic lockdep class key. But there is no updating stats_lock_key routine So, lockdep warning will occur. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 Splat looks like: [ 38.420603][ T957] 5.5.0+ #394 Not tainted [ 38.421074][ T957] ------------------------------------------------------ [ 38.421837][ T957] ip/957 is trying to acquire lock: [ 38.422399][ T957] ffff888063262cd8 (&bond->stats_lock_key#2){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.423528][ T957] [ 38.423528][ T957] but task is already holding lock: [ 38.424526][ T957] ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.426075][ T957] [ 38.426075][ T957] which lock already depends on the new lock. [ 38.426075][ T957] [ 38.428536][ T957] [ 38.428536][ T957] the existing dependency chain (in reverse order) is: [ 38.429475][ T957] [ 38.429475][ T957] -> #1 (&bond->stats_lock_key){+.+.}: [ 38.430273][ T957] _raw_spin_lock+0x30/0x70 [ 38.430812][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.431451][ T957] dev_get_stats+0x1ec/0x270 [ 38.432088][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.432767][ T957] dev_get_stats+0x1ec/0x270 [ 38.433322][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.433866][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.434474][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.435081][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.436848][ T957] rtnetlink_event+0xcd/0x120 [ 38.437455][ T957] notifier_call_chain+0x90/0x160 [ 38.438067][ T957] netdev_change_features+0x74/0xa0 [ 38.438708][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.439522][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.440225][ T957] do_setlink+0xaab/0x2ef0 [ 38.440786][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.441463][ T957] rtnl_newlink+0x65/0x90 [ 38.442075][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.442774][ T957] netlink_rcv_skb+0x121/0x350 [ 38.443451][ T957] netlink_unicast+0x42e/0x610 [ 38.444282][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.444992][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.445679][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.446365][ T957] __sys_sendmsg+0xc6/0x150 [ 38.447007][ T957] do_syscall_64+0x99/0x4f0 [ 38.447668][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.448538][ T957] [ 38.448538][ T957] -> #0 (&bond->stats_lock_key#2){+.+.}: [ 38.449554][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.450148][ T957] lock_acquire+0x164/0x3b0 [ 38.450711][ T957] _raw_spin_lock+0x30/0x70 [ 38.451292][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.451950][ T957] dev_get_stats+0x1ec/0x270 [ 38.452425][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.453362][ T957] dev_get_stats+0x1ec/0x270 [ 38.453825][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.454390][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.456257][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.456998][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.459351][ T957] rtnetlink_event+0xcd/0x120 [ 38.460086][ T957] notifier_call_chain+0x90/0x160 [ 38.460829][ T957] netdev_change_features+0x74/0xa0 [ 38.461752][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.462705][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.463476][ T957] do_setlink+0xaab/0x2ef0 [ 38.464141][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.464897][ T957] rtnl_newlink+0x65/0x90 [ 38.465522][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.466215][ T957] netlink_rcv_skb+0x121/0x350 [ 38.466895][ T957] netlink_unicast+0x42e/0x610 [ 38.467583][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.468285][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.469202][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.469884][ T957] __sys_sendmsg+0xc6/0x150 [ 38.470587][ T957] do_syscall_64+0x99/0x4f0 [ 38.471245][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.472093][ T957] [ 38.472093][ T957] other info that might help us debug this: [ 38.472093][ T957] [ 38.473438][ T957] Possible unsafe locking scenario: [ 38.473438][ T957] [ 38.474898][ T957] CPU0 CPU1 [ 38.476234][ T957] ---- ---- [ 38.480171][ T957] lock(&bond->stats_lock_key); [ 38.480808][ T957] lock(&bond->stats_lock_key#2); [ 38.481791][ T957] lock(&bond->stats_lock_key); [ 38.482754][ T957] lock(&bond->stats_lock_key#2); [ 38.483416][ T957] [ 38.483416][ T957] *** DEADLOCK *** [ 38.483416][ T957] [ 38.484505][ T957] 3 locks held by ip/957: [ 38.485048][ T957] #0: ffffffffbccf6230 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890 [ 38.486198][ T957] #1: ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.487625][ T957] #2: ffffffffbc9254c0 (rcu_read_lock){....}, at: bond_get_stats+0x5/0x4d0 [bonding] [ 38.488897][ T957] [ 38.488897][ T957] stack backtrace: [ 38.489646][ T957] CPU: 1 PID: 957 Comm: ip Not tainted 5.5.0+ #394 [ 38.490497][ T957] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 38.492810][ T957] Call Trace: [ 38.493219][ T957] dump_stack+0x96/0xdb [ 38.493709][ T957] check_noncircular+0x371/0x450 [ 38.494344][ T957] ? lookup_address+0x60/0x60 [ 38.494923][ T957] ? print_circular_bug.isra.35+0x310/0x310 [ 38.495699][ T957] ? hlock_class+0x130/0x130 [ 38.496334][ T957] ? __lock_acquire+0x2d8d/0x3de0 [ 38.496979][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.497607][ T957] ? register_lock_class+0x14d0/0x14d0 [ 38.498333][ T957] ? check_chain_key+0x236/0x5d0 [ 38.499003][ T957] lock_acquire+0x164/0x3b0 [ 38.499800][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.500706][ T957] _raw_spin_lock+0x30/0x70 [ 38.501435][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.502311][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ ... ] But, there is another problem. The dynamic lockdep class key is protected by RTNL, but bond_get_stats() would be called outside of RTNL. So, it would use an invalid dynamic lockdep class key. In order to fix this issue, stats_lock uses spin_lock_nested() instead of a dynamic lockdep key. The bond_get_stats() calls bond_get_lowest_level_rcu() to get the correct nest level value, which will be used by spin_lock_nested(). The "dev->lower_level" indicates lower nest level value, but this value is invalid outside of RTNL. So, bond_get_lowest_level_rcu() returns valid lower nest level value in the RCU critical section. bond_get_lowest_level_rcu() will be work only when LOCKDEP is enabled. Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16bonding: add missing netdev_update_lockdep_key()Taehee Yoo
After bond_release(), netdev_update_lockdep_key() should be called. But both ioctl path and attribute path don't call netdev_update_lockdep_key(). This patch adds missing netdev_update_lockdep_key(). Test commands: ip link add bond0 type bond ip link add bond1 type bond ifenslave bond0 bond1 ifenslave -d bond0 bond1 ifenslave bond1 bond0 Splat looks like: [ 29.501182][ T1046] WARNING: possible circular locking dependency detected [ 29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700 [ 29.503442][ T1046] 5.5.0+ #322 Not tainted [ 29.503447][ T1046] ------------------------------------------------------ [ 29.504277][ T1039] softirqs last enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981 [ 29.505443][ T1046] ifenslave/1046 is trying to acquire lock: [ 29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0 [ 29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120 [ 29.511243][ T1046] [ 29.511243][ T1046] but task is already holding lock: [ 29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.514124][ T1046] [ 29.514124][ T1046] which lock already depends on the new lock. [ 29.514124][ T1046] [ 29.517297][ T1046] [ 29.517297][ T1046] the existing dependency chain (in reverse order) is: [ 29.518231][ T1046] [ 29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}: [ 29.519076][ T1046] _raw_spin_lock+0x30/0x70 [ 29.519588][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.520208][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.520862][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.521640][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.522438][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.523251][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.524082][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.524959][ T1046] kernfs_fop_write+0x276/0x410 [ 29.525620][ T1046] vfs_write+0x197/0x4a0 [ 29.526218][ T1046] ksys_write+0x141/0x1d0 [ 29.526818][ T1046] do_syscall_64+0x99/0x4f0 [ 29.527430][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.528265][ T1046] [ 29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}: [ 29.529272][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.529935][ T1046] lock_acquire+0x164/0x3b0 [ 29.530638][ T1046] _raw_spin_lock+0x30/0x70 [ 29.531187][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.531790][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.532451][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.533163][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.533789][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.534595][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.535500][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.536379][ T1046] kernfs_fop_write+0x276/0x410 [ 29.537057][ T1046] vfs_write+0x197/0x4a0 [ 29.537640][ T1046] ksys_write+0x141/0x1d0 [ 29.538251][ T1046] do_syscall_64+0x99/0x4f0 [ 29.538870][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.539659][ T1046] [ 29.539659][ T1046] other info that might help us debug this: [ 29.539659][ T1046] [ 29.540953][ T1046] Possible unsafe locking scenario: [ 29.540953][ T1046] [ 29.541883][ T1046] CPU0 CPU1 [ 29.542540][ T1046] ---- ---- [ 29.543209][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.543880][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.544873][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.545863][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.546525][ T1046] [ 29.546525][ T1046] *** DEADLOCK *** [ 29.546525][ T1046] [ 29.547542][ T1046] 5 locks held by ifenslave/1046: [ 29.548196][ T1046] #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0 [ 29.549248][ T1046] #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410 [ 29.550343][ T1046] #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410 [ 29.551575][ T1046] #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding] [ 29.552819][ T1046] #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.554175][ T1046] [ 29.554175][ T1046] stack backtrace: [ 29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322 [ 29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 29.557064][ T1046] Call Trace: [ 29.557504][ T1046] dump_stack+0x96/0xdb [ 29.558054][ T1046] check_noncircular+0x371/0x450 [ 29.558723][ T1046] ? print_circular_bug.isra.35+0x310/0x310 [ 29.559486][ T1046] ? hlock_class+0x130/0x130 [ 29.560100][ T1046] ? __lock_acquire+0x2d8d/0x3de0 [ 29.560761][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.561366][ T1046] ? register_lock_class+0x14d0/0x14d0 [ 29.562045][ T1046] ? find_held_lock+0x39/0x1d0 [ 29.562641][ T1046] lock_acquire+0x164/0x3b0 [ 29.563199][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.563872][ T1046] _raw_spin_lock+0x30/0x70 [ 29.564464][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.565146][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.565793][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.566487][ T1046] ? bond_update_slave_arr+0x940/0x940 [bonding] [ 29.567279][ T1046] ? bstr_printf+0xc20/0xc20 [ 29.567857][ T1046] ? stack_trace_consume_entry+0x160/0x160 [ 29.568614][ T1046] ? deactivate_slab.isra.77+0x2c5/0x800 [ 29.569320][ T1046] ? check_chain_key+0x236/0x5d0 [ 29.569939][ T1046] ? sscanf+0x93/0xc0 [ 29.570442][ T1046] ? vsscanf+0x1e20/0x1e20 [ 29.571003][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ ... ] Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05bonding/alb: properly access headers in bond_alb_xmit()Eric Dumazet
syzbot managed to send an IPX packet through bond_alb_xmit() and af_packet and triggered a use-after-free. First, bond_alb_xmit() was using ipx_hdr() helper to reach the IPX header, but ipx_hdr() was using the transport offset instead of the network offset. In the particular syzbot report transport offset was 0xFFFF This patch removes ipx_hdr() since it was only (mis)used from bonding. Then we need to make sure IPv4/IPv6/IPX headers are pulled in skb->head before dereferencing anything. BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108 (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline] [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53 [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282 [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline] [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline] [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422 [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469 [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline] [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224 [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline] [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline] [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline] [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627 [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238 [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278 [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline] [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252 [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline] [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684 [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996 [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline] [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26bonding: rename AD_STATE_* to LACP_STATE_*Andy Roulin
As the LACP actor/partner state is now part of the uapi, rename the 3ad state defines with LACP prefix. The LACP prefix is preferred over BOND_3AD as the LACP standard moved to 802.1AX. Fixes: 826f66b30c2e3 ("bonding: move 802.3ad port state flags to uapi") Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Mere overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-14bonding: fix active-backup transition after link failureMahesh Bandewar
After the recent fix in commit 1899bb325149 ("bonding: fix state transition issue in link monitoring"), the active-backup mode with miimon initially come-up fine but after a link-failure, both members transition into backup state. Following steps to reproduce the scenario (eth1 and eth2 are the slaves of the bond): ip link set eth1 up ip link set eth2 down sleep 1 ip link set eth2 up ip link set eth1 down cat /sys/class/net/eth1/bonding_slave/state cat /sys/class/net/eth2/bonding_slave/state Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring") CC: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-12-14bonding: move 802.3ad port state flags to uapiAndy Roulin
The bond slave actor/partner operating state is exported as bitfield to userspace, which lacks a way to interpret it, e.g., iproute2 only prints the state as a number: ad_actor_oper_port_state 15 For userspace to interpret the bitfield, the bitfield definitions should be part of the uapi. The bitfield itself is defined in the 802.3ad standard. This commit moves the 802.3ad bitfield definitions to uapi. Related iproute2 patches, soon to be posted upstream, use the new uapi headers to pretty-print bond slave state, e.g., with ip -d link show ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync> Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-12-09bonding: fix bond_neigh_init()Eric Dumazet
1) syzbot reported an uninit-value in bond_neigh_setup() [1] bond_neigh_setup() uses a temporary on-stack 'struct neigh_parms parms', but only clears parms.neigh_setup field. A stacked bonding device would then enter bond_neigh_setup() and read garbage from parms->dev. If we get really unlucky and garbage is matching @dev, then we could recurse and eventually crash. Let's make sure the whole structure is cleared to avoid surprises. 2) bond_neigh_setup() can be called while another cpu manipulates the master device, removing or adding a slave. We need at least rcu protection to prevent use-after-free. Note: Prior code does not support a stack of bonding devices, this patch does not attempt to fix this, and leave a comment instead. [1] BUG: KMSAN: uninit-value in bond_neigh_setup+0xa4/0x110 drivers/net/bonding/bond_main.c:3655 CPU: 0 PID: 11256 Comm: syz-executor.0 Not tainted 5.4.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245 bond_neigh_setup+0xa4/0x110 drivers/net/bonding/bond_main.c:3655 bond_neigh_init+0x216/0x4b0 drivers/net/bonding/bond_main.c:3626 ___neigh_create+0x169e/0x2c40 net/core/neighbour.c:613 __neigh_create+0xbd/0xd0 net/core/neighbour.c:674 ip6_finish_output2+0x149a/0x2670 net/ipv6/ip6_output.c:113 __ip6_finish_output+0x83d/0x8f0 net/ipv6/ip6_output.c:142 ip6_finish_output+0x2db/0x420 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] mld_sendpack+0xebd/0x13d0 net/ipv6/mcast.c:1682 mld_send_cr net/ipv6/mcast.c:1978 [inline] mld_ifc_timer_expire+0x116b/0x1680 net/ipv6/mcast.c:2477 call_timer_fn+0x232/0x530 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers+0xd60/0x1270 kernel/time/timer.c:1773 run_timer_softirq+0x2d/0x50 kernel/time/timer.c:1786 __do_softirq+0x4a1/0x83a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:375 [inline] irq_exit+0x230/0x280 kernel/softirq.c:416 exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:536 smp_apic_timer_interrupt+0x48/0x70 arch/x86/kernel/apic/apic.c:1138 apic_timer_interrupt+0x2e/0x40 arch/x86/entry/entry_64.S:835 </IRQ> RIP: 0010:kmsan_free_page+0x18d/0x1c0 mm/kmsan/kmsan_shadow.c:439 Code: 4c 89 ff 44 89 f6 e8 82 0d ee ff 65 ff 0d 9f 26 3b 60 65 8b 05 98 26 3b 60 85 c0 75 24 e8 5b f6 35 ff 4c 89 6d d0 ff 75 d0 9d <48> 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b 0f 0b 0f RSP: 0018:ffffb328034af818 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000000 RBX: ffffe2d7471f8360 RCX: 0000000000000000 RDX: ffffffffadea7000 RSI: 0000000000000004 RDI: ffff93496fcda104 RBP: ffffb328034af850 R08: ffff934a47e86d00 R09: ffff93496fc41900 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000246 R14: 0000000000000000 R15: ffffe2d7472225c0 free_pages_prepare mm/page_alloc.c:1138 [inline] free_pcp_prepare mm/page_alloc.c:1230 [inline] free_unref_page_prepare+0x1d9/0x770 mm/page_alloc.c:3025 free_unref_page mm/page_alloc.c:3074 [inline] free_the_page mm/page_alloc.c:4832 [inline] __free_pages+0x154/0x230 mm/page_alloc.c:4840 __vunmap+0xdac/0xf20 mm/vmalloc.c:2277 __vfree mm/vmalloc.c:2325 [inline] vfree+0x7c/0x170 mm/vmalloc.c:2355 copy_entries_to_user net/ipv6/netfilter/ip6_tables.c:883 [inline] get_entries net/ipv6/netfilter/ip6_tables.c:1041 [inline] do_ip6t_get_ctl+0xfa4/0x1030 net/ipv6/netfilter/ip6_tables.c:1709 nf_sockopt net/netfilter/nf_sockopt.c:104 [inline] nf_getsockopt+0x481/0x4e0 net/netfilter/nf_sockopt.c:122 ipv6_getsockopt+0x264/0x510 net/ipv6/ipv6_sockglue.c:1400 tcp_getsockopt+0x1c6/0x1f0 net/ipv4/tcp.c:3688 sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3110 __sys_getsockopt+0x533/0x7b0 net/socket.c:2129 __do_sys_getsockopt net/socket.c:2144 [inline] __se_sys_getsockopt+0xe1/0x100 net/socket.c:2141 __x64_sys_getsockopt+0x62/0x80 net/socket.c:2141 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d20a Code: b8 34 01 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 8d 8b fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 6a 8b fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:0000000000a6f618 EFLAGS: 00000212 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 0000000000a6f640 RCX: 000000000045d20a RDX: 0000000000000041 RSI: 0000000000000029 RDI: 0000000000000003 RBP: 0000000000717cc0 R08: 0000000000a6f63c R09: 0000000000004000 R10: 0000000000a6f740 R11: 0000000000000212 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000029 R15: 0000000000715b00 Local variable description: ----parms@bond_neigh_init Variable was created at: bond_neigh_init+0x8c/0x4b0 drivers/net/bonding/bond_main.c:3617 bond_neigh_init+0x8c/0x4b0 drivers/net/bonding/bond_main.c:3617 Fixes: 9918d5bf329d ("bonding: modify only neigh_parms owned by us") Fixes: 234bcf8a499e ("net/bonding: correctly proxy slave neigh param setup ndo function") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09neighbour: remove neigh_cleanup() methodEric Dumazet
neigh_cleanup() has not been used for seven years, and was a wrong design. Messing with shared pointer in bond_neigh_init() without proper memory barriers would at least trigger syzbot complains eventually. It is time to remove this stuff. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16bonding: symmetric ICMP transmitMatteo Croce
A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports to balance packets between slaves. With some network errors, we receive an ICMP error packet by the remote host or a router. If sent by a router, the source IP can differ from the remote host one. Additionally the ICMP protocol has no port numbers, so a layer3+4 bonding will get a different hash than the previous one. These two conditions could let the packet go through a different interface than the other packets of the same flow: # tcpdump -qltnni veth0 |sed 's/^/0: /' & # tcpdump -qltnni veth1 |sed 's/^/1: /' & # hping3 -2 192.168.0.2 -p 9 0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 An ICMP error packet contains the header of the packet which caused the network error, so inspect it and match the flow against it, so we can send the ICMP via the same interface of the previous packet in the flow. Move the IP and port dissect code into a generic function bond_flow_ip() and if we are dissecting an ICMP error packet, call it again with the adjusted offset. # hping3 -2 192.168.0.2 -p 9 1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>