summaryrefslogtreecommitdiff
path: root/drivers/net/bonding/bond_main.c
AgeCommit message (Collapse)Author
2021-10-24net: bonding: constify and use dev_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Make sure local references to netdev->dev_addr are constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09net: use dev_addr_set()Jakub Kicinski
Use dev_addr_set() instead of writing directly to netdev->dev_addr in various misc and old drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06bonding: Fix negative jump label count on nested bondingJussi Maki
With nested bonding devices the nested bond device's ndo_bpf was called without a program causing it to decrement the static key without a prior increment leading to negative count. Fix the issue by 1) only calling slave's ndo_bpf when there's a program to be loaded and 2) only decrement the count when a program is unloaded. Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Reported-by: syzbot+30622fb04ddd72a4d167@syzkaller.appspotmail.com Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05bonding: complain about missing route only once for A/B ARP probesDavid Decotigny
On configs where there is no confirgured direct route to the target of the ARP probes, these probes are still sent and may be replied to properly, so no need to repeatedly complain about the missing route. Signed-off-by: David Decotigny <ddecotig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16bonding: improve nl error msg when device can't be enslaved because of ↵Antoine Tenart
IFF_MASTER Use a more user friendly netlink error message when a device can't be enslaved because it has IFF_MASTER, by not referring directly to a kernel internal flag. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13net, bonding: Disallow vlan+srcmac with XDPJussi Maki
The new vlan+srcmac xmit policy is not implementable with XDP since in many cases the 802.1Q payload is not present in the packet. This can be for example due to hardware offload or in the case of veth due to use of skbuffs internally. This also fixes the NULL deref with the vlan+srcmac xmit policy reported by Jonathan Toppins by additionally checking the skb pointer. Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11bonding: combine netlink and console error messagesJonathan Toppins
There seems to be no reason to have different error messages between netlink and printk. It also cleans up the function slightly. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== bpf-next 2021-08-10 We've added 31 non-merge commits during the last 8 day(s) which contain a total of 28 files changed, 3644 insertions(+), 519 deletions(-). 1) Native XDP support for bonding driver & related BPF selftests, from Jussi Maki. 2) Large batch of new BPF JIT tests for test_bpf.ko that came out as a result from 32-bit MIPS JIT development, from Johan Almbladh. 3) Rewrite of netcnt BPF selftest and merge into test_progs, from Stanislav Fomichev. 4) Fix XDP bpf_prog_test_run infra after net to net-next merge, from Andrii Nakryiko. 5) Follow-up fix in unix_bpf_update_proto() to enforce socket type, from Cong Wang. 6) Fix bpf-iter-tcp4 selftest to print the correct dest IP, from Jose Blanquicet. 7) Various misc BPF XDP sample improvements, from Niklas Söderlund, Matthew Cover, and Muhammad Falak R Wani. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits) bpf, tests: Add tail call test suite bpf, tests: Add tests for BPF_CMPXCHG bpf, tests: Add tests for atomic operations bpf, tests: Add test for 32-bit context pointer argument passing bpf, tests: Add branch conversion JIT test bpf, tests: Add word-order tests for load/store of double words bpf, tests: Add tests for ALU operations implemented with function calls bpf, tests: Add more ALU64 BPF_MUL tests bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64 bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations bpf, tests: Fix typos in test case descriptions bpf, tests: Add BPF_MOV tests for zero and sign extension bpf, tests: Add BPF_JMP32 test cases samples, bpf: Add an explict comment to handle nested vlan tagging. selftests/bpf: Add tests for XDP bonding selftests/bpf: Fix xdp_tx.c prog section name net, core: Allow netdev_lower_get_next_private_rcu in bh context bpf, devmap: Exclude XDP broadcast to master device net, bonding: Add XDP support to the bonding driver ... ==================== Link: https://lore.kernel.org/r/20210810130038.16927-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09net, bonding: Add XDP support to the bonding driverJussi Maki
XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". On top of that, for ice, further work is in progress on improving the XDP_TX numbers [4]. -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
2021-08-09net, bonding: Refactor bond_xmit_hash for use with xdp_buffJussi Maki
In preparation for adding XDP support to the bonding driver refactor the packet hashing functions to be able to work with any linear data buffer without an skb. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
2021-08-03bonding: add new option lacp_activeHangbin Liu
Add an option lacp_active, which is similar with team's runner.active. This option specifies whether to send LACPDU frames periodically. If set on, the LACPDU frames are sent along with the configured lacp_rate setting. If set off, the LACPDU frames acts as "speak when spoken to". Note, the LACPDU state frames still will be sent when init or unbind port. v2: remove module parameter Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02bonding: 3ad: fix the concurrency between __bond_release_one() and ↵Yufeng Mo
bond_3ad_state_machine_handler() Some time ago, I reported a calltrace issue "did not find a suitable aggregator", please see[1]. After a period of analysis and reproduction, I find that this problem is caused by concurrency. Before the problem occurs, the bond structure is like follows: bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1 \ port0 \ slaver1(eth1) - agg1.lag_ports -> NULL \ port1 If we run 'ifenslave bond0 -d eth1', the process is like below: excuting __bond_release_one() | bond_upper_dev_unlink()[step1] | | | | | bond_3ad_lacpdu_recv() | | ->bond_3ad_rx_indication() | | spin_lock_bh() | | ->ad_rx_machine() | | ->__record_pdu()[step2] | | spin_unlock_bh() | | | | bond_3ad_state_machine_handler() | spin_lock_bh() | ->ad_port_selection_logic() | ->try to find free aggregator[step3] | ->try to find suitable aggregator[step4] | ->did not find a suitable aggregator[step5] | spin_unlock_bh() | | | | bond_3ad_unbind_slave() | spin_lock_bh() spin_unlock_bh() step1: already removed slaver1(eth1) from list, but port1 remains step2: receive a lacpdu and update port0 step3: port0 will be removed from agg0.lag_ports. The struct is "agg0.lag_ports -> port1" now, and agg0 is not free. At the same time, slaver1/agg1 has been removed from the list by step1. So we can't find a free aggregator now. step4: can't find suitable aggregator because of step2 step5: cause a calltrace since port->aggregator is NULL To solve this concurrency problem, put bond_upper_dev_unlink() after bond_3ad_unbind_slave(). In this way, we can invalid the port first and skip this port in bond_3ad_state_machine_handler(). This eliminates the situation that the slaver has been removed from the list but the port is still valid. [1]https://lore.kernel.org/netdev/10374.1611947473@famine/ Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: bonding: move ioctl handling to private ndo operationArnd Bergmann
All other user triggered operations are gone from ndo_ioctl, so move the SIOCBOND family into a custom operation as well. The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now, but there are still a few definitions in obsolete wireless drivers as well as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR helpers from inside the kernel. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27bonding: use siocdevprivateArnd Bergmann
The bonding driver supports two command codes for each operation: one in the SIOCDEVPRIVATE range and another one with the same definition but a unique command code. Only the second set currently works in compat mode, as the ifr_data expansion overwrites part of the ifr_slave field. Move the private ones into ndo_siocdevprivate and change the implementation to call the other function. This makes both version work correctly. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16bonding: fix build issueMahesh Bandewar
The commit 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") is causing following build error when XFRM is not selected in kernel config. lld: error: undefined symbol: xfrm_dev_state_flush >>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453) >>> net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a Fixes: 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Taehee Yoo <ap420073@gmail.com> CC: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: fix incorrect return value of bond_ipsec_offload_ok()Taehee Yoo
bond_ipsec_offload_ok() is called to check whether the interface supports ipsec offload or not. bonding interface support ipsec offload only in active-backup mode. So, if a bond interface is not in active-backup mode, it should return false but it returns true. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()Taehee Yoo
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Splat looks like: WARNING: suspicious RCU usage 5.13.0-rc6+ #1179 Not tainted drivers/net/bonding/bond_main.c:571 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ping/974: #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x1303/0x2cb0 stack backtrace: CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_offload_ok+0x1f4/0x260 [bonding] xfrm_output+0x179/0x890 xfrm4_output+0xfa/0x410 ? __xfrm4_output+0x4b0/0x4b0 ? __ip_make_skb+0xecc/0x2030 ? xfrm4_udp_encap_rcv+0x800/0x800 ? ip_local_out+0x21/0x3a0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x1bfd/0x2cb0 Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: Add struct bond_ipesc to manage SATaehee Yoo
bonding has been supporting ipsec offload. When SA is added, bonding just passes SA to its own active real interface. But it doesn't manage SA. So, when events(add/del real interface, active real interface change, etc) occur, bonding can't handle that well because It doesn't manage SA. So some problems(panic, UAF, refcnt leak)occur. In order to make it stable, it should manage SA. That's the reason why struct bond_ipsec is added. When a new SA is added to bonding interface, it is stored in the bond_ipsec list. And the SA is passed to a current active real interface. If events occur, it uses bond_ipsec data to handle these events. bond->ipsec_list is protected by bond->ipsec_lock. If a current active real interface is changed, the following logic works. 1. delete all SAs from old active real interface 2. Add all SAs to the new active real interface. 3. If a new active real interface doesn't support ipsec offload or SA's option, it sets real_dev to NULL. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: disallow setting nested bonding + ipsec offloadTaehee Yoo
bonding interface can be nested and it supports ipsec offload. So, it allows setting the nested bonding + ipsec scenario. But code does not support this scenario. So, it should be disallowed. interface graph: bond2 | bond1 | eth0 The nested bonding + ipsec offload may not a real usecase. So, disallowing this scenario is fine. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: fix suspicious RCU usage in bond_ipsec_del_sa()Taehee Yoo
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Test commands: ip netns add A ip netns exec A bash modprobe netdevsim echo "1 1" > /sys/bus/netdevsim/new_device ip link add bond0 type bond ip link set eth0 master bond0 ip link set eth0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \ transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in ip x s f Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ip/705: #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2}, at: xfrm_state_delete+0x16/0x30 stack backtrace: CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_del_sa+0x16a/0x1c0 [bonding] __xfrm_state_delete+0x51f/0x730 xfrm_state_delete+0x1e/0x30 xfrm_state_flush+0x22f/0x390 xfrm_flush_sa+0xd8/0x260 [xfrm_user] ? xfrm_flush_policy+0x290/0x290 [xfrm_user] xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 [ ... ] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: fix null dereference in bond_ipsec_add_sa()Taehee Yoo
If bond doesn't have real device, bond->curr_active_slave is null. But bond_ipsec_add_sa() dereferences bond->curr_active_slave without null checking. So, null-ptr-deref would occur. Test commands: ip link add bond0 type bond ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \ 0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168 RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding] Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48 RSP: 0018:ffff88810946f508 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100 RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11 R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000 R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330 FS: 00007efc5552e680(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ...] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06bonding: fix suspicious RCU usage in bond_ipsec_add_sa()Taehee Yoo
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Test commands: ip link add dummy0 type dummy ip link add bond0 type bond ip link set dummy0 master bond0 ip link set dummy0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \ mode transport \ reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel \ src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \ dev bond0 dir in Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/684: #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] 55.191733][ T684] stack backtrace: CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_add_sa+0x18c/0x1f0 [bonding] xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ... ] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-23bonding: allow nesting of bonding deviceDi Zhu
The commit 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER, but it rejects the scenario of nested bonding device. As Eric Dumazet described: since there indeed is a usage scenario about nesting bonding, we should not break it. So we add a new judgment condition to allow nesting of bonding device. Fixes: 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22bonding: avoid adding slave device with IFF_MASTER flagDi Zhu
The following steps will definitely cause the kernel to crash: ip link add vrf1 type vrf table 1 modprobe bonding.ko max_bonds=1 echo "+vrf1" >/sys/class/net/bond0/bonding/slaves rmmod bonding The root cause is that: When the VRF is added to the slave device, it will fail, and some cleaning work will be done. because VRF device has IFF_MASTER flag, cleanup process will not clear the IFF_BONDING flag. Then, when we unload the bonding module, unregister_netdevice_notifier() will treat the VRF device as a bond master device and treat netdev_priv() as struct bonding{} which actually is struct net_vrf{}. By analyzing the processing logic of bond_enslave(), it seems that it is not allowed to add the slave device with the IFF_MASTER flag, so we need to add a code check for this situation. Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15net: bonding: Use per-cpu rr_tx_counterJussi Maki
The round-robin rr_tx_counter was shared across CPUs leading to significant cache thrashing at high packet rates. This patch switches the round-robin packet counter to use a per-cpu variable to decide the destination slave. On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after this patch. "perf top -e cache_misses" before: 12.31% [bonding] [k] bond_xmit_roundrobin_slave_get 10.59% [sch_fq_codel] [k] fq_codel_dequeue 9.34% [kernel] [k] skb_release_data after: 15.42% [sch_fq_codel] [k] fq_codel_dequeue 10.06% [kernel] [k] __memset 9.12% [kernel] [k] skb_release_data Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03net: bonding: Use strscpy_pad() instead of manually-truncated strncpy()Kees Cook
Silence this warning by using strscpy_pad() directly: drivers/net/bonding/bond_main.c:4877:3: warning: 'strncpy' specified bound 16 equals destination size [-Wstringop-truncation] 4877 | strncpy(params->primary, primary, IFNAMSIZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additionally replace other strncpy() uses, as it is considered deprecated: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202102150705.fdR6obB0-lkp@intel.com Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-20net: bonding: remove unnecessary bracesYufeng Mo
Braces {} are not necessary for single statement blocks, so remove these braces {}. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-20net: bonding: add some required blank linesYufeng Mo
Add some blank lines after declarations as required. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-17bonding: init notify_work earlier to avoid uninitialized useJohannes Berg
If bond_kobj_init() or later kzalloc() in bond_alloc_slave() fail, then we call kobject_put() on the slave->kobj. This in turn calls the release function slave_kobj_release() which will always try to cancel_delayed_work_sync(&slave->notify_work), which shouldn't be done on an uninitialized work struct. Always initialize the work struct earlier to avoid problems here. Syzbot bisected this down to a completely pointless commit, some fault injection may have been at work here that caused the alloc failure in the first place, which may interact badly with bisect. Reported-by: syzbot+bfda097c12a00c8cae67@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2021-04-21bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state ↵jinyiting
machine The bond works in mode 4, and performs down/up operations on the bond that is normally negotiated. The probability of bond-> slave_arr is NULL Test commands: ifconfig bond1 down ifconfig bond1 up The conflict occurs in the following process: __dev_open (CPU A) --bond_open --queue_delayed_work(bond->wq,&bond->ad_work,0); --bond_update_slave_arr --bond_3ad_get_active_agg_info ad_work(CPU B) --bond_3ad_state_machine_handler --ad_agg_selection_logic ad_work runs on cpu B. In the function ad_agg_selection_logic, all agg->is_active will be cleared. Before the new active aggregator is selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A, bond->slave_arr will be set to NULL. The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr. The conflict occurred in that ad_agg_selection_logic clears agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr is inspecting agg->is_active outside the lock. Also, bond_update_slave_arr is normal for potential sleep when allocating memory, so replace the WARN_ON with a call to might_sleep. Signed-off-by: jinyiting <jinyiting@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-29net: bonding: Correct function name bond_change_active_slave() in commentYang Yingliang
Fix the following make W=1 kernel build warning: drivers/net/bonding/bond_main.c:982: warning: expecting prototype for change_active_interface(). Prototype was for bond_change_active_slave() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12Revert "net: bonding: fix error return code of bond_neigh_init()"David S. Miller
This reverts commit 2055a99da8a253a357bdfd359b3338ef3375a26c. This change rejects legitimate configurations. A slave doesn't need to exist nor implement ndo_slave_setup. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08net: bonding: fix error return code of bond_neigh_init()Jia-Ju Bai
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error return code of bond_neigh_init() is assigned. To fix this bug, ret is assigned with -EINVAL in these cases. Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-01-19bonding: add a vlan+srcmac tx hashing optionJarod Wilson
This comes from an end-user request, where they're running multiple VMs on hosts with bonded interfaces connected to some interest switch topologies, where 802.3ad isn't an option. They're currently running a proprietary solution that effectively achieves load-balancing of VMs and bandwidth utilization improvements with a similar form of transmission algorithm. Basically, each VM has it's own vlan, so it always sends its traffic out the same interface, unless that interface fails. Traffic gets split between the interfaces, maintaining a consistent path, with failover still available if an interface goes down. Unlike bond_eth_hash(), this hash function is using the full source MAC address instead of just the last byte, as there are so few components to the hash, and in the no-vlan case, we would be returning just the last byte of the source MAC as the hash value. It's entirely possible to have two NICs in a bond with the same last byte of their MAC, but not the same MAC, so this adjustment should guarantee distinct hashes in all cases. This has been rudimetarily tested to provide similar results to the proprietary solution it is aiming to replace. A patch for iproute2 is also posted, to properly support the new mode there as well. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Thomas Davis <tadavis@lbl.gov> Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Implement TLS TX device offloadTariq Toukan
Implement TLS TX device offload for bonding interfaces. This allows kTLS sockets running on a bond to benefit from the device offload on capable lower devices. To allow a simple and fast maintenance of the TLS context in SW and lower devices, we bind the TLS socket to a specific lower dev. To achieve a behavior similar to SW kTLS, we support only balance-xor and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced in bond_sk_check(), done in a previous patch. For the above configuration, the SW implementation keeps picking the same exact lower dev for all the socket's SKBs. The device offload behaves similarly, making the decision once at the connection creation. Per socket, the TLS module should work directly with the lowest netdev in chain, to call the tls_dev_ops operations. As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the current offload status based on the logic under bond_sk_check(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Implement ndo_sk_get_lower_devTariq Toukan
Add ndo_sk_get_lower_dev() implementation for bond interfaces. Support only for the cases where the socket's and SKBs' hash yields identical value for the whole connection lifetime. Here we restrict it to L3+4 sockets only, with xmit_hash_policy==LAYER34 and bond modes xor/802.3ad. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/bonding: Take IP hash logic into a helperTariq Toukan
Hash logic on L3 will be used in a downstream patch for one more use case. Take it to a function for a better code reuse. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: bonding: Notify ports about their initial stateTobias Waldekranz
When creating a static bond (e.g. balance-xor), all ports will always be enabled. This is set, and the corresponding notification is sent out, before the port is linked to the bond upper. In the offloaded case, this ordering is hard to deal with. The lower will first see a notification that it can not associate with any bond. Then the bond is joined. After that point no more notifications are sent, so all ports remain disabled. This change simply sends an extra notification once the port has been linked to the upper to synchronize the initial state. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07bonding: set xfrm feature flags more sanelyJarod Wilson
We can remove one of the ifdef blocks here, and instead of setting both the xfrm hw_features and features flags, then unsetting the feature flags if not in AB, wait to set the features flags if we're actually in AB mode. Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20201205174003.578267-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflict in CAN, keep the net-next + the byteswap wrapper. Conflicts: drivers/net/can/usb/gs_usb.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21bonding: wait for sysfs kobject destruction before freeing struct slaveJamie Iles
syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03net: bonding, dummy, ifb, team: advertise NETIF_F_GSO_SOFTWAREAlexander Lobakin
Virtual netdevs should use NETIF_F_GSO_SOFTWARE to forward GSO skbs as-is and let the final drivers deal with them when supported. Also remove NETIF_F_GSO_UDP_L4 from bonding and team drivers as it's now included in the "software" list. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-28net: core: introduce struct netdev_nested_priv for nested interface ↵Taehee Yoo
infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25bonding: set dev->needed_headroom in bond_setup_by_slave()Eric Dumazet
syzbot managed to crash a host by creating a bond with a GRE device. For non Ethernet device, bonding calls bond_setup_by_slave() instead of ether_setup(), and unfortunately dev->needed_headroom was not copied from the new added member. [ 171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0 [ 171.243111] ------------[ cut here ]------------ [ 171.243112] kernel BUG at net/core/skbuff.c:112! [ 171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI [ 171.243469] gsmi: Log Shutdown Reason 0x03 [ 171.243505] Call Trace: [ 171.243506] <IRQ> [ 171.243512] [<ffffffffa171be59>] skb_push+0x49/0x50 [ 171.243516] [<ffffffffa184b9ea>] ipgre_header+0x2a/0xf0 [ 171.243520] [<ffffffffa17452d7>] neigh_connected_output+0xb7/0x100 [ 171.243524] [<ffffffffa186f1d3>] ip6_finish_output2+0x383/0x490 [ 171.243528] [<ffffffffa186ede2>] __ip6_finish_output+0xa2/0x110 [ 171.243531] [<ffffffffa186acbc>] ip6_finish_output+0x2c/0xa0 [ 171.243534] [<ffffffffa186abe9>] ip6_output+0x69/0x110 [ 171.243537] [<ffffffffa186ac90>] ? ip6_output+0x110/0x110 [ 171.243541] [<ffffffffa189d952>] mld_sendpack+0x1b2/0x2d0 [ 171.243544] [<ffffffffa189d290>] ? mld_send_report+0xf0/0xf0 [ 171.243548] [<ffffffffa189c797>] mld_ifc_timer_expire+0x2d7/0x3b0 [ 171.243551] [<ffffffffa189c4c0>] ? mld_gq_timer_expire+0x50/0x50 [ 171.243556] [<ffffffffa0fea270>] call_timer_fn+0x30/0x130 [ 171.243559] [<ffffffffa0fea17c>] expire_timers+0x4c/0x110 [ 171.243563] [<ffffffffa0fea0e3>] __run_timers+0x213/0x260 [ 171.243566] [<ffffffffa0fecb7d>] ? ktime_get+0x3d/0xa0 [ 171.243570] [<ffffffffa0ff9c4e>] ? clockevents_program_event+0x7e/0xe0 [ 171.243574] [<ffffffffa0f7e5d5>] ? sched_clock_cpu+0x15/0x190 [ 171.243577] [<ffffffffa0fe973d>] run_timer_softirq+0x1d/0x40 [ 171.243581] [<ffffffffa1c00152>] __do_softirq+0x152/0x2f0 [ 171.243585] [<ffffffffa0f44e1f>] irq_exit+0x9f/0xb0 [ 171.243588] [<ffffffffa1a02e1d>] smp_apic_timer_interrupt+0xfd/0x1a0 [ 171.243591] [<ffffffffa1a01ea6>] apic_timer_interrupt+0x86/0x90 Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-18bonding: fix active-backup failover for current ARP slaveJiri Wiesner
When the ARP monitor is used for link detection, ARP replies are validated for all slaves (arp_validate=3) and fail_over_mac is set to active, two slaves of an active-backup bond may get stuck in a state where both of them are active and pass packets that they receive to the bond. This state makes IPv6 duplicate address detection fail. The state is reached thus: 1. The current active slave goes down because the ARP target is not reachable. 2. The current ARP slave is chosen and made active. 3. A new slave is enslaved. This new slave becomes the current active slave and can reach the ARP target. As a result, the current ARP slave stays active after the enslave action has finished and the log is littered with "PROBE BAD" messages: > bond0: PROBE: c_arp ens10 && cas ens11 BAD The workaround is to remove the slave with "going back" status from the bond and re-enslave it. This issue was encountered when DPDK PMD interfaces were being enslaved to an active-backup bond. I would be possible to fix the issue in bond_enslave() or bond_change_active_slave() but the ARP monitor was fixed instead to keep most of the actions changing the current ARP slave in the ARP monitor code. The current ARP slave is set as inactive and backup during the commit phase. A new state, BOND_LINK_FAIL, has been introduced for slaves in the context of the ARP monitor. This allows administrators to see how slaves are rotated for sending ARP requests and attempts are made to find a new active slave. Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>