summaryrefslogtreecommitdiff
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2025-03-11bonding: fix incorrect MAC address setting to receive NS messagesHangbin Liu
When validation on the backup slave is enabled, we need to validate the Neighbor Solicitation (NS) messages received on the backup slave. To receive these messages, the correct destination MAC address must be added to the slave. However, the target in bonding is a unicast address, which we cannot use directly. Instead, we should first convert it to a Solicited-Node Multicast Address and then derive the corresponding MAC address. Fix the incorrect MAC address setting on both slave_set_ns_maddr() and slave_set_ns_maddrs(). Since the two function names are similar. Add some description for the functions. Also only use one mac_addr variable in slave_set_ns_maddr() to save some code and logic. Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device") Acked-by: Jay Vosburgh <jv@jvosburgh.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306023923.38777-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30Merge tag 'net-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from IPSec, netfilter and Bluetooth. Nothing really stands out, but as usual there's a slight concentration of fixes for issues added in the last two weeks before the merge window, and driver bugs from 6.13 which tend to get discovered upon wider distribution. Current release - regressions: - net: revert RTNL changes in unregister_netdevice_many_notify() - Bluetooth: fix possible infinite recursion of btusb_reset - eth: adjust locking in some old drivers which protect their state with spinlocks to avoid sleeping in atomic; core protects netdev state with a mutex now Previous releases - regressions: - eth: - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node() - bgmac: reduce max frame size to support just 1500 bytes; the jumbo frame support would previously cause OOB writes, but now fails outright - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid false detection of MPTCP blackholing Previous releases - always broken: - mptcp: handle fastopen disconnect correctly - xfrm: - make sure skb->sk is a full sock before accessing its fields - fix taking a lock with preempt disabled for RT kernels - usb: ipheth: improve safety of packet metadata parsing; prevent potential OOB accesses - eth: renesas: fix missing rtnl lock in suspend/resume path" * tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) MAINTAINERS: add Neal to TCP maintainers net: revert RTNL changes in unregister_netdevice_many_notify() net: hsr: fix fill_frame_info() regression vs VLAN packets doc: mptcp: sysctl: blackhole_timeout is per-netns mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted netfilter: nf_tables: reject mismatching sum of field_len with set key length net: sh_eth: Fix missing rtnl lock in suspend/resume path net: ravb: Fix missing rtnl lock in suspend/resume path selftests/net: Add test for loading devbound XDP program in generic mode net: xdp: Disallow attaching device-bound programs in generic mode tcp: correct handling of extreme memory squeeze bgmac: reduce max frame size to support just MTU 1500 vsock/test: Add test for connect() retries vsock/test: Add test for UAF due to socket unbinding vsock/test: Introduce vsock_connect_fd() vsock/test: Introduce vsock_bind() vsock: Allow retrying on connect() failure vsock: Keep the binding until socket destruction Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming ...
2025-01-28bonding: Correctly support GSO ESP offloadCosmin Ratiu
The referenced fix is incomplete. It correctly computes bond_dev->gso_partial_features across slaves, but unfortunately netdev_fix_features discards gso_partial_features from the feature set if NETIF_F_GSO_PARTIAL isn't set in bond_dev->features. This is visible with ethtool -k bond0 | grep esp: tx-esp-segmentation: off [requested on] esp-hw-offload: on esp-tx-csum-hw-offload: on This patch reworks the bonding GSO offload support by: - making aggregating gso_partial_features across slaves similar to the other feature sets (this part is a no-op). - advertising the default partial gso features on empty bond devs, same as with other feature sets (also a no-op). - adding NETIF_F_GSO_PARTIAL to hw_enc_features filtered across slaves. - adding NETIF_F_GSO_PARTIAL to features in bond_setup() With all of these, 'ethtool -k bond0 | grep esp' now reports: tx-esp-segmentation: on esp-hw-offload: on esp-tx-csum-hw-offload: on Fixes: 4861333b4217 ("bonding: add ESP offload features when slaves support") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20250127104147.759658-1-cratiu@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-15saner replacement for debugfs_rename()Al Viro
Existing primitive has several problems: 1) calling conventions are clumsy - it returns a dentry reference that is either identical to its second argument or is an ERR_PTR(-E...); in both cases no refcount changes happen. Inconvenient for users and bug-prone; it would be better to have it return 0 on success and -E... on failure. 2) it allows cross-directory moves; however, no such caller have ever materialized and considering the way debugfs is used, it's unlikely to happen in the future. What's more, any such caller would have fun issues to deal with wrt interplay with recursive removal. It also makes the calling conventions clumsier... 3) tautological rename fails; the callers have no race-free way to deal with that. 4) new name must have been formed by the caller; quite a few callers have it done by sprintf/kasprintf/etc., ending up with considerable boilerplate. Proposed replacement: int debugfs_change_name(dentry, fmt, ...). All callers convert to that easily, and it's simpler internally. IMO debugfs_rename() should go; if we ever get a real-world use case for cross-directory moves in debugfs, we can always look into the right way to handle that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20250112080705.141166-21-viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-12bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALLDaniel Borkmann
Drivers like mlx5 expose NIC's vlan_features such as NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are later not propagated when the underlying devices are bonded and a vlan device created on top of the bond. Right now, the more cumbersome workaround for this is to create the vlan on top of the mlx5 and then enslave the vlan devices to a bond. To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES such that bond_compute_features() can probe and propagate the vlan_features from the slave devices up to the vlan device. Given the following bond: # ethtool -i enp2s0f{0,1}np{0,1} driver: mlx5_core [...] # ethtool -k enp2s0f0np0 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off # ethtool -k enp2s0f1np1 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off # ethtool -k bond0 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off Before: # ethtool -k bond0.100 | grep udp tx-udp_tnl-segmentation: off [requested on] tx-udp_tnl-csum-segmentation: off [requested on] tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off After: # ethtool -k bond0.100 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off Various users have run into this reporting performance issues when configuring Cilium in vxlan tunneling mode and having the combination of bond & vlan for the core devices connecting the Kubernetes cluster to the outside world. Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12bonding: Fix initial {vlan,mpls}_feature set in bond_compute_featuresDaniel Borkmann
If a bonding device has slave devices, then the current logic to derive the feature set for the master bond device is limited in that flags which are fully supported by the underlying slave devices cannot be propagated up to vlan devices which sit on top of bond devices. Instead, these get blindly masked out via current NETIF_F_ALL_FOR_ALL logic. vlan_features and mpls_features should reuse netdev_base_features() in order derive the set in the same way as ndo_fix_features before iterating through the slave devices to refine the feature set. Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Fixes: 2e770b507ccd ("net: bonding: Inherit MPLS features from slave devices") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12net, team, bonding: Add netdev_base_features helperDaniel Borkmann
Both bonding and team driver have logic to derive the base feature flags before iterating over their slave devices to refine the set via netdev_increment_features(). Add a small helper netdev_base_features() so this can be reused instead of having it open-coded multiple times. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14bonding: add ns target multicast address to slave deviceHangbin Liu
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves") tried to resolve the issue where backup slaves couldn't be brought up when receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only worked for drivers that receive all multicast messages, such as the veth interface. For standard drivers, the NS multicast message is silently dropped because the slave device is not a member of the NS target multicast group. To address this, we need to make the slave device join the NS target multicast group, ensuring it can receive these IPv6 NS messages to validate the slave’s status properly. There are three policies before joining the multicast group: 1. All settings must be under active-backup mode (alb and tlb do not support arp_validate), with backup slaves and slaves supporting multicast. 2. We can add or remove multicast groups when arp_validate changes. 3. Other operations, such as enslaving, releasing, or setting NS targets, need to be guarded by arp_validate. Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07bonding: add ESP offload features when slaves supportJianbo Liu
Add NETIF_F_GSO_ESP bit to bond's gso_partial_features if all slaves support it, such that ESP segmentation is handled by hardware if possible. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241105192721.584822-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28bonding: return detailed error when loading native XDP failsHangbin Liu
Bonding only supports native XDP for specific modes, which can lead to confusion for users regarding why XDP loads successfully at times and fails at others. This patch enhances error handling by returning detailed error messages, providing users with clearer insights into the specific reasons for the failure when loading native XDP. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-23netpoll: remove ndo_netpoll_setup() second argumentEric Dumazet
npinfo is not used in any of the ndo_netpoll_setup() methods. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-24bonding: Fix unnecessary warnings and logs from bond_xdp_get_xmit_slave()Jiwon Kim
syzbot reported a WARNING in bond_xdp_get_xmit_slave. To reproduce this[1], one bond device (bond1) has xdpdrv, which increases bpf_master_redirect_enabled_key. Another bond device (bond0) which is unsupported by XDP but its slave (veth3) has xdpgeneric that returns XDP_TX. This triggers WARN_ON_ONCE() from the xdp_master_redirect(). To reduce unnecessary warnings and improve log management, we need to delete the WARN_ON_ONCE() and add ratelimit to the netdev_err(). [1] Steps to reproduce: # Needs tx_xdp with return XDP_TX; ip l add veth0 type veth peer veth1 ip l add veth3 type veth peer veth4 ip l add bond0 type bond mode 6 # BOND_MODE_ALB, unsupported by XDP ip l add bond1 type bond # BOND_MODE_ROUNDROBIN by default ip l set veth0 master bond1 ip l set bond1 up # Increases bpf_master_redirect_enabled_key ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx ip l set veth3 master bond0 ip l set bond0 up ip l set veth4 up # Triggers WARN_ON_ONCE() from the xdp_master_redirect() ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257 Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20240918140602.18644-1-jiwonaid0@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-09bonding: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05bonding: support xfrm state updateHangbin Liu
The patch add xfrm statistics update for bonding IPsec offload. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-05bonding: Add ESN support to IPSec HW offloadHangbin Liu
Currently, users can see that bonding supports IPSec HW offload via ethtool. However, this functionality does not work with NICs like Mellanox cards when ESN (Extended Sequence Numbers) is enabled, as ESN functions are not yet supported. This patch adds ESN support to the bonding IPSec device offload, ensuring proper functionality with NICs that support ESN. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-05bonding: add common function to check ipsec deviceHangbin Liu
This patch adds a common function to check the status of IPSec devices. This function will be useful for future implementations, such as IPSec ESN and state offload callbacks. Suggested-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_localAlexander Lobakin
"Interface can't change network namespaces" is rather an attribute, not a feature, and it can't be changed via Ethtool. Make it a "cold" private flag instead of a netdev_feature and free one more bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/faraday/ftgmac100.c 4186c8d9e6af ("net: ftgmac100: Ensure tx descriptor updates are visible") e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI") https://lore.kernel.org/0b851ec5-f91d-4dd3-99da-e81b98c9ed28@kernel.org net/ipv4/tcp.c bac76cf89816 ("tcp: fix forever orphan socket caused by tcp_abort") edefba66d929 ("tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset") https://lore.kernel.org/20240828112207.5c199d41@canb.auug.org.au No adjacent changes. Link: https://patch.msgid.link/20240829130829.39148-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: change ipsec_lock from spin lock to mutexJianbo Liu
In the cited commit, bond->ipsec_lock is added to protect ipsec_list, hence xdo_dev_state_add and xdo_dev_state_delete are called inside this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep, "scheduling while atomic" will be triggered when changing bond's active slave. [ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200 [ 101.055726] Modules linked in: [ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1 [ 101.058760] Hardware name: [ 101.059434] Call Trace: [ 101.059436] <TASK> [ 101.060873] dump_stack_lvl+0x51/0x60 [ 101.061275] __schedule_bug+0x4e/0x60 [ 101.061682] __schedule+0x612/0x7c0 [ 101.062078] ? __mod_timer+0x25c/0x370 [ 101.062486] schedule+0x25/0xd0 [ 101.062845] schedule_timeout+0x77/0xf0 [ 101.063265] ? asm_common_interrupt+0x22/0x40 [ 101.063724] ? __bpf_trace_itimer_state+0x10/0x10 [ 101.064215] __wait_for_common+0x87/0x190 [ 101.064648] ? usleep_range_state+0x90/0x90 [ 101.065091] cmd_exec+0x437/0xb20 [mlx5_core] [ 101.065569] mlx5_cmd_do+0x1e/0x40 [mlx5_core] [ 101.066051] mlx5_cmd_exec+0x18/0x30 [mlx5_core] [ 101.066552] mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core] [ 101.067163] ? bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.067738] ? kmalloc_trace+0x4d/0x350 [ 101.068156] mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core] [ 101.068747] mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core] [ 101.069312] bond_change_active_slave+0x392/0x900 [bonding] [ 101.069868] bond_option_active_slave_set+0x1c2/0x240 [bonding] [ 101.070454] __bond_opt_set+0xa6/0x430 [bonding] [ 101.070935] __bond_opt_set_notify+0x2f/0x90 [bonding] [ 101.071453] bond_opt_tryset_rtnl+0x72/0xb0 [bonding] [ 101.071965] bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.072567] kernfs_fop_write_iter+0x10c/0x1a0 [ 101.073033] vfs_write+0x2d8/0x400 [ 101.073416] ? alloc_fd+0x48/0x180 [ 101.073798] ksys_write+0x5f/0xe0 [ 101.074175] do_syscall_64+0x52/0x110 [ 101.074576] entry_SYSCALL_64_after_hwframe+0x4b/0x53 As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called from bond_change_active_slave, which requires holding the RTNL lock. And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user context. So ipsec_lock doesn't have to be spin lock, change it to mutex, and thus the above issue can be resolved. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: extract the use of real_device into local variableJianbo Liu
Add a local variable for slave->dev, to prepare for the lock change in the next patch. There is no functionality change. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-3-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: implement xdo_dev_state_free and call it after deletionJianbo Liu
Add this implementation for bonding, so hardware resources can be freed from the active slave after xfrm state is deleted. The netdev used to invoke xdo_dev_state_free callback, is saved in the xfrm state (xs->xso.real_dev), which is also the bond's active slave. To prevent it from being freed, acquire netdev reference before leaving RCU read-side critical section, and release it after callback is done. And call it when deleting all SAs from old active real interface while switching current active slave. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-2-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-24net: refactor ->ndo_bpf calls into dev_xdp_propagateMina Almasry
When net devices propagate xdp configurations to slave devices, we will need to perform a memory provider check to ensure we're not binding xdp to a device using unreadable netmem. Currently the ->ndo_bpf calls in a few places. Adding checks to all these places would not be ideal. Refactor all the ->ndo_bpf calls into one place where we can add this check in the future. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20bonding: fix xfrm state handling when clearing active slaveNikolay Aleksandrov
If the active slave is cleared manually the xfrm state is not flushed. This leads to xfrm add/del imbalance and adding the same state multiple times. For example when the device cannot handle anymore states we get: [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA because it's filled with the same state after multiple active slave clearings. This change also has a few nice side effects: user-space gets a notification for the change, the old device gets its mac address and promisc/mcast adjusted properly. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20bonding: fix xfrm real_dev null pointer dereferenceNikolay Aleksandrov
We shouldn't set real_dev to NULL because packets can be in transit and xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume real_dev is set. Example trace: kernel: BUG: unable to handle page fault for address: 0000000000001030 kernel: bond0: (slave eni0np1): making interface the new active one kernel: #PF: supervisor write access in kernel mode kernel: #PF: error_code(0x0002) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0002 [#1] PREEMPT SMP kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f kernel: bond0: (slave eni0np1): making interface the new active one kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000 kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0 kernel: bond0: (slave eni0np1): making interface the new active one kernel: Call Trace: kernel: <TASK> kernel: ? __die+0x1f/0x60 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? page_fault_oops+0x142/0x4c0 kernel: ? do_user_addr_fault+0x65/0x670 kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50 kernel: bond0: (slave eni0np1): making interface the new active one kernel: ? exc_page_fault+0x7b/0x180 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): making interface the new active one kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding] kernel: xfrm_output+0x61/0x3b0 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ip_push_pending_frames+0x56/0x80 Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20bonding: fix null pointer deref in bond_ipsec_offload_okNikolay Aleksandrov
We must check if there is an active slave before dereferencing the pointer. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20bonding: fix bond_ipsec_offload_ok return typeNikolay Aleksandrov
Fix the return type which should be bool. Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-07bonding: Pass string literal as format argument of alloc_ordered_workqueue()Simon Horman
Recently I noticed that both gcc-14 and clang-18 report that passing a non-string literal as the format argument of alloc_ordered_workqueue is potentially insecure. F.e. clang-18 says: .../bond_main.c:6384:37: warning: format string is not a string literal (potentially insecure) [-Wformat-security] 6384 | bond->wq = alloc_ordered_workqueue(bond_dev->name, WQ_MEM_RECLAIM); | ^~~~~~~~~~~~~~ .../workqueue.h:524:18: note: expanded from macro 'alloc_ordered_workqueue' 524 | alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args) | ^~~ .../bond_main.c:6384:37: note: treat the string as an argument to avoid this 6384 | bond->wq = alloc_ordered_workqueue(bond_dev->name, WQ_MEM_RECLAIM); | ^ | "%s", ..../workqueue.h:524:18: note: expanded from macro 'alloc_ordered_workqueue' 524 | alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args) | ^ Perhaps it is always the case where the contents of bond_dev->name is safe to pass as the format argument. That is, in my understanding, it never contains any format escape sequences. But, it seems better to be safe than sorry. And, as a bonus, compiler output becomes less verbose by addressing this issue as suggested by clang-18. Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240806-bonding-fmt-v1-1-e75027e45775@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-23net: bonding: correctly annotate RCU in bond_should_notify_peers()Johannes Berg
RCU use in bond_should_notify_peers() looks wrong, since it does rcu_dereference(), leaves the critical section, and uses the pointer after that. Luckily, it's called either inside a nested RCU critical section or with the RTNL held. Annotate it with rcu_dereference_rtnl() instead, and remove the inner RCU critical section. Fixes: 4cb4f97b7e36 ("bonding: rebuild the lock use for bond_mii_monitor()") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240719094119.35c62455087d.I68eb9c0f02545b364b79a59f2110f2cf5682a8e2@changeid Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-15net: Add struct kernel_ethtool_ts_infoKory Maincent
In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()Sam Sun
In function bond_option_arp_ip_targets_set(), if newval->string is an empty string, newval->string+1 will point to the byte after the string, causing an out-of-bound read. BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418 Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107 CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:475 kasan_report+0xbe/0xf0 mm/kasan/report.c:588 strlen+0x7d/0xa0 lib/string.c:418 __fortify_strlen include/linux/fortify-string.h:210 [inline] in4_pton+0xa3/0x3f0 net/core/utils.c:130 bond_option_arp_ip_targets_set+0xc2/0x910 drivers/net/bonding/bond_options.c:1201 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156 dev_attr_store+0x54/0x80 drivers/base/core.c:2366 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x96a/0xd80 fs/read_write.c:584 ksys_write+0x122/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace ]--- Fix it by adding a check of string length before using it. Fixes: f9de11a16594 ("bonding: add ip checks when store ip target") Signed-off-by: Yue Sun <samsun1006219@gmail.com> Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-21bonding: fix incorrect software timestamping reportHangbin Liu
The __ethtool_get_ts_info function returns directly if the device has a get_ts_info() method. For bonding with an active slave, this works correctly as we simply return the real device's timestamping information. However, when there is no active slave, we only check the slave's TX software timestamp information. We still need to set the phc index and RX timestamp information manually. Otherwise, the result will be look like: Time stamping parameters for bond0: Capabilities: software-transmit PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: none Hardware Receive Filter Modes: none This issue does not affect VLAN or MACVLAN devices, as they only have one downlink and can directly use the downlink's timestamping information. Fixes: b8768dc40777 ("net: ethtool: Refactor identical get_ts_info implementations.") Reported-by: Liang Li <liali@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-42409 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-16bonding: fix oops during rmmodTony Battersby
"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function"). Here are the relevant functions being called: bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); bonding_debug_root = NULL; <--------- SET TO NULL HERE bond_netlink_fini() rtnl_link_unregister() __rtnl_link_unregister() unregister_netdevice_many_notify() bond_uninit() bond_debug_unregister() (commit removed check for bonding_debug_root == NULL) debugfs_remove() simple_recursive_removal() down_write() -> OOPS However, reverting the bad commit does not solve the problem completely because the original code contains a race that could cause the same oops, although it was much less likely to be triggered unintentionally: CPU1 rmmod bonding bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); CPU2 echo -bond0 > /sys/class/net/bonding_masters bond_uninit() bond_debug_unregister() if (!bonding_debug_root) CPU1 bonding_debug_root = NULL; So do NOT revert the bad commit (since the removed checks were racy anyway), and instead change the order of actions taken during module removal. The same oops can also happen if there is an error during module init, so apply the same fix there. Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function") Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09bonding: no longer use RTNL in bonding_show_queue_id()Eric Dumazet
Annotate lockless reads of slave->queue_id. Annotate writes of slave->queue_id. Switch bonding_show_queue_id() to rcu_read_lock() and bond_for_each_slave_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09bonding: no longer use RTNL in bonding_show_slaves()Eric Dumazet
Slave devices are already RCU protected, simply switch to bond_for_each_slave_rcu(), Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09bonding: no longer use RTNL in bonding_show_bonds()Eric Dumazet
netdev structures are already RCU protected. Change bond_init() and bond_uninit() to use RCU enabled list_add_tail_rcu() and list_del_rcu(). Then bonding_show_bonds() can use rcu_read_lock() while iterating through bn->dev_list. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08ipv4: Set scope explicitly in ip_route_output().Guillaume Nault
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05xdp, bonding: Fix feature flags when there are no slave devs anymoreDaniel Borkmann
Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") changed the driver from reporting everything as supported before a device was bonded into having the driver report that no XDP feature is supported until a real device is bonded as it seems to be more truthful given eventually real underlying devices decide what XDP features are supported. The change however did not take into account when all slave devices get removed from the bond device. In this case after 9b0ed890ac2a, the driver keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK & ~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature mask of 0. Fix it by resetting XDP feature flags in the same way as if no XDP program is attached to the bond device. This was uncovered by the XDP bond selftest which let BPF CI fail. After adjusting the starting masks on the latter to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with this fix. Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Prashant Batra <prbatra.mail@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-1-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-27bonding: 802.3ad replace MAC_ADDRESS_EQUAL with __agg_has_partnerJones Syue 薛懷宗
Replace macro MAC_ADDRESS_EQUAL() for null_mac_addr checking with inline function__agg_has_partner(). When MAC_ADDRESS_EQUAL() is verifiying aggregator's partner mac addr with null_mac_addr, means that seeing if aggregator has a valid partner or not. Using __agg_has_partner() makes it more clear to understand. In ad_port_selection_logic(), since aggregator->partner_system and port->partner_oper.system has been compared first as a prerequisite, it is safe to replace the upcoming MAC_ADDRESS_EQUAL() for null_mac_addr checking with __agg_has_partner(). Delete null_mac_addr, which is not required anymore in bond_3ad.c, since all references to it are gone. Signed-off-by: Jones Syue <jonessyue@qnap.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/SI2PR04MB5097BCA8FF2A2F03D9A5A3EEDC5A2@SI2PR04MB5097.apcprd04.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22bonding: rate-limit bonding driver inspect messagesPraveen Kumar Kannoju
Through the routine bond_mii_monitor(), bonding driver inspects and commits the slave state changes. During the times when slave state change and failure in aqcuiring rtnl lock happen at the same time, the routine bond_mii_monitor() reschedules itself to come around after 1 msec to commit the new state. During this, it executes the routine bond_miimon_inspect() to re-inspect the state chane and prints the corresponding slave state on to the console. Hence we do see a message at every 1 msec till the rtnl lock is acquired and state chage is committed. This patch doesn't change how bond functions. It only simply limits this kind of log flood. Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240221082752.4660-1-praveen.kannoju@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPYMagnus Karlsson
Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the bonding driver does not support XDP and AF_XDP in zero-copy mode even if the real NIC drivers do. Note that the driver used to report everything as supported before a device was bonded. Instead of just masking out the zero-copy support from this, have the driver report that no XDP feature is supported until a real device is bonded. This seems to be more truthful as it is the real drivers that decide what XDP features are supported. Fixes: cb9e6e584d58 ("bonding: add xdp_features support") Reported-by: Prashant Batra <prbatra.mail@gmail.com> Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/ Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240207084737.20890-1-magnus.karlsson@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07bonding: use exit_batch_rtnl() methodEric Dumazet
exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair, and one unregister_netdevice_many() call. v2: Added bond_net_pre_exit() method to make sure bond_destroy_sysfs() is called before we unregister the devices in bond_net_exit_batch_rtnl (Antoine Tenart : https://lore.kernel.org/netdev/170688415193.5216.10499830272732622816@kwain/) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06bonding: Add independent control state machineAahil Awatramani
Add support for the independent control state machine per IEEE 802.1AX-2008 5.4.15 in addition to the existing implementation of the coupled control state machine. Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in the LACP MUX state machine for separated handling of an initial Collecting state before the Collecting and Distributing state. This enables a port to be in a state where it can receive incoming packets while not still distributing. This is useful for reducing packet loss when a port begins distributing before its partner is able to collect. Added new functions such as bond_set_slave_tx_disabled_flags and bond_set_slave_rx_enabled_flags to precisely manage the port's collecting and distributing states. Previously, there was no dedicated method to disable TX while keeping RX enabled, which this patch addresses. Note that the regular flow process in the kernel's bonding driver remains unaffected by this patch. The extension requires explicit opt-in by the user (in order to ensure no disruptions for existing setups) via netlink support using the new bonding parameter coupled_control. The default value for coupled_control is set to 1 so as to preserve existing behaviour. Signed-off-by: Aahil Awatramani <aahila@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-24bonding: remove print in bond_verify_device_pathZhengchao Shao
As suggested by Paolo in link[1], if the memory allocation fails, the mm layer will emit a lot warning comprising the backtrace, so remove the print. [1] https://lore.kernel.org/all/20231118081653.1481260-1-shaozhengchao@huawei.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-21bonding: return -ENOMEM instead of BUG in alb_upper_dev_walkZhengchao Shao
If failed to allocate "tags" or could not find the final upper device from start_dev's upper list in bond_verify_device_path(), only the loopback detection of the current upper device should be affected, and the system is no need to be panic. So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn information when failed to allocate memory for vlan tags in bond_verify_device_path. I also think that the following function calls netdev_walk_all_upper_dev_rcu ---->>>alb_upper_dev_walk ---------->>>bond_verify_device_path From this way, "end device" can eventually be obtained from "start device" in bond_verify_device_path, IS_ERR(tags) could be instead of IS_ERR_OR_NULL(tags) in alb_upper_dev_walk. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20231118081653.1481260-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>