summaryrefslogtreecommitdiff
path: root/net/dsa
AgeCommit message (Collapse)Author
12 daysnet: dsa: add support for phylink managed EEERussell King (Oracle)
Add support to allow DSA drivers to use phylink managed EEE, with only needing support for controlling the LPI state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
12 daysnet: dsa: Remove adjust_link pathsnet-nextnet-mergedFlorian Fainelli
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12net: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port()Russell King (Oracle)
Convert dsa_user_phylink_fixed_state() to use the newly introduced dsa_phylink_to_port() helper. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-04-10net: dsa: allow DSA switch drivers to provide their own phylink mac opsRussell King (Oracle)
Rather than having a shim for each and every phylink MAC operation, allow DSA switch drivers to provide their own ops structure. When a DSA driver provides the phylink MAC operations, the shimmed ops must not be provided, so fail an attempt to register a switch with both the phylink_mac_ops in struct dsa_switch and the phylink_mac_* operations populated in dsa_switch_ops populated. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-04-10net: dsa: introduce dsa_phylink_to_port()Russell King (Oracle)
We convert from a phylink_config struct to a dsa_port struct in many places, let's provide a helper for this. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-01-11net: dsa: fix netdev_priv() dereference before check on non-DSA netdevice eventsVladimir Oltean
After the blamed commit, we started doing this dereference for every NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER event in the system. static inline struct dsa_port *dsa_user_to_port(const struct net_device *dev) { struct dsa_user_priv *p = netdev_priv(dev); return p->dp; } Which is obviously bogus, because not all net_devices have a netdev_priv() of type struct dsa_user_priv. But struct dsa_user_priv is fairly small, and p->dp means dereferencing 8 bytes starting with offset 16. Most drivers allocate that much private memory anyway, making our access not fault, and we discard the bogus data quickly afterwards, so this wasn't caught. But the dummy interface is somewhat special in that it calls alloc_netdev() with a priv size of 0. So every netdev_priv() dereference is invalid, and we get this when we emit a NETDEV_PRECHANGEUPPER event with a VLAN as its new upper: $ ip link add dummy1 type dummy $ ip link add link dummy1 name dummy1.100 type vlan id 100 [ 43.309174] ================================================================== [ 43.316456] BUG: KASAN: slab-out-of-bounds in dsa_user_prechangeupper+0x30/0xe8 [ 43.323835] Read of size 8 at addr ffff3f86481d2990 by task ip/374 [ 43.330058] [ 43.342436] Call trace: [ 43.366542] dsa_user_prechangeupper+0x30/0xe8 [ 43.371024] dsa_user_netdevice_event+0xb38/0xee8 [ 43.375768] notifier_call_chain+0xa4/0x210 [ 43.379985] raw_notifier_call_chain+0x24/0x38 [ 43.384464] __netdev_upper_dev_link+0x3ec/0x5d8 [ 43.389120] netdev_upper_dev_link+0x70/0xa8 [ 43.393424] register_vlan_dev+0x1bc/0x310 [ 43.397554] vlan_newlink+0x210/0x248 [ 43.401247] rtnl_newlink+0x9fc/0xe30 [ 43.404942] rtnetlink_rcv_msg+0x378/0x580 Avoid the kernel oops by dereferencing after the type check, as customary. Fixes: 4c3f80d22b2e ("net: dsa: walk through all changeupper notifier functions") Reported-and-tested-by: syzbot+d81bcd883824180500c8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000001d4255060e87545c@google.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240110003354.2796778-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION()s for DSA tagsJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the DSA tag modules. The descriptions are copy/pasted Kconfig names, with s/^Tag/DSA tag/. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20240104143759.1318137-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-27net: rename dsa_realloc_skb to skb_ensure_writable_head_tailRadu Pirea (NXP OSS)
Rename dsa_realloc_skb to skb_ensure_writable_head_tail and move it to skbuff.c to use it as helper. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-14net: dsa: tag_rtl4_a: Use existing ETH_P_REALTEK constantFlorian Fainelli
No functional change, uses the existing ETH_P_REALTEK constant already defined in if_ether.h. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20231113165030.2440083-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUITFlorian Fainelli
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Use conduit and user termsFlorian Fainelli
Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11net: dsa: remove dsa_port_phylink_validate()Russell King (Oracle)
As all drivers now provide phylink capabilities (including MAC), the if() condition in dsa_port_phylink_validate() will always be true. We will always use the generic validator, which phylink will call itself if the .validate method isn't populated. Thus, there is now no need to implement the .validate method, so this implementation can be removed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-03net: dsa: tag_ksz: Extend ksz9477_xmit() for HSR frame duplicationLukasz Majewski
The KSZ9477 has support for HSR (High-Availability Seamless Redundancy). One of its offloading (i.e. performed in the switch IC hardware) features is to duplicate received frame to both HSR aware switch ports. To achieve this goal - the tail TAG needs to be modified. To be more specific, both ports must be marked as destination (egress) ones. The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and assures (in HSR core code) that frame is sent only once from HOST to switch with tail tag indicating both ports. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-03net: dsa: notify drivers of MAC address changes on user portsVladimir Oltean
In some cases, drivers may need to veto the changing of a MAC address on a user port. Such is the case with KSZ9477 when it offloads a HSR device, because it programs the MAC address of multiple ports to a shared hardware register. Those ports need to have equal MAC addresses for the lifetime of the HSR offload. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-03net: dsa: propagate extack to ds->ops->port_hsr_join()Vladimir Oltean
Drivers can provide meaningful error messages which state a reason why they can't perform an offload, and dsa_slave_changeupper() already has the infrastructure to propagate these over netlink rather than printing to the kernel log. So pass the extack argument and modify the xrs700x driver's port_hsr_join() prototype. Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases from xrs700x_hsr_join(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-09net: dsa: mark parsed interface mode for legacy switch driversRussell King (Oracle)
If we successfully parsed an interface mode with a legacy switch driver, populate that mode into phylink's supported interfaces rather than defaulting to the internal and gmii interfaces. This hasn't caused an issue so far, because when the interface doesn't match a supported one, phylink_validate() doesn't clear the supported mask, but instead returns -EINVAL. phylink_parse_fixedlink() doesn't check this return value, and merely relies on the supported ethtool link modes mask being cleared. Therefore, the fixed link settings end up being allowed despite validation failing. Before this causes a problem, arrange for DSA to more accurately populate phylink's supported interfaces mask so validation can correctly succeed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1qTKdM-003Cpx-Eh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/dsa/port.c 9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink") a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection") https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/ net/xdp/xsk.c 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark") b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt.c 37b61cda9c16 ("bnxt: don't handle XDP in netpoll") 2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully") https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c 62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector") fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace") drivers/net/ethernet/sfc/selftest.c 55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest") ae9d445cd41f ("sfc: Miscellaneous comment removals") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-01net: dsa: tag_qca: return early if dev is not foundChristian Marangi
Currently checksum is recalculated and dsa tag stripped even if we later don't find the dev. To improve code, exit early if we don't find the dev and skip additional operation on the skb since it will be freed anyway. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230730074113.21889-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27net: dsa: fix older DSA drivers using phylinkRussell King (Oracle)
Older DSA drivers that do not provide an dsa_ops adjust_link method end up using phylink. Unfortunately, a recent phylink change that requires its supported_interfaces bitmap to be filled breaks these drivers because the bitmap remains empty. Rather than fixing each driver individually, fix it in the core code so we have a sensible set of defaults. Reported-by: Sergei Antonov <saproj@gmail.com> Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-23net: dsa: remove deprecated strncpyjustinstitt@google.com
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. Even call sites utilizing length-bounded destination buffers should switch over to using `strtomem` or `strtomem_pad`. In this case, however, the compiler is unable to determine the size of the `data` buffer which renders `strtomem` unusable. Due to this, `strscpy` should be used. It should be noted that most call sites already zero-initialize the destination buffer. However, I've opted to use `strscpy_pad` to maintain the same exact behavior that `strncpy` produced (zero-padded tail up to `len`). Also see [3]. [1]: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [2]: elixir.bootlin.com/linux/v6.3/source/net/ethtool/ioctl.c#L1944 [3]: manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-18net: dsa: remove legacy_pre_march2020 detectionRussell King (Oracle)
All drivers are now updated for the March 2020 changes, and no longer make use of the mac_pcs_get_state() or mac_an_restart() operations, which are now NULL across all DSA drivers. All DSA drivers don't look at speed, duplex, pause or advertisement in their phylink_mac_config() method either. Remove support for these operations from DSA, and stop marking DSA as a legacy driver by default. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-04net: dsa: sja1105: always enable the send_meta optionsVladimir Oltean
incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless PTP RX timestamping is enabled. The incl_srcpt option was initially unconditionally enabled, then that changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that it now needs to be always enabled, otherwise the driver does not have a reliable source of information regarding source_port and switch_id for link-local traffic (tag_8021q VLANs may be imprecise since now they identify an entire bridging domain when ports are not standalone). If we accept that PTP RX timestamping (and therefore, meta frame generation) is always enabled in hardware, then that limitation could be avoided and packets with any MAC DA can be properly received, because meta frames do contain the original bytes from the MAC DA of their associated link-local packet. This change enables meta frame generation unconditionally, which also has the nice side effects of simplifying the switch control path (a switch reset is no longer required on hwtstamping settings change) and the tagger data path (it no longer needs to be informed whether to expect meta frames or not - it always does). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-04net: dsa: tag_sja1105: fix MAC DA patching from meta framesVladimir Oltean
The SJA1105 manual says that at offset 4 into the meta frame payload we have "MAC destination byte 2" and at offset 5 we have "MAC destination byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2] aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3]. The sja1105_meta_unpack() function decodes these the other way around, so a frame with MAC DA 01:80:c2:11:22:33 is received by the network stack as having 01:80:c2:22:11:33. Fixes: e53e18a6fe4d ("net: dsa: sja1105: Receive and decode meta frames") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-03net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge modeVladimir Oltean
There was a regression introduced by the blamed commit, where pinging to a VLAN-unaware bridge would fail with the repeated message "Couldn't decode source port" coming from the tagging protocol driver. When receiving packets with a bridge_vid as determined by dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode: - source_port = 0 (which isn't really valid, more like "don't know") - switch_id = 0 (which isn't really valid, more like "don't know") - vbid = value in range 1-7 Since the blamed patch has reversed the order of the checks, we are now going to believe that source_port != -1 and switch_id != -1, so they're valid, but they aren't. The minimal solution to the problem is to only populate source_port and switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero, i.e. the source port information is trustworthy. Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-29net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPTVladimir Oltean
Currently the sja1105 tagging protocol prefers using the source port information from the VLAN header if that is available, falling back to the INCL_SRCPT option if it isn't. The VLAN header is available for all frames except for META frames initiated by the switch (containing RX timestamps), and thus, the "if (is_link_local)" branch is practically dead. The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. With incorrect source port reporting, for example PTP traffic over 2 bridged ports will all be seen on sockets opened on the first such port, which is incorrect. Now that the tagging protocol has been changed to make link-local frames always contain source port information, we can reverse the order of the checks so that we always give precedence to that information (which is always precise) in lieu of the tag_8021q VID which is only precise for a standalone port. Fixes: d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21fcec ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addressesVladimir Oltean
When using the felix driver (the only one which supports UC filtering and MC filtering) as a DSA master for a random other DSA switch, one can see the following stack trace when the downstream switch ports join a VLAN-aware bridge: ============================= WARNING: suspicious RCU usage ----------------------------- net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage! stack backtrace: Workqueue: dsa_ordered dsa_slave_switchdev_event_work Call trace: lockdep_rcu_suspicious+0x170/0x210 vlan_for_each+0x8c/0x188 dsa_slave_sync_uc+0x128/0x178 __hw_addr_sync_dev+0x138/0x158 dsa_slave_set_rx_mode+0x58/0x70 __dev_set_rx_mode+0x88/0xa8 dev_uc_add+0x74/0xa0 dsa_port_bridge_host_fdb_add+0xec/0x180 dsa_slave_switchdev_event_work+0x7c/0x1c8 process_one_work+0x290/0x568 What it's saying is that vlan_for_each() expects rtnl_lock() context and it's not getting it, when it's called from the DSA master's ndo_set_rx_mode(). The caller of that - dsa_slave_set_rx_mode() - is the slave DSA interface's dsa_port_bridge_host_fdb_add() which comes from the deferred dsa_slave_switchdev_event_work(). We went to great lengths to avoid the rtnl_lock() context in that call path in commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), and calling rtnl_lock() is simply not an option due to the possibility of deadlocking when calling dsa_flush_workqueue() from the call paths that do hold rtnl_lock() - basically all of them. So, when the DSA master calls vlan_for_each() from its ndo_set_rx_mode(), the state of the 8021q driver on this device is really not protected from concurrent access by anything. Looking at net/8021q/, I don't think that vlan_info->vid_list was particularly designed with RCU traversal in mind, so introducing an RCU read-side form of vlan_for_each() - vlan_for_each_rcu() - won't be so easy, and it also wouldn't be exactly what we need anyway. In general I believe that the solution isn't in net/8021q/ anyway; vlan_for_each() is not cut out for this task. DSA doesn't need rtnl_lock() to be held per se - since it's not a netdev state change that we're blocking, but rather, just concurrent additions/removals to a VLAN list. We don't even need sleepable context - the callback of vlan_for_each() just schedules deferred work. The proposed escape is to remove the dependency on vlan_for_each() and to open-code a non-sleepable, rtnl-free alternative to that, based on copies of the VLAN list modified from .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid(). Fixes: 64fdc5f341db ("net: dsa: sync unicast and multicast addresses for VLAN filters too") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230626154402.3154454-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: tools/testing/selftests/net/fcnal-test.sh d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.") https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/ https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26net: dsa: add support for mac_prepare() and mac_finish() callsRussell King (Oracle)
Add DSA support for the phylink mac_prepare() and mac_finish() calls. These were introduced as part of the PCS support to allow MACs to perform preparatory steps prior to configuration, and finalisation steps after the MAC and PCS has been configured. Introducing phylink_pcs support to the mv88e6xxx DSA driver needs some code moved out of its mac_config() stage into the mac_prepare() and mac_finish() stages, and this commit facilitates such code in DSA drivers. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TXVladimir Oltean
ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most appropriate helper I could find which strips away a VLAN header. That's all I need it to do, but __skb_vlan_pop() has more logic, which will become incompatible with the future revert of commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). Namely, it performs a sanity check on skb_mac_header(), which will stop being set after the above revert, so it will return an error instead of removing the VLAN tag. ocelot_xmit_get_vlan_info() gets called in 2 circumstances: (1) the port is under a VLAN-aware bridge and the bridge sends VLAN-tagged packets (2) the port is under a VLAN-aware bridge and somebody else (an 8021q upper) sends VLAN-tagged packets (using a VID that isn't in the bridge vlan tables) In case (1), there is actually no bug to defend against, because br_dev_xmit() calls skb_reset_mac_header() and things continue to work. However, in case (2), illustrated using the commands below, it can be seen that our intervention is needed, since __skb_vlan_pop() complains: $ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set $eth master br0 && ip link set $eth up $ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up $ ip addr add 192.168.100.1/24 dev $eth.100 I could fend off the checks in __skb_vlan_pop() with some skb_mac_header_was_set() calls, but seeing how few callers of __skb_vlan_pop() there are from TX paths, that seems rather unproductive. As an alternative solution, extract the bare minimum logic to strip a VLAN header, and move it to a new helper named vlan_remove_tag(), close to the definition of vlan_insert_tag(). Document it appropriately and make ocelot_xmit_get_vlan_info() call this smaller helper instead. Seeing that it doesn't appear illegal to test skb->protocol in the TX path, I guess it would be a good for vlan_remove_tag() to also absorb the vlan_set_encap_proto() function call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: update TX path comments to not mention skb_mac_header()Vladimir Oltean
Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") will be reverted, it will no longer be true that skb->data points at skb_mac_header(skb) - since the skb->mac_header will not be set - so stop saying that, and just say that it points to the MAC header. I've reviewed vlan_insert_tag() and it does not *actually* depend on skb_mac_header(), so reword that to avoid the confusion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_sja1105: replace skb_mac_header() with vlan_eth_hdr()Vladimir Oltean
This is a cosmetic patch which consolidates the code to use the helper function offered by if_vlan.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_sja1105: don't rely on skb_mac_header() in TX pathsVladimir Oltean
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ksz: do not rely on skb_mac_header() in TX pathsVladimir Oltean
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to get to the Ethernet header's MAC DA instead, helper which assumes this header is located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmitVladimir Oltean
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for VLAN operationsVladimir Oltean
These are not as critical as the FDB/MDB trace points (I'm not aware of outstanding VLAN related bugs), but maybe they are useful to somebody, either debugging something or simply trying to learn more. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for FDB/MDB operationsVladimir Oltean
DSA performs non-trivial housekeeping of unicast and multicast addresses on shared (CPU and DSA) ports, and puts a bit of pressure on higher layers, requiring them to behave correctly (remove these addresses exactly as many times as they were added). Otherwise, either addresses linger around forever, or DSA returns -ENOENT complaining that entries that were already deleted must be deleted again. To aid debugging, introduce some trace points specifically for FDB and MDB - that's where some of the bugs still are right now. Some bugs I have seen were also due to race conditions, see: - 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path") - a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN") so it would be good to not disturb the timing too much, hence the choice to use trace points vs regular dev_dbg(). I've had these for some time on my computer in a less polished form, and they've proven useful. What I found most useful was to enable CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline, and run "cat /sys/kernel/debug/tracing/trace". This is to debug more complex environments with network managers started by the init system, things like that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-09net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stubVladimir Oltean
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a netdev notifier for DSA to reject PTP on DSA master"), due to a desire to convert DSA's attempt to deny TX timestamping on a DSA master to something that doesn't block the kernel-wide API conversion from ndo_eth_ioctl() to ndo_hwtstamp_set(). What was required was a mechanism that did not depend on ndo_eth_ioctl(), and what was provided was a mechanism that did not depend on ndo_eth_ioctl(), while at the same time introducing something that wasn't absolutely necessary - a new netdev notifier. There have been objections from Jakub Kicinski that using notifiers in general when they are not absolutely necessary creates complications to the control flow and difficulties to maintainers who look at the code. So there is a desire to not use notifiers. In addition to that, the notifier chain gets called even if there is no DSA in the system and no one is interested in applying any restriction. Take the model of udp_tunnel_nic_ops and introduce a stub mechanism, through which net/core/dev_ioctl.c can call into DSA even when CONFIG_NET_DSA=m. Compared to the code that existed prior to the notifier conversion, aka what was added in commits: - 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops") - 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers") this is different because we are not overloading any struct net_device_ops of the DSA master anymore, but rather, we are exposing a rather specific functionality which is orthogonal to which API is used to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set(). Also, what is similar is that both approaches use function pointers to get from built-in code to DSA. There is no point in replicating the function pointers towards __dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr). Instead, it is sufficient to introduce a singleton struct dsa_stubs, built into the kernel, which contains a single function pointer to __dsa_master_hwtstamp_validate(). I find this approach preferable to what we had originally, because dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any attempts to add any data encapsulation and hide DSA data structures from the outside world. Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-03net: create a netdev notifier for DSA to reject PTP on DSA masterVladimir Oltean
The fact that PTP 2-step TX timestamping is broken on DSA switches if the master also timestamps the same packets is documented by commit f685e609a301 ("net: dsa: Deny PTP on master if switch supports it"). We attempt to help the users avoid shooting themselves in the foot by making DSA reject the timestamping ioctls on an interface that is a DSA master, and the switch tree beneath it contains switches which are aware of PTP. The only problem is that there isn't an established way of intercepting ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network stack by creating a struct dsa_netdevice_ops with overlaid function pointers that are manually checked from the relevant call sites. There used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only one left. There is an ongoing effort to migrate driver-visible hardware timestamping control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set() model, but DSA actively prevents that migration, since dsa_master_ioctl() is currently coded to manually call the master's legacy ndo_eth_ioctl(), and so, whenever a network device driver would be converted to the new API, DSA's restrictions would be circumvented, because any device could be used as a DSA master. The established way for unrelated modules to react on a net device event is via netdevice notifiers. So we create a new notifier which gets called whenever there is an attempt to change hardware timestamping settings on a device. Finally, there is another reason why a netdev notifier will be a good idea, besides strictly DSA, and this has to do with PHY timestamping. With ndo_eth_ioctl(), all MAC drivers must manually call phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP, otherwise they must pass this ioctl to the PHY driver via phy_mii_ioctl(). With the new ndo_hwtstamp_set() API, it will be desirable to simply not make any calls into the MAC device driver when timestamping should be performed at the PHY level. But there exist drivers, such as the lan966x switch, which need to install packet traps for PTP regardless of whether they are the layer that provides the hardware timestamps, or the PHY is. That would be impossible to support with the new API. The proposal there, too, is to introduce a netdev notifier which acts as a better cue for switching drivers to add or remove PTP packet traps, than ndo_hwtstamp_set(). The one introduced here "almost" works there as well, except for the fact that packet traps should only be installed if the PHY driver succeeded to enable hardware timestamping, whereas here, we need to deny hardware timestamping on the DSA master before it actually gets enabled. This is why this notifier is called "PRE_", and the notifier that would get used for PHY timestamping and packet traps would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing. In expectation of future netlink UAPI, we also pass a non-NULL extack pointer to the netdev notifier, and we make DSA populate it with an informative reason for the rejection. To avoid making it go to waste, we make the ioctl-based dev_set_hwtstamp() create a fake extack and print the message to the kernel log. Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/ Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-03net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreqVladimir Oltean
dsa_master_ioctl() is in the process of getting converted to a different API, where we won't have access to a struct ifreq * anymore, but rather, to a struct kernel_hwtstamp_config. Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this creates a difficult situation where we have to make up such a dummy pointer. The conversion is a bit messy, because it forces a "good" implementation of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user() because of the NULL ifr->ifr_data pointer. However, it works, and it is only a transient step until ds->ops->port_hwtstamp_get() gets converted to the new API which passes struct kernel_hwtstamp_config and does not call copy_to_user(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-30net: dsa: fix db type confusion in host fdb/mdb add/delVladimir Oltean
We have the following code paths: Host FDB (unicast RX filtering): dsa_port_standalone_host_fdb_add() dsa_port_bridge_host_fdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_add() dsa_port_standalone_host_fdb_del() dsa_port_bridge_host_fdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_del() Host MDB (multicast RX filtering): dsa_port_standalone_host_mdb_add() dsa_port_bridge_host_mdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_add() dsa_port_standalone_host_mdb_del() dsa_port_bridge_host_mdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_del() The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") zeroes out db.bridge.num if the switch doesn't support ds->fdb_isolation (the majority doesn't). This is done for a reason explained in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation"). Taking a single code path as example - dsa_port_host_fdb_add() - the others are similar - the problem is that this function handles: - DSA_DB_PORT databases, when called from dsa_port_standalone_host_fdb_add() - DSA_DB_BRIDGE databases, when called from dsa_port_bridge_host_fdb_add() So, if dsa_port_host_fdb_add() were to make any change on the "bridge.num" attribute of the database, this would only be correct for a DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge. However, this bug is without consequences, for 2 reasons: - dsa_port_standalone_host_fdb_add() is only called from code which is (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that function only returns true if ds->fdb_isolation is set. So, the code only executed for DSA_DB_BRIDGE databases. - Even if the code was not dead for DSA_DB_PORT, we have the following memory layout: struct dsa_bridge { struct net_device *dev; unsigned int num; bool tx_fwd_offload; refcount_t refcount; }; struct dsa_db { enum dsa_db_type type; union { const struct dsa_port *dp; // DSA_DB_PORT struct dsa_lag lag; struct dsa_bridge bridge; // DSA_DB_BRIDGE }; }; So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of type DSA_DB_PORT would access memory which is unused, because we only use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition. It is correct to fix up dsa_db :: bridge :: num only from code paths that come from the bridge / switchdev, so move these there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30net: dsa: sync unicast and multicast addresses for VLAN filters tooVladimir Oltean
If certain conditions are met, DSA can install all necessary MAC addresses on the CPU ports as FDB entries and disable flooding towards the CPU (we call this RX filtering). There is one corner case where this does not work. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up ip link set swp0 master br0 && ip link set swp0 up ip link add link swp0 name swp0.100 type vlan id 100 ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100 Traffic through swp0.100 is broken, because the bridge turns on VLAN filtering in the swp0 port (causing RX packets to be classified to the FDB database corresponding to the VID from their 802.1Q header), and although the 8021q module does call dev_uc_add() towards the real device, that API is VLAN-unaware, so it only contains the MAC address, not the VID; and DSA's current implementation of ndo_set_rx_mode() is only for VID 0 (corresponding to FDB entries which are installed in an FDB database which is only hit when the port is VLAN-unaware). It's interesting to understand why the bridge does not turn on IFF_PROMISC for its swp0 bridge port, and it may appear at first glance that this is a regression caused by the logic in commit 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode."). After all, a bridge port needs to have IFF_PROMISC by its very nature - it needs to receive and forward frames with a MAC DA different from the bridge ports' MAC addresses. While that may be true, when the bridge is VLAN-aware *and* it has a single port, there is no real reason to enable promiscuity even if that is an automatic port, with flooding and learning (there is nowhere for packets to go except to the BR_FDB_LOCAL entries), and this is how the corner case appears. Adding a second automatic interface to the bridge would make swp0 promisc as well, and would mask the corner case. Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't pass a VLAN ID), the only way to address that problem is to install host FDB entries for the cartesian product of RX filtering MAC addresses and VLAN RX filters. Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: dsa: tag_brcm: legacy: fix daisy-chained switchesÁlvaro Fernández Rojas
When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: dsa: report rx_bytes unadjusted for ETH_HLENVladimir Oltean
We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu()Vladimir Oltean
Currently, when dsa_slave_change_mtu() is called on a user port where dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code will stumble upon this check: if (new_master_mtu > mtu_limit) return -ERANGE; because new_master_mtu is adjusted for the tagger overhead but mtu_limit is not. But it would be good if the logic went through, for example if the DSA master really depends on an MTU adjustment to accept DSA-tagged frames. To make the code pass through the check, we need to adjust mtu_limit for the overhead as well, if the minimum restriction was caused by the DSA user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU are always offset by the protocol overhead. Currently no drivers return 1500 .port_max_mtu(), but this is only temporary and a bug in itself - mv88e6xxx should have done that, but since commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") it no longer does. This is a preparation for fixing that. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistentlyVladimir Oltean
Now that commit 028fb19c6ba7 ("netlink: provide an ability to set default extack message") provides a weak function that doesn't override an existing extack message provided by the driver, it makes sense to use it also for LAG and HSR offloading, not just for bridge offloading. Also consistently put the message string on a separate line, to reduce line length from 92 to 84 characters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: dsa: Use sysfs_emit() to instead of sprintf()Bo Liu
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-01netlink: provide an ability to set default extack messageLeon Romanovsky
In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message. Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code. [1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23net: dsa: microchip: enable port queues for tc mqprioArun Ramadoss
LAN937x family of switches has 8 queues per port where the KSZ switches has 4 queues per port. By default, only one queue per port is enabled. The queues are configurable in 2, 4 or 8. This patch add 8 number of queues for LAN937x and 4 for other switches. In the tag_ksz.c file, prioirty of the packet is queried using the skb buffer and the corresponding value is updated in the tag. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>