From 636e8adf7878eab3614250234341bde45537f47a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 14 Mar 2023 20:24:04 +0200 Subject: net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu() Currently, when dsa_slave_change_mtu() is called on a user port where dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code will stumble upon this check: if (new_master_mtu > mtu_limit) return -ERANGE; because new_master_mtu is adjusted for the tagger overhead but mtu_limit is not. But it would be good if the logic went through, for example if the DSA master really depends on an MTU adjustment to accept DSA-tagged frames. To make the code pass through the check, we need to adjust mtu_limit for the overhead as well, if the minimum restriction was caused by the DSA user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU are always offset by the protocol overhead. Currently no drivers return 1500 .port_max_mtu(), but this is only temporary and a bug in itself - mv88e6xxx should have done that, but since commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") it no longer does. This is a preparation for fixing that. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/slave.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 6957971c2db2..cac17183589f 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1933,6 +1933,7 @@ int dsa_slave_change_mtu(struct net_device *dev, int new_mtu) int new_master_mtu; int old_master_mtu; int mtu_limit; + int overhead; int cpu_mtu; int err; @@ -1961,9 +1962,10 @@ int dsa_slave_change_mtu(struct net_device *dev, int new_mtu) largest_mtu = slave_mtu; } - mtu_limit = min_t(int, master->max_mtu, dev->max_mtu); + overhead = dsa_tag_protocol_overhead(cpu_dp->tag_ops); + mtu_limit = min_t(int, master->max_mtu, dev->max_mtu + overhead); old_master_mtu = master->mtu; - new_master_mtu = largest_mtu + dsa_tag_protocol_overhead(cpu_dp->tag_ops); + new_master_mtu = largest_mtu + overhead; if (new_master_mtu > mtu_limit) return -ERANGE; @@ -1998,8 +2000,7 @@ int dsa_slave_change_mtu(struct net_device *dev, int new_mtu) out_port_failed: if (new_master_mtu != old_master_mtu) - dsa_port_mtu_change(cpu_dp, old_master_mtu - - dsa_tag_protocol_overhead(cpu_dp->tag_ops)); + dsa_port_mtu_change(cpu_dp, old_master_mtu - overhead); out_cpu_failed: if (new_master_mtu != old_master_mtu) dev_set_mtu(master, old_master_mtu); -- cgit From a8eff03545d4cef12ae66a1905627c1818a0f81a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 18 Mar 2023 01:19:00 +0200 Subject: net: dsa: report rx_bytes unadjusted for ETH_HLEN We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/dsa/tag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/tag.c b/net/dsa/tag.c index b2fba1a003ce..5105a5ff58fa 100644 --- a/net/dsa/tag.c +++ b/net/dsa/tag.c @@ -114,7 +114,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, skb = nskb; } - dev_sw_netstats_rx_add(skb->dev, skb->len); + dev_sw_netstats_rx_add(skb->dev, skb->len + ETH_HLEN); if (dsa_skb_defer_rx_timestamp(p, skb)) return 0; -- cgit From 032a954061afd4b7426c3eb6bfd2952ef1e9a384 Mon Sep 17 00:00:00 2001 From: Álvaro Fernández Rojas Date: Sun, 19 Mar 2023 10:55:40 +0100 Subject: net: dsa: tag_brcm: legacy: fix daisy-chained switches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas Reviewed-by: Michal Swiatkowski Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski --- net/dsa/tag_brcm.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index 10239daa5745..cacdafb41200 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -7,6 +7,7 @@ #include #include +#include #include #include @@ -252,6 +253,7 @@ static struct sk_buff *brcm_leg_tag_xmit(struct sk_buff *skb, static struct sk_buff *brcm_leg_tag_rcv(struct sk_buff *skb, struct net_device *dev) { + int len = BRCM_LEG_TAG_LEN; int source_port; u8 *brcm_tag; @@ -266,12 +268,16 @@ static struct sk_buff *brcm_leg_tag_rcv(struct sk_buff *skb, if (!skb->dev) return NULL; + /* VLAN tag is added by BCM63xx internal switch */ + if (netdev_uses_dsa(skb->dev)) + len += VLAN_HLEN; + /* Remove Broadcom tag and update checksum */ - skb_pull_rcsum(skb, BRCM_LEG_TAG_LEN); + skb_pull_rcsum(skb, len); dsa_default_offload_fwd_mark(skb); - dsa_strip_etype_header(skb, BRCM_LEG_TAG_LEN); + dsa_strip_etype_header(skb, len); return skb; } -- cgit From 64fdc5f341db01200e33105265d4b8450122a82e Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 29 Mar 2023 18:18:21 +0300 Subject: net: dsa: sync unicast and multicast addresses for VLAN filters too If certain conditions are met, DSA can install all necessary MAC addresses on the CPU ports as FDB entries and disable flooding towards the CPU (we call this RX filtering). There is one corner case where this does not work. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up ip link set swp0 master br0 && ip link set swp0 up ip link add link swp0 name swp0.100 type vlan id 100 ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100 Traffic through swp0.100 is broken, because the bridge turns on VLAN filtering in the swp0 port (causing RX packets to be classified to the FDB database corresponding to the VID from their 802.1Q header), and although the 8021q module does call dev_uc_add() towards the real device, that API is VLAN-unaware, so it only contains the MAC address, not the VID; and DSA's current implementation of ndo_set_rx_mode() is only for VID 0 (corresponding to FDB entries which are installed in an FDB database which is only hit when the port is VLAN-unaware). It's interesting to understand why the bridge does not turn on IFF_PROMISC for its swp0 bridge port, and it may appear at first glance that this is a regression caused by the logic in commit 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode."). After all, a bridge port needs to have IFF_PROMISC by its very nature - it needs to receive and forward frames with a MAC DA different from the bridge ports' MAC addresses. While that may be true, when the bridge is VLAN-aware *and* it has a single port, there is no real reason to enable promiscuity even if that is an automatic port, with flooding and learning (there is nowhere for packets to go except to the BR_FDB_LOCAL entries), and this is how the corner case appears. Adding a second automatic interface to the bridge would make swp0 promisc as well, and would mask the corner case. Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't pass a VLAN ID), the only way to address that problem is to install host FDB entries for the cartesian product of RX filtering MAC addresses and VLAN RX filters. Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/slave.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 116 insertions(+), 5 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/slave.c b/net/dsa/slave.c index cac17183589f..165bb2cb8431 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -57,6 +57,12 @@ struct dsa_standalone_event_work { u16 vid; }; +struct dsa_host_vlan_rx_filtering_ctx { + struct net_device *dev; + const unsigned char *addr; + enum dsa_standalone_event event; +}; + static bool dsa_switch_supports_uc_filtering(struct dsa_switch *ds) { return ds->ops->port_fdb_add && ds->ops->port_fdb_del && @@ -155,18 +161,37 @@ static int dsa_slave_schedule_standalone_work(struct net_device *dev, return 0; } +static int dsa_slave_host_vlan_rx_filtering(struct net_device *vdev, int vid, + void *arg) +{ + struct dsa_host_vlan_rx_filtering_ctx *ctx = arg; + + return dsa_slave_schedule_standalone_work(ctx->dev, ctx->event, + ctx->addr, vid); +} + static int dsa_slave_sync_uc(struct net_device *dev, const unsigned char *addr) { struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_host_vlan_rx_filtering_ctx ctx = { + .dev = dev, + .addr = addr, + .event = DSA_UC_ADD, + }; + int err; dev_uc_add(master, addr); if (!dsa_switch_supports_uc_filtering(dp->ds)) return 0; - return dsa_slave_schedule_standalone_work(dev, DSA_UC_ADD, addr, 0); + err = dsa_slave_schedule_standalone_work(dev, DSA_UC_ADD, addr, 0); + if (err) + return err; + + return vlan_for_each(dev, dsa_slave_host_vlan_rx_filtering, &ctx); } static int dsa_slave_unsync_uc(struct net_device *dev, @@ -174,13 +199,23 @@ static int dsa_slave_unsync_uc(struct net_device *dev, { struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_host_vlan_rx_filtering_ctx ctx = { + .dev = dev, + .addr = addr, + .event = DSA_UC_DEL, + }; + int err; dev_uc_del(master, addr); if (!dsa_switch_supports_uc_filtering(dp->ds)) return 0; - return dsa_slave_schedule_standalone_work(dev, DSA_UC_DEL, addr, 0); + err = dsa_slave_schedule_standalone_work(dev, DSA_UC_DEL, addr, 0); + if (err) + return err; + + return vlan_for_each(dev, dsa_slave_host_vlan_rx_filtering, &ctx); } static int dsa_slave_sync_mc(struct net_device *dev, @@ -188,13 +223,23 @@ static int dsa_slave_sync_mc(struct net_device *dev, { struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_host_vlan_rx_filtering_ctx ctx = { + .dev = dev, + .addr = addr, + .event = DSA_MC_ADD, + }; + int err; dev_mc_add(master, addr); if (!dsa_switch_supports_mc_filtering(dp->ds)) return 0; - return dsa_slave_schedule_standalone_work(dev, DSA_MC_ADD, addr, 0); + err = dsa_slave_schedule_standalone_work(dev, DSA_MC_ADD, addr, 0); + if (err) + return err; + + return vlan_for_each(dev, dsa_slave_host_vlan_rx_filtering, &ctx); } static int dsa_slave_unsync_mc(struct net_device *dev, @@ -202,13 +247,23 @@ static int dsa_slave_unsync_mc(struct net_device *dev, { struct net_device *master = dsa_slave_to_master(dev); struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_host_vlan_rx_filtering_ctx ctx = { + .dev = dev, + .addr = addr, + .event = DSA_MC_DEL, + }; + int err; dev_mc_del(master, addr); if (!dsa_switch_supports_mc_filtering(dp->ds)) return 0; - return dsa_slave_schedule_standalone_work(dev, DSA_MC_DEL, addr, 0); + err = dsa_slave_schedule_standalone_work(dev, DSA_MC_DEL, addr, 0); + if (err) + return err; + + return vlan_for_each(dev, dsa_slave_host_vlan_rx_filtering, &ctx); } void dsa_slave_sync_ha(struct net_device *dev) @@ -1702,6 +1757,8 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto, .flags = 0, }; struct netlink_ext_ack extack = {0}; + struct dsa_switch *ds = dp->ds; + struct netdev_hw_addr *ha; int ret; /* User port... */ @@ -1721,6 +1778,30 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto, return ret; } + if (!dsa_switch_supports_uc_filtering(ds) && + !dsa_switch_supports_mc_filtering(ds)) + return 0; + + netif_addr_lock_bh(dev); + + if (dsa_switch_supports_mc_filtering(ds)) { + netdev_for_each_synced_mc_addr(ha, dev) { + dsa_slave_schedule_standalone_work(dev, DSA_MC_ADD, + ha->addr, vid); + } + } + + if (dsa_switch_supports_uc_filtering(ds)) { + netdev_for_each_synced_uc_addr(ha, dev) { + dsa_slave_schedule_standalone_work(dev, DSA_UC_ADD, + ha->addr, vid); + } + } + + netif_addr_unlock_bh(dev); + + dsa_flush_workqueue(); + return 0; } @@ -1733,13 +1814,43 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, /* This API only allows programming tagged, non-PVID VIDs */ .flags = 0, }; + struct dsa_switch *ds = dp->ds; + struct netdev_hw_addr *ha; int err; err = dsa_port_vlan_del(dp, &vlan); if (err) return err; - return dsa_port_host_vlan_del(dp, &vlan); + err = dsa_port_host_vlan_del(dp, &vlan); + if (err) + return err; + + if (!dsa_switch_supports_uc_filtering(ds) && + !dsa_switch_supports_mc_filtering(ds)) + return 0; + + netif_addr_lock_bh(dev); + + if (dsa_switch_supports_mc_filtering(ds)) { + netdev_for_each_synced_mc_addr(ha, dev) { + dsa_slave_schedule_standalone_work(dev, DSA_MC_DEL, + ha->addr, vid); + } + } + + if (dsa_switch_supports_uc_filtering(ds)) { + netdev_for_each_synced_uc_addr(ha, dev) { + dsa_slave_schedule_standalone_work(dev, DSA_UC_DEL, + ha->addr, vid); + } + } + + netif_addr_unlock_bh(dev); + + dsa_flush_workqueue(); + + return 0; } static int dsa_slave_restore_vlan(struct net_device *vdev, int vid, void *arg) -- cgit From eb1ab7650d358b553cc946035fd7c7bdda1856e3 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 29 Mar 2023 16:38:19 +0300 Subject: net: dsa: fix db type confusion in host fdb/mdb add/del We have the following code paths: Host FDB (unicast RX filtering): dsa_port_standalone_host_fdb_add() dsa_port_bridge_host_fdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_add() dsa_port_standalone_host_fdb_del() dsa_port_bridge_host_fdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_del() Host MDB (multicast RX filtering): dsa_port_standalone_host_mdb_add() dsa_port_bridge_host_mdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_add() dsa_port_standalone_host_mdb_del() dsa_port_bridge_host_mdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_del() The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") zeroes out db.bridge.num if the switch doesn't support ds->fdb_isolation (the majority doesn't). This is done for a reason explained in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation"). Taking a single code path as example - dsa_port_host_fdb_add() - the others are similar - the problem is that this function handles: - DSA_DB_PORT databases, when called from dsa_port_standalone_host_fdb_add() - DSA_DB_BRIDGE databases, when called from dsa_port_bridge_host_fdb_add() So, if dsa_port_host_fdb_add() were to make any change on the "bridge.num" attribute of the database, this would only be correct for a DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge. However, this bug is without consequences, for 2 reasons: - dsa_port_standalone_host_fdb_add() is only called from code which is (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that function only returns true if ds->fdb_isolation is set. So, the code only executed for DSA_DB_BRIDGE databases. - Even if the code was not dead for DSA_DB_PORT, we have the following memory layout: struct dsa_bridge { struct net_device *dev; unsigned int num; bool tx_fwd_offload; refcount_t refcount; }; struct dsa_db { enum dsa_db_type type; union { const struct dsa_port *dp; // DSA_DB_PORT struct dsa_lag lag; struct dsa_bridge bridge; // DSA_DB_BRIDGE }; }; So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of type DSA_DB_PORT would access memory which is unused, because we only use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition. It is correct to fix up dsa_db :: bridge :: num only from code paths that come from the bridge / switchdev, so move these there. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/port.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/port.c b/net/dsa/port.c index 67ad1adec2a2..15cee17769e9 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -1028,9 +1028,6 @@ static int dsa_port_host_fdb_add(struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_FDB_ADD, &info); } @@ -1055,6 +1052,9 @@ int dsa_port_bridge_host_fdb_add(struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + /* Avoid a call to __dev_set_promiscuity() on the master, which * requires rtnl_lock(), since we can't guarantee that is held here, * and we can't take it either. @@ -1079,9 +1079,6 @@ static int dsa_port_host_fdb_del(struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_FDB_DEL, &info); } @@ -1106,6 +1103,9 @@ int dsa_port_bridge_host_fdb_del(struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + if (master->priv_flags & IFF_UNICAST_FLT) { err = dev_uc_del(master, addr); if (err) @@ -1210,9 +1210,6 @@ static int dsa_port_host_mdb_add(const struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_MDB_ADD, &info); } @@ -1237,6 +1234,9 @@ int dsa_port_bridge_host_mdb_add(const struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + err = dev_mc_add(master, mdb->addr); if (err) return err; @@ -1254,9 +1254,6 @@ static int dsa_port_host_mdb_del(const struct dsa_port *dp, .db = db, }; - if (!dp->ds->fdb_isolation) - info.db.bridge.num = 0; - return dsa_port_notify(dp, DSA_NOTIFIER_HOST_MDB_DEL, &info); } @@ -1281,6 +1278,9 @@ int dsa_port_bridge_host_mdb_del(const struct dsa_port *dp, }; int err; + if (!dp->ds->fdb_isolation) + db.bridge.num = 0; + err = dev_mc_del(master, mdb->addr); if (err) return err; -- cgit From ff6ac4d013e680a5e7a38ee83ca59ffe1846915d Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sun, 2 Apr 2023 15:37:54 +0300 Subject: net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreq dsa_master_ioctl() is in the process of getting converted to a different API, where we won't have access to a struct ifreq * anymore, but rather, to a struct kernel_hwtstamp_config. Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this creates a difficult situation where we have to make up such a dummy pointer. The conversion is a bit messy, because it forces a "good" implementation of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user() because of the NULL ifr->ifr_data pointer. However, it works, and it is only a transient step until ds->ops->port_hwtstamp_get() gets converted to the new API which passes struct kernel_hwtstamp_config and does not call copy_to_user(). Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/master.c | 2 +- net/dsa/port.c | 10 ++++++---- net/dsa/port.h | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/master.c b/net/dsa/master.c index 22d3f16b0e6d..e397641382ca 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -212,7 +212,7 @@ static int dsa_master_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) * switch in the tree that is PTP capable. */ list_for_each_entry(dp, &dst->ports, list) - if (dsa_port_supports_hwtstamp(dp, ifr)) + if (dsa_port_supports_hwtstamp(dp)) return -EBUSY; break; } diff --git a/net/dsa/port.c b/net/dsa/port.c index 15cee17769e9..71ba30538411 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -114,19 +114,21 @@ static bool dsa_port_can_configure_learning(struct dsa_port *dp) return !err; } -bool dsa_port_supports_hwtstamp(struct dsa_port *dp, struct ifreq *ifr) +bool dsa_port_supports_hwtstamp(struct dsa_port *dp) { struct dsa_switch *ds = dp->ds; + struct ifreq ifr = {}; int err; if (!ds->ops->port_hwtstamp_get || !ds->ops->port_hwtstamp_set) return false; /* "See through" shim implementations of the "get" method. - * This will clobber the ifreq structure, but we will either return an - * error, or the master will overwrite it with proper values. + * Since we can't cook up a complete ioctl request structure, this will + * fail in copy_to_user() with -EFAULT, which hopefully is enough to + * detect a valid implementation. */ - err = ds->ops->port_hwtstamp_get(ds, dp->index, ifr); + err = ds->ops->port_hwtstamp_get(ds, dp->index, &ifr); return err != -EOPNOTSUPP; } diff --git a/net/dsa/port.h b/net/dsa/port.h index 9c218660d223..dc812512fd0e 100644 --- a/net/dsa/port.h +++ b/net/dsa/port.h @@ -15,7 +15,7 @@ struct switchdev_obj_port_mdb; struct switchdev_vlan_msti; struct phy_device; -bool dsa_port_supports_hwtstamp(struct dsa_port *dp, struct ifreq *ifr); +bool dsa_port_supports_hwtstamp(struct dsa_port *dp); void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp, const struct dsa_device_ops *tag_ops); int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age); -- cgit From 88c0a6b503b7f4fffb68a8d49c3987870c5b1d6b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sun, 2 Apr 2023 15:37:55 +0300 Subject: net: create a netdev notifier for DSA to reject PTP on DSA master The fact that PTP 2-step TX timestamping is broken on DSA switches if the master also timestamps the same packets is documented by commit f685e609a301 ("net: dsa: Deny PTP on master if switch supports it"). We attempt to help the users avoid shooting themselves in the foot by making DSA reject the timestamping ioctls on an interface that is a DSA master, and the switch tree beneath it contains switches which are aware of PTP. The only problem is that there isn't an established way of intercepting ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network stack by creating a struct dsa_netdevice_ops with overlaid function pointers that are manually checked from the relevant call sites. There used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only one left. There is an ongoing effort to migrate driver-visible hardware timestamping control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set() model, but DSA actively prevents that migration, since dsa_master_ioctl() is currently coded to manually call the master's legacy ndo_eth_ioctl(), and so, whenever a network device driver would be converted to the new API, DSA's restrictions would be circumvented, because any device could be used as a DSA master. The established way for unrelated modules to react on a net device event is via netdevice notifiers. So we create a new notifier which gets called whenever there is an attempt to change hardware timestamping settings on a device. Finally, there is another reason why a netdev notifier will be a good idea, besides strictly DSA, and this has to do with PHY timestamping. With ndo_eth_ioctl(), all MAC drivers must manually call phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP, otherwise they must pass this ioctl to the PHY driver via phy_mii_ioctl(). With the new ndo_hwtstamp_set() API, it will be desirable to simply not make any calls into the MAC device driver when timestamping should be performed at the PHY level. But there exist drivers, such as the lan966x switch, which need to install packet traps for PTP regardless of whether they are the layer that provides the hardware timestamps, or the PHY is. That would be impossible to support with the new API. The proposal there, too, is to introduce a netdev notifier which acts as a better cue for switching drivers to add or remove PTP packet traps, than ndo_hwtstamp_set(). The one introduced here "almost" works there as well, except for the fact that packet traps should only be installed if the PHY driver succeeded to enable hardware timestamping, whereas here, we need to deny hardware timestamping on the DSA master before it actually gets enabled. This is why this notifier is called "PRE_", and the notifier that would get used for PHY timestamping and packet traps would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing. In expectation of future netlink UAPI, we also pass a non-NULL extack pointer to the netdev notifier, and we make DSA populate it with an informative reason for the rejection. To avoid making it go to waste, we make the ioctl-based dev_set_hwtstamp() create a fake extack and print the message to the kernel log. Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/ Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/ Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/master.c | 50 +++++++++++++++----------------------------------- net/dsa/master.h | 3 +++ net/dsa/slave.c | 11 +++++++++++ 3 files changed, 29 insertions(+), 35 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/master.c b/net/dsa/master.c index e397641382ca..c2cabe6248b1 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -195,38 +195,31 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset, } } -static int dsa_master_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +/* Deny PTP operations on master if there is at least one switch in the tree + * that is PTP capable. + */ +int dsa_master_pre_change_hwtstamp(struct net_device *dev, + const struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) { struct dsa_port *cpu_dp = dev->dsa_ptr; struct dsa_switch *ds = cpu_dp->ds; struct dsa_switch_tree *dst; - int err = -EOPNOTSUPP; struct dsa_port *dp; dst = ds->dst; - switch (cmd) { - case SIOCGHWTSTAMP: - case SIOCSHWTSTAMP: - /* Deny PTP operations on master if there is at least one - * switch in the tree that is PTP capable. - */ - list_for_each_entry(dp, &dst->ports, list) - if (dsa_port_supports_hwtstamp(dp)) - return -EBUSY; - break; + list_for_each_entry(dp, &dst->ports, list) { + if (dsa_port_supports_hwtstamp(dp)) { + NL_SET_ERR_MSG(extack, + "HW timestamping not allowed on DSA master when switch supports the operation"); + return -EBUSY; + } } - if (dev->netdev_ops->ndo_eth_ioctl) - err = dev->netdev_ops->ndo_eth_ioctl(dev, ifr, cmd); - - return err; + return 0; } -static const struct dsa_netdevice_ops dsa_netdev_ops = { - .ndo_eth_ioctl = dsa_master_ioctl, -}; - static int dsa_master_ethtool_setup(struct net_device *dev) { struct dsa_port *cpu_dp = dev->dsa_ptr; @@ -267,15 +260,6 @@ static void dsa_master_ethtool_teardown(struct net_device *dev) cpu_dp->orig_ethtool_ops = NULL; } -static void dsa_netdev_ops_set(struct net_device *dev, - const struct dsa_netdevice_ops *ops) -{ - if (netif_is_lag_master(dev)) - return; - - dev->dsa_ptr->netdev_ops = ops; -} - /* Keep the master always promiscuous if the tagging protocol requires that * (garbles MAC DA) or if it doesn't support unicast filtering, case in which * it would revert to promiscuous mode as soon as we call dev_uc_add() on it @@ -414,16 +398,13 @@ int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp) if (ret) goto out_err_reset_promisc; - dsa_netdev_ops_set(dev, &dsa_netdev_ops); - ret = sysfs_create_group(&dev->dev.kobj, &dsa_group); if (ret) - goto out_err_ndo_teardown; + goto out_err_ethtool_teardown; return ret; -out_err_ndo_teardown: - dsa_netdev_ops_set(dev, NULL); +out_err_ethtool_teardown: dsa_master_ethtool_teardown(dev); out_err_reset_promisc: dsa_master_set_promiscuity(dev, -1); @@ -433,7 +414,6 @@ out_err_reset_promisc: void dsa_master_teardown(struct net_device *dev) { sysfs_remove_group(&dev->dev.kobj, &dsa_group); - dsa_netdev_ops_set(dev, NULL); dsa_master_ethtool_teardown(dev); dsa_master_reset_mtu(dev); dsa_master_set_promiscuity(dev, -1); diff --git a/net/dsa/master.h b/net/dsa/master.h index 3fc0e610b5b5..80842f4e27f7 100644 --- a/net/dsa/master.h +++ b/net/dsa/master.h @@ -15,5 +15,8 @@ int dsa_master_lag_setup(struct net_device *lag_dev, struct dsa_port *cpu_dp, struct netlink_ext_ack *extack); void dsa_master_lag_teardown(struct net_device *lag_dev, struct dsa_port *cpu_dp); +int dsa_master_pre_change_hwtstamp(struct net_device *dev, + const struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); #endif diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 165bb2cb8431..8abc1658ac47 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -3289,6 +3289,7 @@ static int dsa_master_changeupper(struct net_device *dev, static int dsa_slave_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr) { + struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct net_device *dev = netdev_notifier_info_to_dev(ptr); switch (event) { @@ -3418,6 +3419,16 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb, return NOTIFY_OK; } + case NETDEV_PRE_CHANGE_HWTSTAMP: { + struct netdev_notifier_hwtstamp_info *info = ptr; + int err; + + if (!netdev_uses_dsa(dev)) + return NOTIFY_DONE; + + err = dsa_master_pre_change_hwtstamp(dev, info->config, extack); + return notifier_from_errno(err); + } default: break; } -- cgit From 5a17818682cf43ad0fdd6035945f3b7a8c9dc5e9 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 6 Apr 2023 14:42:46 +0300 Subject: net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a netdev notifier for DSA to reject PTP on DSA master"), due to a desire to convert DSA's attempt to deny TX timestamping on a DSA master to something that doesn't block the kernel-wide API conversion from ndo_eth_ioctl() to ndo_hwtstamp_set(). What was required was a mechanism that did not depend on ndo_eth_ioctl(), and what was provided was a mechanism that did not depend on ndo_eth_ioctl(), while at the same time introducing something that wasn't absolutely necessary - a new netdev notifier. There have been objections from Jakub Kicinski that using notifiers in general when they are not absolutely necessary creates complications to the control flow and difficulties to maintainers who look at the code. So there is a desire to not use notifiers. In addition to that, the notifier chain gets called even if there is no DSA in the system and no one is interested in applying any restriction. Take the model of udp_tunnel_nic_ops and introduce a stub mechanism, through which net/core/dev_ioctl.c can call into DSA even when CONFIG_NET_DSA=m. Compared to the code that existed prior to the notifier conversion, aka what was added in commits: - 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops") - 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers") this is different because we are not overloading any struct net_device_ops of the DSA master anymore, but rather, we are exposing a rather specific functionality which is orthogonal to which API is used to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set(). Also, what is similar is that both approaches use function pointers to get from built-in code to DSA. There is no point in replicating the function pointers towards __dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr). Instead, it is sufficient to introduce a singleton struct dsa_stubs, built into the kernel, which contains a single function pointer to __dsa_master_hwtstamp_validate(). I find this approach preferable to what we had originally, because dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any attempts to add any data encapsulation and hide DSA data structures from the outside world. Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/ Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/Makefile | 6 ++++++ net/dsa/dsa.c | 19 +++++++++++++++++++ net/dsa/master.c | 2 +- net/dsa/master.h | 2 +- net/dsa/slave.c | 11 ----------- net/dsa/stubs.c | 10 ++++++++++ 6 files changed, 37 insertions(+), 13 deletions(-) create mode 100644 net/dsa/stubs.c (limited to 'net/dsa') diff --git a/net/dsa/Makefile b/net/dsa/Makefile index cc7e93a562fe..3835de286116 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -1,4 +1,10 @@ # SPDX-License-Identifier: GPL-2.0 + +# the stubs are built-in whenever DSA is built-in or module +ifdef CONFIG_NET_DSA +obj-y := stubs.o +endif + # the core obj-$(CONFIG_NET_DSA) += dsa_core.o dsa_core-y += \ diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index e5f156940c67..ab1afe67fd18 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include "devlink.h" @@ -1702,6 +1703,20 @@ bool dsa_mdb_present_in_other_db(struct dsa_switch *ds, int port, } EXPORT_SYMBOL_GPL(dsa_mdb_present_in_other_db); +static const struct dsa_stubs __dsa_stubs = { + .master_hwtstamp_validate = __dsa_master_hwtstamp_validate, +}; + +static void dsa_register_stubs(void) +{ + dsa_stubs = &__dsa_stubs; +} + +static void dsa_unregister_stubs(void) +{ + dsa_stubs = NULL; +} + static int __init dsa_init_module(void) { int rc; @@ -1721,6 +1736,8 @@ static int __init dsa_init_module(void) if (rc) goto netlink_register_fail; + dsa_register_stubs(); + return 0; netlink_register_fail: @@ -1735,6 +1752,8 @@ module_init(dsa_init_module); static void __exit dsa_cleanup_module(void) { + dsa_unregister_stubs(); + rtnl_link_unregister(&dsa_link_ops); dsa_slave_unregister_notifier(); diff --git a/net/dsa/master.c b/net/dsa/master.c index c2cabe6248b1..6be89ab0cc01 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -198,7 +198,7 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset, /* Deny PTP operations on master if there is at least one switch in the tree * that is PTP capable. */ -int dsa_master_pre_change_hwtstamp(struct net_device *dev, +int __dsa_master_hwtstamp_validate(struct net_device *dev, const struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack) { diff --git a/net/dsa/master.h b/net/dsa/master.h index 80842f4e27f7..76e39d3ec909 100644 --- a/net/dsa/master.h +++ b/net/dsa/master.h @@ -15,7 +15,7 @@ int dsa_master_lag_setup(struct net_device *lag_dev, struct dsa_port *cpu_dp, struct netlink_ext_ack *extack); void dsa_master_lag_teardown(struct net_device *lag_dev, struct dsa_port *cpu_dp); -int dsa_master_pre_change_hwtstamp(struct net_device *dev, +int __dsa_master_hwtstamp_validate(struct net_device *dev, const struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 8abc1658ac47..165bb2cb8431 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -3289,7 +3289,6 @@ static int dsa_master_changeupper(struct net_device *dev, static int dsa_slave_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr) { - struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct net_device *dev = netdev_notifier_info_to_dev(ptr); switch (event) { @@ -3419,16 +3418,6 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb, return NOTIFY_OK; } - case NETDEV_PRE_CHANGE_HWTSTAMP: { - struct netdev_notifier_hwtstamp_info *info = ptr; - int err; - - if (!netdev_uses_dsa(dev)) - return NOTIFY_DONE; - - err = dsa_master_pre_change_hwtstamp(dev, info->config, extack); - return notifier_from_errno(err); - } default: break; } diff --git a/net/dsa/stubs.c b/net/dsa/stubs.c new file mode 100644 index 000000000000..2ed8a6c85fbf --- /dev/null +++ b/net/dsa/stubs.c @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Stubs for DSA functionality called by the core network stack. + * These are necessary because CONFIG_NET_DSA can be a module, and built-in + * code cannot directly call symbols exported by modules. + */ +#include + +const struct dsa_stubs *dsa_stubs; +EXPORT_SYMBOL_GPL(dsa_stubs); -- cgit From 9538ebce88ffa074202d592d468521995cb1e714 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 7 Apr 2023 17:14:50 +0300 Subject: net: dsa: add trace points for FDB/MDB operations DSA performs non-trivial housekeeping of unicast and multicast addresses on shared (CPU and DSA) ports, and puts a bit of pressure on higher layers, requiring them to behave correctly (remove these addresses exactly as many times as they were added). Otherwise, either addresses linger around forever, or DSA returns -ENOENT complaining that entries that were already deleted must be deleted again. To aid debugging, introduce some trace points specifically for FDB and MDB - that's where some of the bugs still are right now. Some bugs I have seen were also due to race conditions, see: - 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path") - a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN") so it would be good to not disturb the timing too much, hence the choice to use trace points vs regular dev_dbg(). I've had these for some time on my computer in a less polished form, and they've proven useful. What I found most useful was to enable CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline, and run "cat /sys/kernel/debug/tracing/trace". This is to debug more complex environments with network managers started by the init system, things like that. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/dsa/Makefile | 6 +- net/dsa/switch.c | 61 +++++++++-- net/dsa/trace.c | 39 +++++++ net/dsa/trace.h | 329 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 423 insertions(+), 12 deletions(-) create mode 100644 net/dsa/trace.c create mode 100644 net/dsa/trace.h (limited to 'net/dsa') diff --git a/net/dsa/Makefile b/net/dsa/Makefile index 3835de286116..12e305824a96 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -16,7 +16,8 @@ dsa_core-y += \ slave.o \ switch.o \ tag.o \ - tag_8021q.o + tag_8021q.o \ + trace.o # tagging formats obj-$(CONFIG_NET_DSA_TAG_AR9331) += tag_ar9331.o @@ -37,3 +38,6 @@ obj-$(CONFIG_NET_DSA_TAG_RZN1_A5PSW) += tag_rzn1_a5psw.o obj-$(CONFIG_NET_DSA_TAG_SJA1105) += tag_sja1105.o obj-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o obj-$(CONFIG_NET_DSA_TAG_XRS700X) += tag_xrs700x.o + +# for tracing framework to find trace.h +CFLAGS_trace.o := -I$(src) diff --git a/net/dsa/switch.c b/net/dsa/switch.c index d5bc4bb7310d..ff1b5d980e37 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -18,6 +18,7 @@ #include "slave.h" #include "switch.h" #include "tag_8021q.h" +#include "trace.h" static unsigned int dsa_switch_fastest_ageing_time(struct dsa_switch *ds, unsigned int ageing_time) @@ -164,14 +165,20 @@ static int dsa_port_do_mdb_add(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_mdb_add(ds, port, mdb, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_mdb_add(ds, port, mdb, db); + trace_dsa_mdb_add_hw(dp, mdb->addr, mdb->vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_mdb_add_bump(dp, mdb->addr, mdb->vid, &db, + &a->refcount); goto out; } @@ -182,6 +189,7 @@ static int dsa_port_do_mdb_add(struct dsa_port *dp, } err = ds->ops->port_mdb_add(ds, port, mdb, db); + trace_dsa_mdb_add_hw(dp, mdb->addr, mdb->vid, &db, err); if (err) { kfree(a); goto out; @@ -209,21 +217,30 @@ static int dsa_port_do_mdb_del(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_mdb_del(ds, port, mdb, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_mdb_del(ds, port, mdb, db); + trace_dsa_mdb_del_hw(dp, mdb->addr, mdb->vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid, db); if (!a) { + trace_dsa_mdb_del_not_found(dp, mdb->addr, mdb->vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_mdb_del_drop(dp, mdb->addr, mdb->vid, &db, + &a->refcount); goto out; + } err = ds->ops->port_mdb_del(ds, port, mdb, db); + trace_dsa_mdb_del_hw(dp, mdb->addr, mdb->vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; @@ -247,14 +264,19 @@ static int dsa_port_do_fdb_add(struct dsa_port *dp, const unsigned char *addr, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_fdb_add(ds, port, addr, vid, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_fdb_add(ds, port, addr, vid, db); + trace_dsa_fdb_add_hw(dp, addr, vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->fdbs, addr, vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_fdb_add_bump(dp, addr, vid, &db, &a->refcount); goto out; } @@ -265,6 +287,7 @@ static int dsa_port_do_fdb_add(struct dsa_port *dp, const unsigned char *addr, } err = ds->ops->port_fdb_add(ds, port, addr, vid, db); + trace_dsa_fdb_add_hw(dp, addr, vid, &db, err); if (err) { kfree(a); goto out; @@ -291,21 +314,29 @@ static int dsa_port_do_fdb_del(struct dsa_port *dp, const unsigned char *addr, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_fdb_del(ds, port, addr, vid, db); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_fdb_del(ds, port, addr, vid, db); + trace_dsa_fdb_del_hw(dp, addr, vid, &db, err); + + return err; + } mutex_lock(&dp->addr_lists_lock); a = dsa_mac_addr_find(&dp->fdbs, addr, vid, db); if (!a) { + trace_dsa_fdb_del_not_found(dp, addr, vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_fdb_del_drop(dp, addr, vid, &db, &a->refcount); goto out; + } err = ds->ops->port_fdb_del(ds, port, addr, vid, db); + trace_dsa_fdb_del_hw(dp, addr, vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; @@ -332,6 +363,8 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag, a = dsa_mac_addr_find(&lag->fdbs, addr, vid, db); if (a) { refcount_inc(&a->refcount); + trace_dsa_lag_fdb_add_bump(lag->dev, addr, vid, &db, + &a->refcount); goto out; } @@ -342,6 +375,7 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag, } err = ds->ops->lag_fdb_add(ds, *lag, addr, vid, db); + trace_dsa_lag_fdb_add_hw(lag->dev, addr, vid, &db, err); if (err) { kfree(a); goto out; @@ -370,14 +404,19 @@ static int dsa_switch_do_lag_fdb_del(struct dsa_switch *ds, struct dsa_lag *lag, a = dsa_mac_addr_find(&lag->fdbs, addr, vid, db); if (!a) { + trace_dsa_lag_fdb_del_not_found(lag->dev, addr, vid, &db); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&a->refcount)) + if (!refcount_dec_and_test(&a->refcount)) { + trace_dsa_lag_fdb_del_drop(lag->dev, addr, vid, &db, + &a->refcount); goto out; + } err = ds->ops->lag_fdb_del(ds, *lag, addr, vid, db); + trace_dsa_lag_fdb_del_hw(lag->dev, addr, vid, &db, err); if (err) { refcount_set(&a->refcount, 1); goto out; diff --git a/net/dsa/trace.c b/net/dsa/trace.c new file mode 100644 index 000000000000..1b107165d331 --- /dev/null +++ b/net/dsa/trace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Copyright 2022-2023 NXP + */ + +#define CREATE_TRACE_POINTS +#include "trace.h" + +void dsa_db_print(const struct dsa_db *db, char buf[DSA_DB_BUFSIZ]) +{ + switch (db->type) { + case DSA_DB_PORT: + sprintf(buf, "port %s", db->dp->name); + break; + case DSA_DB_LAG: + sprintf(buf, "lag %s id %d", db->lag.dev->name, db->lag.id); + break; + case DSA_DB_BRIDGE: + sprintf(buf, "bridge %s num %d", db->bridge.dev->name, + db->bridge.num); + break; + default: + sprintf(buf, "unknown"); + break; + } +} + +const char *dsa_port_kind(const struct dsa_port *dp) +{ + switch (dp->type) { + case DSA_PORT_TYPE_USER: + return "user"; + case DSA_PORT_TYPE_CPU: + return "cpu"; + case DSA_PORT_TYPE_DSA: + return "dsa"; + default: + return "unused"; + } +} diff --git a/net/dsa/trace.h b/net/dsa/trace.h new file mode 100644 index 000000000000..42c8bbc7d472 --- /dev/null +++ b/net/dsa/trace.h @@ -0,0 +1,329 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright 2022-2023 NXP + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dsa + +#if !defined(_NET_DSA_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _NET_DSA_TRACE_H + +#include +#include +#include +#include + +/* Enough to fit "bridge %s num %d" where num has 3 digits */ +#define DSA_DB_BUFSIZ (IFNAMSIZ + 16) + +void dsa_db_print(const struct dsa_db *db, char buf[DSA_DB_BUFSIZ]); +const char *dsa_port_kind(const struct dsa_port *dp); + +DECLARE_EVENT_CLASS(dsa_port_addr_op_hw, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db, int err), + + TP_ARGS(dp, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __get_str(kind), __entry->port, __entry->addr, + __entry->vid, __entry->db_buf, __entry->err) +); + +/* Add unicast/multicast address to hardware, either on user ports + * (where no refcounting is kept), or on shared ports when the entry + * is first seen and its refcount is 1. + */ +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_fdb_add_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_mdb_add_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +/* Delete unicast/multicast address from hardware, either on user ports or + * when the refcount on shared ports reaches 0 + */ +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_fdb_del_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DEFINE_EVENT(dsa_port_addr_op_hw, dsa_mdb_del_hw, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + TP_ARGS(dp, addr, vid, db, err)); + +DECLARE_EVENT_CLASS(dsa_port_addr_op_refcount, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(dp, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->addr, + __entry->vid, __entry->db_buf, __entry->refcount) +); + +/* Bump the refcount of an existing unicast/multicast address on shared ports */ +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_fdb_add_bump, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_mdb_add_bump, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +/* Drop the refcount of a multicast address that we still keep on + * shared ports + */ +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_fdb_del_drop, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DEFINE_EVENT(dsa_port_addr_op_refcount, dsa_mdb_del_drop, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db, + const refcount_t *refcount), + TP_ARGS(dp, addr, vid, db, refcount)); + +DECLARE_EVENT_CLASS(dsa_port_addr_del_not_found, + + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, u16 vid, + const struct dsa_db *db), + + TP_ARGS(dp, addr, vid, db), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + ), + + TP_printk("%s %s port %d addr %pM vid %u db \"%s\"", + __get_str(dev), __get_str(kind), __entry->port, + __entry->addr, __entry->vid, __entry->db_buf) +); + +/* Attempt to delete a unicast/multicast address on shared ports for which + * the delete operation was called more times than the addition + */ +DEFINE_EVENT(dsa_port_addr_del_not_found, dsa_fdb_del_not_found, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + TP_ARGS(dp, addr, vid, db)); + +DEFINE_EVENT(dsa_port_addr_del_not_found, dsa_mdb_del_not_found, + TP_PROTO(const struct dsa_port *dp, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + TP_ARGS(dp, addr, vid, db)); + +TRACE_EVENT(dsa_lag_fdb_add_hw, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + + TP_ARGS(lag_dev, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->err) +); + +TRACE_EVENT(dsa_lag_fdb_add_bump, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(lag_dev, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->refcount) +); + +TRACE_EVENT(dsa_lag_fdb_del_hw, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, int err), + + TP_ARGS(lag_dev, addr, vid, db, err), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->err = err; + ), + + TP_printk("%s addr %pM vid %u db \"%s\" err %d", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->err) +); + +TRACE_EVENT(dsa_lag_fdb_del_drop, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db, const refcount_t *refcount), + + TP_ARGS(lag_dev, addr, vid, db, refcount), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s addr %pM vid %u db \"%s\" refcount %u", + __get_str(dev), __entry->addr, __entry->vid, + __entry->db_buf, __entry->refcount) +); + +TRACE_EVENT(dsa_lag_fdb_del_not_found, + + TP_PROTO(const struct net_device *lag_dev, const unsigned char *addr, + u16 vid, const struct dsa_db *db), + + TP_ARGS(lag_dev, addr, vid, db), + + TP_STRUCT__entry( + __string(dev, lag_dev->name) + __array(unsigned char, addr, ETH_ALEN) + __field(u16, vid) + __array(char, db_buf, DSA_DB_BUFSIZ) + ), + + TP_fast_assign( + __assign_str(dev, lag_dev->name); + ether_addr_copy(__entry->addr, addr); + __entry->vid = vid; + dsa_db_print(db, __entry->db_buf); + ), + + TP_printk("%s addr %pM vid %u db \"%s\"", + __get_str(dev), __entry->addr, __entry->vid, __entry->db_buf) +); + +#endif /* _NET_DSA_TRACE_H */ + +/* We don't want to use include/trace/events */ +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE trace +/* This part must be outside protection */ +#include -- cgit From 02020bd70fa6abcb1c2a8525ce7c1500dd4f44a8 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 7 Apr 2023 17:14:51 +0300 Subject: net: dsa: add trace points for VLAN operations These are not as critical as the FDB/MDB trace points (I'm not aware of outstanding VLAN related bugs), but maybe they are useful to somebody, either debugging something or simply trying to learn more. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/dsa/switch.c | 24 ++++++++--- net/dsa/trace.h | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+), 5 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/switch.c b/net/dsa/switch.c index ff1b5d980e37..8c9a9f94b756 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -695,8 +695,12 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports. */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_vlan_add(ds, port, vlan, extack); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_vlan_add(ds, port, vlan, extack); + trace_dsa_vlan_add_hw(dp, vlan, err); + + return err; + } /* No need to propagate on shared ports the existing VLANs that were * re-notified after just the flags have changed. This would cause a @@ -711,6 +715,7 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, v = dsa_vlan_find(&dp->vlans, vlan); if (v) { refcount_inc(&v->refcount); + trace_dsa_vlan_add_bump(dp, vlan, &v->refcount); goto out; } @@ -721,6 +726,7 @@ static int dsa_port_do_vlan_add(struct dsa_port *dp, } err = ds->ops->port_vlan_add(ds, port, vlan, extack); + trace_dsa_vlan_add_hw(dp, vlan, err); if (err) { kfree(v); goto out; @@ -745,21 +751,29 @@ static int dsa_port_do_vlan_del(struct dsa_port *dp, int err = 0; /* No need to bother with refcounting for user ports */ - if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) - return ds->ops->port_vlan_del(ds, port, vlan); + if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp))) { + err = ds->ops->port_vlan_del(ds, port, vlan); + trace_dsa_vlan_del_hw(dp, vlan, err); + + return err; + } mutex_lock(&dp->vlans_lock); v = dsa_vlan_find(&dp->vlans, vlan); if (!v) { + trace_dsa_vlan_del_not_found(dp, vlan); err = -ENOENT; goto out; } - if (!refcount_dec_and_test(&v->refcount)) + if (!refcount_dec_and_test(&v->refcount)) { + trace_dsa_vlan_del_drop(dp, vlan, &v->refcount); goto out; + } err = ds->ops->port_vlan_del(ds, port, vlan); + trace_dsa_vlan_del_hw(dp, vlan, err); if (err) { refcount_set(&v->refcount, 1); goto out; diff --git a/net/dsa/trace.h b/net/dsa/trace.h index 42c8bbc7d472..567f29a39707 100644 --- a/net/dsa/trace.h +++ b/net/dsa/trace.h @@ -9,7 +9,9 @@ #define _NET_DSA_TRACE_H #include +#include #include +#include #include #include @@ -318,6 +320,122 @@ TRACE_EVENT(dsa_lag_fdb_del_not_found, __get_str(dev), __entry->addr, __entry->vid, __entry->db_buf) ); +DECLARE_EVENT_CLASS(dsa_vlan_op_hw, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + + TP_ARGS(dp, vlan, err), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + __field(u16, flags) + __field(bool, changed) + __field(int, err) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + __entry->flags = vlan->flags; + __entry->changed = vlan->changed; + __entry->err = err; + ), + + TP_printk("%s %s port %d vid %u%s%s%s", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid, + __entry->flags & BRIDGE_VLAN_INFO_PVID ? " pvid" : "", + __entry->flags & BRIDGE_VLAN_INFO_UNTAGGED ? " untagged" : "", + __entry->changed ? " (changed)" : "") +); + +DEFINE_EVENT(dsa_vlan_op_hw, dsa_vlan_add_hw, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + TP_ARGS(dp, vlan, err)); + +DEFINE_EVENT(dsa_vlan_op_hw, dsa_vlan_del_hw, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, int err), + TP_ARGS(dp, vlan, err)); + +DECLARE_EVENT_CLASS(dsa_vlan_op_refcount, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + + TP_ARGS(dp, vlan, refcount), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + __field(u16, flags) + __field(bool, changed) + __field(unsigned int, refcount) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + __entry->flags = vlan->flags; + __entry->changed = vlan->changed; + __entry->refcount = refcount_read(refcount); + ), + + TP_printk("%s %s port %d vid %u%s%s%s refcount %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid, + __entry->flags & BRIDGE_VLAN_INFO_PVID ? " pvid" : "", + __entry->flags & BRIDGE_VLAN_INFO_UNTAGGED ? " untagged" : "", + __entry->changed ? " (changed)" : "", __entry->refcount) +); + +DEFINE_EVENT(dsa_vlan_op_refcount, dsa_vlan_add_bump, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + TP_ARGS(dp, vlan, refcount)); + +DEFINE_EVENT(dsa_vlan_op_refcount, dsa_vlan_del_drop, + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan, + const refcount_t *refcount), + TP_ARGS(dp, vlan, refcount)); + +TRACE_EVENT(dsa_vlan_del_not_found, + + TP_PROTO(const struct dsa_port *dp, + const struct switchdev_obj_port_vlan *vlan), + + TP_ARGS(dp, vlan), + + TP_STRUCT__entry( + __string(dev, dev_name(dp->ds->dev)) + __string(kind, dsa_port_kind(dp)) + __field(int, port) + __field(u16, vid) + ), + + TP_fast_assign( + __assign_str(dev, dev_name(dp->ds->dev)); + __assign_str(kind, dsa_port_kind(dp)); + __entry->port = dp->index; + __entry->vid = vlan->vid; + ), + + TP_printk("%s %s port %d vid %u", + __get_str(dev), __get_str(kind), __entry->port, __entry->vid) +); + #endif /* _NET_DSA_TRACE_H */ /* We don't want to use include/trace/events */ -- cgit From eabb1494c9f20362ae53a9991481a1523be4f4b7 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:55:56 +0300 Subject: net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmit skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean Reviewed-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/tag_ocelot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 28ebecafdd24..73ee09de1a3a 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -26,7 +26,7 @@ static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, return; } - hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + hdr = skb_vlan_eth_hdr(skb); br_vlan_get_proto(br, &proto); if (ntohs(hdr->h_vlan_proto) == proto) { -- cgit From 499b2491d550677b824b828ff431e4ef4d1d3b9d Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:55:57 +0300 Subject: net: dsa: tag_ksz: do not rely on skb_mac_header() in TX paths skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to get to the Ethernet header's MAC DA instead, helper which assumes this header is located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean Reviewed-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/tag_ksz.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 0eb1c7784c3d..ea100bd25939 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -120,18 +120,18 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb, static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev) { struct dsa_port *dp = dsa_slave_to_port(dev); + struct ethhdr *hdr; u8 *tag; - u8 *addr; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) return NULL; /* Tag encoding */ tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); *tag = 1 << dp->index; - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) *tag |= KSZ8795_TAIL_TAG_OVERRIDE; return skb; @@ -273,8 +273,8 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, u16 queue_mapping = skb_get_queue_mapping(skb); u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); + struct ethhdr *hdr; __be16 *tag; - u8 *addr; u16 val; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) @@ -284,13 +284,13 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, ksz_xmit_timestamp(dp, skb); tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); val = BIT(dp->index); val |= FIELD_PREP(KSZ9477_TAIL_TAG_PRIO, prio); - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) val |= KSZ9477_TAIL_TAG_OVERRIDE; *tag = cpu_to_be16(val); @@ -337,7 +337,7 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, u16 queue_mapping = skb_get_queue_mapping(skb); u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); - u8 *addr; + struct ethhdr *hdr; u8 *tag; if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) @@ -347,13 +347,13 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, ksz_xmit_timestamp(dp, skb); tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); - addr = skb_mac_header(skb); + hdr = skb_eth_hdr(skb); *tag = BIT(dp->index); *tag |= FIELD_PREP(KSZ9893_TAIL_TAG_PRIO, prio); - if (is_link_local_ether_addr(addr)) + if (is_link_local_ether_addr(hdr->h_dest)) *tag |= KSZ9893_TAIL_TAG_OVERRIDE; return ksz_defer_xmit(dp, skb); -- cgit From f9346f00b5af0b32503dcd74072392d9592163ef Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:55:58 +0300 Subject: net: dsa: tag_sja1105: don't rely on skb_mac_header() in TX paths skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean Reviewed-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/tag_sja1105.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c index 1c2ceba4771b..a7ca97b7ac9e 100644 --- a/net/dsa/tag_sja1105.c +++ b/net/dsa/tag_sja1105.c @@ -256,7 +256,7 @@ static struct sk_buff *sja1105_pvid_tag_control_pkt(struct dsa_port *dp, return NULL; } - hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + hdr = skb_vlan_eth_hdr(skb); /* If skb is already VLAN-tagged, leave that VLAN ID in place */ if (hdr->h_vlan_proto == xmit_tpid) -- cgit From b5653b157e55a25ec76506e8cd31353e0ed4944c Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:55:59 +0300 Subject: net: dsa: tag_sja1105: replace skb_mac_header() with vlan_eth_hdr() This is a cosmetic patch which consolidates the code to use the helper function offered by if_vlan.h. Signed-off-by: Vladimir Oltean Reviewed-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/tag_sja1105.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c index a7ca97b7ac9e..a5f3b73da417 100644 --- a/net/dsa/tag_sja1105.c +++ b/net/dsa/tag_sja1105.c @@ -516,7 +516,7 @@ static bool sja1110_skb_has_inband_control_extension(const struct sk_buff *skb) static void sja1105_vlan_rcv(struct sk_buff *skb, int *source_port, int *switch_id, int *vbid, u16 *vid) { - struct vlan_ethhdr *hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + struct vlan_ethhdr *hdr = vlan_eth_hdr(skb); u16 vlan_tci; if (skb_vlan_tag_present(skb)) -- cgit From f0a9d563064c3040a2d7b32317b37a6bcb1099b2 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:56:00 +0300 Subject: net: dsa: update TX path comments to not mention skb_mac_header() Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") will be reverted, it will no longer be true that skb->data points at skb_mac_header(skb) - since the skb->mac_header will not be set - so stop saying that, and just say that it points to the MAC header. I've reviewed vlan_insert_tag() and it does not *actually* depend on skb_mac_header(), so reword that to avoid the confusion. Signed-off-by: Vladimir Oltean Reviewed-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/tag.h | 2 +- net/dsa/tag_8021q.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'net/dsa') diff --git a/net/dsa/tag.h b/net/dsa/tag.h index 7cfbca824f1c..32d12f4a9d73 100644 --- a/net/dsa/tag.h +++ b/net/dsa/tag.h @@ -229,7 +229,7 @@ static inline void *dsa_etype_header_pos_rx(struct sk_buff *skb) return skb->data - 2; } -/* On TX, skb->data points to skb_mac_header(skb), which means that EtherType +/* On TX, skb->data points to the MAC header, which means that EtherType * header taggers start exactly where the EtherType is (the EtherType is * treated as part of the DSA header). */ diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c index 5ee9ef00954e..cbdfc392f7e0 100644 --- a/net/dsa/tag_8021q.c +++ b/net/dsa/tag_8021q.c @@ -461,8 +461,8 @@ EXPORT_SYMBOL_GPL(dsa_tag_8021q_unregister); struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev, u16 tpid, u16 tci) { - /* skb->data points at skb_mac_header, which - * is fine for vlan_insert_tag. + /* skb->data points at the MAC header, which is fine + * for vlan_insert_tag(). */ return vlan_insert_tag(skb, htons(tpid), tci); } -- cgit From 0bcf2e4aca6c29a07555b713f2fb461dc38d5977 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 21 Apr 2023 01:56:01 +0300 Subject: net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TX ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most appropriate helper I could find which strips away a VLAN header. That's all I need it to do, but __skb_vlan_pop() has more logic, which will become incompatible with the future revert of commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). Namely, it performs a sanity check on skb_mac_header(), which will stop being set after the above revert, so it will return an error instead of removing the VLAN tag. ocelot_xmit_get_vlan_info() gets called in 2 circumstances: (1) the port is under a VLAN-aware bridge and the bridge sends VLAN-tagged packets (2) the port is under a VLAN-aware bridge and somebody else (an 8021q upper) sends VLAN-tagged packets (using a VID that isn't in the bridge vlan tables) In case (1), there is actually no bug to defend against, because br_dev_xmit() calls skb_reset_mac_header() and things continue to work. However, in case (2), illustrated using the commands below, it can be seen that our intervention is needed, since __skb_vlan_pop() complains: $ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set $eth master br0 && ip link set $eth up $ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up $ ip addr add 192.168.100.1/24 dev $eth.100 I could fend off the checks in __skb_vlan_pop() with some skb_mac_header_was_set() calls, but seeing how few callers of __skb_vlan_pop() there are from TX paths, that seems rather unproductive. As an alternative solution, extract the bare minimum logic to strip a VLAN header, and move it to a new helper named vlan_remove_tag(), close to the definition of vlan_insert_tag(). Document it appropriately and make ocelot_xmit_get_vlan_info() call this smaller helper instead. Seeing that it doesn't appear illegal to test skb->protocol in the TX path, I guess it would be a good for vlan_remove_tag() to also absorb the vlan_set_encap_proto() function call. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- net/dsa/tag_ocelot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index 73ee09de1a3a..20bf7074d5a6 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -30,7 +30,7 @@ static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, br_vlan_get_proto(br, &proto); if (ntohs(hdr->h_vlan_proto) == proto) { - __skb_vlan_pop(skb, &tci); + vlan_remove_tag(skb, &tci); *vlan_tci = tci; } else { rcu_read_lock(); -- cgit From b79d7c14f48083abb3fb061370c0c64a569edf4c Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 17 Jun 2023 09:26:48 +0300 Subject: net: dsa: introduce preferred_default_local_cpu_port and use on MT7530 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean Signed-off-by: Arınç ÜNAL Reviewed-by: Russell King (Oracle) Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- net/dsa/dsa.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'net/dsa') diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index ab1afe67fd18..1afed89e03c0 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -403,6 +403,24 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst) return 0; } +static struct dsa_port * +dsa_switch_preferred_default_local_cpu_port(struct dsa_switch *ds) +{ + struct dsa_port *cpu_dp; + + if (!ds->ops->preferred_default_local_cpu_port) + return NULL; + + cpu_dp = ds->ops->preferred_default_local_cpu_port(ds); + if (!cpu_dp) + return NULL; + + if (WARN_ON(!dsa_port_is_cpu(cpu_dp) || cpu_dp->ds != ds)) + return NULL; + + return cpu_dp; +} + /* Perform initial assignment of CPU ports to user ports and DSA links in the * fabric, giving preference to CPU ports local to each switch. Default to * using the first CPU port in the switch tree if the port does not have a CPU @@ -410,12 +428,16 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst) */ static int dsa_tree_setup_cpu_ports(struct dsa_switch_tree *dst) { - struct dsa_port *cpu_dp, *dp; + struct dsa_port *preferred_cpu_dp, *cpu_dp, *dp; list_for_each_entry(cpu_dp, &dst->ports, list) { if (!dsa_port_is_cpu(cpu_dp)) continue; + preferred_cpu_dp = dsa_switch_preferred_default_local_cpu_port(cpu_dp->ds); + if (preferred_cpu_dp && preferred_cpu_dp != cpu_dp) + continue; + /* Prefer a local CPU port */ dsa_switch_for_each_port(dp, cpu_dp->ds) { /* Prefer the first local CPU port found */ -- cgit