summaryrefslogtreecommitdiff
path: root/net/dsa/slave.c
AgeCommit message (Collapse)Author
2022-03-14net: dsa: report and change port default priority using dcbnlVladimir Oltean
The port-based default QoS class is assigned to packets that lack a VLAN PCP (or the port is configured to not trust the VLAN PCP), an IP DSCP (or the port is configured to not trust IP DSCP), and packets on which no tc-skbedit action has matched. Similar to other drivers, this can be exposed to user space using the DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table D-8 - Sel field values that when the Selector is 1, the Protocol ID value of 0 denotes the "Default application priority. For use when application priority is not otherwise specified." The way in which the dcbnl integration in DSA has been designed has to do with its requirements. Andrew Lunn explains that SOHO switches are expected to come with some sort of pre-configured QoS profile, and that it is desirable for this to come pre-loaded into the DSA slave interfaces' DCB application priority table. In the dcbnl design, this is possible because calls to dcb_ieee_setapp() can be initiated by anyone including being self-initiated by this device driver. However, what makes this challenging to implement in DSA is that the DSA core manages the net_devices (effectively hiding them from drivers), while drivers manage the hardware. The DSA core has no knowledge of what individual drivers' QoS policies are. DSA could export to drivers a wrapper over dcb_ieee_setapp() and these could call that function to pre-populate the app priority table, however drivers don't have a good moment in time to do this. The dsa_switch_ops :: setup() method gets called before the net_devices are created (dsa_slave_create), and so is dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops :: port_enable(), but this gets called upon each ndo_open. If we add app table entries on every open, we'd need to remove them on close, to avoid duplicate entry errors. But if we delete app priority entries on close, what we delete may not be the initial, driver pre-populated entries, but rather user-added entries. So it is clear that letting drivers choose the timing of the dcb_ieee_setapp() call is inappropriate. The alternative which was chosen is to introduce hardware-specific ops in dsa_switch_ops, and effectively hide dcbnl details from drivers as well. For pre-populating the application table, dsa_slave_dcbnl_init() will call ds->ops->port_get_default_prio() which is supposed to read from hardware. If the operation succeeds, DSA creates a default-prio app table entry. The method is called as soon as the slave_dev is registered, but before we release the rtnl_mutex. This is done such that user space sees the app table entries as soon as it sees the interface being registered. The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is NULL. Because there are still dcbnl-unaware DSA drivers even if they have dcbnl_ops populated, the way to restore the behavior is to make all dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific dsa_switch_ops method. The dcbnl framework absurdly allows there to be more than one app table entry for the same selector and protocol (in other words, more than one port-based default priority). In the iproute2 dcb program, there is a "replace" syntactical sugar command which performs an "add" and a "del" to hide this away. But we choose the largest configured priority when we call ds->ops->port_set_default_prio(), using __fls(). When there is no default-prio app table entry left, the port-default priority is restored to 0. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: be mostly no-op in dsa_slave_set_mac_address when downVladimir Oltean
Since the slave unicast address is synced to hardware and to the DSA master during dsa_slave_open(), this means that a call to dsa_slave_set_mac_address() while the slave interface is down will result to a call to dsa_port_standalone_host_fdb_del() and to dev_uc_del() for the MAC address while there was no previous dsa_port_standalone_host_fdb_add() or dev_uc_add(). This is a partial revert of the blamed commit below, which was too aggressive. Fixes: 35aae5ab9121 ("net: dsa: remove workarounds for changing master promisc/allmulti only while up") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: manage flooding on the CPU portsVladimir Oltean
DSA can treat IFF_PROMISC and IFF_ALLMULTI on standalone user ports as signifying whether packets with an unknown MAC DA will be received or not. Since known MAC DAs are handled by FDB/MDB entries, this means that promiscuity is analogous to including/excluding the CPU port from the flood domain of those packets. There are two ways to signal CPU flooding to drivers. The first (chosen here) is to synthesize a call to ds->ops->port_bridge_flags() for the CPU port, with a mask of BR_FLOOD | BR_MCAST_FLOOD. This has the effect of turning on egress flooding on the CPU port regardless of source. The alternative would be to create a new ds->ops->port_host_flood() which is called per user port. Some switches (sja1105) have a flood domain that is managed per {ingress port, egress port} pair, so it would make more sense for this kind of switch to not flood the CPU from port A if just port B requires it. Nonetheless, the sja1105 has other quirks that prevent it from making use of unicast filtering, and without a concrete user making use of this feature, I chose not to implement it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: install the primary unicast MAC address as standalone port host FDBVladimir Oltean
To be able to safely turn off CPU flooding for standalone ports, we need to ensure that the dev_addr of each DSA slave interface is installed as a standalone host FDB entry for compatible switches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: install secondary unicast and multicast addresses as host FDB/MDBVladimir Oltean
In preparation of disabling flooding towards the CPU in standalone ports mode, identify the addresses requested by upper interfaces and use the new API for DSA FDB isolation to request the hardware driver to offload these as FDB or MDB objects. The objects belong to the user port's database, and are installed pointing towards the CPU port. Because dev_uc_add()/dev_mc_add() is VLAN-unaware, we offload to the port standalone database addresses with VID 0 (also VLAN-unaware). So this excludes switches with global VLAN filtering from supporting unicast filtering, because there, it is possible for a port of a switch to join a VLAN-aware bridge, and this changes the VLAN awareness of standalone ports, requiring VLAN-aware standalone host FDB entries. For the same reason, hellcreek, which requires VLAN awareness in standalone mode, is also exempted from unicast filtering. We create "standalone" variants of dsa_port_host_fdb_add() and dsa_port_host_mdb_add() (and the _del coresponding functions). We also create a separate work item type for handling deferred standalone host FDB/MDB entries compared to the switchdev one. This is done for the purpose of clarity - the procedure for offloading a bridge FDB entry is different than offloading a standalone one, and the switchdev event work handles only FDBs anyway, not MDBs. Deferral is needed for standalone entries because ndo_set_rx_mode runs in atomic context. We could probably optimize things a little by first queuing up all entries that need to be offloaded, and scheduling the work item just once, but the data structures that we can pass through __dev_uc_sync() and __dev_mc_sync() are limiting (there is nothing like a void *priv), so we'd have to keep the list of queued events somewhere in struct dsa_switch, and possibly a lock for it. Too complicated for now. Adding the address to the master is handled by dev_uc_sync(), adding it to the hardware is handled by __dev_uc_sync(). So this is the reason why dsa_port_standalone_host_fdb_add() does not call dev_uc_add(). Not that it had the rtnl_mutex anyway - ndo_set_rx_mode has it, but is atomic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespaceVladimir Oltean
We are preparing to add API in port.c that adds FDB and MDB entries that correspond to the port's standalone database. Rename the existing methods to make it clear that the FDB and MDB entries offloaded come from the bridge database. Since the function names lengthen in dsa_slave_switchdev_event_work(), we place "addr" and "vid" in temporary variables, to shorten those. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: remove workarounds for changing master promisc/allmulti only while upVladimir Oltean
Lennert Buytenhek explains in commit df02c6ff2e39 ("dsa: fix master interface allmulti/promisc handling"), dated Nov 2008, that changing the promiscuity of interfaces that are down (here the master) is broken. This fact regarding promisc/allmulti has changed since commit b6c40d68ff64 ("net: only invoke dev->change_rx_flags when device is UP") by Vlad Yasevich, dated Nov 2013. Therefore, DSA now has unnecessary complexity to handle master state transitions from down to up. In fact, syncing the unicast and multicast addresses can happen completely asynchronously to the administrative state changes. This change reduces that complexity by effectively fully reverting commit df02c6ff2e39 ("dsa: fix master interface allmulti/promisc handling"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-24net: dsa: support FDB events on offloaded LAG interfacesVladimir Oltean
This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as support for filtering towards the CPU local FDB entries emitted for LAG interfaces that are bridge ports. Conceptually, host addresses on LAG ports are identical to what we do for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply be replicated towards all member ports like we do for multicast, or VLAN. Instead we need new driver API. Hardware usually considers a LAG to be a "logical port", and sets the entire LAG as the forwarding destination. The physical egress port selection within the LAG is made by hashing policy, as usual. To represent the logical port corresponding to the LAG, we pass by value a copy of the dsa_lag structure to all switches in the tree that have at least one port in that LAG. To illustrate why a refcounted list of FDB entries is needed in struct dsa_lag, it is enough to say that: - a LAG may be a bridge port and may therefore receive FDB events even while it isn't yet offloaded by any DSA interface - DSA interfaces may be removed from a LAG while that is a bridge port; we don't want FDB entries lingering around, but we don't want to remove entries that are still in use, either For all the cases below to work, the idea is to always keep an FDB entry on a LAG with a reference count equal to the DSA member ports. So: - if a port joins a LAG, it requests the bridge to replay the FDB, and the FDB entries get created, or their refcount gets bumped by one - if a port leaves a LAG, the FDB replay deletes or decrements refcount by one - if an FDB is installed towards a LAG with ports already present, that entry is created (if it doesn't exist) and its refcount is bumped by the amount of ports already present in the LAG echo "Adding FDB entry to bond with existing ports" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond, then removing ports one by one" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link set swp1 nomaster ip link set swp2 nomaster ip link del br0 ip link del bond0 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_devVladimir Oltean
When switchdev_handle_fdb_event_to_device() replicates a FDB event emitted for the bridge or for a LAG port and DSA offloads that, we should notify back to switchdev that the FDB entry on the original device is what was offloaded, not on the DSA slave devices that the event is replicated on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_workVladimir Oltean
By construction, the struct net_device *dev passed to dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work is always a DSA slave device. Therefore, it is redundant to pass struct dsa_switch and int port information in the deferred work structure. This can be retrieved at all times from the provided struct net_device via dsa_slave_to_port(). For the same reason, we can drop the dsa_is_user_port() check in dsa_fdb_offload_notify(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_deviceVladimir Oltean
When the switchdev_handle_fdb_event_to_device() event replication helper was created, my original thought was that FDB events on LAG interfaces should most likely be special-cased, not just replicated towards all switchdev ports beneath that LAG. So this replication helper currently does not recurse through switchdev lower interfaces of LAG bridge ports, but rather calls the lag_mod_cb() if that was provided. No switchdev driver uses this helper for FDB events on LAG interfaces yet, so that was an assumption which was yet to be tested. It is certainly usable for that purpose, as my RFC series shows: https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/ however this approach is slightly convoluted because: - the switchdev driver gets a "dev" that isn't its own net device, but rather the LAG net device. It must call switchdev_lower_dev_find(dev) in order to get a handle of any of its own net devices (the ones that pass check_cb). - in order for FDB entries on LAG ports to be correctly refcounted per the number of switchdev ports beneath that LAG, we haven't escaped the need to iterate through the LAG's lower interfaces. Except that is now the responsibility of the switchdev driver, because the replication helper just stopped half-way. So, even though yes, FDB events on LAG bridge ports must be special-cased, in the end it's simpler to let switchdev_handle_fdb_* just iterate through the LAG port's switchdev lowers, and let the switchdev driver figure out that those physical ports are under a LAG. The switchdev_handle_fdb_event_to_device() helper takes a "foreign_dev_check" callback so it can figure out whether @dev can autonomously forward to @foreign_dev. DSA fills this method properly: if the LAG is offloaded by another port in the same tree as @dev, then it isn't foreign. If it is a software LAG, it is foreign - forwarding happens in software. Whether an interface is foreign or not decides whether the replication helper will go through the LAG's switchdev lowers or not. Since the lan966x doesn't properly fill this out, FDB events on software LAG uppers will get called. By changing lan966x_foreign_dev_check(), we can suppress them. Whereas DSA will now start receiving FDB events for its offloaded LAG uppers, so we need to return -EOPNOTSUPP, since we currently don't do the right thing for them. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: create a dsa_lag structureVladimir Oltean
The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16net: dsa: offload bridge port VLANs on foreign interfacesVladimir Oltean
DSA now explicitly handles VLANs installed with the 'self' flag on the bridge as host VLANs, instead of just replicating every bridge port VLAN also on the CPU port and never deleting it, which is what it did before. However, this leaves a corner case uncovered, as explained by Tobias Waldekranz: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 Forwarding towards a bridge port VLAN installed on a bridge port foreign to DSA (separate NIC, Wi-Fi AP) used to work by virtue of the fact that DSA itself needed to have at least one port in that VLAN (therefore, it also had the CPU port in said VLAN). However, now that the CPU port may not be member of all VLANs that user ports are members of, we need to ensure this isn't the case if software forwarding to a foreign interface is required. The solution is to treat bridge port VLANs on standalone interfaces in the exact same way as host VLANs. From DSA's perspective, there is no difference between local termination and software forwarding; packets in that VLAN must reach the CPU in both cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: dsa: add explicit support for host bridge VLANsVladimir Oltean
Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it does so on user ports. This is good for basic functionality but has several limitations: - the VLAN group which must reach the CPU may be radically different from the VLAN group that must be autonomously forwarded by the switch. In other words, the admin may want to isolate noisy stations and avoid traffic from them going to the control processor of the switch, where it would just waste useless cycles. The bridge already supports independent control of VLAN groups on bridge ports and on the bridge itself, and when VLAN-aware, it will drop packets in software anyway if their VID isn't added as a 'self' entry towards the bridge device. - Replaying host FDB entries may depend, for some drivers like mv88e6xxx, on replaying the host VLANs as well. The 2 VLAN groups are approximately the same in most regular cases, but there are corner cases when timing matters, and DSA's approximation of replicating VLANs on shared ports simply does not work. - If a user makes the bridge (implicitly the CPU port) join a VLAN by accident, there is no way for the CPU port to isolate itself from that noisy VLAN except by rebooting the system. This is because for each VLAN added on a user port, DSA will add it on shared ports too, but for each VLAN deletion on a user port, it will remain installed on shared ports, since DSA has no good indication of whether the VLAN is still in use or not. Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN addition and removal events, DSA has a simple and straightforward task of separating the bridge port VLANs (these have an orig_dev which is a DSA slave interface, or a LAG interface) from the host VLANs (these have an orig_dev which is a bridge interface), and to keep a simple reference count of each VID on each shared port. Forwarding VLANs must be installed on the bridge ports and on all DSA ports interconnecting them. We don't have a good view of the exact topology, so we simply install forwarding VLANs on all DSA ports, which is what has been done until now. Host VLANs must be installed primarily on the dedicated CPU port of each bridge port. More subtly, they must also be installed on upstream-facing and downstream-facing DSA ports that are connecting the bridge ports and the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB entry may be absent from VTU) is still addressed even if that switch is in a cross-chip setup, and it has no local CPU port. Therefore: - user ports contain only bridge port (forwarding) VLANs, and no refcounting is necessary - DSA ports contain both forwarding and host VLANs. Refcounting is necessary among these 2 types. - CPU ports contain only host VLANs. Refcounting is also necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net: dsa: remove lockdep class for DSA slave address listVladimir Oltean
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_idVladimir Oltean
There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: provide switch operations for tracking the master stateVladimir Oltean
Certain drivers may need to send management traffic to the switch for things like register access, FDB dump, etc, to accelerate what their slow bus (SPI, I2C, MDIO) can already do. Ethernet is faster (especially in bulk transactions) but is also more unreliable, since the user may decide to bring the DSA master down (or not bring it up), therefore severing the link between the host and the attached switch. Drivers needing Ethernet-based register access already should have fallback logic to the slow bus if the Ethernet method fails, but that fallback may be based on a timeout, and the I/O to the switch may slow down to a halt if the master is down, because every Ethernet packet will have to time out. The driver also doesn't have the option to turn off Ethernet-based I/O momentarily, because it wouldn't know when to turn it back on. Which is where this change comes in. By tracking NETDEV_CHANGE, NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know the exact interval of time during which this interface is reliably available for traffic. Provide this information to switches so they can use it as they wish. An helper is added dsa_port_master_is_operational() to check if a master port is operational. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06net: dsa: merge rtnl_lock sections in dsa_slave_createVladimir Oltean
Currently dsa_slave_create() has two sequences of rtnl_lock/rtnl_unlock in a row. Remove the rtnl_unlock() and rtnl_lock() in between, such that the operation can execute slighly faster. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06net: dsa: reorder PHY initialization with MTU setup in slave.cVladimir Oltean
In dsa_slave_create() there are 2 sections that take rtnl_lock(): MTU change and netdev registration. They are separated by PHY initialization. There isn't any strict ordering requirement except for the fact that netdev registration should be last. Therefore, we can perform the MTU change a bit later, after the PHY setup. A future change will then be able to merge the two rtnl_lock sections into one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-08net: dsa: keep the bridge_dev and bridge_num as part of the same structureVladimir Oltean
The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: rename dsa_port_offloads_bridge to dsa_port_offloads_bridge_devVladimir Oltean
Currently the majority of dsa_port_bridge_dev_get() calls in drivers is just to check whether a port is under the bridge device provided as argument by the DSA API. We'd like to change that DSA API so that a more complex structure is provided as argument. To keep things more generic, and considering that the new complex structure will be provided by value and not by reference, direct comparisons between dp->bridge and the provided bridge will be broken. The generic way to do the checking would simply be to do something like dsa_port_offloads_bridge(dp, &bridge). But there's a problem, we already have a function named that way, which actually takes a bridge_dev net_device as argument. Rename it so that we can use dsa_port_offloads_bridge for something else. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpersVladimir Oltean
The location of the bridge device pointer and number is going to change. It is not going to be kept individually per port, but in a common structure allocated dynamically and which will have lockdep validation. Create helpers to access these elements so that we have a migration path to the new organization. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01net: dsa: consolidate phylink creationRussell King (Oracle)
The code in port.c and slave.c creating the phylink instance is very similar - let's consolidate this into a single function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: dsa: populate supported_interfaces memberMarek Behún
Add a new DSA switch operation, phylink_get_interfaces, which should fill in which PHY_INTERFACE_MODE_* are supported by given port. Use this before phylink_create() to fill phylinks supported_interfaces member, allowing phylink to determine which PHY_INTERFACE_MODEs are supported. Signed-off-by: Marek Behún <kabel@kernel.org> [tweaked patch and description to add more complete support -- rmk] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: switchdev: merge switchdev_handle_fdb_{add,del}_to_deviceVladimir Oltean
To reduce code churn, the same patch makes multiple changes, since they all touch the same lines: 1. The implementations for these two are identical, just with different function pointers. Reduce duplications and name the function pointers "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument. 2. Drop the "const" attribute from "orig_dev". If the driver needs to check whether orig_dev belongs to itself and then call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it can't, because call_switchdev_notifiers takes a non-const struct net_device *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: dsa: stop calling dev_hold in dsa_slave_fdb_eventVladimir Oltean
Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have finished executing by the time we leave our bridge upper interface, we've established a stronger boundary condition for how long the dsa_slave_switchdev_event_work() might run. As such, it is no longer possible for DSA slave interfaces to become unregistered, since they are still bridge ports. So delete the unnecessary dev_hold() and dev_put(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_workVladimir Oltean
After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25Revert "Merge branch 'dsa-rtnl'"David S. Miller
This reverts commit 965e6b262f48257dbdb51b565ecfd84877a0ab5f, reversing changes made to 4d98bb0d7ec2d0b417df6207b0bafe1868bad9f8.
2021-10-24net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_workVladimir Oltean
After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: remove the "dsa_to_port in a loop" antipattern from the coreVladimir Oltean
Ever since Vivien's conversion of the ds->ports array into a dst->ports list, and the introduction of dsa_to_port, iterations through the ports of a switch became quadratic whenever dsa_to_port was needed. dsa_to_port can either be called directly, or indirectly through the dsa_is_{user,cpu,dsa,unused}_port helpers. Use the newly introduced dsa_switch_for_each_port() iteration macro that works with the iterator variable being a struct dsa_port *dp directly, and not an int i. It is an expensive variable to go from i to dp, but cheap to go from dp to i. This macro iterates through the entire ds->dst->ports list and filters by the ports belonging just to the switch provided as argument. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: dsa: allow reporting of standard ethtool stats for slave devicesAlvin Šipraga
Jakub pointed out that we have a new ethtool API for reporting device statistics in a standardized way, via .get_eth_{phy,mac,ctrl}_stats. Add a small amount of plumbing to allow DSA drivers to take advantage of this when exposing statistics. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02net: use eth_hw_addr_set() instead of ether_addr_copy()Jakub Kicinski
Convert from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setupVladimir Oltean
DSA supports connecting to a phy-handle, and has a fallback to a non-OF based method of connecting to an internal PHY on the switch's own MDIO bus, if no phy-handle and no fixed-link nodes were present. The -ENODEV error code from the first attempt (phylink_of_phy_connect) is what triggers the second attempt (phylink_connect_phy). However, when the first attempt returns a different error code than -ENODEV, this results in an unbalance of calls to phylink_create and phylink_destroy by the time we exit the function. The phylink instance has leaked. There are many other error codes that can be returned by phylink_of_phy_connect. For example, phylink_validate returns -EINVAL. So this is a practical issue too. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24net: dsa: let drivers state that they need VLAN filtering while standaloneVladimir Oltean
As explained in commit e358bef7c392 ("net: dsa: Give drivers the chance to veto certain upper devices"), the hellcreek driver uses some tricks to comply with the network stack expectations: it enforces port separation in standalone mode using VLANs. For untagged traffic, bridging between ports is prevented by using different PVIDs, and for VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on two ports, so packets with one VLAN cannot leak from one port to another. That is almost fine*, and has worked because hellcreek relied on an implicit behavior of the DSA core that was changed by the previous patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in standalone mode, that feature was actually incorrectly reflecting the hardware/driver state, so there was a desire to fix it. This leaves the hellcreek driver in a situation where it has to explicitly request this behavior from the DSA framework. We configure the ports as follows: - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid and will add a VLAN to the hardware tables, giving the driver the opportunity to refuse it through .port_prechangeupper. - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper on top of a bridged hellcreek port will not go through dsa_slave_vlan_rx_add_vid, because there will not be any attempt to offload this VLAN. The driver already disables VLAN awareness, so that upper should receive the traffic it needs. - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid, and can again be vetoed through .port_prechangeupper. *It is not actually completely fine, because if I follow through correctly, we can have the following situation: ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 # lan0 now becomes VLAN-unaware ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation This patch fixes that corner case by extending the DSA core logic, based on this requested attribute, to change the VLAN awareness state of the switch (port) when it leaves the bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: don't advertise 'rx-vlan-filter' when not neededVladimir Oltean
There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit 9b236d2a69da ("net: dsa: Advertise the VLAN offload netdev ability only if switch supports it"). Up to that point, the driver was returning -EOPNOTSUPP and DSA was reconverting that error to 0, making the 8021q upper think all is ok (but obviously the error message was there even prior to this change). After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it is a hard error. Case 2: Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL because they don't implement .port_bridge_{join,leave}). Understandably, a standalone port should not offload VLANs either, it should remain VLAN unaware and any VLAN should be a software VLAN (as long as the hardware is not quirky, that is). In fact, dsa_slave_port_obj_add does do the right thing and rejects switchdev VLAN objects coming from the bridge when that bridge is not offloaded: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); But it seems that the bridge is able to trick us. The __vlan_vid_add from br_vlan.c has: /* Try switchdev op first. In case it is not supported, fallback to * 8021q add. */ err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack); if (err == -EOPNOTSUPP) return vlan_vid_add(dev, br->vlan_proto, v->vid); So it says "no, no, you need this VLAN in your life!". And we, naive as we are, say "oh, this comes from the vlan_vid_add code path, it must be an 8021q upper, sure, I'll take that". And we end up with that bridge VLAN installed on our port anyway. But this time, it has the wrong flags: if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN, failed via switchdev, retried via vlan_vid_add, we have this comment: /* This API only allows programming tagged, non-PVID VIDs */ So what we do makes absolutely no sense. Backtracing a bit, we see the common pattern. We allow the network stack to think that our standalone ports are VLAN-aware, but they aren't, for the vast majority of switches. The quirky ones should not dictate the norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid methods exist for drivers that need the 'rx-vlan-filter: on' feature in ethtool -k, which can be due to any of the following reasons: 1. vlan_filtering_is_global = true, and some ports are under a VLAN-aware bridge while others are standalone, and the standalone ports would otherwise drop VLAN-tagged traffic. This is described in commit 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation"). 2. the ports that are under a VLAN-aware bridge should also set this feature, for 8021q uppers having a VID not claimed by the bridge. In this case, the driver will essentially not even know that the VID is coming from the 8021q layer and not the bridge. 3. Hellcreek. This driver needs it because in standalone mode, it uses unique VLANs per port to ensure separation. For separation of untagged traffic, it uses different PVIDs for each port, and for separation of VLAN-tagged traffic, it never accepts 8021q uppers with the same vid on two ports. If a driver does not fall under any of the above 3 categories, there is no reason why it should advertise the 'rx-vlan-filter' feature, therefore no reason why it should offload the VLANs added through vlan_vid_add. This commit fixes the problem by removing the 'rx-vlan-filter' feature from the slave devices when they operate in standalone mode, and when they offload a VLAN-unaware bridge. The way it works is that vlan_vid_add will now stop its processing here: vlan_add_rx_filter_info: if (!vlan_hw_filter_capable(dev, proto)) return 0; So the VLAN will still be saved in the interface's VLAN RX filtering list, but because it does not declare VLAN filtering in its features, the 8021q module will return zero without committing that VLAN to hardware. This gives the drivers what they want, since it keeps the 8021q VLANs away from the VLAN table until VLAN awareness is enabled (point at which the ports are no longer standalone, hence in the mv88e6xxx case, the check in mv88e6xxx_port_vlan_prepare passes). Since the issue predates the existence of the hellcreek driver, case 3 will be dealt with in a separate patch. The main change that this patch makes is to no longer set NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically (for most switches, never). The second part of the patch addresses an issue that the first part introduces: because the 'rx-vlan-filter' feature is now dynamically toggled, and our .ndo_vlan_rx_add_vid does not get called when 'rx-vlan-filter' is off, we need to avoid bugs such as the following by replaying the VLANs from 8021q uppers every time we enable VLAN filtering: ip link add link lan0 name lan0.100 type vlan id 100 ip addr add 192.168.100.1/24 dev lan0.100 ping 192.168.100.2 # should work ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 ping 192.168.100.2 # should still work ip link set br0 type bridge vlan_filtering 1 ping 192.168.100.2 # should still work but doesn't As reported by Florian, some drivers look at ds->vlan_filtering in their .port_vlan_add() implementation. So this patch also makes sure that ds->vlan_filtering is committed before calling the driver. This is the reason why it is first committed, then restored on the failure path. Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: properly fall back to software bridgingVladimir Oltean
If the driver does not implement .port_bridge_{join,leave}, then we must fall back to standalone operation on that port, and trigger the error path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL. In turn, having a non-NULL dp->bridge_dev when there is no offloading support makes the following things go wrong: - dsa_default_offload_fwd_mark make the wrong decision in setting skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for ports that don't offload the bridge, which should instruct the bridge to forward in software. But this does not happen, dp->bridge_dev is incorrectly set to point to the bridge, so the bridge is told that packets have been forwarded in hardware, which they haven't. - switchdev objects (MDBs, VLANs) should not be offloaded by ports that don't offload the bridge. Standalone ports should behave as packet-in, packet-out and the bridge should not be able to manipulate the pvid of the port, or tag stripping on egress, or ingress filtering. This should already work fine because dsa_slave_port_obj_add has: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); but since dsa_port_offloads_bridge_port works based on dp->bridge_dev, this is again sabotaging us. All the above work in case the port has an unoffloaded LAG interface, so this is well exercised code, we should apply it for plain unoffloaded bridge ports too. Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted ↵Vladimir Oltean
by drivers towards the bridge The blamed commit added a new field to struct switchdev_notifier_fdb_info, but did not make sure that all call paths set it to something valid. For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE notifier, and since the 'is_local' flag is not set, it contains junk from the stack, so the bridge might interpret those notifications as being for local FDB entries when that was not intended. To avoid that now and in the future, zero-initialize all switchdev_notifier_fdb_info structures created by drivers such that all newly added fields to not need to touch drivers again. Fixes: 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-08net: dsa: don't fast age standalone portsVladimir Oltean
DSA drives the procedure to flush dynamic FDB entries from a port based on the change of STP state: whenever we go from a state where address learning is enabled (LEARNING, FORWARDING) to a state where it isn't (LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic entries. However, there are cases when this is not needed. Internally, when a DSA switch interface is not under a bridge, DSA still keeps it in the "FORWARDING" STP state. And when that interface joins a bridge, the bridge will meticulously iterate that port through all STP states, starting with BLOCKING and ending with FORWARDING. Because there is a state transition from the standalone version of FORWARDING into the temporary BLOCKING bridge port state, DSA calls the fast age procedure. Since commit 5e38c15856e9 ("net: dsa: configure better brport flags when ports leave the bridge"), DSA asks standalone ports to disable address learning. Therefore, there can be no dynamic FDB entries on a standalone port. Therefore, it does not make sense to flush dynamic FDB entries on one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06net: dsa: don't disable multicast flooding to the CPU even without an IGMP ↵Vladimir Oltean
querier Commit 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER attribute") added an option for users to turn off multicast flooding towards the CPU if they turn off the IGMP querier on a bridge which already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router). And commit a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags") simply papered over that issue, because it moved the decision to flood the CPU with multicast (or not) from the DSA core down to individual drivers, instead of taking a more radical position then. The truth is that disabling multicast flooding to the CPU is simply something we are not prepared to do now, if at all. Some reasons: - ICMP6 neighbor solicitation messages are unregistered multicast packets as far as the bridge is concerned. So if we stop flooding multicast, the outside world cannot ping the bridge device's IPv6 link-local address. - There might be foreign interfaces bridged with our DSA switch ports (sending a packet towards the host does not necessarily equal termination, but maybe software forwarding). So if there is no one interested in that multicast traffic in the local network stack, that doesn't mean nobody is. - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as the bridge is concerned. This should reach the CPU port. - The switch driver might not do FDB partitioning. And since we don't even bother to do more fine-grained flood disabling (such as "disable flooding _from_port_N_ towards the CPU port" as opposed to "disable flooding _from_any_port_ towards the CPU port"), this breaks standalone ports, or even multiple bridges where one has an IGMP querier and one doesn't. Reverting the logic makes all of the above work. Fixes: a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags") Fixes: 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER attribute") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflicts are simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-22net: bridge: move the switchdev object replay helpers to "push" modeVladimir Oltean
Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit 0d2cfbd41c4a ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21net: dsa: ensure linearized SKBs in case of tail taggersLino Sanfilippo
The function skb_put() that is used by tail taggers to make room for the DSA tag must only be called for linearized SKBS. However in case that the slave device inherited features like NETIF_F_HW_SG or NETIF_F_FRAGLIST the SKB passed to the slaves transmit function may not be linearized. Avoid those SKBs by clearing the NETIF_F_HW_SG and NETIF_F_FRAGLIST flags for tail taggers. Furthermore since the tagging protocol can be changed at runtime move the code for setting up the slaves features into dsa_slave_setup_tagger(). Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: use switchdev_handle_fdb_{add,del}_to_deviceVladimir Oltean
Using the new fan-out helper for FDB entries installed on the software bridge, we can install host addresses with the proper refcount on the CPU port, such that this case: ip link set swp0 master br0 ip link set swp1 master br0 ip link set swp2 master br0 ip link set swp3 master br0 ip link set br0 address 00:01:02:03:04:05 ip link set swp3 nomaster works properly and the br0 address remains installed as a host entry with refcount 3 instead of getting deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: switchdev: introduce helper for checking dynamically learned FDB entriesVladimir Oltean
It is a bit difficult to understand what DSA checks when it tries to avoid installing dynamically learned addresses on foreign interfaces as local host addresses, so create a generic switchdev helper that can be reused and is generally more readable. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are ↵Vladimir Oltean
on the same dev When (a) "dev" is a bridge port which the DSA switch tree offloads, but is otherwise not a dsa slave (such as a LAG netdev), or (b) "dev" is the bridge net device itself then strange things happen to the dev_hold/dev_put pair: dsa_schedule_work() will still be called with a DSA port that offloads that netdev, but dev_hold() will be called on the non-DSA netdev. Then the "if" condition in dsa_slave_switchdev_event_work() does not pass, because "dev" is not a DSA netdev, so dev_put() is not called. This results in the simple fact that we have a reference counting mismatch on the "dev" net device. This can be seen when we add support for host addresses installed on the bridge net device. ip link add br1 type bridge ip link set br1 address 00:01:02:03:04:05 ip link set swp0 master br1 ip link del br1 [ 968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5 It seems foolish to do penny pinching and not add the net_device pointer in the dsa_switchdev_event_work structure, so let's finally do that. As an added bonus, when we start offloading local entries pointing towards the bridge, these will now properly appear as 'offloaded' in 'bridge fdb' (this was not possible before, because 'dev' was assumed to only be a DSA net device): 00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent 00:01:02:03:04:05 dev br0 offload master br0 permanent Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: dsa: include fdb entries pointing to bridge in the host fdb listVladimir Oltean
The bridge supports a legacy way of adding local (non-forwarded) FDB entries, which works on an individual port basis: bridge fdb add dev swp0 00:01:02:03:04:05 master local As well as a new way, added by Roopa Prabhu in commit 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device"): bridge fdb add dev br0 00:01:02:03:04:05 self local The two commands are functionally equivalent, except that the first one produces an entry with fdb->dst == swp0, and the other an entry with fdb->dst == NULL. The confusing part, though, is that even if fdb->dst is swp0 for the 'local on port' entry, that destination is not used. Nonetheless, the idea is that the bridge has reference counting for local entries, and local entries pointing towards the bridge are still 'as local' as local entries for a port. The bridge adds the MAC addresses of the interfaces automatically as FDB entries with is_local=1. For the MAC address of the ports, fdb->dst will be equal to the port, and for the MAC address of the bridge, fdb->dst will point towards the bridge (i.e. be NULL). Therefore, if the MAC address of the bridge is not inherited from either of the physical ports, then we must explicitly catch local FDB entries emitted towards the br0, otherwise we'll miss the MAC address of the bridge (and, of course, any entry with 'bridge add dev br0 ... self local'). Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: dsa: include bridge addresses which are local in the host fdb listTobias Waldekranz
The bridge automatically creates local (not forwarded) fdb entries pointing towards physical ports with their interface MAC addresses. For switchdev, the significance of these fdb entries is the exact opposite of that of non-local entries: instead of sending these frame outwards, we must send them inwards (towards the host). NOTE: The bridge's own MAC address is also "local". If that address is not shared with any port, the bridge's MAC is not be added by this functionality - but the following commit takes care of that case. NOTE 2: We mark these addresses as host-filtered regardless of the value of ds->assisted_learning_on_cpu_port. This is because, as opposed to the speculative logic done for dynamic address learning on foreign interfaces, the local FDB entries are rather fixed, so there isn't any risk of them migrating from one bridge port to another. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: dsa: sync static FDB entries on foreign interfaces to hardwareVladimir Oltean
DSA is able to install FDB entries towards the CPU port for addresses which were dynamically learnt by the software bridge on foreign interfaces that are in the same bridge with a DSA switch interface. Since this behavior is opportunistic, it is guarded by the "assisted_learning_on_cpu_port" property which can be enabled by drivers and is not done automatically (since certain switches may support address learning of packets coming from the CPU port). But if those FDB entries added on the foreign interfaces are static (added by the user) instead of dynamically learnt, currently DSA does not do anything (and arguably it should). Because static FDB entries are not supposed to move on their own, there is no downside in reusing the "assisted_learning_on_cpu_port" logic to sync static FDB entries to the DSA CPU port unconditionally, even if assisted_learning_on_cpu_port is not requested by the driver. For example, this situation: br0 / \ swp0 dummy0 $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master static Results in DSA adding an entry in the hardware FDB, pointing this address towards the CPU port. The same is true for entries added to the bridge itself, e.g: $ bridge fdb add 02:00:de:ad:00:01 dev br0 vlan 1 self local (except that right now, DSA still ignores 'local' FDB entries, this will be changed in a later patch) Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>