summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw
AgeCommit message (Collapse)Author
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device deslavementPetr Machata
When a netdevice is removed from a bridge or a LAG, and it has an IP address, it should join the router and gain a RIF. Do that by replaying address addition event on the netdevice. When handling deslavement of LAG or its upper from a bridge device, the replay should be done after all the lowers of the LAG have left the bridge. Thus these scenarios are handled by passing replay_deslavement of false, and by invoking, after the lowers have been processed, a new helper, mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper handling, and in particular invokes the replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device enslavementPetr Machata
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. When this is permitted, any uppers with IP addresses need to have the NETDEV_UP inetaddr event replayed, so that any RIFs are created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay neighbours when RIF is madePetr Machata
As neighbours are created, mlxsw is involved through the netevent notifications. When at the time there is no RIF for a given neighbour, the notification is not acted upon. When the RIF is later created, these outstanding neighbours are left unoffloaded and cause traffic to go through the SW datapath. In order to fix this issue, as a RIF is created, walk the ARP and ND tables and find neighbours for the netdevice that represents the RIF. Then schedule neighbour work for them, allowing them to be offloaded. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay MACVLANs when RIF is madePetr Machata
If IP address is added to a MACVLAN netdevice, the effect is of configuring VRRP on the RIF for the netdevice linked to the MACVLAN. Because the MACVLAN offload is tied to existence of a RIF at the linked netdevice, adding a MACVLAN is currently not allowed until a RIF is present. If this requirement stays, it will never be possible to attach a first port into a topology that involves a MACVLAN. Thus topologies would need to be built in a certain order, which is impractical. Additionally, IP address removal, which leads to disappearance of the RIF that the MACVLAN depends on, cannot be vetoed. Thus even as things stand now it is possible to get to a state where a MACVLAN netdevice exists without a RIF, despite having mlxsw lowers. And once the MACVLAN is un-offloaded due to RIF getting destroyed, recreating the RIF does not bring it back. In this patch, accept that MACVLAN can be created out of order and support that use case. One option would seem to be to simply recognize MACVLAN netdevices as "interesting", and let the existing replay mechanisms take care of the offload. However, that does not address the necessity to reoffload MACVLAN once a RIF is created. Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(), called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the existing offloads, applies the configuration as if the netdevice were created just now. Additionally, remove all vetoes and warning messages that checked for presence of a RIF at the linked device. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Offload ethernet nexthops when RIF is madePetr Machata
As RIF is created, refresh each netxhop group tracked at the CRIF for which the RIF was created. Note that nothing needs to be done for IPIP nexthops. The RIF for these is either available from the get-go, or will never be available, so no after the fact offloading needs to be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Join RIFs of LAG upper VLANsPetr Machata
In the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join not only RIF for the immediate LAG, as is currently the case, but also RIFs for VLAN netdevices upper to the LAG. In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the uppers of a LAG being joined, and also join any VLAN ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_switchdev: Replay switchdev objects on port joinPetr Machata
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. At that point there are no bridge objects (such as port VLANs) to replay. Those are added afterwards, and notified as they are created. This holds even for the PVID. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to replay the existing bridge objects. Without this replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which causes issues later, as a lot of code relies on their presence. To that end, add a new notifier block whose sole role is to filter out events related to the one relevant upper, and forward those to the existing switchdev notifier block. Pass the new notifier block to switchdev_bridge_port_offload() when the bridge port is created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: On port enslavement to a LAG, join upper's bridgesPetr Machata
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG already has an upper, enslaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join bridges of LAG uppers. Without this replay, the mlxsw bridge_port objects are not instantiated, which causes issues later, as a lot of code relies on their presence. Therefore in this patch, when the first mlxsw physical netdevice is enslaved to a LAG, consider bridges upper to the LAG (both the direct master, if any, and any bridge masters of VLAN uppers), and have the relevant netdevices join their bridges. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Add a replay_deslavement argument to event handlersPetr Machata
When handling deslavement of LAG or its upper from a bridge device, when the deslaved netdevice has an IP address, it should join the router. This should be done after all the lowers of the LAG have left the bridge. The replay intended to cause the device to join the router therefore cannot be invoked unconditionally in the event handlers themselves. It can be done right away if the handler is invoked for a sole device, but when it is invoked repeated for each LAG lower, the replay needs to be postponed until after this processing is done. To that end, add a boolean parameter, replay_deslavement, to mlxsw_sp_netdevice_port_upper_event(), mlxsw_sp_netdevice_port_vlan_event() and one helper on the call path. Have the invocations that are done for sole netdevices pass true, and those done for LAG lowers pass false. Nothing depends on this flag at this point, but it removes some noise from the patch that introduces the replay itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Allow event handlers to check unowned bridgesPetr Machata
Currently the bridge-related handlers bail out when the event is related to a netdevice that is not an upper of one of the front-panel ports. In order to allow enslavement of front-panel ports to bridges that already have uppers, it will be necessary to replay CHANGEUPPER events to validate that the configuration is offloadable. In order for the replay to be effective, it must be possible to ignore unsupported configuration in the context of an actual notifier event, but to still "veto" these configurations when the validation is performed. To that end, introduce two parameters to a number of handlers: mlxsw_sp, because it will not be possible to deduce that from the netdevice lowers; and process_foreign to indicate whether netdevices that are not front panel uppers should be validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Split a helper out of mlxsw_sp_netdevice_event()Petr Machata
Move the meat of mlxsw_sp_netdevice_event() to a separate function that does just the validation. This separate helper will be possible to call later for recursive ascent when validating attachment of a front panel port to a bridge with uppers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Extract a helper to schedule neighbour workPetr Machata
This will come in handy for neighbour replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Allow address handlers to run on bridge portsPetr Machata
Currently the IP address event handlers bail out when the event is related to a netdevice that is a bridge port or a member of a LAG. In order to create a RIF when a bridged or LAG'd port is unenslaved, these event handlers will be replayed. However, at the point in time when the NETDEV_CHANGEUPPER event is delivered, informing of the loss of enslavement, the port is still formally enslaved. In order for the operation to have any effect, these handlers need an extra parameter to indicate that the check for bridge or LAG membership should not be done. In this patch, add an argument "nomaster" to several event handlers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_switchdev: Manage RIFs on PVID changePetr Machata
Currently, mlxsw has several shortcomings with regards to RIF handling due to PVID changes: - In order to cause RIF for a bridge device to be created, the user is expected first to set PVID, then to add an IP address. The reverse ordering is disallowed, which is not very user-friendly. - When such bridge gets a VLAN upper whose VID was the same as the existing PVID, and this VLAN netdevice gets an IP address, a RIF is created for this netdevice. The new RIF is then assigned to the 802.1Q FID for the given VID. This results in a working configuration. However, then, when the VLAN netdevice is removed again, the RIF for the bridge itself is never reassociated to the VLAN. - PVID cannot be changed once the bridge has uppers. Presumably this is because the driver does not manage RIFs properly in face of PVID changes. However, as the previous point shows, it is still possible to get into invalid configurations. In this patch, add the logic necessary for creation of a RIF as a result of PVID change. Moreover, when a VLAN upper is created whose VID matches lower PVID, do not create RIF for this netdevice. These changes obviate the need for ordering of IP address additions and PVID configuration, so stop forbidding addition of an IP address to a PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID changes when the bridge has uppers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argumentPetr Machata
For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make decisions based on the proposed value of PVID. Querying PVID reveals the current settings, not the in-flight values that the user requested and that the notifiers are acting upon. Add a parameter, lower_pvid, which carries the proposed PVID of the lower bridge, or -1 if the lower is not a bridge. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding stylePetr Machata
The bridge branch of the dispatch in this function is going to get more code and will need curly braces. Per the doctrine, that means the whole if-else chain should get them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Take VID for VLAN FIDs from RIF paramsPetr Machata
Currently, when an IP address is added to a bridge that has no PVID, the operation is rejected. An IP address addition is interpreted as a request to create a RIF for the bridge device, but without a PVID there is no VLAN for which the RIF should be created. Thus the correct way to create a RIF for a bridge as a user is to first add a PVID, and then add the IP address. Ideally this ordering requirement would not exist. RIF would be created either because an IP address is added, or because a PVID is added, depending on which comes last. For that, the switchdev code (which notices the PVID change request) must be able to request that a RIF is created with a given VLAN ID, because at the time that the PVID notification is distributed, the PVID setting is not yet visible for querying. Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to communicate the VID, and do not determine it ad-hoc in the fid_get callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_getPetr Machata
The fid_get callback is called to allocate a FID for the newly-created RIF. In a following patch, the fid_get implementation for VLANs will be modified to take the VLAN ID from the parameters instead of deducing it from the netdevice. To that end, propagate the RIF parameters to the fid_get callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_switchdev: Pass extack to mlxsw_sp_br_ban_rif_pvid_change()Petr Machata
Currently the reason for rejection of PVID manipulation is dumped to syslog, and a generic -EBUSY is returned to the userspace. But switchdev_handle_port_obj_add(), through which we get to mlxsw_sp_port_vlans_add(), handles extack just fine, and we can pass the message this way. This improves visibility into reasons why the request to change PVID was rejected. Before the change: # bridge vlan add dev br vid 2 self pvid untagged RTNETLINK answers: Device or resource busy (plus a syslog line) After the change: # bridge vlan add dev br vid 2 self pvid untagged Error: mlxsw_spectrum: Can't change PVID, it's used by router interface. Note that this particular error message is going away in the following patches. However the ability to pass error messages through extack will be useful more broadly for communicating in particular reasons why a RIF failed to be created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-12mlxsw: spectrum_flower: Add ability to match on port rangesIdo Schimmel
Add the ability to match on port ranges by utilizing the previously added port range registers and the port range key element. Up to two port range registers can be used for each filter, one for source port and another for destination port. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/df4385a9592917e9a22ebff339e0463e4a8dfa82.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_acl: Pass main driver structure to mlxsw_sp_acl_rulei_destroy()Ido Schimmel
The main driver structure will be needed in this function by a subsequent patch, so pass it. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/24d96a4e21310e5de2951ace58263db35e44a0df.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_acl: Add port range key elementIdo Schimmel
Add the port range key element to supported key blocks so that it could be used to match on the output of the port range registers. Each bit in the element can be used to match on the output of the port range register with the corresponding index. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/f0423f6ee9e36c6b0a426bc9995f42223c48f2db.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_port_range: Add devlink resource supportIdo Schimmel
Expose via devlink-resource the maximum number of port range registers and their current occupancy. Besides the observability benefits, this resource will be used by subsequent patches for scale and occupancy tests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/7945e0c715dc5efb1617f45f7560c1f1bd0bcf8a.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_port_range: Add port range coreIdo Schimmel
The Spectrum ASICs have a fixed number of port range registers, each of which maintains the following parameters: * Minimum and maximum port. * Apply port range for source port, destination port or both. * Apply port range for TCP, UDP or both. * Apply port range for IPv4, IPv6 or both. Implement a port range core which takes care of the allocation and configuration of these registers and exposes an API that allows in-driver consumers (e.g., the ACL code) to request matching on a range of either source or destination port. These registers are going to be used for port range matching in the flower classifier that already matches on EtherType being IPv4 / IPv6 and IP protocol being TCP / UDP. As such, there is no need to limit these registers to a specific EtherType or IP protocol, which will increase the likelihood of a register being shared by multiple flower filters. It is unlikely that a filter will match on the same range of both source and destination ports, which is why each register is only configured to match on either source or destination port. If a filter requires matching on a range of both source and destination ports, it will utilize two port range registers and match on the output of both. For efficient lookup and traversal, use XArray to store the allocated port range registers. The XArray uses RCU and an internal spinlock to synchronise access, so there is no need for a dedicate lock. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/674f00539a0072d455847663b5feb504db51a259.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: resource: Add resource identifier for port range registersIdo Schimmel
Add a resource identifier for maximum number of layer 4 port range register so that it could be later used to query the information from firmware. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/59a8fec353d5ad9fbfb7612e4a7ff61eaedad445.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: reg: Add Policy-Engine Port Range RegisterIdo Schimmel
Add the Policy-Engine Port Range Register that is used for configuring port range identification. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/d1a1f53d758f7452cf5abfe006b23496076ec3e6.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-04mlxsw: spectrum_router: Fix an IS_ERR() vs NULL checkDan Carpenter
The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't return error pointers. Fix the check. Fixes: 78126cfd5dc9 ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-29mlxsw: minimal: fix potential memory leak in mlxsw_m_linecards_initZhengchao Shao
The line cards array is not freed in the error path of mlxsw_m_linecards_init(), which can lead to a memory leak. Fix by freeing the array in the error path, thereby making the error path identical to mlxsw_m_linecards_fini(). Fixes: 01328e23a476 ("mlxsw: minimal: Extend module to port mapping with slot index") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230630012647.1078002-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Track next hops at CRIFsPetr Machata
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The reason is that eventually, next hops for mlxsw uppers should be offloaded and unoffloaded on demand as a netdevice becomes an upper, or stops being one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the netdevice lifetime. Correspondingly, track at each next hop not its RIF, but its CRIF (from which a RIF can always be deduced). Note that now that next hops are tracked at a CRIF, it is not necessary to move each over to a new RIF when it is necessary to edit a RIF. Therefore drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy() call mlxsw_sp_nexthop_rif_update() directly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Split nexthop finalization to two stagesPetr Machata
Nexthop finalization consists of two steps: the part where the offload is removed, because the backing RIF is now gone; and the part where the association to the RIF is severed. Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later be called independently. Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini() vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a historical happenstance than a conscious decision. The two cleanups do not depend on each other, and this change should have no observable effects. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_indexPetr Machata
A previous patch added a pointer to loopback CRIF to the router data structure. That makes the loopback RIF index redundant, as everything necessary can be derived from the CRIF. Drop the field and adjust the code accordingly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Link CRIFs to RIFsPetr Machata
When a RIF is about to be created, the registration of the netdevice that it should be associated with must have been seen in the past, and a CRIF created. Therefore make this a hard requirement by looking up the CRIF during RIF creation, and complaining loudly when there isn't one. This then allows to keep a link between a RIF and its corresponding CRIF (and back, as the relationship is one-to-at-most-one), which do. The CRIF will later be useful as the objects tracked there will be offloaded lazily as a result of RIF creation. CRIFs are created when an "interesting" netdevice is registered, and destroyed after such device is unregistered. CRIFs are supposed to already exist when a RIF creation request arises, and exist at least as long as that RIF exists. This makes for a simple invariant: it is always safe to dereference CRIF pointer from "its" RIF. To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER event is delivered. The reason is that if a RIF's netdevices has an IPv6 address, removal of this address is notified in an atomic block. To remove the RIF, the IPv6 removal handler schedules a work item. It must be safe for this work item to access the associated CRIF as well. Thus when a netdevice that backs the CRIF is removed, if it still has a RIF, do not actually free the CRIF, only toggle its can_destroy flag, which this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIFPetr Machata
CRIFs are generally not maintained for loopback RIFs. However, the RIF for the default VRF is used for offloading of blackhole nexthops. Nexthops expect to have a valid CRIF. Therefore in this patch, add code to maintain CRIF for the loopback RIF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain a hash table of CRIFsPetr Machata
CRIFs are objects that mlxsw maintains for netdevices that may not have an associated RIF (i.e. they may not have been instantiated in the ASIC), but if indeed they do not, it is quite possible they will in the future. These netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are created include e.g. bridges, LAGs, or front panel ports. The idea is that next hops would be kept at CRIFs, not RIFs, and thus it would be easier to offload and unoffload the entities that have been added before the RIF was created. In this patch, add the code for low-level CRIF maintenance: create and destroy, and keep in a table keyed by the netdevice pointer for easy recall. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIFPetr Machata
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the function mentioned in the subject. As such it forms an external interface of the router code. In future patches we will want to maintain connection between RIFs and the CRIFs (introduced in the next patch) that back them. That will not hold for the VRF-based loopback netdevices, so the whole CRIF business can be kept hidden from the rest of mlxsw. But for the main VRF loopback RIF we do want to keep the RIF-CRIF connection, because that RIF is used for blackhole next hops, and the next hop code can be kept simpler for assuming rif->crif is valid. Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback RIF. This being an internal function will take the CRIF argument anyway. Furthermore, the function does not lock, which is not necessary at this point in code yet. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Add extack argument to mlxsw_sp_lb_rif_init()Petr Machata
The extack will be handy in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14mlxsw: spectrum_router: Move IPIP init upPetr Machata
mlxsw will need to keep track of certain devices that are not related to any of its front panel ports. This includes IPIP netdevices. To be able to query the list of supported IPIP types, router->ipip_ops_arr needs to be initialized. To that end, move the IPIP initialization up (and finalization correspondingly down). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper for RIF migrationPetr Machata
RIF configuration contains a number of parameters that cannot be changed after the RIF is created. For the IPIP loopbacks, this is currently worked around by creating a new RIF with the desired configuration changes applied, and updating next hops to the new RIF, and then destroying the old RIF. This operation will be useful as a reusable atom, so extract a helper to that effect. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Add a helper to check if netdev has addressesPetr Machata
This function will be useful later as the driver will need to retroactively create RIFs for new uppers with addresses. Add another helper that assumes RCU lock, and restructure the code to skip the IPv6 branch not through conditioning on the addr_list_empty variable, but by directly returning the result value. This makes the skip more obvious than it previously was. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper to free a RIFPetr Machata
Right now freeing the object that mlxsw uses to keep track of a RIF is as simple as calling a kfree. But later on as CRIF abstraction is brought in, it will involve severing the link between CRIF and its RIF as well. Better to have the logic encapsulated in a helper. Since a helper is being introduced, make it a full-fledged destructor and have it validate that the objects tracked at the RIF have been released. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access nhgi->rif through a helperPetr Machata
To abstract away deduction of RIF from the corresponding next hop group info (NHGI), mlxsw currently uses a macro. In its current form, that macro is impossible to extend to more general computation. Therefore introduce a helper, mlxsw_sp_nhgi_rif(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access nh->rif->dev through a helperPetr Machata
In order to abstract away deduction of netdevice from the corresponding next hop, introduce a helper, mlxsw_sp_nexthop_dev(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access rif->dev from params in mlxsw_sp_rif_create()Petr Machata
The previous patch added a helper to access a netdevice given a RIF. Using this helper in mlxsw_sp_rif_create() is unreasonable: the netdevice was given in RIF creation parameters. Just take it there. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access rif->dev through a helperPetr Machata
In order to abstract away deduction of netdevice from the corresponding RIF, introduce a helper, mlxsw_sp_rif_dev(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Add a helper specifically for joining a LAGPetr Machata
Currently, joining a LAG very simply means that the LAG RIF should be joined by the subport representing untagged traffic. If the RIF does not exist, it does not have to be created: if the user wants there to be RIF for the LAG device, they are supposed to add an IP address, and they are supposed to do it after tha LAG becomes mlxsw upper. We can also assume that the LAG has no uppers, otherwise the enslavement is not allowed. In the future, these ordering dependencies should be removed. That means that joining LAG will be more complex operation, possibly involving a lazy RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers of the LAG. It will be handy to have a dedicated function that handles all this. The new function mlxsw_sp_router_port_join_lag() is that. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper from mlxsw_sp_port_vlan_router_join()Petr Machata
Split out of mlxsw_sp_port_vlan_router_join() the part that checks for RIF and dispatches to __mlxsw_sp_port_vlan_router_join(), leaving it as wrapper that just manages the router lock. The new function, mlxsw_sp_port_vlan_router_join_existing(), will be useful as an atom in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-12net: mlxsw: i2c: Switch back to use struct i2c_driver's .probe()Uwe Kleine-König
After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back type"), all drivers being converted to .probe_new() and then commit 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter") convert back to (the new) .probe() to be able to eventually drop .probe_new() from struct i2c_driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev()Petr Machata
Now that the external users of mlxsw_sp_rif_dev() have been converted in the preceding patches, make the function static. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: Convert does-RIF-have-this-netdev queries to a dedicated helperPetr Machata
In a number of places, a netdevice underlying a RIF is obtained only to compare it to another pointer. In order to clean up the interface between the router and the other modules, add a new helper to specifically answer this question, and convert the relevant uses to this new interface. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: Convert RIF-has-netdevice queries to a dedicated helperPetr Machata
In a number of places, a netdevice underlying a RIF is obtained only to check if it a NULL pointer. In order to clean up the interface between the router and the other modules, add a new helper to specifically answer this question, and convert the relevant uses to this new interface. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>