summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
AgeCommit message (Collapse)Author
2024-03-11mlxsw: spectrum_router: Share nexthop counters in resilient groupsPetr Machata
For resilient groups, we can reuse the same counter for all the buckets that share the same nexthop. Keep a reference count per counter, and keep all these counters in a per-next hop group xarray, which serves as a NHID->counter cache. If a counter is already present for a given NHID, just take a reference and use the same counter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/cdd00084533fc83ac5917562f54642f008205bf3.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Support nexthop group hardware statisticsPetr Machata
When hw_stats is set on a group, install nexthop counters on members of a group. Counter allocation request is moved from nexthop object initialization to the update code. The previous placement made sense: when the counters are enabled by dpipe, the counters are installed to all existing nexthops and all nexthops created from then on get them. For the finer-grained nexthop group statistics, this is unsuitable. The existing placement was kept for the IPv4 and IPv6 nexthops. Resilient group replacement emits a pre_replace notification, and then any bucket_replace notifications if there were any replacements at all. If the group is balanced and the nexthop composition of the replaced group didn't change, there will be no such notifiers. Therefore hook to the pre_replace notifier and mark all buckets for update, to un/install the counters. When reporting deltas for resilient groups, use the nexthop ID that we stored in a previous patch to look up to which nexthop a bucket contributes. Co-developed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/87495a72f187df2e5d491d02729c550d235fcc85.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Track NH ID's of group membersPetr Machata
The core interfaces for collecting per-NH statistics are built around nexthops even for resilient groups. Because mlxsw models each bucket as a nexthop, the core next hop that a given bucket contributes to needs to be looked up. In order to be able to match the two up, we need to track nexthop ID for members of group nexthop objects. For simplicity, do it for all nexthop objects, not just group members. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/184ceb6b154e08f5bcf116a705b0fcb01c31895c.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Add helpers for nexthop countersPetr Machata
The next patch will add the ability to share nexthop counters among mlxsw nexthops backed by the same core nexthop. To have a place to store reference count, the counter should be kept in a dedicated structure. In this patch, introduce the structure together with the related helpers, sans the refcount, which comes in the next patch. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/61f23fa4f8c5d7879f68dacd793d8ab7425f33c0.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Avoid allocating NH counters twicePetr Machata
mlxsw_sp_nexthop_counter_disable() decays to a nop when called on a disabled counter, but mlxsw_sp_nexthop_counter_enable() can't similarly be called on an enabled counter. This would be useful in the following patches. Add the missing condition. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/0cc9050e196366c1387ab5ee47f1cee8ecde9c86.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum: Allow fetch-and-clear of flow countersPetr Machata
For the report_delta-like interface like a previous patch has added for collection of NH group statistics, it's easiest to read the counter and have the HW clear it right away. Thus, change mlxsw_sp_flow_counter_get() to take a bool indicating whether this should be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/6a096ede8ee92d5041e3832242c3bbc137198aba.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Have mlxsw_sp_nexthop_counter_enable() return intPetr Machata
In order to be able to diagnose failures in counter allocation, have the function mlxsw_sp_nexthop_counter_enable() return an error code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/e0bb5c0cc6234ade2ade1e92abac991359c3f446.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Rename two functionsPetr Machata
The function mlxsw_sp_nexthop_counter_alloc() doesn't directly allocate anything, and mlxsw_sp_nexthop_counter_free() doesn't directly free. For the following patches, we will need names for functions that actually do those things. Therefore rename to mlxsw_sp_nexthop_counter_enable() and mlxsw_sp_nexthop_counter_disable() to free up the namespace. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/f59272958697a718f090f59f892d32beabcd8972.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30mlxsw: Use refcount_t for reference countingAmit Cohen
mlxsw driver uses 'unsigned int' for reference counters in several structures. Instead, use refcount_t type which allows us to catch overflow and underflow issues. Change the type of the counters and use the appropriate API. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-18mlxsw: spectrum_router: Register netdevice notifier before nexthopPetr Machata
If there are IPIP nexthops at the time when the driver is loaded (or the devlink instance reloaded), the driver looks up the corresponding IPIP entry. But IPIP entries are only created as a result of netdevice notifications. Since the netdevice notifier is registered after the nexthop notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry, registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF to the nexthop. Later on when the CRIF is necessary, the WARN_ON in mlxsw_sp_nexthop_rif() triggers, causing the splat [1]. In order to fix the issue, reorder the netdevice notifier to be registered before the nexthop one. [1] (edited for clarity): WARNING: CPU: 1 PID: 1364 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3245 mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum Hardware name: Mellanox Technologies Ltd. MSN4410/VMOD0010, BIOS 5.11 01/06/2019 Call Trace: ? mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum __mlxsw_sp_nexthop_eth_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3637) mlxsw_spectrum mlxsw_sp_nexthop_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3679 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3727) mlxsw_spectrum mlxsw_sp_nexthop_group_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3757) mlxsw_spectrum mlxsw_sp_nexthop_group_refresh (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4112) mlxsw_spectrum mlxsw_sp_nexthop_obj_event (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5118 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5191 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5315 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5500) mlxsw_spectrum nexthops_dump (net/ipv4/nexthop.c:217 net/ipv4/nexthop.c:440 net/ipv4/nexthop.c:3609) register_nexthop_notifier (net/ipv4/nexthop.c:3624) mlxsw_sp_router_init (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:11486) mlxsw_spectrum mlxsw_sp_init (drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3267) mlxsw_spectrum __mlxsw_core_bus_device_register (drivers/net/ethernet/mellanox/mlxsw/core.c:2202) mlxsw_core mlxsw_devlink_core_bus_device_reload_up (drivers/net/ethernet/mellanox/mlxsw/core.c:2265 drivers/net/ethernet/mellanox/mlxsw/core.c:1603) mlxsw_core devlink_reload (net/devlink/dev.c:314 net/devlink/dev.c:475) [...] Fixes: 9464a3d68ea9 ("mlxsw: spectrum_router: Track next hops at CRIFs") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/74edb8d45d004e8d8f5318eede6ccc3d786d8ba9.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21mlxsw: spectrum_router: Call RIF setup before obtaining FIDPetr Machata
For subport RIFs, the setup initializes, among other things, RIF port and LAG numbers. Those are important to determine where in the PGT the RIF FID will be stored. Therefore, call the RIF setup before fid_get. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/f24d8cad7e4748b8e8e0e16894ca6a20704dea32.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21mlxsw: spectrum_router: Add a helper to get subport number from a RIFPetr Machata
In the CFF flood mode, responsibility for management of the PGT entries for rFIDs is moved from FW to the driver. All rFIDs are based off either a front panel port, or a LAG port. The flood vectors for port-based rFIDs enable just the port itself, the ones for LAG-based rFIDs enable all member ports of the LAG in question. Since all rFIDs based off the same port have the same flood vector, and similarly for LAG-based rFIDs, the flood entries are shared. The PGT address of the flood vector is therefore determined based on the port (or LAG) number of the RIF connected with the rFID. Add a helper to determine subport number given a RIF, to be used in these calculations. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/d7ab43cf5b021f785f363f236e4b6780d10eea93.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02mlxsw: spectrum_router: Annotate struct mlxsw_sp_nexthop_group_info with ↵Kees Cook
__counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_nexthop_group_info. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Petr Machata <petrm@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230929180746.3005922-4-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28mlxsw: spectrum_router: IPv6 events: Use tracker helpers to hold & put ↵Petr Machata
netdevices Using the tracking helpers makes it easier to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled. Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the router code that deals with IPv6 address events. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f0af6ad4722b4ca6e598fd4fda8311a3041651ec.1690471775.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28mlxsw: spectrum_router: RIF: Use tracker helpers to hold & put netdevicesPetr Machata
Using the tracking helpers makes it easier to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled. Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the router code that deals with RIF allocation. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/8b7701a7b439ac268e4be4040eff99d01e27ae47.1690471775.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28mlxsw: spectrum_router: hw_stats: Use tracker helpers to hold & put netdevicesPetr Machata
Using the tracking helpers makes it easier to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled. Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the router code that deals with hw_stats events. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/b972314cfef4f4c24e66e60d13cffa5d606d1bf3.1690471774.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28mlxsw: spectrum_router: FIB: Use tracker helpers to hold & put netdevicesPetr Machata
Using the tracking helpers makes it easier to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled. Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the router code that deals with FIB events. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/5221a92e751c40447c55959f622267ccc999ed04.1690471774.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device deslavementPetr Machata
When a netdevice is removed from a bridge or a LAG, and it has an IP address, it should join the router and gain a RIF. Do that by replaying address addition event on the netdevice. When handling deslavement of LAG or its upper from a bridge device, the replay should be done after all the lowers of the LAG have left the bridge. Thus these scenarios are handled by passing replay_deslavement of false, and by invoking, after the lowers have been processed, a new helper, mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper handling, and in particular invokes the replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device enslavementPetr Machata
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. When this is permitted, any uppers with IP addresses need to have the NETDEV_UP inetaddr event replayed, so that any RIFs are created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay neighbours when RIF is madePetr Machata
As neighbours are created, mlxsw is involved through the netevent notifications. When at the time there is no RIF for a given neighbour, the notification is not acted upon. When the RIF is later created, these outstanding neighbours are left unoffloaded and cause traffic to go through the SW datapath. In order to fix this issue, as a RIF is created, walk the ARP and ND tables and find neighbours for the netdevice that represents the RIF. Then schedule neighbour work for them, allowing them to be offloaded. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay MACVLANs when RIF is madePetr Machata
If IP address is added to a MACVLAN netdevice, the effect is of configuring VRRP on the RIF for the netdevice linked to the MACVLAN. Because the MACVLAN offload is tied to existence of a RIF at the linked netdevice, adding a MACVLAN is currently not allowed until a RIF is present. If this requirement stays, it will never be possible to attach a first port into a topology that involves a MACVLAN. Thus topologies would need to be built in a certain order, which is impractical. Additionally, IP address removal, which leads to disappearance of the RIF that the MACVLAN depends on, cannot be vetoed. Thus even as things stand now it is possible to get to a state where a MACVLAN netdevice exists without a RIF, despite having mlxsw lowers. And once the MACVLAN is un-offloaded due to RIF getting destroyed, recreating the RIF does not bring it back. In this patch, accept that MACVLAN can be created out of order and support that use case. One option would seem to be to simply recognize MACVLAN netdevices as "interesting", and let the existing replay mechanisms take care of the offload. However, that does not address the necessity to reoffload MACVLAN once a RIF is created. Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(), called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the existing offloads, applies the configuration as if the netdevice were created just now. Additionally, remove all vetoes and warning messages that checked for presence of a RIF at the linked device. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Offload ethernet nexthops when RIF is madePetr Machata
As RIF is created, refresh each netxhop group tracked at the CRIF for which the RIF was created. Note that nothing needs to be done for IPIP nexthops. The RIF for these is either available from the get-go, or will never be available, so no after the fact offloading needs to be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Join RIFs of LAG upper VLANsPetr Machata
In the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join not only RIF for the immediate LAG, as is currently the case, but also RIFs for VLAN netdevices upper to the LAG. In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the uppers of a LAG being joined, and also join any VLAN ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Extract a helper to schedule neighbour workPetr Machata
This will come in handy for neighbour replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Allow address handlers to run on bridge portsPetr Machata
Currently the IP address event handlers bail out when the event is related to a netdevice that is a bridge port or a member of a LAG. In order to create a RIF when a bridged or LAG'd port is unenslaved, these event handlers will be replayed. However, at the point in time when the NETDEV_CHANGEUPPER event is delivered, informing of the loss of enslavement, the port is still formally enslaved. In order for the operation to have any effect, these handlers need an extra parameter to indicate that the check for bridge or LAG membership should not be done. In this patch, add an argument "nomaster" to several event handlers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_switchdev: Manage RIFs on PVID changePetr Machata
Currently, mlxsw has several shortcomings with regards to RIF handling due to PVID changes: - In order to cause RIF for a bridge device to be created, the user is expected first to set PVID, then to add an IP address. The reverse ordering is disallowed, which is not very user-friendly. - When such bridge gets a VLAN upper whose VID was the same as the existing PVID, and this VLAN netdevice gets an IP address, a RIF is created for this netdevice. The new RIF is then assigned to the 802.1Q FID for the given VID. This results in a working configuration. However, then, when the VLAN netdevice is removed again, the RIF for the bridge itself is never reassociated to the VLAN. - PVID cannot be changed once the bridge has uppers. Presumably this is because the driver does not manage RIFs properly in face of PVID changes. However, as the previous point shows, it is still possible to get into invalid configurations. In this patch, add the logic necessary for creation of a RIF as a result of PVID change. Moreover, when a VLAN upper is created whose VID matches lower PVID, do not create RIF for this netdevice. These changes obviate the need for ordering of IP address additions and PVID configuration, so stop forbidding addition of an IP address to a PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID changes when the bridge has uppers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argumentPetr Machata
For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make decisions based on the proposed value of PVID. Querying PVID reveals the current settings, not the in-flight values that the user requested and that the notifiers are acting upon. Add a parameter, lower_pvid, which carries the proposed PVID of the lower bridge, or -1 if the lower is not a bridge. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding stylePetr Machata
The bridge branch of the dispatch in this function is going to get more code and will need curly braces. Per the doctrine, that means the whole if-else chain should get them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Take VID for VLAN FIDs from RIF paramsPetr Machata
Currently, when an IP address is added to a bridge that has no PVID, the operation is rejected. An IP address addition is interpreted as a request to create a RIF for the bridge device, but without a PVID there is no VLAN for which the RIF should be created. Thus the correct way to create a RIF for a bridge as a user is to first add a PVID, and then add the IP address. Ideally this ordering requirement would not exist. RIF would be created either because an IP address is added, or because a PVID is added, depending on which comes last. For that, the switchdev code (which notices the PVID change request) must be able to request that a RIF is created with a given VLAN ID, because at the time that the PVID notification is distributed, the PVID setting is not yet visible for querying. Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to communicate the VID, and do not determine it ad-hoc in the fid_get callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14mlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_getPetr Machata
The fid_get callback is called to allocate a FID for the newly-created RIF. In a following patch, the fid_get implementation for VLANs will be modified to take the VLAN ID from the parameters instead of deducing it from the netdevice. To that end, propagate the RIF parameters to the fid_get callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-04mlxsw: spectrum_router: Fix an IS_ERR() vs NULL checkDan Carpenter
The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't return error pointers. Fix the check. Fixes: 78126cfd5dc9 ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-23mlxsw: spectrum_router: Track next hops at CRIFsPetr Machata
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The reason is that eventually, next hops for mlxsw uppers should be offloaded and unoffloaded on demand as a netdevice becomes an upper, or stops being one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the netdevice lifetime. Correspondingly, track at each next hop not its RIF, but its CRIF (from which a RIF can always be deduced). Note that now that next hops are tracked at a CRIF, it is not necessary to move each over to a new RIF when it is necessary to edit a RIF. Therefore drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy() call mlxsw_sp_nexthop_rif_update() directly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Split nexthop finalization to two stagesPetr Machata
Nexthop finalization consists of two steps: the part where the offload is removed, because the backing RIF is now gone; and the part where the association to the RIF is severed. Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later be called independently. Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini() vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a historical happenstance than a conscious decision. The two cleanups do not depend on each other, and this change should have no observable effects. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_indexPetr Machata
A previous patch added a pointer to loopback CRIF to the router data structure. That makes the loopback RIF index redundant, as everything necessary can be derived from the CRIF. Drop the field and adjust the code accordingly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Link CRIFs to RIFsPetr Machata
When a RIF is about to be created, the registration of the netdevice that it should be associated with must have been seen in the past, and a CRIF created. Therefore make this a hard requirement by looking up the CRIF during RIF creation, and complaining loudly when there isn't one. This then allows to keep a link between a RIF and its corresponding CRIF (and back, as the relationship is one-to-at-most-one), which do. The CRIF will later be useful as the objects tracked there will be offloaded lazily as a result of RIF creation. CRIFs are created when an "interesting" netdevice is registered, and destroyed after such device is unregistered. CRIFs are supposed to already exist when a RIF creation request arises, and exist at least as long as that RIF exists. This makes for a simple invariant: it is always safe to dereference CRIF pointer from "its" RIF. To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER event is delivered. The reason is that if a RIF's netdevices has an IPv6 address, removal of this address is notified in an atomic block. To remove the RIF, the IPv6 removal handler schedules a work item. It must be safe for this work item to access the associated CRIF as well. Thus when a netdevice that backs the CRIF is removed, if it still has a RIF, do not actually free the CRIF, only toggle its can_destroy flag, which this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIFPetr Machata
CRIFs are generally not maintained for loopback RIFs. However, the RIF for the default VRF is used for offloading of blackhole nexthops. Nexthops expect to have a valid CRIF. Therefore in this patch, add code to maintain CRIF for the loopback RIF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Maintain a hash table of CRIFsPetr Machata
CRIFs are objects that mlxsw maintains for netdevices that may not have an associated RIF (i.e. they may not have been instantiated in the ASIC), but if indeed they do not, it is quite possible they will in the future. These netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are created include e.g. bridges, LAGs, or front panel ports. The idea is that next hops would be kept at CRIFs, not RIFs, and thus it would be easier to offload and unoffload the entities that have been added before the RIF was created. In this patch, add the code for low-level CRIF maintenance: create and destroy, and keep in a table keyed by the netdevice pointer for easy recall. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIFPetr Machata
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the function mentioned in the subject. As such it forms an external interface of the router code. In future patches we will want to maintain connection between RIFs and the CRIFs (introduced in the next patch) that back them. That will not hold for the VRF-based loopback netdevices, so the whole CRIF business can be kept hidden from the rest of mlxsw. But for the main VRF loopback RIF we do want to keep the RIF-CRIF connection, because that RIF is used for blackhole next hops, and the next hop code can be kept simpler for assuming rif->crif is valid. Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback RIF. This being an internal function will take the CRIF argument anyway. Furthermore, the function does not lock, which is not necessary at this point in code yet. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-23mlxsw: spectrum_router: Add extack argument to mlxsw_sp_lb_rif_init()Petr Machata
The extack will be handy in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14mlxsw: spectrum_router: Move IPIP init upPetr Machata
mlxsw will need to keep track of certain devices that are not related to any of its front panel ports. This includes IPIP netdevices. To be able to query the list of supported IPIP types, router->ipip_ops_arr needs to be initialized. To that end, move the IPIP initialization up (and finalization correspondingly down). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper for RIF migrationPetr Machata
RIF configuration contains a number of parameters that cannot be changed after the RIF is created. For the IPIP loopbacks, this is currently worked around by creating a new RIF with the desired configuration changes applied, and updating next hops to the new RIF, and then destroying the old RIF. This operation will be useful as a reusable atom, so extract a helper to that effect. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Add a helper to check if netdev has addressesPetr Machata
This function will be useful later as the driver will need to retroactively create RIFs for new uppers with addresses. Add another helper that assumes RCU lock, and restructure the code to skip the IPv6 branch not through conditioning on the addr_list_empty variable, but by directly returning the result value. This makes the skip more obvious than it previously was. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper to free a RIFPetr Machata
Right now freeing the object that mlxsw uses to keep track of a RIF is as simple as calling a kfree. But later on as CRIF abstraction is brought in, it will involve severing the link between CRIF and its RIF as well. Better to have the logic encapsulated in a helper. Since a helper is being introduced, make it a full-fledged destructor and have it validate that the objects tracked at the RIF have been released. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access nhgi->rif through a helperPetr Machata
To abstract away deduction of RIF from the corresponding next hop group info (NHGI), mlxsw currently uses a macro. In its current form, that macro is impossible to extend to more general computation. Therefore introduce a helper, mlxsw_sp_nhgi_rif(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access nh->rif->dev through a helperPetr Machata
In order to abstract away deduction of netdevice from the corresponding next hop, introduce a helper, mlxsw_sp_nexthop_dev(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access rif->dev from params in mlxsw_sp_rif_create()Petr Machata
The previous patch added a helper to access a netdevice given a RIF. Using this helper in mlxsw_sp_rif_create() is unreasonable: the netdevice was given in RIF creation parameters. Just take it there. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Access rif->dev through a helperPetr Machata
In order to abstract away deduction of netdevice from the corresponding RIF, introduce a helper, mlxsw_sp_rif_dev(), and use it throughout. This will make it possible to change the deduction path easily later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Add a helper specifically for joining a LAGPetr Machata
Currently, joining a LAG very simply means that the LAG RIF should be joined by the subport representing untagged traffic. If the RIF does not exist, it does not have to be created: if the user wants there to be RIF for the LAG device, they are supposed to add an IP address, and they are supposed to do it after tha LAG becomes mlxsw upper. We can also assume that the LAG has no uppers, otherwise the enslavement is not allowed. In the future, these ordering dependencies should be removed. That means that joining LAG will be more complex operation, possibly involving a lazy RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers of the LAG. It will be handy to have a dedicated function that handles all this. The new function mlxsw_sp_router_port_join_lag() is that. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14mlxsw: spectrum_router: Extract a helper from mlxsw_sp_port_vlan_router_join()Petr Machata
Split out of mlxsw_sp_port_vlan_router_join() the part that checks for RIF and dispatches to __mlxsw_sp_port_vlan_router_join(), leaving it as wrapper that just manages the router lock. The new function, mlxsw_sp_port_vlan_router_join_existing(), will be useful as an atom in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-12mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev()Petr Machata
Now that the external users of mlxsw_sp_rif_dev() have been converted in the preceding patches, make the function static. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>