Age | Commit message (Collapse) | Author |
|
When a netdevice is removed from a bridge or a LAG, and it has an IP
address, it should join the router and gain a RIF. Do that by replaying
address addition event on the netdevice.
When handling deslavement of LAG or its upper from a bridge device, the
replay should be done after all the lowers of the LAG have left the bridge.
Thus these scenarios are handled by passing replay_deslavement of false,
and by invoking, after the lowers have been processed, a new helper,
mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper
handling, and in particular invokes the replay.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enslaving of front panel ports (and their uppers) to netdevices that
already have uppers is currently forbidden. When this is permitted, any
uppers with IP addresses need to have the NETDEV_UP inetaddr event
replayed, so that any RIFs are created.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As neighbours are created, mlxsw is involved through the netevent
notifications. When at the time there is no RIF for a given neighbour, the
notification is not acted upon. When the RIF is later created, these
outstanding neighbours are left unoffloaded and cause traffic to go through
the SW datapath.
In order to fix this issue, as a RIF is created, walk the ARP and ND tables
and find neighbours for the netdevice that represents the RIF. Then
schedule neighbour work for them, allowing them to be offloaded.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If IP address is added to a MACVLAN netdevice, the effect is of configuring
VRRP on the RIF for the netdevice linked to the MACVLAN. Because the
MACVLAN offload is tied to existence of a RIF at the linked netdevice,
adding a MACVLAN is currently not allowed until a RIF is present.
If this requirement stays, it will never be possible to attach a first port
into a topology that involves a MACVLAN. Thus topologies would need to be
built in a certain order, which is impractical.
Additionally, IP address removal, which leads to disappearance of the RIF
that the MACVLAN depends on, cannot be vetoed. Thus even as things stand
now it is possible to get to a state where a MACVLAN netdevice exists
without a RIF, despite having mlxsw lowers. And once the MACVLAN is
un-offloaded due to RIF getting destroyed, recreating the RIF does not
bring it back.
In this patch, accept that MACVLAN can be created out of order and support
that use case.
One option would seem to be to simply recognize MACVLAN netdevices as
"interesting", and let the existing replay mechanisms take care of the
offload. However, that does not address the necessity to reoffload MACVLAN
once a RIF is created.
Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(),
called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the
existing offloads, applies the configuration as if the netdevice were
created just now.
Additionally, remove all vetoes and warning messages that checked for
presence of a RIF at the linked device.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As RIF is created, refresh each netxhop group tracked at the CRIF for which
the RIF was created.
Note that nothing needs to be done for IPIP nexthops. The RIF for these is
either available from the get-go, or will never be available, so no after
the fact offloading needs to be done.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the following patches, the requirement that ports be only enslaved to
masters without uppers, is going to be relaxed. It will therefore be
necessary to join not only RIF for the immediate LAG, as is currently the
case, but also RIFs for VLAN netdevices upper to the LAG.
In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the
uppers of a LAG being joined, and also join any VLAN ones.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently it never happens that a netdevice that is already a bridge slave
would suddenly become mlxsw upper. The only case where this might be
possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG
has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is
forbidden. Thus the only way to install a LAG between a bridge and a mlxsw
port is by first enslaving the port to the LAG, and then enslaving that LAG
to a bridge. At that point there are no bridge objects (such as port VLANs)
to replay. Those are added afterwards, and notified as they are created.
This holds even for the PVID.
However in the following patches, the requirement that ports be only
enslaved to masters without uppers, is going to be relaxed. It will
therefore be necessary to replay the existing bridge objects. Without this
replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which
causes issues later, as a lot of code relies on their presence.
To that end, add a new notifier block whose sole role is to filter out
events related to the one relevant upper, and forward those to the existing
switchdev notifier block. Pass the new notifier block to
switchdev_bridge_port_offload() when the bridge port is created.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently it never happens that a netdevice that is already a bridge slave
would suddenly become mlxsw upper. The only case where this might be
possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG
already has an upper, enslaving mlxsw port to that LAG is forbidden. Thus
the only way to install a LAG between a bridge and a mlxsw port is by first
enslaving the port to the LAG, and then enslaving that LAG to a bridge.
However in the following patches, the requirement that ports be only
enslaved to masters without uppers, is going to be relaxed. It will
therefore be necessary to join bridges of LAG uppers. Without this replay,
the mlxsw bridge_port objects are not instantiated, which causes issues
later, as a lot of code relies on their presence.
Therefore in this patch, when the first mlxsw physical netdevice is
enslaved to a LAG, consider bridges upper to the LAG (both the direct
master, if any, and any bridge masters of VLAN uppers), and have the
relevant netdevices join their bridges.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When handling deslavement of LAG or its upper from a bridge device, when
the deslaved netdevice has an IP address, it should join the router. This
should be done after all the lowers of the LAG have left the bridge. The
replay intended to cause the device to join the router therefore cannot be
invoked unconditionally in the event handlers themselves. It can be done
right away if the handler is invoked for a sole device, but when it is
invoked repeated for each LAG lower, the replay needs to be postponed
until after this processing is done.
To that end, add a boolean parameter, replay_deslavement, to
mlxsw_sp_netdevice_port_upper_event(), mlxsw_sp_netdevice_port_vlan_event()
and one helper on the call path. Have the invocations that are done for
sole netdevices pass true, and those done for LAG lowers pass false.
Nothing depends on this flag at this point, but it removes some noise from
the patch that introduces the replay itself.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the bridge-related handlers bail out when the event is related to
a netdevice that is not an upper of one of the front-panel ports. In order
to allow enslavement of front-panel ports to bridges that already have
uppers, it will be necessary to replay CHANGEUPPER events to validate that
the configuration is offloadable. In order for the replay to be effective,
it must be possible to ignore unsupported configuration in the context of
an actual notifier event, but to still "veto" these configurations when the
validation is performed.
To that end, introduce two parameters to a number of handlers: mlxsw_sp,
because it will not be possible to deduce that from the netdevice lowers;
and process_foreign to indicate whether netdevices that are not front panel
uppers should be validated.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the meat of mlxsw_sp_netdevice_event() to a separate function that
does just the validation. This separate helper will be possible to call
later for recursive ascent when validating attachment of a front panel port
to a bridge with uppers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This will come in handy for neighbour replay.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the IP address event handlers bail out when the event is related
to a netdevice that is a bridge port or a member of a LAG. In order to
create a RIF when a bridged or LAG'd port is unenslaved, these event
handlers will be replayed. However, at the point in time when the
NETDEV_CHANGEUPPER event is delivered, informing of the loss of
enslavement, the port is still formally enslaved.
In order for the operation to have any effect, these handlers need an extra
parameter to indicate that the check for bridge or LAG membership should
not be done. In this patch, add an argument "nomaster" to several event
handlers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, mlxsw has several shortcomings with regards to RIF handling due
to PVID changes:
- In order to cause RIF for a bridge device to be created, the user is
expected first to set PVID, then to add an IP address. The reverse
ordering is disallowed, which is not very user-friendly.
- When such bridge gets a VLAN upper whose VID was the same as the existing
PVID, and this VLAN netdevice gets an IP address, a RIF is created for
this netdevice. The new RIF is then assigned to the 802.1Q FID for the
given VID. This results in a working configuration. However, then, when
the VLAN netdevice is removed again, the RIF for the bridge itself is
never reassociated to the VLAN.
- PVID cannot be changed once the bridge has uppers. Presumably this is
because the driver does not manage RIFs properly in face of PVID changes.
However, as the previous point shows, it is still possible to get into
invalid configurations.
In this patch, add the logic necessary for creation of a RIF as a result of
PVID change. Moreover, when a VLAN upper is created whose VID matches lower
PVID, do not create RIF for this netdevice.
These changes obviate the need for ordering of IP address additions and
PVID configuration, so stop forbidding addition of an IP address to a
PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID
changes when the bridge has uppers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make
decisions based on the proposed value of PVID. Querying PVID reveals the
current settings, not the in-flight values that the user requested and that
the notifiers are acting upon. Add a parameter, lower_pvid, which carries
the proposed PVID of the lower bridge, or -1 if the lower is not a bridge.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bridge branch of the dispatch in this function is going to get more
code and will need curly braces. Per the doctrine, that means the whole
if-else chain should get them.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when an IP address is added to a bridge that has no PVID, the
operation is rejected. An IP address addition is interpreted as a request
to create a RIF for the bridge device, but without a PVID there is no VLAN
for which the RIF should be created. Thus the correct way to create a RIF
for a bridge as a user is to first add a PVID, and then add the IP address.
Ideally this ordering requirement would not exist. RIF would be created
either because an IP address is added, or because a PVID is added,
depending on which comes last.
For that, the switchdev code (which notices the PVID change request) must
be able to request that a RIF is created with a given VLAN ID, because at
the time that the PVID notification is distributed, the PVID setting is not
yet visible for querying.
Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to
communicate the VID, and do not determine it ad-hoc in the fid_get
callback.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fid_get callback is called to allocate a FID for the newly-created RIF.
In a following patch, the fid_get implementation for VLANs will be modified
to take the VLAN ID from the parameters instead of deducing it from the
netdevice. To that end, propagate the RIF parameters to the fid_get
callback.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the reason for rejection of PVID manipulation is dumped to
syslog, and a generic -EBUSY is returned to the userspace. But
switchdev_handle_port_obj_add(), through which we get to
mlxsw_sp_port_vlans_add(), handles extack just fine, and we can pass the
message this way.
This improves visibility into reasons why the request to change PVID
was rejected. Before the change:
# bridge vlan add dev br vid 2 self pvid untagged
RTNETLINK answers: Device or resource busy
(plus a syslog line)
After the change:
# bridge vlan add dev br vid 2 self pvid untagged
Error: mlxsw_spectrum: Can't change PVID, it's used by router interface.
Note that this particular error message is going away in the following
patches. However the ability to pass error messages through extack will be
useful more broadly for communicating in particular reasons why a RIF
failed to be created.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the ability to match on port ranges by utilizing the previously
added port range registers and the port range key element. Up to two
port range registers can be used for each filter, one for source port
and another for destination port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/df4385a9592917e9a22ebff339e0463e4a8dfa82.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The main driver structure will be needed in this function by a
subsequent patch, so pass it. No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/24d96a4e21310e5de2951ace58263db35e44a0df.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the port range key element to supported key blocks so that it could
be used to match on the output of the port range registers. Each bit in
the element can be used to match on the output of the port range
register with the corresponding index.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/f0423f6ee9e36c6b0a426bc9995f42223c48f2db.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expose via devlink-resource the maximum number of port range registers
and their current occupancy. Besides the observability benefits, this
resource will be used by subsequent patches for scale and occupancy
tests.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/7945e0c715dc5efb1617f45f7560c1f1bd0bcf8a.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Spectrum ASICs have a fixed number of port range registers, each of
which maintains the following parameters:
* Minimum and maximum port.
* Apply port range for source port, destination port or both.
* Apply port range for TCP, UDP or both.
* Apply port range for IPv4, IPv6 or both.
Implement a port range core which takes care of the allocation and
configuration of these registers and exposes an API that allows
in-driver consumers (e.g., the ACL code) to request matching on a range
of either source or destination port.
These registers are going to be used for port range matching in the
flower classifier that already matches on EtherType being IPv4 / IPv6 and
IP protocol being TCP / UDP. As such, there is no need to limit these
registers to a specific EtherType or IP protocol, which will increase
the likelihood of a register being shared by multiple flower filters.
It is unlikely that a filter will match on the same range of both source
and destination ports, which is why each register is only configured to
match on either source or destination port. If a filter requires
matching on a range of both source and destination ports, it will
utilize two port range registers and match on the output of both.
For efficient lookup and traversal, use XArray to store the allocated
port range registers. The XArray uses RCU and an internal spinlock to
synchronise access, so there is no need for a dedicate lock.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/674f00539a0072d455847663b5feb504db51a259.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a resource identifier for maximum number of layer 4 port range
register so that it could be later used to query the information from
firmware.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/59a8fec353d5ad9fbfb7612e4a7ff61eaedad445.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the Policy-Engine Port Range Register that is used for configuring
port range identification.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/d1a1f53d758f7452cf5abfe006b23496076ec3e6.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't
return error pointers. Fix the check.
Fixes: 78126cfd5dc9 ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The line cards array is not freed in the error path of
mlxsw_m_linecards_init(), which can lead to a memory leak. Fix by
freeing the array in the error path, thereby making the error path
identical to mlxsw_m_linecards_fini().
Fixes: 01328e23a476 ("mlxsw: minimal: Extend module to port mapping with slot index")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230630012647.1078002-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
reason is that eventually, next hops for mlxsw uppers should be offloaded
and unoffloaded on demand as a netdevice becomes an upper, or stops being
one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
netdevice lifetime.
Correspondingly, track at each next hop not its RIF, but its CRIF (from
which a RIF can always be deduced).
Note that now that next hops are tracked at a CRIF, it is not necessary to
move each over to a new RIF when it is necessary to edit a RIF. Therefore
drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
call mlxsw_sp_nexthop_rif_update() directly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Nexthop finalization consists of two steps: the part where the offload is
removed, because the backing RIF is now gone; and the part where the
association to the RIF is severed.
Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
be called independently.
Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
historical happenstance than a conscious decision. The two cleanups do not
depend on each other, and this change should have no observable effects.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A previous patch added a pointer to loopback CRIF to the router data
structure. That makes the loopback RIF index redundant, as everything
necessary can be derived from the CRIF. Drop the field and adjust the code
accordingly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a RIF is about to be created, the registration of the netdevice that
it should be associated with must have been seen in the past, and a CRIF
created. Therefore make this a hard requirement by looking up the CRIF
during RIF creation, and complaining loudly when there isn't one.
This then allows to keep a link between a RIF and its corresponding
CRIF (and back, as the relationship is one-to-at-most-one), which do.
The CRIF will later be useful as the objects tracked there will be
offloaded lazily as a result of RIF creation.
CRIFs are created when an "interesting" netdevice is registered, and
destroyed after such device is unregistered. CRIFs are supposed to already
exist when a RIF creation request arises, and exist at least as long as
that RIF exists. This makes for a simple invariant: it is always safe to
dereference CRIF pointer from "its" RIF.
To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
event is delivered. The reason is that if a RIF's netdevices has an IPv6
address, removal of this address is notified in an atomic block. To remove
the RIF, the IPv6 removal handler schedules a work item. It must be safe
for this work item to access the associated CRIF as well.
Thus when a netdevice that backs the CRIF is removed, if it still has a
RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CRIFs are generally not maintained for loopback RIFs. However, the RIF for
the default VRF is used for offloading of blackhole nexthops. Nexthops
expect to have a valid CRIF. Therefore in this patch, add code to maintain
CRIF for the loopback RIF as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CRIFs are objects that mlxsw maintains for netdevices that may not have an
associated RIF (i.e. they may not have been instantiated in the ASIC), but
if indeed they do not, it is quite possible they will in the future. These
netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
created include e.g. bridges, LAGs, or front panel ports. The idea is that
next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
offload and unoffload the entities that have been added before the RIF was
created.
In this patch, add the code for low-level CRIF maintenance: create and
destroy, and keep in a table keyed by the netdevice pointer for easy
recall.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
function mentioned in the subject. As such it forms an external interface
of the router code.
In future patches we will want to maintain connection between RIFs and the
CRIFs (introduced in the next patch) that back them. That will not hold
for the VRF-based loopback netdevices, so the whole CRIF business can be
kept hidden from the rest of mlxsw.
But for the main VRF loopback RIF we do want to keep the RIF-CRIF
connection, because that RIF is used for blackhole next hops, and the next
hop code can be kept simpler for assuming rif->crif is valid.
Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
RIF. This being an internal function will take the CRIF argument anyway.
Furthermore, the function does not lock, which is not necessary at this
point in code yet.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The extack will be handy in later patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlxsw will need to keep track of certain devices that are not related to
any of its front panel ports. This includes IPIP netdevices. To be able to
query the list of supported IPIP types, router->ipip_ops_arr needs to be
initialized.
To that end, move the IPIP initialization up (and finalization
correspondingly down).
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
RIF configuration contains a number of parameters that cannot be changed
after the RIF is created. For the IPIP loopbacks, this is currently worked
around by creating a new RIF with the desired configuration changes
applied, and updating next hops to the new RIF, and then destroying the old
RIF. This operation will be useful as a reusable atom, so extract a helper
to that effect.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This function will be useful later as the driver will need to retroactively
create RIFs for new uppers with addresses.
Add another helper that assumes RCU lock, and restructure the code to
skip the IPv6 branch not through conditioning on the addr_list_empty
variable, but by directly returning the result value. This makes the skip
more obvious than it previously was.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Right now freeing the object that mlxsw uses to keep track of a RIF is as
simple as calling a kfree. But later on as CRIF abstraction is brought in,
it will involve severing the link between CRIF and its RIF as well. Better
to have the logic encapsulated in a helper.
Since a helper is being introduced, make it a full-fledged destructor and
have it validate that the objects tracked at the RIF have been released.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
To abstract away deduction of RIF from the corresponding next hop group
info (NHGI), mlxsw currently uses a macro. In its current form, that macro
is impossible to extend to more general computation. Therefore introduce a
helper, mlxsw_sp_nhgi_rif(), and use it throughout. This will make it
possible to change the deduction path easily later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In order to abstract away deduction of netdevice from the corresponding
next hop, introduce a helper, mlxsw_sp_nexthop_dev(), and use it
throughout. This will make it possible to change the deduction path easily
later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The previous patch added a helper to access a netdevice given a RIF. Using
this helper in mlxsw_sp_rif_create() is unreasonable: the netdevice was
given in RIF creation parameters. Just take it there.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In order to abstract away deduction of netdevice from the corresponding
RIF, introduce a helper, mlxsw_sp_rif_dev(), and use it throughout. This
will make it possible to change the deduction path easily later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, joining a LAG very simply means that the LAG RIF should be
joined by the subport representing untagged traffic. If the RIF does not
exist, it does not have to be created: if the user wants there to be RIF
for the LAG device, they are supposed to add an IP address, and they are
supposed to do it after tha LAG becomes mlxsw upper.
We can also assume that the LAG has no uppers, otherwise the enslavement is
not allowed.
In the future, these ordering dependencies should be removed. That means
that joining LAG will be more complex operation, possibly involving a lazy
RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers
of the LAG. It will be handy to have a dedicated function that handles all
this. The new function mlxsw_sp_router_port_join_lag() is that.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Split out of mlxsw_sp_port_vlan_router_join() the part that checks for RIF
and dispatches to __mlxsw_sp_port_vlan_router_join(), leaving it as wrapper
that just manages the router lock.
The new function, mlxsw_sp_port_vlan_router_join_existing(), will be useful
as an atom in later patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
commit 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the external users of mlxsw_sp_rif_dev() have been converted in
the preceding patches, make the function static.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a number of places, a netdevice underlying a RIF is obtained only to
compare it to another pointer. In order to clean up the interface between
the router and the other modules, add a new helper to specifically answer
this question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a number of places, a netdevice underlying a RIF is obtained only to
check if it a NULL pointer. In order to clean up the interface between the
router and the other modules, add a new helper to specifically answer this
question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|