Age | Commit message (Collapse) | Author |
|
mlxsw used to hold an array of qdiscs indexed by the TC number. In the
previous patch, it was changed to allocate child qdiscs dynamically, and
they are now indexed by band number. Follow suit with the array of future
FIFOs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of keeping qdiscs in globally-preallocated arrays, introduce a
per-qdisc-kind value num_classes, and then allocate the necessary child
qdiscs (if any) based on that value. Since now dynamic allocation is
involved, mlxsw_sp_qdisc_replace() gets messy enough that it is worth it to
split it to two cases: a new qdisc allocation and a change of existing
qdisc. (Note that the change also includes what TC formally calls replace,
if the qdisc kind is the same.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The FIFO handler currently guards accesses to the future FIFO tracking by
asserting RTNL. In the future, the changes to the qdisc state will be more
thorough, so other qdiscs will need this guarding is as well. In order
to not further the RTNL infestation, instead convert to a custom lock that
will guard accesses to the qdisc state.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlxsw currently allows a two-level structure of qdiscs: the root and
possibly a number of children. In order to support offloading more general
qdisc trees, introduce to struct mlxsw_sp_qdisc a pointer to child qdiscs.
Refer to the child qdiscs through this pointer, instead of going through
the tclass_qdiscs in qdisc_state. Additionally introduce a field
num_classes, which holds number of given qdisc's children.
Also introduce a generic function for walking qdisc trees. Rewrite
mlxsw_sp_qdisc_find() and _find_by_handle() to use the generic walker.
For now, keep the qdisc_state.tclass_qdisc, and just point root_qdiscs's
children to this array. Following patches will make the allocation dynamic.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a qdisc is removed, it is necessary to update the backlog value at its
parent--unless the qdisc is at root position. RED, TBF and FIFO all do
that, each separately. Since all of them need to do this, just promote the
operation directly to mlxsw_sp_qdisc_destroy(), instead of deferring it to
individual destructors. Since FIFO dtor thus becomes trivial, remove it.
Add struct mlxsw_sp_qdisc.parent to point at the parent qdisc. This will be
handy later as deeper structures are offloaded. Use the parent qdisc to
find the chain of parents whose backlog value needs to be updated.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tclass_num is just a number, a value that would be ordinarily passed around
as an int. (Which is unlike a u8 prio_bitmap.) In several places,
tclass_num already is an int. Convert the remaining instances.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function mlxsw_sp_qdisc_compare() is invoked a couple lines above this
check, which will bounce any requests where this condition does not hold.
Therefore drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The purpose of this function is to filter out events that are related to
qdiscs that are not offloaded, or are not offloaded anymore. But the
function is unnecessarily thorough:
- mlxsw_sp_qdisc pointer is never NULL in the context where it is called
- Two qdiscs with the same handle will never have different types. Even
when replacing one qdisc with another in the same class, Linux will not
permit handle reuse unless the qdisc type also matches.
Simplify the function by omitting these two unnecessary conditions.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mlxsw_sp_qdisc argument is not used in any of the actual callbacks.
Drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlxsw has nicely grouped stats, add support for standard uAPI.
I'm guessing the register access part. Compile tested only.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.
Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.
Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some drivers clear the 'ethtool_link_ksettings' struct in their
get_link_ksettings() callback, before populating it with actual values.
Such drivers will set the new 'link_mode' field to zero, resulting in
user space receiving wrong link mode information given that zero is a
valid value for the field.
Another problem is that some drivers (notably tun) can report random
values in the 'link_mode' field. This can result in a general protection
fault when the field is used as an index to the 'link_mode_params' array
[1].
This happens because such drivers implement their set_link_ksettings()
callback by simply overwriting their private copy of
'ethtool_link_ksettings' struct with the one they get from the stack,
which is not always properly initialized.
Fix these problems by removing 'link_mode' from 'ethtool_link_ksettings'
and instead have drivers call ethtool_params_from_link_mode() with the
current link mode. The function will derive the link parameters (e.g.,
speed) from the link mode and fill them in the 'ethtool_link_ksettings'
struct.
v3:
* Remove link_mode parameter and derive the link parameters in
the driver instead of passing link_mode parameter to ethtool
and derive it there.
v2:
* Introduce 'cap_link_mode_supported' instead of adding a
validity field to 'ethtool_link_ksettings' struct.
[1]
general protection fault, probably for non-canonical address 0xdffffc00f14cc32c: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x000000078a661960-0x000000078a661967]
CPU: 0 PID: 8452 Comm: syz-executor360 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__ethtool_get_link_ksettings+0x1a3/0x3a0 net/ethtool/ioctl.c:446
Code: b7 3e fa 83 fd ff 0f 84 30 01 00 00 e8 16 b0 3e fa 48 8d 3c ed 60 d5 69 8a 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03
+38 d0 7c 08 84 d2 0f 85 b9
RSP: 0018:ffffc900019df7a0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888026136008 RCX: 0000000000000000
RDX: 00000000f14cc32c RSI: ffffffff873439ca RDI: 000000078a661960
RBP: 00000000ffff8880 R08: 00000000ffffffff R09: ffff88802613606f
R10: ffffffff873439bc R11: 0000000000000000 R12: 0000000000000000
R13: ffff88802613606c R14: ffff888011d0c210 R15: ffff888011d0c210
FS: 0000000000749300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b60f0 CR3: 00000000185c2000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
linkinfo_prepare_data+0xfd/0x280 net/ethtool/linkinfo.c:37
ethnl_default_notify+0x1dc/0x630 net/ethtool/netlink.c:586
ethtool_notify+0xbd/0x1f0 net/ethtool/netlink.c:656
ethtool_set_link_ksettings+0x277/0x330 net/ethtool/ioctl.c:620
dev_ethtool+0x2b35/0x45d0 net/ethtool/ioctl.c:2842
dev_ioctl+0x463/0xb70 net/core/dev_ioctl.c:440
sock_do_ioctl+0x148/0x2d0 net/socket.c:1060
sock_ioctl+0x477/0x6a0 net/socket.c:1177
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and duplex parameters")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Disable software thermal protection by removing critical trip points
from all thermal zones.
The software thermal protection is redundant given there are two layers
of protection below it in firmware and hardware. The first layer is
performed by firmware, the second, in case firmware was not able to
perform protection, by hardware.
The temperature threshold set for hardware protection is always higher
than for firmware.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
requested
When cleared, the 'force' parameter in nexthop bucket replacement
notifications indicates that a driver should try to perform an atomic
replacement. Meaning, only update the contents of the bucket if it is
inactive.
Since mlxsw only queries buckets' activity once every second, there is
no point in trying an atomic replacement if the idle timer interval is
smaller than 1 second.
Currently, mlxsw ignores the original value of 'force' and will always
try an atomic replacement if the idle timer is not smaller than 1
second.
Fix this by taking the original value of 'force' into account and never
promoting a non-atomic replacement to an atomic one.
Fixes: 617a77f044ed ("mlxsw: spectrum_router: Add nexthop bucket replacement support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The per-port sampling triggers (i.e., ingress / egress) cannot be
enabled twice. Meaning, the below configuration will not result in
packets being sampled twice:
# tc filter add dev swp1 ingress matchall skip_sw action sample rate 100 group 1
# tc filter add dev swp1 ingress matchall skip_sw action sample rate 100 group 1
Therefore, reject such configurations.
Fixes: 90f53c53ec4a ("mlxsw: spectrum: Start using sampling triggers hash table")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Perform the priority check earlier in the function instead of repeating
it for every action. This fixes a bug that allowed matchall rules with
sample action to be added in front of flower rules on egress.
Fixes: 54d0e963f683 ("mlxsw: spectrum_matchall: Add support for egress sampling")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previous patch moved the protocol check out of the action check, so
these if statements can now be converted to a switch statement. Perform
the conversion.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Perform the protocol check earlier in the function instead of repeating
it for every action. Example:
# tc filter add dev swp1 ingress proto ip matchall skip_sw action sample group 1 rate 100
Error: matchall rules only supported with 'all' protocol.
We have an error talking to the kernel
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cited commit changed the behavior of the software data path with regards
to the ECN marking of decapsulated packets. However, the commit did not
change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
driver is using the function in order to ensure that the hardware and
software data paths act the same with regards to the ECN marking of
decapsulated packets.
The discrepancy was uncovered by commit 5aa3c334a449 ("selftests:
forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
aligned the selftest to the new behavior. Without this patch the
selftest passes when used with veth pairs, but fails when used with
mlxsw netdevs.
Fix this by instructing the device to propagate the ECT(1) mark from the
outer header to the inner header when the inner header is ECT(0), for
both NVE and IP-in-IP tunnels.
A helper is added in order not to duplicate the code between both tunnel
types.
Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that mlxsw supports resilient nexthop groups, allow them to be
programmed after validating that their configuration conforms to the
device's limitations (e.g., number of buckets is within predefined
range).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kernel periodically checks the idle time of nexthop buckets to
determine if they are idle and can be re-populated with a new nexthop.
When the resilient nexthop group is offloaded to hardware, the kernel
will not see activity on nexthop buckets unless it is reported from
hardware.
Therefore, periodically (every 1 second) query the hardware for activity
of adjacency entries used as part of a resilient nexthop group and
report it to the nexthop code.
The activity is only queried if resilient nexthop groups are in use. The
delayed work is canceled otherwise.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RATRAD register is used to dump and optionally clear activity bits
of router adjacency table entries. Will be used by the next patch to
query and clear the activity of nexthop buckets in a resilient nexthop
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So far, mlxsw only updated hardware flags ('offload' / 'trap') on
nexthop objects. For resilient nexthop groups, these flags need to be
updated on individual nexthop buckets as well.
Update these flags whenever updating the flags of the encapsulating
nexthop object and whenever a nexthop bucket is replaced.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace a single nexthop bucket upon receiving a
'NEXTHOP_EVENT_BUCKET_REPLACE' notification.
When the 'force' parameter is not set, instruct the device to only
overwrite an adjacency entry if its activity is cleared, so as not to
break existing flows using the adjacency entry. The device does not
provide feedback if the replacement was successful in this case, so the
contents of the adjacency entry after the replacement are compared with
the replacement request.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Have the caller pass a pointer to the payload of the RATR register to
the function updating a single nexthop / adjacency entry.
In a subsequent patch, this will allow the caller to make sure
replacement was successful by querying the state of the adjacency entry
after replacement and comparing with the initial request.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inactive
Allow the driver to instruct the device to only overwrite an adjacency
entry if its activity is cleared. Currently, adjacency entry is always
overwritten, regardless of activity.
This will be used by subsequent patches to prevent replacement of active
nexthop buckets.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Parse the configuration of resilient nexthop groups to existing mlxsw
data structures. Unlike non-resilient groups, nexthops without a valid
MAC or router interface (RIF) are programmed with a trap action instead
of not being programmed at all.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spectrum-{2,3} support different adjacency group size ranges compared to
Spectrum-1. Add an array describing these ranges and change the common
code to use the array which was set during the per-ASIC initialization.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The device supports a fixed set of adjacency group sizes. Encode these
sizes in an array, so that the next patch will be able to split it
between Spectrum-1 and Spectrum-{2,3}, which support different size
ranges.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are several differences in the router module between Spectrum-1
and Spectrum-{2,3}. Currently, this is only apparent in the router
interface (RIF) operations that are split between these ASICs.
A subsequent patch is going to introduce another difference between
these ASICs.
Create per-ASIC router operations that will encapsulate all these
differences. For now, these operations are only used to set the per-ASIC
RIF operations in 'mlxsw_sp->router->rif_ops_arr'. Note that this fields
was unused since commit 1f5b23033937 ("mlxsw: spectrum: Set RIF ops per
ASIC type").
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid updating neighbour and adjacency entries in hardware when the
neighbour is already connected and its MAC address did not change. This
can happen, for example, when neighbour transitions between valid states
such as 'NUD_REACHABLE' and 'NUD_DELAY'.
This is especially important for resilient hashing as these updates will
result in adjacency entries being marked as active.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
function
The validation of a nexthop group entry is also necessary for resilient
nexthop groups, so break the validation to a separate function to allow
for code reuse in subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Encapsulate this functionality in a separate function, so that it could
be invoked by follow-up patches, when replacing a nexthop bucket that is
part of a resilient nexthop group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlxsw_sp_nexthop_update() is used to update the configuration of
Ethernet-type nexthops, as opposed to mlxsw_sp_nexthop_ipip_update(),
which is used to update IPinIP-type nexthops.
Rename the function to mlxsw_sp_nexthop_eth_update(), so that it is
consistent with mlxsw_sp_nexthop_ipip_update().
It will allow us to introduce mlxsw_sp_nexthop_update() in a follow-up
patch, which calls either of above mentioned function based on the
nexthop's type.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, nexthops are programmed with either forward or discard action
(for blackhole nexthops). Nexthops that do not have a valid MAC address
(neighbour) or router interface (RIF) are simply not written to the
adjacency table.
In resilient nexthop groups, the size of the group must remain fixed and
the kernel is in complete control of the layout of the adjacency table.
A nexthop without a valid MAC or RIF will therefore be written with a
trap action, to trigger neighbour resolution.
Allow such nexthops to be programmed to the adjacency table to enable
above mentioned use case.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Nexthops that need to be programmed with a trap action might not have a
valid router interface (RIF) associated with them. Therefore, use the
loopback RIF created during initialization to program them to the
device.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the action associated with the nexthop is assumed to be
'forward' unless the 'discard' bit is set.
Instead, simplify this by introducing a dedicated field to represent the
action of the nexthop. This will allow us to more easily introduce more
actions, such as trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comments assume that nexthops are simple Ethernet nexthops
that are programmed to forward packets to the associated neighbour. This
is no longer the case, as both IPinIP and blackhole nexthops are now
supported.
Adjust the comments to reflect these changes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper returns the MAC address associated with the nexthop. It is
only valid when the nexthop forwards packets and when it is an Ethernet
nexthop. Reflect this in the checks the helper is performing.
This is not an issue because the sole caller of the function only
invokes it for such nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper mlxsw_sp_nexthop_offload() is actually interested in finding
out if the nexthop is both written to the adjacency table and forwarding
packets (as opposed to discarding them).
Rename it to mlxsw_sp_nexthop_is_forward() and remove
mlxsw_sp_nexthop_is_discard().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the RTNL assertion in the nexthop notifier block. The assertion
is not needed given RTNL is never assumed to be taken.
This is a preparation for future patches where mlxsw will start handling
nexthop events that are not always sent with RTNL held.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently only one EtherType can be configured for pushing in tunnels
because EtherType is configured using SPVID.et_vlan for tunnel port.
This behavior is forbidden by comparing mlxsw_sp_nve_config struct for
each new tunnel, the struct contains 'ethertype' field which means that
only one EtherType is legal at any given time. Remove 'ethertype' field to
allow creating VxLAN devices with different bridges.
To allow using several types of VxLAN bridges at the same time, the
EtherType should be determined at the egress port. This behavior is
achieved by setting SPVID to decide which EtherType to push at egress and
for each local_port which is member in 802.1ad bridge, set SPEVET.et_vlan
to ether_type1 (i.e., 0x88A8).
Use switchdev_ops->init() to set different mlxsw_sp_bridge_ops for
different ASICs in order to be able to split the behavior when port joins /
leaves an 802.1ad bridge in different ASICs.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A subsequent patch will need to implement different set of operations
when a port joins / leaves an 802.1ad bridge, based on the ASIC type.
Prepare for this change by allowing to initialize the bridge module
based on the ASIC type via 'struct mlxsw_sp_switchdev_ops'.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A subsequent patch will cause decapsulated packets to have their EtherType
determined by the egress port. Add mlxsw_sp_port_egress_ethtype_set() which
will be called when a port joins an 802.1ad bridge, so that it will set an
802.1ad EtherType on decapsulated packets transmitted through it, instead
of the default 802.1q EtherType.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SPEVET configures which EtherType to push at egress for packets incoming
through a local port for which 'SPVID.egr_et_set' is set.
The next patches will use SPEVET to configure EtherType 0x88A8 and
0x8100 for local ports member in 802.1ad and 802.1q bridges,
respectively. This allows using dual VxLAN bridges (802.1d and 802.1ad at
the same time).
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SPVID.egr_et_set=1 means that when VLAN is pushed at ingress (for untagged
packets or for QinQ push mode) then the EtherType is decided at the egress
port.
The next patches will use this field for VxLAN devices (tunnel port) in
order to allow using dual VxLAN bridges (802.1d and 802.1ad at the same
time).
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement support for action sample when used with a flower classifier
by implementing the required sampler_add() / sampler_del() callbacks and
registering an Rx listener for the sampled packets.
The sampler_add() callback returns an error for Spectrum-1 as the
functionality is not supported. In Spectrum-{2,3} the callback creates a
mirroring agent towards the CPU. The agent's identifier is used by the
policy engine code to mirror towards the CPU with probability.
The Rx listener for the sampled packet is registered with the 'policy
engine' mirroring reason and passes trapped packets to the psample
module after looking up their parameters (e.g., sampling group).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add core functionality required to support mirror sampler action in the
policy engine. The switch driver (e.g., 'mlxsw_spectrum') is required to
implement the sampler_add() / sampler_del() callbacks that perform the
necessary configuration before the sampler action can be installed. The
next patch will implement it for Spectrum-{2,3}, while Spectrum-1 will
return an error, given it is not supported.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow user space to install a matchall classifier with sample action on
egress. This is only supported on Spectrum-2 onwards, so Spectrum-1 will
continue to return an error.
Programming the hardware to sample on egress is identical to ingress
sampling with the sole change of using a different sampling trigger.
Upon receiving a sampled packet, the sampling trigger (ingress vs.
egress) will be encoded in the mirroring reason in the Completion Queue
Element (CQE). The mirroring reason is used to lookup the sampling
parameters (e.g., psample group) which are passed to the psample module.
Note that locally generated packets that are sampled are simply
consumed. This is done for several reasons.
First, such packets do not have an ingress netdev given that their Rx
local port is the CPU port. This breaks several basic assumptions.
Second, sampling using the same interface (tc), but with flower
classifier will not result in locally generated packets being sampled
given that such packets are not subject to the policy engine.
Third, realistically, this is not a big deal given that the vast
majority of the packets being transmitted through the port are not
locally generated packets.
Fourth, if such packets do need to be sampled, they can be sampled with
a 'skip_hw' filter and reported to the same sampling group as the data
path packets. The software sampling rate can also be adjusted to fit the
rate of the locally generated packets which is much lower than the rate
of the data path traffic.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|