Age | Commit message (Collapse) | Author |
|
Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
Add support for IPv6 underlay.
The main differences are:
* Learning is not supported for IPv6 FDB entries, use static entries and
do not allow 'learning' flag for IPv6 VxLAN.
* IPv6 addresses for FDB entries should be saved as part of KVDL.
Use the new API to allocate and release entries for IPv6 addresses.
* Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
packets with checksum zero are dropped.
Force the relevant flags which allow the VxLAN device to generate UDP
packets with zero checksum and also receive them.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold
a reference on a resource. Namely, the KVDL entry where the IPv6
underlay destination IP is stored. When such an FDB entry is deleted, it
needs to drop the reference from the corresponding KVDL entry.
To that end, maintain a hash table that maps an FDB entry (i.e., {MAC,
FID}) to the IPv6 address used by it.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cited commit changed the behavior of the software data path with regards
to the ECN marking of decapsulated packets. However, the commit did not
change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
driver is using the function in order to ensure that the hardware and
software data paths act the same with regards to the ECN marking of
decapsulated packets.
The discrepancy was uncovered by commit 5aa3c334a449 ("selftests:
forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
aligned the selftest to the new behavior. Without this patch the
selftest passes when used with veth pairs, but fails when used with
mlxsw netdevs.
Fix this by instructing the device to propagate the ECT(1) mark from the
outer header to the inner header when the inner header is ECT(0), for
both NVE and IP-in-IP tunnels.
A helper is added in order not to duplicate the code between both tunnel
types.
Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implementation of Q-in-VNI is different between ASIC types, this set adds
support only for Spectrum-2.
Return an error when trying to create VxLAN device and enslave it to
802.1ad bridge in Spectrum-1.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add EtherType field to mlxsw_sp_nve_config struct.
Set EtherType according to mlxsw_sp_nve_params.ethertype.
Pass 'mlxsw_sp_nve_params' instead of 'mlxsw_sp_nve_params->dev' to the
function which initializes mlxsw_sp_nve_config struct to know which
EtherType to use.
This field is needed to configure which EtherType will be used when
VLAN is pushed at ingress of the tunnel port.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently SFN, TNUMT and TNPC registers use separate enums for
tunnel_port.
Create one enum with a neutral name and use it.
Remove the enums that are not currently required.
The next patches add two more registers that contain tunnel_port field,
the new enum can be used for them also.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a host route is added, the driver checks if the route needs to be
promoted to perform NVE decapsulation based on the current NVE
configuration. If so, the index of the decapsulation entry is retrieved
and associated with the route.
Currently, this information is stored in the NVE module which the router
module consults. Since the information is protected under RTNL and since
route insertion happens with RTNL held, there is no problem to retrieve
the information from the NVE module.
However, this is going to change and route insertion will no longer
happen under RTNL. Instead, a dedicated lock will be introduced for the
router module.
Therefore, store this information in the router module and change the
router module to consult this copy.
The validity of the information is set / cleared whenever an NVE tunnel
is initialized / de-initialized. When this happens the NVE module calls
into the router module to promote / demote the relevant host route.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The device supports a single VTEP whose configuration is shared between
all VXLAN tunnels.
While the shared configuration is cleared upon the destruction of the
last tunnel - in mlxsw_sp_nve_tunnel_fini() - it is set in
mlxsw_sp_nve_fid_enable(), after calling mlxsw_sp_nve_tunnel_init().
Make tunnel initialization and destruction symmetric and set the
configuration in mlxsw_sp_nve_tunnel_init().
This will later allow us to protect the shared configuration with a
lock.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Follow-up patch is going to allow to reload devlink instance into
different network namespace, so use devlink_net() helper instead
of init_net.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spectrum systems have a configurable limit on how far into the packet they
parse. By default, the limit is 96 bytes.
An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
sequence ID of a PTP event is only available 32 bytes into payload, for a
total of 94 bytes. When an additional 802.1q header is present as
well (such as when ptp4l is running on a VLAN port), the parsing limit is
exceeded. Such packets are not recognized as PTP, and are not timestamped.
Therefore generalize the current VXLAN-specific parsing depth setting to
allow reference-counted requests from other modules as well. Keep it in the
VXLAN module, because the MPRS register also configures UDP destination
port number used for VXLAN, and is thus closely tied to the VXLAN code
anyway.
Then invoke the new interfaces from both VXLAN (in obvious places), as well
as from PTP code, when the (global) timestamping configuration changes from
disabled to enabled or vice versa.
Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Annotate the rejections in mlxsw_sp_switchdev_vxlan_work_prepare() with
textual reasons.
Because this code ends up being invoked for FDB replay as well, drop the
default message from there, so that the more accurate error message is
not overwritten.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A follow-up patch will extend vxlan_fdb_replay() with an extack
argument. Extend the fdb_replay callback in mlxsw likewise so that the
argument is ready for the vxlan conversion.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which
is already a member in the VLAN. In case the configuration of the VXLAN
is not supported, the driver would return -EOPNOTSUPP.
This is problematic since bridge code does not interpret this as error,
but rather that it should try to setup the VLAN using the 8021q driver
instead of switchdev.
Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pointer was NULLed before freeing the memory, resulting in a memory
leak. Trace from kmemleak:
unreferenced object 0xffff88820ae36528 (size 512):
comm "devlink", pid 5374, jiffies 4295354033 (age 10829.296s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a43f5195>] kmem_cache_alloc_trace+0x1be/0x330
[<00000000312f8140>] mlxsw_sp_nve_init+0xcb/0x1ae0
[<0000000009201d22>] mlxsw_sp_init+0x1382/0x2690
[<000000007227d877>] mlxsw_sp1_init+0x1b5/0x260
[<000000004a16feec>] __mlxsw_core_bus_device_register+0x776/0x1360
[<0000000070ab954c>] mlxsw_devlink_core_bus_device_reload+0x129/0x220
[<00000000432313d5>] devlink_nl_cmd_reload+0x119/0x1e0
[<000000003821a06b>] genl_family_rcv_msg+0x813/0x1150
[<00000000d54d04c0>] genl_rcv_msg+0xd1/0x180
[<0000000040543d12>] netlink_rcv_skb+0x152/0x3c0
[<00000000efc4eae8>] genl_rcv+0x2d/0x40
[<00000000ea645603>] netlink_unicast+0x52f/0x740
[<00000000641fca1a>] netlink_sendmsg+0x9c7/0xf50
[<00000000fed4a4b8>] sock_sendmsg+0xbe/0x120
[<00000000d85795a9>] __sys_sendto+0x397/0x620
[<00000000c5f84622>] __x64_sys_sendto+0xe6/0x1a0
Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Several conflicts, seemingly all over the place.
I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.
The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.
The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.
cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.
__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy. Or at least I think it was :-)
Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.
The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Any existing NVE FDB entries need to be offloaded when NVE is enabled
for a given FID. Recent patches have added fdb_replay op for this, so
just invoke it from mlxsw_sp_nve_fid_enable().
When NVE is disabled on a FID, any existing FDB offloaded marks need to
be cleared on NVE device as well as on its bridge master. An op to
handle this, fdb_clear_offload, has been added to FID ops and NVE ops in
previous patches. Add code to resolve the NVE device, NVE type, and
dispatch to both fdb_clear_offload ops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A follow-up patch will add support for replay and for clearing of
offload marks. These are NVE type-sensitive operations, and to be able
to dispatch them properly, a FID needs to know what NVE type is attached
to it.
Therefore, track the NVE type at struct mlxsw_sp_fid. Extend
mlxsw_sp_fid_vni_set() to take it as an argument, and add
mlxsw_sp_fid_nve_type().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible to trigger a warning in mlxsw in case a flood entry which
mlxsw is not aware of is deleted from the VxLAN device. This is because
mlxsw expects to find a singly linked list where the flood entry is
present in.
Fix by removing these warnings for now.
Will re-add them in the next release after we teach mlxsw to ask for a
dump of FDB entries from the VxLAN device, once it is enslaved to a
bridge mlxsw cares about.
Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FDB notifications for entries learned from an NVE tunnel contain the IP
address of the remote VTEP. In the case of IPv4 underlay, the IP address
is specified as-is. IPv6 addresses on the other hand, are specified as
handles which then need to be used to query the actual address from the
device.
Only IPv4 underlay is currently supported, so we cannot receive
notifications for IPv6 addresses and therefore an error is returned when
one tries to resolve such an address.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver periodically polls for new FDB entries learned by the device.
In the case of an FDB entry learned from a VxLAN tunnel, the
notification includes the IP of the remote VTEP, the filtering
identifier (FID) and the source MAC address of the overlay packet.
Assuming learning is enabled in the VxLAN and bridge drivers, the driver
needs to generate a notification and update them about the new FDB
entry.
Store the ifindex of the NVE device in the FID so that the driver will
be able to update the VxLAN and bridge drivers using it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Spectrum ASIC supports different types of NVE encapsulations (e.g.,
VxLAN, NVGRE) with more types to be supported by future ASICs.
Despite being different, all these encapsulations share some common
functionality such as the enablement of NVE encapsulation on a given
filtering identifier (FID) and the addition of remote VTEPs to the
linked-list of VTEPs that traffic should be flooded to.
Implement this common core and allow different ASICs to register
different operations for different encapsulation types.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|