summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw
AgeCommit message (Collapse)Author
2022-07-20mlxsw: spectrum_router: Fix IPv4 nexthop gateway indicationIdo Schimmel
mlxsw needs to distinguish nexthops with a gateway from connected nexthops in order to write the former to the adjacency table of the device. The check used to rely on the fact that nexthops with a gateway have a 'link' scope whereas connected nexthops have a 'host' scope. This is no longer correct after commit 747c14307214 ("ip: fix dflt addr selection for connected nexthop"). Fix that by instead checking the address family of the gateway IP. This is a more direct way and also consistent with the IPv6 counterpart in mlxsw_sp_rt6_is_gateway(). Cc: stable@vger.kernel.org Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.Kuniyuki Iwashima
While reading sysctl_fib_multipath_hash_fields, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.Kuniyuki Iwashima
While reading sysctl_fib_multipath_hash_policy, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15ip: Fix data-races around sysctl_ip_fwd_update_priority.Kuniyuki Iwashima
While reading sysctl_ip_fwd_update_priority, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-30mlxsw: spectrum_router: Fix rollback in tunnel next hop initPetr Machata
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When that function results in an error, the next hop will not have been removed from the linked list. As the error is propagated upwards and the caller frees the next hop object, the linked list ends up holding an invalid object. A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback block does exist, however does not include the linked list removal. Both IPv6 and IPv4 next hops have a similar issue with next-hop counter rollbacks. As these were introduced in the same patchset as the next hop linked list, include the cleanup in this patch. Fixes: dbe4598c1e92 ("mlxsw: spectrum_router: Keep nexthops in a linked list") Fixes: a5390278a5eb ("mlxsw: spectrum: Add support for setting counters on nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-14mlxsw: spectrum_cnt: Reorder counter poolsPetr Machata
Both RIF and ACL flow counters use a 24-bit SW-managed counter address to communicate which counter they want to bind. In a number of Spectrum FW releases, binding a RIF counter is broken and slices the counter index to 16 bits. As a result, on Spectrum-2 and above, no more than about 410 RIF counters can be effectively used. This translates to 205 netdevices for which L3 HW stats can be enabled. (This does not happen on Spectrum-1, because there are fewer counters available overall and the counter index never exceeds 16 bits.) Binding counters to ACLs does not have this issue. Therefore reorder the counter allocation scheme so that RIF counters come first and therefore get lower indices that are below the 16-bit barrier. Fixes: 98e60dce4da1 ("Merge branch 'mlxsw-Introduce-initial-Spectrum-2-support'") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220613125017.2018162-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12mlxsw: Avoid warning during ip6gre device removalAmit Cohen
IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: <TASK> mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-08mlxsw: spectrum_router: Take router lock in router notifier handlerPetr Machata
For notifications that the router needs to handle, router lock is taken. Further, at least to determine whether an event is related to a tunnel underlay, router lock also needs to be taken. Due to this, the router lock is always taken for each unhandled event, and also for some handled events, even if they are not related to underlay. Thus each event implies at least one router lock, sometimes two. Instead of deferring the locking to the leaf handlers, take the lock in the router notifier handler always. This simplifies thinking about the locking state, and in some cases saves one lock cycle. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Update a commentPetr Machata
The position of netdevice notifier registration no longer depends on the router initialization, because the event handler no longer dispatches to the router code. Update the comment at the registration to that effect. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of tunnel events to router codePetr Machata
The events related to IPIP tunnels are handled by the router code. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of router events to router codePetr Machata
The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU have implications for in-ASIC router interface objects, and as such are handled in the router module. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of HW stats events to router codePetr Machata
L3 HW stats are implemented in mlxsw as RIF counters, and therefore the code resides in spectrum_router. Exclude the offload xstats events from the mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the glue code in the router module. Previously, the order of dispatch was that for events on tunnels, a dedicated handler was called, which however did not handle HW stats events. But there is nothing special about tunnel devices as far as HW stats: there is a RIF associated with the tunnel netdevice, and that RIF is where the counter should be installed. Therefore now, HW stats events are tested first, independent of netdevice type. The upshot is that as of this commit, mlxsw supports L3 HW stats work on GRE tunnels. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of VRF events to router codePetr Machata
Events involving VRF, as L3 concern, are handled in the router code, by the helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked from the centralized dispatcher in spectrum.c. Instead, move the call to the newly-introduced router-specific notifier handler. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum_router: Add a dedicated notifier blockPetr Machata
Currently all netdevice events are handled in the centralized notifier handler maintained by spectrum.c. Since a number of events are involving router code, spectrum.c needs to dispatch them to spectrum_router.c. The spectrum module therefore needs to know more about the router code than it should have, and there is are several API points through which the two modules communicate. To simplify the notifier handlers, introduce a new notifier into the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Tolerate enslaving of various devices to VRFPetr Machata
Enslaving netdevices to VRF is currently handled through an mlxsw_sp_is_vrf_event() conditional in mlxsw_sp_netdevice_event(). In the following patch sets, VRF enslavement will be handled purely in the router code. Therefore make handlers of NETDEV_PRECHANGEUPPER tolerant of enslaving to VRF, so that they do not bounce the change. For NETDEV_CHANGEUPPER, drop the WARN_ON(1) and bounce from mlxsw_sp_netdevice_port_vlan_event(). This is the only handler that warned and bounces even in the CHANGEUPPER code, other handler quietly do nothing when they encounter an unfamiliar upper. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-05Revert "Merge branch 'mlxsw-line-card-model'"Jakub Kicinski
This reverts commit 5e927a9f4b9f29d78a7c7d66ea717bb5c8bbad8e, reversing changes made to cfc1d91a7d78cf9de25b043d81efcc16966d55b3. The discussion is still ongoing so let's remove the uAPI until the discussion settles. Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/ Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04mlxsw: spectrum_router: Only query neighbour activity when necessaryIdo Schimmel
The driver periodically queries the device for activity of neighbour entries in order to report it to the kernel's neighbour table. Avoid unnecessary activity query when no neighbours are installed. Use an atomic variable to track the number of neighbours, as it is read without any locks. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04mlxsw: spectrum_switchdev: Only query FDB notifications when necessaryIdo Schimmel
The driver periodically queries the device for FDB notifications (e.g., learned, aged-out) in order to update the bridge driver. These notifications can only be generated when bridges are offloaded to the device. Avoid unnecessary queries by starting to query upon installation of the first bridge and stop querying upon removal of the last bridge. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04mlxsw: spectrum_acl: Do not report activity for multicast routesIdo Schimmel
The driver periodically queries the device for activity of ACL rules in order to report it to tc upon 'FLOW_CLS_STATS'. In Spectrum-2 and later ASICs, multicast routes are programmed as ACL rules, but unlike rules installed by tc, their activity is of no interest. Avoid unnecessary activity query for such rules by always reporting them as inactive. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04mlxsw: Treat LLDP packets as controlPetr Machata
When trapping packets for on-CPU processing, Spectrum machines differentiate between control and non-control traps. Traffic trapped through non-control traps is treated as data and kept in shared buffer in pools 0-4. Traffic trapped through control traps is kept in the dedicated control buffer 9. The advantage of marking traps as control is that pressure in the data plane does not prevent the control traffic to be processed. When the LLDP trap was introduced, it was marked as a control trap. But then in commit aed4b5721143 ("mlxsw: spectrum: PTP: Hook into packet receive path"), PTP traps were introduced. Because Ethernet-encapsulated PTP packets look to the Spectrum-1 ASIC as LLDP traffic and are trapped under the LLDP trap, this trap was reconfigured as non-control, in sync with the PTP traps. There is however no requirement that PTP traffic be handled as data. Besides, the usual encapsulation for PTP traffic is UDP, not bare Ethernet, and that is in deployments that even need PTP, which is far less common than LLDP. This is reflected by the default policer, which was not bumped up to the 19Kpps / 24Kpps that is the expected load of a PTP-enabled Spectrum-1 switch. Marking of LLDP trap as non-control was therefore probably misguided. In this patch, change it back to control. Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04mlxsw: spectrum_dcb: Do not warn about priority changesPetr Machata
The idea behind the warnings is that the user would get warned in case when more than one priority is configured for a given DSCP value on a netdevice. The warning is currently wrong, because dcb_ieee_getapp_mask() returns the first matching entry, not all of them, and the warning will then claim that some priority is "current", when in fact it is not. But more importantly, the warning is misleading in general. Consider the following commands: # dcb app flush dev swp19 dscp-prio # dcb app add dev swp19 dscp-prio 24:3 # dcb app replace dev swp19 dscp-prio 24:2 The last command will issue the following warning: mlxsw_spectrum3 0000:07:00.0 swp19: Ignoring new priority 2 for DSCP 24 in favor of current value of 3 The reason is that the "replace" command works by first adding the new value, and then removing all old values. This is the only way to make the replacement without causing the traffic to be prioritized to whatever the chip defaults to. The warning is issued in response to adding the new priority, and then no warning is shown when the old priority is removed. The upshot is that the canonical way to change traffic prioritization always produces a warning about ignoring the new priority, but what gets configured is in fact what the user intended. An option to just emit warning every time that the prioritization changes just to make it clear that it happened is obviously unsatisfactory. Therefore, in this patch, remove the warnings. Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-03mlxsw: Configure descriptor buffersPetr Machata
Spectrum machines have two resources related to keeping packets in an internal buffer: bytes (allocated in cell-sized units) for packet payload, and descriptors, for keeping metadata. Currently, mlxsw only configures the bytes part of the resource management. Spectrum switches permit a full parallel configuration for the descriptor resources, including port-pool and port-TC-pool quotas. By default, these are all configured to use pool 14, with an infinite quota. The ingress pool 14 is then infinite in size. However, egress pool 14 has finite size by default. The size is chip dependent, but always much lower than what the chip actually permits. As a result, we can easily construct workloads that exhaust the configured descriptor limit. Fix the issue by configuring the egress descriptor pool to be infinite in size as well. This will maintain the configuration philosophy of the default configuration, but will unlock all chip resources to be usable. In the code, include both the configuration of ingress and ingress, mostly for clarity. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03mlxsw: reg: Add "desc" field to SBPRPetr Machata
SBPR, or Shared Buffer Pools Register, configures and retrieves the shared buffer pools and configuration. The desc field determines whether the configuration relates to the byte pool or the descriptor pool. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-25mlxsw: core_linecards: Expose device FW version over device infoJiri Pirko
Extend MDDQ to obtain FW version of line card device and implement device_info_get() op to fill up the info with that. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25mlxsw: reg: Extend MDDQ device_info by FW version fieldsJiri Pirko
Add FW version fields to MDDQ device_info. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25mlxsw: core_linecards: Expose HW revision and INI versionJiri Pirko
Implement info_get() to expose HW revision of a linecard and loaded INI version. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25mlxsw: core_linecards: Probe provisioned line cards for devices and attach themJiri Pirko
In case the line card is provisioned, go over all possible existing devices (gearboxes) on it and attach them, so devlink core is aware of them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25mlxsw: reg: Extend MDDQ by device_infoJiri Pirko
Extend existing MDDQ register by possibility to query information about devices residing on a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-22mlxsw: core_linecards: Fix size of array element during ini_files allocationJiri Pirko
types_info->ini_files is an array of pointers to struct mlxsw_linecard_ini_file. Fix the kmalloc_array() argument to be of a size of a pointer. Addresses-Coverity: ("Incorrect expression (SIZEOF_MISMATCH)") Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220420142007.3041173-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
drivers/net/ethernet/microchip/lan966x/lan966x_main.c d08ed852560e ("net: lan966x: Make sure to release ptp interrupt") c8349639324a ("net: lan966x: Add FDMA functionality") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-20mlxsw: core_hwmon: Add interfaces for line card initialization and ↵Vadim Pasternak
de-initialization Add callback functions for line card 'hwmon' initialization and de-initialization. Each line card is associated with the relevant 'hwmon' device, which may contain thermal attributes for the cages and gearboxes found on this line card. The line card 'hwmon' initialization / de-initialization APIs are to be called when line card is set to active / inactive state by got_active() / got_inactive() callbacks from line card state machine. For example cage temperature for module #9 located at line card #7 will be exposed by utility 'sensors' like: linecard#07 front panel 009: +32.0C (crit = +70.0C, emerg = +80.0C) And temperature for gearbox #3 located at line card #5 will be exposed like: linecard#05 gearbox 003: +41.0C (highest = +41.0C) Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_thermal: Add interfaces for line card initialization and ↵Vadim Pasternak
de-initialization Add callback functions for line card thermal area initialization and de-initialization. Each line card is associated with the relevant thermal area, which may contain thermal zones for cages and gearboxes found on this line card. The line card thermal initialization / de-initialization APIs are to be called when line card is set to active / inactive state by got_active() / got_inactive() callbacks from line card state machine. For example thermal zone for module #9 located at line card #7 will have type: mlxsw-lc7-module9. And thermal zone for gearbox #2 located at line card #5 will have type: mlxsw-lc5-gearbox2. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_env: Add interfaces for line card initialization and ↵Vadim Pasternak
de-initialization Netdevs for ports found on line cards are registered upon provisioning. However, user space is not allowed to access the transceiver modules found on a line card until the line card becomes active. Therefore, register event operations with the line card core to get notifications whenever a line card becomes active or inactive. When user space tries to dump the EEPROM of a transceiver module or reset it and the corresponding line card is inactive, emit an error message: ethtool -m enp1s0nl7p9 netlink error: mlxsw_core: Cannot read EEPROM of module on an inactive line card netlink error: Input/output error When user space tries to set the power mode policy of such a transceiver, cache the configuration and apply it when the line card becomes active. This is consistent with other port configuration (e.g., MTU setting) that user space is able to perform while the line card is provisioned, but inactive. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_env: Split module power mode setting to a separate functionVadim Pasternak
Move the code that applies the module power mode to the device to a separate function. This function will be invoked by the next patch to set the power mode on transceiver modules found on a line card when the line card becomes active. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core: Add bus argument to environment init APIVadim Pasternak
Pass bus argument to mlxsw_env_init(). The purpose is to get access to device handle, which is to be provided to error message in case of line card activation failure. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_linecards: Introduce ops for linecards status change trackingJiri Pirko
Introduce an infrastructure allowing users to register a set of operations which are to be called whenever a line card gets active/inactive. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Add port to linecard mappingJiri Pirko
For each port get slot_index using PMLP register. For ports residing on a linecard, identify it with the linecard by setting mapping using devlink_port_linecard_set() helper. Use linecard slot index for PMTDB register queries. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core: Extend driver ops by remove selected ports opJiri Pirko
In case of line card implementation, the core has to have a way to remove relevant ports manually. Extend the Spectrum driver ops by an op that implements port removal of selected ports upon request. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core_linecards: Implement line card activation processJiri Pirko
Allow to process events generated upon line card getting "ready" and "active". When DSDSC event with "ready" bit set is delivered, that means the line card is powered up. Use MDDC register to push the line card to active state. Once FW is done with that, the DSDSC event with "active" bit set is delivered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core_linecards: Add line card objects and implement provisioningJiri Pirko
Introduce objects for line cards and an infrastructure around that. Use devlink_linecard_create/destroy() to register the line card with devlink core. Implement provisioning ops with a list of supported line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management Binary Code Transfer RegisterJiri Pirko
The MBCT register allows to transfer binary INI codes from the host to the management FW by transferring it by chunks of maximum 1KB. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management DownStream Device Control RegisterJiri Pirko
The MDDC register allows to control downstream devices and line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management DownStream Device Query RegisterJiri Pirko
The MDDQ register allows to query the DownStream device properties. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Introduce port mapping change event processingJiri Pirko
Register PMLPE trap and process the port mapping changes delivered by it by creating related ports. Note that this happens after provisioning. The INI of the linecard is processed and merged by FW. PMLPE is generated for each port. Process this mapping change. Layout of PMLPE is the same as layout of PMLP. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: Narrow the critical section of devl_lock during ports creation/removalJiri Pirko
No need to hold the lock for alloc and freecpu. So narrow the critical section. Follow-up patch is going to benefit from this by adding more code to the functions which will be out of the critical as well. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Ports Mapping Event Configuration RegisterJiri Pirko
The PMECR register is used to enable/disable event triggering in case of local port mapping change. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Allocate port mapping array of structs instead of pointersJiri Pirko
Instead of array of pointers to port mapping structures, allocate the array of structures directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Allow lane to start from non-zero indexJiri Pirko
So far, the lane index always started from zero. That is not true for modular systems with gearbox-equipped linecards. Loose the check so the lanes can start from non-zero index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15net: Handle l3mdev in ip_tunnel_init_flowDavid Ahern
Ido reported that the commit referenced in the Fixes tag broke a gre use case with dummy devices. Add a check to ip_tunnel_init_flow to see if the oif is an l3mdev port and if so set the oif to 0 to avoid the oif comparison in fib_lookup_good_nhc. Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>