Age | Commit message (Collapse) | Author |
|
Start using the previously introduced sampling triggers hash table to
store sampling parameters instead of storing them as attributes of the
sampled port.
This makes it easier to introduce new sampling triggers.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, mlxsw supports a single sampling trigger type (i.e., received
packet). When sampling is configured on an ingress port, the sampling
parameters (e.g., pointer to the psample group) are stored as an
attribute of the port, so that they could be passed to
psample_sample_packet() when a sampled packet is trapped to the CPU.
Subsequent patches are going to add more types of sampling triggers,
making it difficult to maintain the current scheme.
Instead, store all the active sampling triggers with their associated
parameters in a hash table. That way, more trigger types can be easily
added.
The next patch will flip mlxsw to use the hash table instead of the
current scheme.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The entry will be required by the next patches, so pass it. No
functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Push some sampling checks to the per-ASIC operations, as they are no
longer relevant for all ASICs.
The sampling rate validation against the MPSC maximum rate is only
relevant for Spectrum-1, as Spectrum-2 and later ASICs no longer use
MPSC register for sampling.
The ingress / egress validation is pushed down to the per-ASIC
operations since subsequent patches are going to remove it for
Spectrum-2 and later ASICs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Due to the differences between Spectrum-1 and later ASICs, some of the
checks currently performed at the common code (where extack is
available) will need to be pushed to the per-ASIC operations.
As a preparation, propagate extack further to maintain proper error
reporting.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We want to have any kind of name for the cooling devices as we do no
longer want to rely on auto-numbering. Let's replace the cooling
device's fixed array by a char pointer to be allocated dynamically
when registering the cooling device, so we don't limit the length of
the name.
Rework the error path at the same time as we have to rollback the
allocations in case of error.
Tested with a dummy device having the name:
"Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
A village on the island of Anglesey (Wales), known to have the longest
name in Europe.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20210314111333.16551-1-daniel.lezcano@linaro.org
|
|
Make use of the previously added metadata and report it to the psample
module. The metadata is read from the skb's control block, which was
initialized by the bus driver (i.e., 'mlxsw_pci') after decoding the
packet's Completion Queue Element (CQE).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function resolves the psample sampling group from the Rx port
because this is the only form of sampling the driver currently supports.
Subsequent patches are going to add support for Tx-based and
policy-based sampling, in which case the sampling group would not be
resolved from the Rx port.
Therefore, move this code to the Rx-specific sampling listener.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 7d8e8f3433dc ("mlxsw: core: Increase scope of RCU read-side
critical section"), all Rx handlers are called from an RCU read-side
critical section.
Remove the unnecessary rcu_read_lock() / rcu_read_unlock().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packets that are mirrored / sampled to the CPU have extra metadata
encoded in their corresponding Completion Queue Element (CQE). Retrieve
this metadata from the CQE and set it in the skb control block so that
it could be accessed by the switch driver (i.e., 'mlxsw_spectrum').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Next patch will need to encode more Rx metadata in the skb control
block, so create a dedicated field for it and move the cookie index
there.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Completion Queue Element version 2 (CQEv2) includes various metadata
fields for packets that are mirrored / sampled to the CPU.
Add these fields so that they could be used by a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, callers of psample_sample_packet() pass three metadata
attributes: Ingress port, egress port and truncated size. Subsequent
patches are going to add more attributes (e.g., egress queue occupancy),
which also need an indication whether they are valid or not.
Encapsulate packet metadata in a struct in order to keep the number of
arguments reasonable.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drivers
A follow-up patch will allow users to configures packet-per-second policing
in the software datapath. In preparation for this, teach all drivers that
support offload of the policer action to reject such configuration as
currently none of them support it.
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spectrum-2 and later ASICs support sampling of packets by mirroring to
the CPU with probability. There are several advantages compared to the
legacy dedicated sampling mechanism:
* Extra metadata per-packet: Egress port, egress traffic class, traffic
class occupancy and end-to-end latency
* Ability to sample packets on egress / per-flow
Convert Spectrum-2 and later ASICs to perform sampling by mirroring to
the CPU with probability.
Subsequent patches will add support for egress / per-flow sampling and
expose the extra metadata.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sampling of ingress packets is supported using a dedicated sampling
mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs
support more sophisticated sampling by mirroring packets to the CPU.
As a preparation for more advanced sampling configurations, split the trap
configuration used for sampled packets between Spectrum-1 and later ASICs.
This is needed since packets that are mirrored to the CPU are trapped
via a different trap identifier compared to packets that are sampled
using the dedicated sampling mechanism.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sampling of ingress packets is supported using a dedicated sampling
mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs
support more sophisticated sampling by mirroring packets to the CPU.
As a preparation for more advanced sampling configurations, split the
sampling operations between Spectrum-1 and later ASICs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, every packet that matches a mirroring trigger (e.g., received
packets, buffer dropped packets) is mirrored. Spectrum-2 and later ASICs
support mirroring with probability, where every 1 in N matched packets
is mirrored.
Extend the API that creates the binding between the trigger and the SPAN
agent with a probability rate parameter, which is an attribute of the
trigger. Set it to '1' to maintain existing behavior.
Subsequent patches will use it to perform more sophisticated sampling,
by mirroring packets to the CPU with probability.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MPAR and MPAGR registers are used to configure the binding between
the mirroring trigger (e.g., received packet) and the SPAN agent. Add
probability rate field, which will allow us to support sampling by
mirroring to the CPU.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When packets are mirrored to the CPU, the trap identifier with which the
packets are trapped is determined according to the session identifier of
the SPAN agent performing the mirroring. Packets that are trapped for
the same logical reason (e.g., buffer drops) should use the same session
identifier.
Currently, a single session is implicitly supported (identifier 0) and
is used for packets that are mirrored to the CPU due to buffer drops
(e.g., early drop).
Subsequent patches are going to mirror packets to the CPU due to
sampling, which will require a different session identifier.
Prepare for that by making the session identifier an attribute of the
SPAN agent.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MFDE.irisc_id and MFDE.event_id were adjusted according to what is
actually implemented in firmware.
Adjust the shift and size of these fields in mlxsw as well.
Note that the displacement of the first field is not a regression.
It was always incorrect and therefore reported "0".
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the MFDE.log_ip field to devlink health reporter in order to ease
firmware debug. This field encodes the instruction pointer that triggered
the CR space timeout.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend MFDE (Monitoring FW Debug) register with new field specifying the
instruction pointer that triggered the CR space timeout.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The indicated version fixes the following two issues:
- MIRROR_SAMPLER_ACTION.mirror_probability_rate inverted. This has
implication for per-flow sampling.
- When adjacency is replaced-if-inactive (RATR.opcode=3), bad parameter
was reported when replacing an active entry. This breaks offload of
resilient next-hop groups.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comment did not include the register name.
Add `pmaos` to align the comment with other comments.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'Uppers' is not clear enough for all users when referring to upper
devices.
Reword the error message so it will be clearer.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Routes are currently processed from a workqueue whereas nexthop objects
are processed in system call context. This can result in the driver not
finding a suitable nexthop group for a route and issuing a warning [1].
Fix this by ignoring such routes earlier in the process. The subsequent
deletion notification will be ignored as well.
[1]
WARNING: CPU: 2 PID: 7754 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4853 mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
[...]
CPU: 2 PID: 7754 Comm: kworker/u8:0 Not tainted 5.11.0-rc6-cq-20210207-1 #16
Hardware name: Mellanox Technologies Ltd. MSN2100/SA001390, BIOS 5.6.5 05/24/2018
Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib_event_work [mlxsw_spectrum]
RIP: 0010:mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
Fixes: cdd6cfc54c64 ("mlxsw: spectrum_router: Allow programming routes with nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Alex Veber <alexve@nvidia.com>
Tested-by: Alex Veber <alexve@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, only external bits are added to the PTYS register, whereas
there is one external bit that is wrongly marked as internal, and so was
recently removed from the register.
Add that bit to the PTYS register again, as this bit is no longer
internal.
Its removal resulted in '100000baseLR4_ER4/Full' link mode no longer
being supported, causing a regression on some setups.
Fixes: 5bf01b571cf4 ("mlxsw: spectrum_ethtool: Remove internal speeds from PTYS register")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Eddie Shklaer <eddies@nvidia.com>
Tested-by: Eddie Shklaer <eddies@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.
This patch also changes drivers to make use of the "mask" field for edge
detection when possible.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion
fails, FIB abort is triggered.
After aborting, set the appropriate hardware flag to make the kernel emit
RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.
The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.
To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.
With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.
This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.
'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.
The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.
To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.
With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.
This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.
'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when user space queries the link's parameters, as speed and
duplex, each parameter is passed from the driver to ethtool.
Instead, pass the link mode bit in use.
In Spectrum-1, simply pass the bit that is set to '1' from PTYS register.
In Spectrum-2, pass the first link mode bit in the mask of the used
link mode.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when auto negotiation is set to off, the user can force a
specific speed or both speed and duplex. The user cannot influence the
number of lanes that will be forced.
Add support for setting speed along with lanes so one would be able
to choose how many lanes will be forced.
When lanes parameter is passed from user space, choose the link mode
that its actual width equals to it.
Otherwise, the default link mode will be the one that supports the width
of the port.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when a speed can be supported by different number of lanes,
the supported link modes bitmask contains only link modes with a single
number of lanes.
This was done in order to prevent auto negotiation on number of
lanes after 50G-1-lane and 100G-2-lanes link modes were introduced.
For example, if a port's max width is 4, only link modes with 4 lanes
will be presented as supported by that port, so 100G is always achieved by
4 lanes of 25G.
After the previous patches that allow selection of the number of lanes,
auto negotiation on number of lanes becomes practical.
Remove that filtering of the maximum number of lanes supported link modes,
so indeed all the supported and advertised link modes will be shown.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the next patch mlxsw and netdevsim will fail in compilation if
CONFIG_IPV6 is disabled.
Do not call fib6_info_hw_flags_set() when IPv6 is disabled.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.
To know sysctl values, net struct is needed.
This change is consistent with the IPv4 version, which gets 'net' struct
as its first argument.
Currently, the only callers of this function are mlxsw and netdevsim.
Patch the callers to pass net.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.
As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
drivers/net/can/dev.c
b552766c872f ("can: dev: prevent potential information leak in can_fill_info()")
3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
0a042c6ec991 ("can: dev: move netlink related code into seperate file")
Code move.
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
214baf22870c ("net/mlx5e: Support HTB offload")
Adjacent code changes
net/switchdev/switchdev.c
20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers")
bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes")
Transaction parameter gets dropped otherwise keep the fix.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The purpose of the delayed work in the SPAN module is to potentially
update the destination port and various encapsulation parameters of SPAN
agents that point to a VLAN device or a GRE tap. The destination port
can change following the insertion of a new route, for example.
SPAN agents that point to a physical port or the CPU port are static and
never change throughout the lifetime of the SPAN agent. Therefore, skip
over them in the delayed work.
This fixes an issue where the delayed work overwrites the policer
that was set on a SPAN agent pointing to the CPU. Modifying the delayed
work to inherit the original policer configuration is error-prone, as
the same will be needed for any new parameter.
Fixes: 4039504e6a0c ("mlxsw: spectrum_span: Allow setting policer on a SPAN agent")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The switch ASIC has a limited capacity of physical ('flavour physical'
in devlink terminology) ports that it can support. While each system is
brought up with a different number of ports, this number can be
increased via splitting up to the ASIC's limit.
Expose physical ports as a devlink resource so that user space will have
visibility to the maximum number of ports that can be supported and the
current occupancy.
In addition, add a "Generic Resources" section in devlink-resource
documentation so the different drivers will be aligned by the same resource
name when exposing to user space.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'mlxsw_pci_queue_init()' and
'mlxsw_pci_fw_area_init()' GFP_KERNEL can be used because both functions
are already using this flag and no lock is acquired.
When memory is allocated in 'mlxsw_pci_mbox_alloc()' GFP_KERNEL can be used
because it is only called from the probe function and no lock is acquired
in the between.
The call chain is:
--> mlxsw_pci_probe()
--> mlxsw_pci_cmd_init()
--> mlxsw_pci_mbox_alloc()
While at it, also replace the 'dma_set_mask/dma_set_coherent_mask' sequence
by a less verbose 'dma_set_mask_and_coherent() call.
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20210114084757.490540-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As of commit 457e20d65924 ("mlxsw: spectrum_switchdev: Avoid returning
errors in commit phase"), the mlxsw driver performs the VLAN object
offloading during the prepare phase. So conversion just seems to be a
matter of removing the code that was running in the commit phase.
Ido Schimmel explains that the reason why mlxsw_sp_span_respin is called
unconditionally is because the bridge driver will ignore -EOPNOTSUPP and
actually add the VLAN on the bridge device - see commit 9c86ce2c1ae3
("net: bridge: Notify about bridge VLANs") and commit ea4721751977
("mlxsw: spectrum_switchdev: Ignore bridge VLAN events"). Since the VLAN
was successfully added on the bridge device, mlxsw_sp_span_respin_work()
should be able to resolve the egress port for a packet that is mirrored
to a gre tap and passes through the bridge device. Therefore keep the
logic as it is.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.
In part, this patch contains a revert of my previous commit 2e554a7a5d8a
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").
For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
done mechanically, by pasting the implementation twice, then only
keeping the code that would get executed during prepare phase on top,
then only keeping the code that gets executed during the commit phase
on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
multicast router could be converted right away. But the ageing time
could not, so a shim was introduced and this was left for a further
commit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.
Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.
Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.
Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The call path of a switchdev VLAN addition to the bridge looks something
like this today:
nbp_vlan_init
| __br_vlan_set_default_pvid
| | |
| | br_afspec |
| | | |
| | v |
| | br_process_vlan_info |
| | | |
| | v |
| | br_vlan_info |
| | / \ /
| | / \ /
| | / \ /
| | / \ /
v v v v v
nbp_vlan_add br_vlan_add ------+
| ^ ^ | |
| / | | |
| / / / |
\ br_vlan_get_master/ / v
\ ^ / / br_vlan_add_existing
\ | / / |
\ | / / /
\ | / / /
\ | / / /
\ | / / /
v | | v /
__vlan_add /
/ | /
/ | /
v | /
__vlan_vid_add | /
\ | /
v v v
br_switchdev_port_vlan_add
The ranges UAPI was introduced to the bridge in commit bdced7ef7838
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:
br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink
Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b9359e ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.
Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c3d83 ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add
(the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about
bridge VLANs")).
That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.
Let's go off and finish the job of commit 29ab586c3d83 by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60dc28e
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).
Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.
And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.
Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Increase critical threshold for ASIC thermal zone from 110C to 140C
according to the system hardware requirements. All the supported ASICs
(Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC
temperature below 140C. With the old critical threshold value system
can perform unjustified shutdown.
All the systems equipped with the above ASICs implement thermal
protection mechanism at firmware level and firmware could decide to
perform system thermal shutdown in case the temperature is below 140C.
So with the new threshold system will not meltdown, while thermal
operating range will be aligned with hardware abilities.
Fixes: 41e760841d26 ("mlxsw: core: Replace thermal temperature trips with defines")
Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.
This is a rare scenario, but it was observed at a customer site.
Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|