summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw
AgeCommit message (Collapse)Author
2023-06-12mlxsw: spectrum_router: Reuse work neighbor initialization in work schedulerPetr Machata
After the struct mlxsw_sp_netevent_work.n field initialization is moved here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost identical to the one in the helper function. Therefore defer to the helper instead of inlining the equivalent. Note that previously, the code took and put a reference of the netdevice. The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for taking the reference. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Use the available router pointer for netevent handlingPetr Machata
This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every time the delay_probe_time changes. mlxsw router currently only maintains one timer, so the last delay_probe_time set wins. Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to the router. This is no longer necessary. But as a side effect, this makes sure that only updates to "interesting netdevices" (ones that have a physical netdevice lower) are projected. Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and punting if there is none. Then just proceed using the router pointer that's already at hand in the helper. Note that previously, the code took and put a reference of the netdevice. Because the mlxsw_sp pointer is now obtained from the notifier block, the port pointer (non-) NULL-ness is all that's relevant, and the reference does not need to be taken anymore. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Pass router to mlxsw_sp_router_schedule_work() directlyPetr Machata
Instead of passing a notifier block and deducing the router pointer from that in the helper, do that in the caller, and pass the result. In the following patches, the pointer will also be made useful in the caller. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Move here inetaddr validator notifiersPetr Machata
The validation logic is already in the router code. Move there the notifier blocks themselves as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: mlxsw_sp_router_fini(): Extract a helper variablePetr Machata
Make mlxsw_sp_router_fini() more similar to the _init() function (and more concise) by extracting the `router' handle to a named variable and using that throughout. The availability of a dedicated `router' variable will come in handy in following patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-08mlxsw: spectrum_nve_vxlan: Fix unsupported flag regressionIdo Schimmel
The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN devices and denotes a behavior that is irrelevant for the hardware data path. Add it to the lists of IPv4 and IPv6 supported flags to avoid rejecting offload of VXLAN devices which have this flag set. Fixes: 69474a8a5837 ("net: vxlan: Add nolocalbypass option to vxlan.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05mlxsw: spectrum_router: Do not query MAX_VRS on each iterationPetr Machata
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Do not query MAX_RIFS on each iterationPetr Machata
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()Petr Machata
In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack further"), the mlxsw_sp_rif_ops.configure callback got a new argument, extack. However the callbacks that deal with tunnel configuration, mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(), were never updated to pass the parameter further. Do that now. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Clarify a commentPetr Machata
"Reserved for X" usually means that only X is supposed to use a given object. Here, it is used in the sense that X should consider the object "reserved", as in "restricted". Replace the comment simply by "X", with the implication that that's where the field is used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-30mlxsw: spectrum_flower: Add ability to match on layer 2 missIdo Schimmel
Add the 'fdb_miss' key element to supported key blocks and make use of it to match on layer 2 miss. The key is only supported on Spectrum-{2,3,4}. An error is returned for Spectrum-1 since the key element is not present in any of its key blocks. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30mlxsw: spectrum_flower: Do not force matching on iifIdo Schimmel
Currently, mlxsw only supports the 'ingress_ifindex' field in the 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add support for the 'l2_miss' field as well. It is valid to only match on 'l2_miss' without 'ingress_ifindex', so do not force matching on it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30mlxsw: spectrum_flower: Split iif parsing to a separate functionIdo Schimmel
Currently, mlxsw only supports the 'ingress_ifindex' field in the 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add support for the 'l2_miss' field as well. Split the parsing of the 'ingress_ifindex' field to a separate function to avoid nesting. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30flow_offload: Reject matching on layer 2 missIdo Schimmel
Adjust drivers that support the 'FLOW_DISSECTOR_KEY_META' key to reject filters that try to match on the newly added layer 2 miss field. Add an extack message to clearly communicate the failure reason to user space. The following users were not patched: 1. mtk_flow_offload_replace(): Only checks that the key is present, but does not do anything with it. 2. mlx5_tc_ct_set_tuple_match(): Used as part of netfilter offload, which does not make use of the new field, unlike tc. 3. get_netdev_from_rule() in nfp: Likewise. Example: # tc filter add dev swp1 egress pref 1 proto all flower skip_sw l2_miss true action drop Error: mlxsw_spectrum: Can't match on "l2_miss". We have an error talking to the kernel Acked-by: Elad Nachman <enachman@marvell.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30devlink: move port_split/unsplit() ops into devlink_port_opsJiri Pirko
Move port_split/unsplit() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30mlxsw_core: register devlink port with opsJiri Pirko
Use newly introduce devlink port registration function variant and register devlink port passing ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Tested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-26Merge tag 'net-next-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
2023-04-25Merge tag 'thermal-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These mostly continue to prepare the thermal control subsystem for using unified representation of trip points, which includes cleanups, code refactoring and similar and update several drivers (for other reasons), which includes new hardware support. Specifics: - Add a thermal zone 'devdata' accessor and modify several drivers to use it (Daniel Lezcano) - Prevent drivers from using the 'device' internal thermal zone structure field directly (Daniel Lezcano) - Clean up the hwmon thermal driver (Daniel Lezcano) - Add thermal zone id accessor and thermal zone type accessor and prevent drivers from using thermal zone fields directly (Daniel Lezcano) - Clean up the acerhdf and tegra thermal drivers (Daniel Lezcano) - Add lower bound check for sysfs input to the x86_pkg_temp_thermal Intel thermal driver (Zhang Rui) - Add more thermal zone device encapsulation: prevent setting structure field directly, access the sensor device instead the thermal zone's device for trace, relocate the traces in drivers/thermal (Daniel Lezcano) - Use the generic trip point for the i.MX and remove the get_trip_temp ops (Daniel Lezcano) - Use the devm_platform_ioremap_resource() in the Hisilicon driver (Yang Li) - Remove R-Car H3 ES1.* handling as public has only access to the ES2 version and the upstream support for the ES1 has been shutdown (Wolfram Sang) - Add a delay after initializing the bank in order to let the time to the hardware to initialze itself before reading the temperature (Amjad Ouled-Ameur) - Add MT8365 support (Amjad Ouled-Ameur) - Preparational cleanup and DT bindings for RK3588 support (Sebastian Reichel) - Add driver support for RK3588 (Finley Xiao) - Use devm_reset_control_array_get_exclusive() for the Rockchip driver (Ye Xingchen) - Detect power gated thermal zones and return -EAGAIN when reading the temperature (Mikko Perttunen) - Remove thermal_bind_params structure as it is unused (Zhang Rui) - Drop unneeded quotes in DT bindings allowing to run yamllint (Rob Herring) - Update the power allocator documentation according to the thermal trace relocation (Lukas Bulwahn) - Fix sensor 1 interrupt status bitmask for the Mediatek LVTS sensor (Chen-Yu Tsai) - Use the dev_err_probe() helper in the Amlogic driver (Ye Xingchen) - Add AP domain support to LVTS thermal controllers for mt8195 (Balsam CHIHI) - Remove buggy call to thermal_of_zone_unregister() (Daniel Lezcano) - Make thermal_of_zone_[un]register() private to the thermal OF code (Daniel Lezcano) - Create a private copy of the thermal zone device parameters structure when registering a thermal zone (Daniel Lezcano) - Fix a kernel NULL pointer dereference in thermal_hwmon (Zhang Rui) - Revert recent message adjustment in thermal_hwmon (Rafael Wysocki) - Use of_property_present() for testing DT property presence in thermal control code (Rob Herring) - Clean up thermal_list_lock locking in the thermal core (Rafael Wysocki) - Add DLVR support for RFIM control in the int340x Intel thermal driver (Srinivas Pandruvada)" * tag 'thermal-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (55 commits) thermal: intel: int340x: Add DLVR support for RFIM control thermal/core: Alloc-copy-free the thermal zone parameters structure thermal/of: Unexport unused OF functions thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195 dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195 thermal: amlogic: Use dev_err_probe() thermal/drivers/mediatek/lvts_thermal: Fix sensor 1 interrupt status bitmask MAINTAINERS: adjust entry in THERMAL/POWER_ALLOCATOR after header movement dt-bindings: thermal: Drop unneeded quotes thermal/core: Remove thermal_bind_params structure thermal/drivers/tegra-bpmp: Handle offline zones thermal/drivers/rockchip: use devm_reset_control_array_get_exclusive() dt-bindings: rockchip-thermal: Support the RK3588 SoC compatible thermal/drivers/rockchip: Support RK3588 SoC in the thermal driver thermal/drivers/rockchip: Support dynamic sized sensor array thermal/drivers/rockchip: Simplify channel id logic thermal/drivers/rockchip: Use dev_err_probe thermal/drivers/rockchip: Simplify clock logic thermal/drivers/rockchip: Simplify getting match data ...
2023-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19mlxsw: pci: Fix possible crash during initializationIdo Schimmel
During initialization the driver issues a reset command via its command interface in order to remove previous configuration from the device. After issuing the reset, the driver waits for 200ms before polling on the "system_status" register using memory-mapped IO until the device reaches a ready state (0x5E). The wait is necessary because the reset command only triggers the reset, but the reset itself happens asynchronously. If the driver starts polling too soon, the read of the "system_status" register will never return and the system will crash [1]. The issue was discovered when the device was flashed with a development firmware version where the reset routine took longer to complete. The issue was fixed in the firmware, but it exposed the fact that the current wait time is borderline. Fix by increasing the wait time from 200ms to 400ms. With this patch and the buggy firmware version, the issue did not reproduce in 10 reboots whereas without the patch the issue is reproduced quite consistently. [1] mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Shutting down cpus with NMI Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: ac004e84164e ("mlxsw: pci: Wait longer before accessing the device after reset") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02mlxsw: core_thermal: Simplify transceiver module get_temp() callbackIdo Schimmel
The get_temp() callback of a thermal zone associated with a transceiver module no longer needs to read the temperature thresholds of the module. Therefore, simplify the callback by only reading the temperature. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02mlxsw: core_thermal: Make mlxsw_thermal_module_init() voidIdo Schimmel
The function can no longer fail so make it void and remove the associated error path. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02mlxsw: core_thermal: Use static trip points for transceiver modulesIdo Schimmel
The driver registers a thermal zone for each transceiver module and tries to set the trip point temperatures according to the thresholds read from the transceiver. If a threshold cannot be read or if a transceiver is unplugged, the trip point temperature is set to zero, which means that it is disabled as far as the thermal subsystem is concerned. A recent change in the thermal core made it so that such trip points are no longer marked as disabled, which lead the thermal subsystem to incorrectly set the associated cooling devices to the their maximum state [1]. A fix to restore this behavior was merged in commit f1b80a3878b2 ("thermal: core: Restore behavior regarding invalid trip points"). However, the thermal maintainer suggested to not rely on this behavior and instead always register a valid array of trip points [2]. Therefore, create a static array of trip points with sane defaults (suggested by Vadim) and register it with the thermal zone of each transceiver module. User space can choose to override these defaults using the thermal zone sysfs interface since these files are writeable. Before: $ cat /sys/class/thermal/thermal_zone11/type mlxsw-module11 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp 65000 75000 80000 After: $ cat /sys/class/thermal/thermal_zone11/type mlxsw-module11 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp 55000 65000 80000 Also tested by reverting commit f1b80a3878b2 ("thermal: core: Restore behavior regarding invalid trip points") and making sure that the associated cooling devices are not set to their maximum state. [1] https://lore.kernel.org/linux-pm/ZA3CFNhU4AbtsP4G@shredder/ [2] https://lore.kernel.org/linux-pm/f78e6b70-a963-c0ca-a4b2-0d4c6aeef1fb@linaro.org/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-31Merge back Intel thermal driver changes for 6.4-rc1.Rafael J. Wysocki
2023-03-27Merge back thermal control material for 6.4-rc1.Rafael J. Wysocki
2023-03-22mlxsw: spectrum_fid: Fix incorrect local port typeIdo Schimmel
Local port is a 10-bit number, but it was mistakenly stored in a u8, resulting in firmware errors when using a netdev corresponding to a local port higher than 255. Fix by storing the local port in u16, as is done in the rest of the code. Fixes: bf73904f5fba ("mlxsw: Add support for 802.1Q FID family") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/eace1f9d96545ab8a2775db857cb7e291a9b166b.1679398549.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-19mlxsw: core_thermal: Fix fan speed in maximum cooling stateIdo Schimmel
The cooling levels array is supposed to prevent the system fans from being configured below a 20% duty cycle as otherwise some of them get stuck at 0 RPM. Due to an off-by-one error, the last element in the array was not initialized, causing it to be set to zero, which in turn lead to fans being configured with a 0% duty cycle in maximum cooling state. Since commit 332fdf951df8 ("mlxsw: thermal: Fix out-of-bounds memory accesses") the contents of the array are static. Therefore, instead of fixing the initialization of the array, simply remove it and adjust thermal_cooling_device_ops::set_cur_state() so that the configured duty cycle is never set below 20%. Before: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 0 After: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 255 This bug was uncovered when the thermal subsystem repeatedly tried to configure the cooling devices to their maximum state due to another issue [1]. This resulted in the fans being stuck at 0 RPM, which eventually lead to the system undergoing thermal shutdown. [1] https://lore.kernel.org/netdev/ZA3CFNhU4AbtsP4G@shredder/ Fixes: a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-15mlxsw: spectrum: Fix incorrect parsing depth after reloadIdo Schimmel
Spectrum ASICs have a configurable limit on how deep into the packet they parse. By default, the limit is 96 bytes. There are several cases where this parsing depth is not enough and there is a need to increase it. For example, timestamping of PTP packets and a FIB multipath hash policy that requires hashing on inner fields. The driver therefore maintains a reference count that reflects the number of consumers that require an increased parsing depth. During reload_down() the parsing depth reference count does not necessarily drop to zero, but the parsing depth itself is restored to the default during reload_up() when the firmware is reset. It is therefore possible to end up in situations where the driver thinks that the parsing depth was increased (reference count is non-zero), when it is not. Fix by making sure that all the consumers that increase the parsing depth reference count also decrease it during reload_down(). Specifically, make sure that when the routing code is de-initialized it drops the reference count if it was increased because of a FIB multipath hash policy that requires hashing on inner fields. Add a warning if the reference count is not zero after the driver was de-initialized and explicitly reset it to zero during initialization for good measures. Fixes: 2d91f0803b84 ("mlxsw: spectrum: Add infrastructure for parsing configuration") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-03thermal: Use thermal_zone_device_type() accessorDaniel Lezcano
Replace the accesses to 'tz->type' by its accessor version in order to self-encapsulate the thermal_zone_device structure. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> #mlxsw Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> #MediaTek LVTS Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03thermal/core: Use the thermal zone 'devdata' accessor in remaining driversDaniel Lezcano
The thermal zone device structure is exposed to the different drivers and obviously they access the internals while that should be restricted to the core thermal code. In order to self-encapsulate the thermal core code, we need to prevent the drivers accessing directly the thermal zone structure and provide accessor functions to deal with. Use the devdata accessor introduced in the previous patch. No functional changes intended. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> #mlxsw Acked-by: Gregory Greenman <gregory.greenman@intel.com> #iwlwifi Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> #power_supply Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> #ahci Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-21Merge tag 'net-next-6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
2023-02-20net/sched: Rename user cookie and act cookiePaul Blakey
struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: core: Register devlink instance before sub-objectsIdo Schimmel
Recent changes made it possible to register the devlink instance before its sub-objects and under the instance lock. Among other things, it allows us to avoid warnings such as this one [1]. The warning is generated because a buggy firmware is generating a health event during driver initialization, before the devlink instance is registered. Move the registration of the devlink instance to the beginning of the initialization flow to avoid such problems. A similar change was implemented in netdevsim in commit 82a3aef2e6af ("netdevsim: move devlink registration under the instance lock"). [1] WARNING: CPU: 3 PID: 49 at net/devlink/leftover.c:7509 devlink_recover_notify.constprop.0+0xaf/0xc0 [...] Call Trace: <TASK> devlink_health_report+0x45/0x1d0 mlxsw_core_health_event_work+0x24/0x30 [mlxsw_core] process_one_work+0x1db/0x390 worker_thread+0x49/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Move devlink param to TCAM codeIdo Schimmel
Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the network namespace of the devlink instance is changed. Specifically, the notifications are generated after calling reload_down(), but before calling reload_up(). At this stage, the data structures accessed while reading the value of the "acl_region_rehash_interval" devlink parameter are uninitialized, resulting in a use-after-free [1]. Fix by moving the registration and unregistration of the devlink parameter to the TCAM code where it is actually used. This means that the parameter is unregistered during reload_down() and then re-registered during reload_up(), avoiding the use-after-free between these two operations. Reproducer: # ip netns add test123 # devlink dev reload pci/0000:06:00.0 netns test123 [1] BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 Read of size 4 at addr ffff888162fd37d8 by task devlink/1323 [...] Call Trace: <TASK> dump_stack_lvl+0x95/0xbd print_report+0x181/0x4a1 kasan_report+0xdb/0x200 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80 devlink_nl_param_fill.constprop.0+0x29a/0x11e0 devlink_param_notify.constprop.0+0xb9/0x250 devlink_notify_unregister+0xbc/0x470 devlink_reload+0x1aa/0x440 devlink_nl_cmd_reload+0x559/0x11b0 genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0 genl_rcv_msg+0x558/0x7f0 netlink_rcv_skb+0x170/0x440 genl_rcv+0x2d/0x40 netlink_unicast+0x53f/0x810 netlink_sendmsg+0x961/0xe80 __sys_sendto+0x2a4/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 7d7e9169a3ec ("devlink: move devlink reload notifications back in between _down() and _up() calls") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Reorder functions to avoid forward declarationsIdo Schimmel
Move the initialization and de-initialization code further below in order to avoid forward declarations in the next patch. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Make fini symmetric to initIdo Schimmel
Move mutex_destroy() to the end to make the function symmetric with mlxsw_sp_acl_tcam_init(). No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Add missing mutex_destroy()Ido Schimmel
Pair mutex_init() with a mutex_destroy() in the error path. Found during code review. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum: Remove pointless call to devlink_param_driverinit_value_set()Danielle Ratson
The "acl_region_rehash_interval" devlink parameter is a "runtime" parameter, making the call to devl_param_driverinit_value_set() pointless. Before cited commit the function simply returned an error (that was not checked), but now it emits a WARNING [1]. Fix by removing the function call. [1] WARNING: CPU: 0 PID: 7 at net/devlink/leftover.c:10974 devl_param_driverinit_value_set+0x8c/0x90 [...] Call Trace: <TASK> mlxsw_sp2_params_register+0x83/0xb0 [mlxsw_spectrum] __mlxsw_core_bus_device_register+0x5e5/0x990 [mlxsw_core] mlxsw_core_bus_device_register+0x42/0x60 [mlxsw_core] mlxsw_pci_probe+0x1f0/0x230 [mlxsw_pci] local_pci_probe+0x1a/0x40 work_for_cpu_fn+0xf/0x20 process_one_work+0x1db/0x390 worker_thread+0x1d5/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Fixes: 85fe0b324c83 ("devlink: make devlink_param_driverinit_value_set() return void") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-30devlink: remove devlink featuresJiri Pirko
Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27devlink: protect devlink param list by instance lockJiri Pirko
Commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance") as the subject implies introduced possibility to register devlink params even for already registered devlink instance. This is a bit problematic, as the consistency or params list was originally secured by the fact it is static during devlink lifetime. So in order to protect the params list, take devlink instance lock during the params operations. Introduce unlocked function variants and use them in drivers in locked context. Put lock assertions to appropriate places. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23Merge back thermal control material for 6.3.Rafael J. Wysocki
2023-01-20mlxsw: Add support of latency TLVAmit Cohen
The latency of each EMAD can be measured by firmware. The driver can get the measurement via latency TLV which can be added to each EMAD. This TLV is optional, when EMAD is sent with this TLV, the EMAD's response will include the TLV and the field 'latency_time' will contain the firmware measurement. This information can be processed using BPF program for example, to create a histogram and average of the latency per register. In addition, it is possible to measure the end-to-end latency, and then reduce firmware measurement, which will result in the latency of the software overhead. This information can be useful to improve the driver performance. Add support for latency TLV by default for all EMADs. First we planned to enable latency TLV per demand, using devlink-param. After some tests, we know that the usage of latency TLV does not impact the end-to-end latency, so it is OK to enable it by default. Note that similar to string TLV, the latency TLV is not supported in all firmware versions. Enable the usage of this TLV only after verifying it is supported by the current firmware version by querying the Management General Information Register (MGIR). Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20mlxsw: core: Define latency TLV fieldsAmit Cohen
The next patch will add support for latency TLV as part of EMAD (Ethernet Management Datagrams) packets. As preparation, add the relevant fields. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20mlxsw: emad: Add support for latency TLVAmit Cohen
The next patches will add support for latency TLV as part of EMAD (Ethernet Management Datagrams) packets. As preparation, add the relevant values. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20mlxsw: core: Do not worry about changing 'enable_string_tlv' while sending EMADsAmit Cohen
Till now, the field 'mlxsw_core->emad.enable_string_tlv' is set as part of mlxsw_sp_init(), this means that it can be changed during emad_reg_access(). To avoid such change, this field is read once in emad_reg_access() and the value is used all the way. The previous patch sets this value according to MGIR output, as part of mlxsw_emad_init(), so now it cannot be changed while sending EMADs. Do not save 'enable_string_tlv' and do not pass it to functions, just pass 'struct mlxsw_core' and use the value directly from it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20mlxsw: Enable string TLV usage according to MGIR outputAmit Cohen
String TLV is not supported by old firmware versions, therefore 'struct mlxsw_core' stores the field 'emad.enable_string_tlv', which is set to true only after firmware version check. Instead of assuming that firmware version check is enough to enable string TLV, a better solution is to query if this TLV is supported from MGIR register. Add such query and initialize 'emad.enable_string_tlv' accordingly. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20mlxsw: reg: Add TLV related fields to MGIR registerAmit Cohen
MGIR (Management General Information Register) allows software to query the hardware and firmware general information. As part of firmware information, the driver can query if string TLV and latency TLV are supported. These TLVs are part of EMAD's header and are used to provide information per EMAD packet to software. Currently, string TLV is already used by the driver, but it does not query if this TLV is supported from MGIR. The next patches will add support of latency TLV. Add the relevant fields to MGIR, so then the driver will query them to know if the TLVs are supported before using them. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: protect health reporter operation with instance lockJiri Pirko
Similar to other devlink objects, protect the reporters list by devlink instance lock. Alongside add unlocked versions of health reporter create/destroy functions and use them in drivers on call paths where the instance lock is held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove linecards lockJiri Pirko
Similar to other devlink objects, convert the linecards list to be protected by devlink instance lock. Alongside with that rename the create/destroy() functions to devl_* to indicate the devlink instance lock needs to be held while calling them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-09mlxsw: spectrum_router: Replace 0-length array with flexible arrayKees Cook
Zero-length arrays are deprecated[1]. Replace struct mlxsw_sp_nexthop_group_info's "nexthops" 0-length array with a flexible array. Detected with GCC 13, using -fstrict-flex-arrays=3: drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_nexthop_group_hash_obj': drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3278:38: warning: array subscript i is outside array bounds of 'struct mlxsw_sp_nexthop[0]' [-Warray-bounds=] 3278 | val ^= jhash(&nh->ifindex, sizeof(nh->ifindex), seed); | ^~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2954:33: note: while referencing 'nexthops' 2954 | struct mlxsw_sp_nexthop nexthops[0]; | ^~~~~~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Ido Schimmel <idosch@nvidia.com> Cc: Petr Machata <petrm@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>