summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2024-04-10net/mlx5: E-switch, store eswitch pointer before registering devlink_paramShay Drory
Next patch will move devlink register to be first. Therefore, whenever mlx5 will register a param, the user will be notified. In order to notify the user, devlink is using the get() callback of the param. Hence, resources that are being used by the get() callback must be set before the devlink param is registered. Therefore, store eswitch pointer inside mdev before registering the param. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10mlxsw: spectrum_ethtool: Add support for 100Gb/s per lane link modesIdo Schimmel
The Spectrum-4 ASIC supports 100Gb/s per lane link modes, but the only one currently supported by the driver is 800Gb/s over eight lanes. Add support for 100Gb/s over one lane, 200Gb/s over two lanes and 400Gb/s over four lanes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/1d77830f6abcc4f0d57a7f845e5a6d97a75a434b.1712667750.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08mlx5/core: Support max_io_eqs for a functionParav Pandit
Implement get and set for the maximum IO event queues for SF and VF. This enables administrator on the hypervisor to control the maximum IO event queues which are typically used to derive the maximum and default number of net device channels or rdma device completion vectors. Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-05net/mlx5e: Implement ethtool hardware timestamping statisticsRahul Rameshbabu
Feed driver statistics counters related to hardware timestamping to standardized ethtool hardware timestamping statistics group. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-5-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Introduce timestamps statistic counter for Tx DMA layerRahul Rameshbabu
Count number of transmitted packets that were hardware timestamped at the device DMA layer. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-4-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQRahul Rameshbabu
Track the number of times a CQE was expected to not be delivered on PTP Tx port timestamping CQ. A CQE is expected to not be delivered if a certain amount of time passes since the corresponding CQE containing the DMA timestamp information has arrived. Increment the late_cqe counter when such a CQE does manage to be delivered to the CQ. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-3-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Un-expose functions in en.hTariq Toukan
Un-expose functions that are not used outside of their c file. Make them static. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240404173357.123307-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Support FEC settings for 100G/lane modesCosmin Ratiu
This consists of: 1. Expose the 100G/lane capability bit in the PCAM reg. 2. Expose the per link mode FEC capability masks in the PPLM reg. 3. Set the overrides according to ethtool parameters. FEC for new modes is set if and only if the PCAM 100G/lane capability is advertised and the capability mask for a given link mode reports that it can accept the requested FEC mode. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Extract checking of FEC support for a link modeCosmin Ratiu
The check of whether a given FEC mode is supported in a given link mode is about to get more complicated, so extract it in a separate function to avoid code duplication. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240404173357.123307-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Store DQ pointer as part of CQ structureAmit Cohen
Currently, for each completion, we check the number of descriptor queue and take it via mlxsw_pci_{sdq,rdq}_get(). This is inefficient, the DQ should be the same for all the completions in CQ, as each CQ handles only one DQ - SDQ or RDQ. This mapping is handled as part of DQ initialization via mlxsw_cmd_mbox_sw2hw_dq_cq_set(). Instead, as part of DQ initialization, set DQ pointer in the appropriate CQ structure. When we handle completions, warn in case that the DQ number that we expect is different from the number we get in the CQE. Call WARN_ON_ONCE() only after checking the value, to avoid calling this method for each completion. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/a5b2559cd6d532c120f3194f89a1e257110318f1.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Remove mlxsw_pci_cq_count()Amit Cohen
Currently, for each interrupt we call mlxsw_pci_cq_count() to determine the number of CQs. This call makes additional two function's calls. This can be removed by storing this value as part of structure 'mlxsw_pci', as we already do for number of SDQs. Remove the function and __mlxsw_pci_queue_count() which is now not used and store the value instead. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f08ad113e8160678f3c8d401382a696c6c7f44c7.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Remove mlxsw_pci_sdq_count()Amit Cohen
The number of SDQs is stored as part of 'mlxsw_pci' structure. In some cases, the driver uses this value and in some cases it calls mlxsw_pci_sdq_count() to get the value. Align the code to use the stored value. This simplifies the code and makes it clearer that the value is always the same. Rename 'mlxsw_pci->num_sdq_cqs' to 'mlxsw_pci->num_sdqs' as now it is used not only in CQ context. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/0c8788506d9af35d589dbf64be35a508fd63d681.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Break mlxsw_pci_cq_tasklet() into tasklets per queue typeAmit Cohen
Completion queues are used for completions of RDQ or SDQ. Each completion queue is used for one DQ. The first CQs are used for SDQs and the rest are used for RDQs. Currently, for each CQE (completion queue element), we check 'sr' value (send/receive) to know if it is completion of RDQ or SDQ. Actually, we do not really have to check it, as according to the queue number we know if it handles completions of Rx or Tx. Break the tasklet into two - one for Rx (RDQ) and one for Tx (SDQ). Then, setup the appropriate tasklet for each queue as part of queue initialization. Use 'sr' value for unlikely case that we get completion with type that we do not expect. Call WARN_ON_ONCE() only after checking the value, to avoid calling this method for each completion. A next patch set will use NAPI to handle events, then we will have a separate poll method for Rx and Tx. This change is a preparation for NAPI usage. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/50fbc366f8de54cb5dc72a7c4f394333ef71f1d0.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Make style change in mlxsw_pci_cq_tasklet()Amit Cohen
This function will be broken into several functions later. As preparation, reorder variables to reverse xmas tree. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7170a8f4429ecb5a539b0374c621697778ff8363.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Remove unused wait queueAmit Cohen
The previous patch changed the code to do not handle command interface from event queue. With this change the wait queue is not used anymore. Remove it and 'wait_done' variable. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f3af6a5a9dabd97d2920cefe475c6aa57767f504.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Use only one event queueAmit Cohen
The device supports two event queues. EQ0 is used for command interface completion events. EQ1 is used for completion events of RDQ or SDQ. Currently, for each EQE (event queue element), we check the queue number and handle accordingly. More than that, for each interrupt we schedule tasklets for both EQs. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini, when EMADs are not available. There is no point to schedule the tasklet for it and check each EQE. A previous patch changed the code to poll command interface for each use of it. It means that now there is no real reason to use EQ0, as we poll the command interface. Initialize only one event queue and use it as EQ1 (this is determined by queue number). Then, for each interrupt we can schedule the tasklet only for one queue and we do not have to check the queue number. This simplifies the code and should improve performance. Note that polling command interface is ok as we use it only as part of driver init/fini. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/23d764f5c032e4c363b98590b746a4b32d2bf900.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Rename MLXSW_PCI_EQS_COUNTAmit Cohen
Currently we use MLXSW_PCI_EQS_COUNT event queues. A next patch will change the driver to initialize only EQ1, as EQ0 is not required anymore when we poll command interface. Rename the macro to MLXSW_PCI_EQS_MAX as later we will not initialize the maximum supported EQs, this value represents the maximum and a new macro will be added to represent the actual used queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b08df430b62f23ca1aa3aaa257896d2d95aa7691.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Poll command interface for each cmd_exec()Amit Cohen
Command interface is used for configuring and querying FW when EMADs are not available. During the time that the driver sets up the asynchronous queues, it polls the command interface for getting completions. Then, there is a short period when asynchronous queues work, but EMADs are not available (marked in the code as nopoll = true). During this time, we send commands via command interface, but we do not poll it, as we can get an interrupt for the completion. Completions of command interface are received from HW in EQ0 (event queue 0). The usage of EQ0 instead of polling is done only 4 times during initialization and one time during tear down, but it makes an overhead during lifetime of the driver. For each interrupt, we have to check if we get events in EQ0 or EQ1 and handle them. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini. Instead, we can poll command interface for each call of cmd_exec(). It means that when we send a command via command interface (as EMADs are not available), we will poll it, regardless of availability of the asynchronous queues. This will allow us to configure later only EQ1 and simplify the flow. Remove 'nopoll' indication and change mlxsw_pci_cmd_exec() to poll till answer/timeout regardless of queues' state. For now, completions are handled also by EQ0, but it will be removed in next patch. Additional cleanups will be added in next patches. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e674c70380ceda953e0e45a77334c5d22e69938f.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Make style changes in mlxsw_pci_eq_tasklet()Amit Cohen
This function will be used later only for EQ1. As preparation, reorder variables to reverse xmas tree and return earlier when it is possible, to simplify the code. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/2412d6c135b2a6aedb4484f5d8baab3aecd7b9ae.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Remove unused countersAmit Cohen
The structure 'mlxsw_pci_queue' stores several counters which were consumed via debugfs. Since commit 9a32562becd9 ("mlxsw: Remove debugfs interface"), these counters are not used. Remove them. This makes the 'union u' and 'struct eq' redundant. Maintain 'struct cq' as it will be extended later. Replace increasing 'q->u.eq.ev_other_count' with WARN_ON_ONCE(), as it is used in an unreasonable case of receiving event in EQ which is not EQ0 or EQ1. When the queues are initialized, we check number of event queues and fail with the print "Unsupported number of queues" in case that the driver tries to initialize more than two queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ee9e658800aa0390e08342100bc27daff4c176c0.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Arm CQ doorbell regardless of number of completionsAmit Cohen
Currently, as part of mlxsw_pci_cq_tasklet(), we check if any item was handled, and only in such case we arm doorbell. This is unlikely case, as we schedule tasklet only for CQs that we get an event for them, which means that they contain completions to handle. Remove this check, which is supposed to be true always, and even if it is false, it is not a mistake to ring the doorbell. We can warn on such case, but it is not really worth to add a check which will be run for each CQ handling when we do not expect to reach it and it does not point to logic error that should be handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f8efa481bfe7bebb9f93bb803f44ab7da77f53e6.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Do not setup tasklet from operationAmit Cohen
Currently, the structure 'mlxsw_pci_queue_ops' holds a pointer to the callback function of tasklet. This is used only for EQ and CQ. mlxsw driver will use NAPI in a following patch set, so CQ will not use tasklet anymore. As preparation, remove this pointer from the shared operation structure and setup the tasklet as part of queue initialization. For now, setup tasklet for EQ and CQ. Later, CQ code will be changed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/a326cae5fc1ad085a1a063c004983de6fe389414.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Move mlxsw_pci_cq_{init, fini}()Amit Cohen
Move mlxsw_pci_cq_{init, fini}() after mlxsw_pci_cq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/25196cb5baf5acf6ec1e956203790e018ba8e306.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03mlxsw: pci: Move mlxsw_pci_eq_{init, fini}()Amit Cohen
Move mlxsw_pci_eq_{init, fini}() after mlxsw_pci_eq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ae120a02e1c490084daae7e684a0d40b7cce4e7.1712062203.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5: Don't call give_pages() if request 0 pageJianbo Liu
Firmware will return 0 on query BOOT/INIT PAGES for non-page supplier functions (external host PF/VF/SF), so no page is needed to be allocated for them. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5: Skip pages EQ creation for non-page supplier functionJianbo Liu
Page events are not issued by device on the function if page_request_disable is set, so no need to create pages EQ. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5: Support matching on l4_type for ttc_tableJianbo Liu
Replace matching on TCP and UDP protocols with new l4_type field which is parsed by steering for ttc_table. It is enabled by the outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table capabilities and used only if pcc_ifa2 bit in HCA capabilities is set. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: Add support for 800Gbps link modesGal Pressman
Add support for 800Gbps speed, link modes of 100Gbps per lane. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5: Convert uintX_t to uXGal Pressman
In the kernel, the preferred types are uX. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: XDP, Fix an inconsistent commentCarolina Jubran
Starting from commit eb9b9fdcafe2 ("net/mlx5e: Introduce extended version for mlx5e_xmit_data") sinfo is no longer passed as an argument to mlx5e_xmit_xdp_frame(), the comment is inconsistent. check_result must be zero when the packet is fragmented. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: debugfs, Add reset option for command interface statsTariq Toukan
Resetting stats just before some test/debug case allows us to eliminate out the impact of previous commands. Useful in particular for the average latency calculation. The average_write() callback was unreachable, as "average" is a read-only file. Extend, rename, and use it for a newly exposed write-only "reset" file. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: Make stats group fill_stats callbacks consistent with the APIGal Pressman
The fill_strings() callbacks were changed to accept a **data pointer, and not rely on propagating the index value. Make a similar change to fill_stats() callbacks to keep the API consistent. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: Use ethtool_sprintf/puts() to fill stats stringsGal Pressman
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Change the fill_strings callback to accept a **data pointer, and remove the index and return value. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: Use ethtool_sprintf/puts() to fill selftests stringsGal Pressman
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. The int return value in mlx5e_self_test_fill_strings() is not removed as it is still used to return the number of selftests. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/mlx5e: Use ethtool_sprintf/puts() to fill priv flags stringsGal Pressman
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01ip_tunnel: use a separate struct to store tunnel params in the kernelAlexander Lobakin
Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure to store params inside the kernel, IPv4 tunnel code uses the same ip_tunnel_parm which is being used to talk with the userspace. This makes it difficult to alter or add any fields or use a different format for whatever data. Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for now, and use it throughout the code. Define the pieces, where the copy user <-> kernel happens, as standalone functions, and copy the data there field-by-field, so that the kernel-side structure could be easily modified later on and the users wouldn't have to care about this. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29netlink: introduce type-checking attribute iterationJohannes Berg
There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29mlx5: stop warning for 64KB pagesArnd Bergmann
When building with 64KB pages, clang points out that xsk->chunk_size can never be PAGE_SIZE: drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:19:22: error: result of comparison of constant 65536 with expression of type 'u16' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (xsk->chunk_size > PAGE_SIZE || ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~ In older versions of this code, using PAGE_SIZE was the only possibility, so this would have never worked on 64KB page kernels, but the patch apparently did not address this case completely. As Maxim Mikityanskiy suggested, 64KB chunks are really not all that useful, so just shut up the warning by adding a cast. Fixes: 282c0c798f8e ("net/mlx5e: Allow XSK frames smaller than a page") Link: https://lore.kernel.org/netdev/20211013150232.2942146-1-arnd@kernel.org/ Link: https://lore.kernel.org/lkml/a7b27541-0ebb-4f2d-bd06-270a4d404613@app.fastmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240328143051.1069575-9-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29mlx5: avoid truncating error messageArnd Bergmann
clang warns that one error message is too long for its destination buffer: drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c:1876:4: error: 'snprintf' will always be truncated; specified size is 80, but format string expands to at least 94 [-Werror,-Wformat-truncation-non-kprintf] Reword it to be a bit shorter so it always fits. Fixes: 70f0302b3f20 ("net/mlx5: Bridge, implement mdb offload") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20240326223825.4084412-5-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29mlxbf_gige: stop interface during shutdownDavid Thompson
The mlxbf_gige driver intermittantly encounters a NULL pointer exception while the system is shutting down via "reboot" command. The mlxbf_driver will experience an exception right after executing its shutdown() method. One example of this exception is: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000 [0000000000000070] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S OE 5.15.0-bf.6.gef6992a #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023 pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] sp : ffff8000080d3c10 x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58 x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008 x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128 x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7 x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404 x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080 x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] __napi_poll+0x40/0x1c8 net_rx_action+0x314/0x3a0 __do_softirq+0x128/0x334 run_ksoftirqd+0x54/0x6c smpboot_thread_fn+0x14c/0x190 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002) ---[ end trace 7cc3941aa0d8e6a4 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x4ce722520000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x000005c1,a3330e5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- During system shutdown, the mlxbf_gige driver's shutdown() is always executed. However, the driver's stop() method will only execute if networking interface configuration logic within the Linux distribution has been setup to do so. If shutdown() executes but stop() does not execute, NAPI remains enabled and this can lead to an exception if NAPI is scheduled while the hardware interface has only been partially deinitialized. The networking interface managed by the mlxbf_gige driver must be properly stopped during system shutdown so that IFF_UP is cleared, the hardware interface is put into a clean state, and NAPI is fully deinitialized. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20240325210929.25362-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26mlxbf_gige: call request_irq() after NAPI initializedDavid Thompson
The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic. The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20240325183627.7641-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25mlxbf_gige: stop PHY during open() error pathsDavid Thompson
The mlxbf_gige_open() routine starts the PHY as part of normal initialization. The mlxbf_gige_open() routine must stop the PHY during its error paths. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-13Merge tag 'thermal-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These mostly change the thermal core in a few ways allowing thermal drivers to be simplified, in particular in their removal and failing probe handling parts that are notoriously prone to errors, and propagate the changes to several drivers. Apart from that, support for a new platform is added (Intel Lunar Lake-M), some bugs are fixed and some code is cleaned up, as usual. Specifics: - Store zone trips table and zone operations directly in struct thermal_zone_device (Rafael Wysocki) - Fix up flex array initialization during thermal zone device registration (Nathan Chancellor) - Rework writable trip points handling in the thermal core and several drivers (Rafael Wysocki) - Thermal core code cleanups (Dan Carpenter, Flavio Suligoi) - Use thermal zone accessor functions in the int340x Intel thermal driver (Rafael Wysocki) - Add Lunar Lake-M PCI ID to the int340x Intel thermal driver (Srinivas Pandruvada) - Minor fixes for thermal governors (Rafael Wysocki, Di Shen) - Trip point handling fixes for the iwlwifi wireless driver (Rafael Wysocki) - Code cleanups (Rafael J. Wysocki, AngeloGioacchino Del Regno)" * tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits) thermal: core: remove unnecessary check in trip_point_hyst_store() thermal: intel: int340x_thermal: Use thermal zone accessor functions thermal: core: Remove excess empty line from a comment thermal: int340x: processor_thermal: Add Lunar Lake-M PCI ID thermal: core: Eliminate writable trip points masks thermal: of: Set THERMAL_TRIP_FLAG_RW_TEMP directly thermal: imx: Set THERMAL_TRIP_FLAG_RW_TEMP directly wifi: iwlwifi: mvm: Set THERMAL_TRIP_FLAG_RW_TEMP directly mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly thermal: intel: Set THERMAL_TRIP_FLAG_RW_TEMP directly thermal: core: Drop the .set_trip_hyst() thermal zone operation thermal: core: Add flags to struct thermal_trip thermal: core: Move initial num_trips assignment before memcpy() thermal: Get rid of CONFIG_THERMAL_WRITABLE_TRIPS thermal: intel: Adjust ops handling during thermal zone registration thermal: ACPI: Constify acpi_thermal_zone_ops thermal: core: Store zone ops in struct thermal_zone_device thermal: intel: Discard trip tables after zone registration thermal: ACPI: Discard trips table after zone registration thermal: core: Store zone trips table in struct thermal_zone_device ...
2024-03-11mlxsw: spectrum_router: Share nexthop counters in resilient groupsPetr Machata
For resilient groups, we can reuse the same counter for all the buckets that share the same nexthop. Keep a reference count per counter, and keep all these counters in a per-next hop group xarray, which serves as a NHID->counter cache. If a counter is already present for a given NHID, just take a reference and use the same counter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/cdd00084533fc83ac5917562f54642f008205bf3.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Support nexthop group hardware statisticsPetr Machata
When hw_stats is set on a group, install nexthop counters on members of a group. Counter allocation request is moved from nexthop object initialization to the update code. The previous placement made sense: when the counters are enabled by dpipe, the counters are installed to all existing nexthops and all nexthops created from then on get them. For the finer-grained nexthop group statistics, this is unsuitable. The existing placement was kept for the IPv4 and IPv6 nexthops. Resilient group replacement emits a pre_replace notification, and then any bucket_replace notifications if there were any replacements at all. If the group is balanced and the nexthop composition of the replaced group didn't change, there will be no such notifiers. Therefore hook to the pre_replace notifier and mark all buckets for update, to un/install the counters. When reporting deltas for resilient groups, use the nexthop ID that we stored in a previous patch to look up to which nexthop a bucket contributes. Co-developed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/87495a72f187df2e5d491d02729c550d235fcc85.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Track NH ID's of group membersPetr Machata
The core interfaces for collecting per-NH statistics are built around nexthops even for resilient groups. Because mlxsw models each bucket as a nexthop, the core next hop that a given bucket contributes to needs to be looked up. In order to be able to match the two up, we need to track nexthop ID for members of group nexthop objects. For simplicity, do it for all nexthop objects, not just group members. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/184ceb6b154e08f5bcf116a705b0fcb01c31895c.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Add helpers for nexthop countersPetr Machata
The next patch will add the ability to share nexthop counters among mlxsw nexthops backed by the same core nexthop. To have a place to store reference count, the counter should be kept in a dedicated structure. In this patch, introduce the structure together with the related helpers, sans the refcount, which comes in the next patch. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/61f23fa4f8c5d7879f68dacd793d8ab7425f33c0.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum_router: Avoid allocating NH counters twicePetr Machata
mlxsw_sp_nexthop_counter_disable() decays to a nop when called on a disabled counter, but mlxsw_sp_nexthop_counter_enable() can't similarly be called on an enabled counter. This would be useful in the following patches. Add the missing condition. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/0cc9050e196366c1387ab5ee47f1cee8ecde9c86.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>