summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2023-11-19net: ethernet: mtk_wed: rely on __dev_alloc_page in mtk_wed_tx_buffer_allocLorenzo Bianconi
Simplify the code and use __dev_alloc_page() instead of __dev_alloc_pages() with order 0 in mtk_wed_tx_buffer_alloc routine Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: ethernet: ti: am65-cpsw: Fix error handling in am65_cpsw_nuss_common_open()Roger Quadros
k3_udma_glue_enable_rx/tx_chn returns error code on failure. Bail out on error while enabling TX/RX channel. In the error path, clean up the RX descriptors and SKBs. Get rid of kmemleak_not_leak() as it seems unnecessary now. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: ethernet: am65-cpsw: Set default TX channels to maximumRoger Quadros
am65-cpsw supports 8 TX hardware queues. Set this as default. The rationale is that some am65-cpsw devices can have up to 4 ethernet ports. If the number of TX channels have to be changed then all interfaces have to be brought down and up as the old default of 1 TX channel is too restrictive for any mqprio/taprio usage. Another reason for this change is to allow testing using kselftest:net/forwarding:ethtool_mm.sh out of the box. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: ethernet: ti: am65-cpsw: Re-arrange functions to avoid forward declarationRoger Quadros
Re-arrange am65_cpsw_nuss_rx_cleanup(), am65_cpsw_nuss_xmit_free() and am65_cpsw_nuss_tx_cleanup() to avoid forward declaration. No functional change. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: ethernet: am65-cpsw: Add standard Ethernet MAC stats to ethtoolRoger Quadros
Gets 'ethtool -S eth0 --groups eth-mac' command to work. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: wangxun: fix kernel panic due to null pointerJiawen Wu
When the device uses a custom subsystem vendor ID, the function wx_sw_init() returns before the memory of 'wx->mac_table' is allocated. The null pointer will causes the kernel panic. Fixes: 79625f45ca73 ("net: wangxun: Move MAC address handling to libwx") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: one by one port representors creation Michal Swiatkowski says: Currently ice supports creating port representors only for VFs. For that use case they can be created and removed in one step. This patchset is refactoring current flow to support port representor creation also for subfunctions and SIOV. In this case port representors need to be created and removed one by one. Also, they can be added and removed while other port representors are running. To achieve that we need to change the switchdev configuration flow. Three first patches are only cosmetic (renaming, removing not used code). Next few ones are preparation for new flow. The most important one is "add VF representor one by one". It fully implements new flow. New type of port representor (for subfunction) will be introduced in follow up patchset. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: reserve number of CP queues ice: adjust switchdev rebuild path ice: add VF representors one by one ice: realloc VSI stats arrays ice: set Tx topology every time new repr is added ice: allow changing SWITCHDEV_CTRL VSI queues ice: return pointer to representor ice: make representor code generic ice: remove VF pointer reference in eswitch code ice: track port representors in xarray ice: use repr instead of vf->repr ice: track q_id in representor ice: remove unused control VSI parameter ice: remove redundant max_vsi_num variable ice: rename switchdev to eswitch ==================== Link: https://lore.kernel.org/r/20231114181449.1290117-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-18Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== igc: Add support for physical + free-running timers Vinicius Costa Gomes says: The objective is to allow having functionality that depends on the physical timer (taprio and ETF offloads, for example) and vclocks operating together. The "big" missing piece is the implementation of the .getcyclesx64() function in igc, as i225/i226 have multiple timers, we use one of those timers (timer 1) as a free-running (non adjustable) timer. The complication is that only implementing .getcyclesx64() and nothing else will break synchronization when using vclocks, as reading the clock will retrieve the free-running value but timnestamps will come from the adjustable timer. The solution is to modify "in one go" the timestamping code to be able to retrieve the timestamp from the correct timer (if a socket is "phc_bound" to a vclock the timestamp will come from the free-running timer). I was debating whether or not to do the adjustments for the internal latencies for the free-running timestamps, decided to do the adjustments so the path delay when using vclocks is similar to the one when using the physical clock. One future improvement is to implement the .getcrosscycles() function. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Add support for PTP .getcyclesx64() igc: Simplify setting flags in the TX data descriptor ==================== Link: https://lore.kernel.org/r/20231114183640.1303163-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-18net: partial revert of the "Make timestamping selectable: seriesJakub Kicinski
Revert following commits: commit acec05fb78ab ("net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask") commit 11d55be06df0 ("net: ethtool: Add a command to expose current time stamping layer") commit bb8645b00ced ("netlink: specs: Introduce new netlink command to get current timestamp") commit d905f9c75329 ("net: ethtool: Add a command to list available time stamping layers") commit aed5004ee7a0 ("netlink: specs: Introduce new netlink command to list available time stamping layers") commit 51bdf3165f01 ("net: Replace hwtstamp_source by timestamping layer") commit 0f7f463d4821 ("net: Change the API of PHY default timestamp to MAC") commit 091fab122869 ("net: ethtool: ts: Update GET_TS to reply the current selected timestamp") commit 152c75e1d002 ("net: ethtool: ts: Let the active time stamping layer be selectable") commit ee60ea6be0d3 ("netlink: specs: Introduce time stamping set command") They need more time for reviews. Link: https://lore.kernel.org/all/20231118183529.6e67100c@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-18r8169: improve RTL8411b phy-down fixupHeiner Kallweit
Mirsad proposed a patch to reduce the number of spinlock lock/unlock operations and the function code size. This can be further improved because the function sets a consecutive register block. Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18Merge tag 'mlx5-updates-2023-11-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2023-11-13 1) Cleanup patches, leftovers from previous cycle 2) Allow sync reset flow when BF MGT interface device is present 3) Trivial ptp refactorings and improvements 4) Add local loopback counter to vport rep stats Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18mlxsw: pci: Implement PCI reset handlersIdo Schimmel
Implement reset_prepare() and reset_done() handlers that are invoked by the PCI core before and after issuing a PCI reset, respectively. Specifically, implement reset_prepare() by calling mlxsw_core_bus_device_unregister() and reset_done() by calling mlxsw_core_bus_device_register(). This is the same implementation as the reload_{down,up}() devlink operations with the following differences: 1. The devlink instance is unregistered and then registered again after the reset. 2. A reset via the device's command interface (using MRSR register) is not issued during reset_done() as PCI core already issued a PCI reset. Tested: # for i in $(seq 1 10); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset; done Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18mlxsw: pci: Add support for new reset flowIdo Schimmel
The driver resets the device during probe and during a devlink reload. The current reset method reloads the current firmware version or a pending one, if one was previously flashed using devlink. However, the current reset method does not result in a PCI hot reset, preventing the PCI firmware from being upgraded, unless the system is rebooted. To solve this problem, a new reset command (6) was implemented in the firmware. Unlike the current command (1), after issuing the new command the device will not start the reset immediately, but only after a PCI hot reset. Implement the new reset method by first verifying that it is supported by the current firmware version by querying the Management Capabilities Mask (MCAM) register. If supported, issue the new reset command (6) via MRSR register followed by a PCI reset by calling __pci_reset_function_locked(). Once the PCI firmware is operational, go back to the regular reset flow and wait for the entire device to become ready. That is, repeatedly read the "system_status" register from the BAR until a value of "FW_READY" (0x5E) appears. Tested: # for i in $(seq 1 10); do devlink dev reload pci/0000:01:00.0; done Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18mlxsw: pci: Move software reset code to a separate functionAmit Cohen
In general, the existing flow of software reset in the driver is: 1. Wait for system ready status. 2. Send MRSR command, to start the reset. 3. Wait for system ready status. This flow will be extended once a new reset command is supported. As a preparation, move step #2 to a separate function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18mlxsw: pci: Rename mlxsw_pci_sw_reset()Amit Cohen
In the next patches, mlxsw_pci_sw_reset() will be extended to support more reset types and will not necessarily issue a software reset. Rename the function to reflect that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18mlxsw: Extend MRSR pack() function to support new commandsAmit Cohen
Currently mlxsw_reg_mrsr_pack() always sets 'command=1'. As preparation for support of new reset flow, pass the command as an argument to the function and add an enum for this field. For now, always pass 'command=1' to the pack() function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18net: Replace hwtstamp_source by timestamping layerKory Maincent
Replace hwtstamp_source which is only used by the kernel_hwtstamp_config structure by the more widely use timestamp_layer structure. This is done to prepare the support of selectable timestamping source. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18net: macb: Convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Kory Maincent
The hardware timestamping through ndo_eth_ioctl() is going away. Convert the macb driver to the new API before that can be removed. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-17gve: add gve_features_check()Eric Dumazet
It is suboptimal to attempt skb linearization from ndo_start_xmit() if a gso skb has pathological layout, or if host stack does not have access to the payload (TCP direct). Linearization of large skbs can also fail under memory pressure. We should instead have an ndo_features_check() so that we can fallback to GSO, which is supported even for TCP direct, and generally much more efficient (no payload copy). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Praveen Kaligineedi <pkaligineedi@google.com> Cc: Shailend Chand <shailend@google.com> Cc: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16stmmac: dwmac-loongson: Add architecture dependencyJean Delvare
Only present the DWMAC_LOONGSON option on architectures where it can actually be used. This follows the same logic as the DWMAC_INTEL option. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Keguang Zhang <keguang.zhang@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16octeon_ep: remove atomic variable usage in Tx data pathShinas Rasheed
Replace atomic variable "instr_pending" which represents number of posted tx instructions pending completion, with host_write_idx and flush_index variables in the xmit and completion processing respectively. Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16octeon_ep: implement xmit_more in transmitShinas Rasheed
Add xmit_more handling in tx datapath for octeon_ep pf. Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16octeon_ep: remove dma sync in trasmit pathShinas Rasheed
Cleanup dma sync calls for scatter gather mappings, since they are coherent allocations and do not need explicit sync to be called. Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16octeon_ep: add padding for small packetsShinas Rasheed
Pad small packets to ETH_ZLEN before transmit, as hardware cannot pad and requires software padding to ensure minimum ethernet frame length. Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-16treewide, spi: Get rid of SPI_MASTER_HALF_DUPLEXAndy Shevchenko
The SPI_MASTER_HALF_DUPLEX is the legacy name of a definition for a half duplex flag. Since all others had been replaced with the respective SPI_CONTROLLER prefix get rid of the last one as well. There is no functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> # for input Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20231113111249.3982461-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-11-15net/mlx5e: Check return value of snprintf writing to fw_version buffer for ↵Rahul Rameshbabu
representors Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. This logic for representors is identical to normal flow with ethtool. Without check, will trigger -Wformat-truncation with W=1. drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 78 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 77 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 79 | fw_rev_maj(mdev), fw_rev_min(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80 | fw_rev_sub(mdev), mdev->board_id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: cf83c8fdcd47 ("net/mlx5e: Add missing ethtool driver info for representors") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Check return value of snprintf writing to fw_version bufferRahul Rameshbabu
Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. Without check, will trigger -Wformat-truncation with W=1. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_ethtool_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:49:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 49 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:48:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 48 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 49 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 50 | fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 51 | mdev->board_id); | ~~~~~~~~~~~~~~~ Fixes: 84e11edb71de ("net/mlx5e: Show board id in ethtool driver information") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Reduce the size of icosq_strSaeed Mahameed
icosq_str size is unnecessarily too long, and it causes a build warning -Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B, hence this patch reduces the size to 32B which should be more than enough to host the string: "ICOSQ: 0x%x, ". While here, add a missing space in the formatted string. This fixes the following build warning: $ KCFLAGS='-Wall -Werror' $ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/ drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout': drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56: error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=] 718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x", | ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288 717 | snprintf(err_str, sizeof(err_str), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x", | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 719 | rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 521f31af004a ("net/mlx5e: Allow RQ outside of channel context") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-14-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Increase size of irq name bufferRahul Rameshbabu
Without increased buffer size, will trigger -Wformat-truncation with W=1 for the snprintf operation writing to the buffer. drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc': drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=] 296 | "%s@pci:%s", name, pci_name(dev->pdev)); | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32 295 | snprintf(irq->name, MLX5_MAX_IRQ_NAME, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 296 | "%s@pci:%s", name, pci_name(dev->pdev)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: ada9f5d00797 ("IB/mlx5: Fix eq names to display nicely in /proc/interrupts") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-13-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Update doorbell for port timestamping CQ before the software counterRahul Rameshbabu
Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the incremented consumer index after the relevant software counters in the kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to either overrunning the device CQ or exceeding the expected software buffer size in the device CQ if the device CQ size was greater than the software buffer size. Update the relevant software counter only after updating the device CQ consumer index in the port timestamping napi_poll context. Log: mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1 mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04 Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Track xmit submission to PTP WQ after populating metadata mapRahul Rameshbabu
Ensure the skb is available in metadata mapping to skbs before tracking the metadata index for detecting undelivered CQEs. If the metadata index is put in the tracking list before putting the skb in the map, the metadata index might be used for detecting undelivered CQEs before the relevant skb is available in the map, which can lead to a null-ptr-deref. Log: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 0 PID: 1243 Comm: kworker/0:2 Not tainted 6.6.0-rc4+ #108 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mlx5e_rx_dim_work [mlx5_core] RIP: 0010:mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core] Code: 8c 24 38 cc ff ff 4c 8d 3c c1 4c 89 f9 48 c1 e9 03 42 80 3c 31 00 0f 85 97 0f 00 00 4d 8b 3f 49 8d 7f 28 48 89 f9 48 c1 e9 03 <42> 80 3c 31 00 0f 85 8b 0f 00 00 49 8b 47 28 48 85 c0 0f 84 05 07 RSP: 0018:ffff8884d3c09c88 EFLAGS: 00010206 RAX: 0000000000000069 RBX: ffff8881160349d8 RCX: 0000000000000005 RDX: ffffed10218f48cf RSI: 0000000000000004 RDI: 0000000000000028 RBP: ffff888122707700 R08: 0000000000000001 R09: ffffed109a781383 R10: 0000000000000003 R11: 0000000000000003 R12: ffff88810c7a7a40 R13: ffff888122707700 R14: dffffc0000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8884d3c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f878dd6e0 CR3: 000000014d108002 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? die_addr+0x3c/0xa0 ? exc_general_protection+0x144/0x210 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core] ? mlx5e_ptp_napi_poll+0x8f6/0x2290 [mlx5_core] __napi_poll.constprop.0+0xa4/0x580 net_rx_action+0x460/0xb80 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? __napi_poll.constprop.0+0x580/0x580 ? tasklet_action_common.isra.0+0x2ef/0x760 __do_softirq+0x26c/0x827 irq_exit_rcu+0xc2/0x100 common_interrupt+0x7f/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:__kmem_cache_alloc_node+0xb/0x330 Code: 41 5d 41 5e 41 5f c3 8b 44 24 14 8b 4c 24 10 09 c8 eb d5 e8 b7 43 ca 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 e4 f0 48 83 RSP: 0018:ffff88812c4079c0 EFLAGS: 00000246 RAX: 1ffffffff083c7fe RBX: ffff888100042dc0 RCX: 0000000000000218 RDX: 00000000ffffffff RSI: 0000000000000dc0 RDI: ffff888100042dc0 RBP: ffff88812c4079c8 R08: ffffffffa0289f96 R09: ffffed1025880ea9 R10: ffff888138839f80 R11: 0000000000000002 R12: 0000000000000dc0 R13: 0000000000000100 R14: 000000000000008c R15: ffff8881271fc450 ? cmd_exec+0x796/0x2200 [mlx5_core] kmalloc_trace+0x26/0xc0 cmd_exec+0x796/0x2200 [mlx5_core] mlx5_cmd_do+0x22/0xc0 [mlx5_core] mlx5_cmd_exec+0x17/0x30 [mlx5_core] mlx5_core_modify_cq_moderation+0x139/0x1b0 [mlx5_core] ? mlx5_add_cq_to_tasklet+0x280/0x280 [mlx5_core] ? lockdep_set_lock_cmp_fn+0x190/0x190 ? process_one_work+0x659/0x1220 mlx5e_rx_dim_work+0x9d/0x100 [mlx5_core] process_one_work+0x730/0x1220 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? max_active_store+0xf0/0xf0 ? assign_work+0x168/0x240 worker_thread+0x70f/0x12d0 ? __kthread_parkme+0xd1/0x1d0 ? process_one_work+0x1220/0x1220 kthread+0x2d9/0x3b0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x70 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse ---[ end trace 0000000000000000 ]--- Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Avoid referencing skb after free-ing in drop path of ↵Rahul Rameshbabu
mlx5e_sq_xmit_wqe When SQ is a port timestamping SQ for PTP, do not access tx flags of skb after free-ing the skb. Free the skb only after all references that depend on it have been handled in the dropped WQE path. Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offloadJianbo Liu
As IPSec packet offload in switchdev mode is not supported with LAG, it's unnecessary to modify those sent-to-vport rules to the peer eswitch. Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Fix pedit endiannessVlad Buslov
Referenced commit addressed endianness issue in mlx5 pedit implementation in ad hoc manner instead of systematically treating integer values according to their types which left pedit fields of sizes not equal to 4 and where the bytes being modified are not least significant ones broken on big endian machines since wrong bits will be consumed during parsing which leads to following example error when applying pedit to source and destination MAC addresses: [Wed Oct 18 12:52:42 2023] mlx5_core 0001:00:00.1 p1v3_r: attempt to offload an unsupported field (cmd 0) [Wed Oct 18 12:52:42 2023] mask: 00000000330c5b68: 00 00 00 00 ff ff 00 00 00 00 ff ff 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 0000000017d22fd9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 000000008186d717: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 0000000029eb6149: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 000000007ed103e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 00000000db8101a6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 00000000ec3c08a9: 00 00 00 00 00 00 00 00 00 00 00 00 ............ Treat masks and values of pedit and filter match as network byte order, refactor pointers to them to void pointers instead of confusing u32 pointers and only cast to pointer-to-integer when reading a value from them. Treat pedit mlx5_fields->field_mask as host byte order according to its type u32, change the constants in fields array accordingly. Fixes: 82198d8bcdef ("net/mlx5e: Fix endianness when calculating pedit mask first bit") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: fix double free of encap_header in update funcsGavin Li
Follow up to the previous patch to fix the same issue for mlx5e_tc_tun_update_header_ipv4{6} when mlx5_packet_reformat_alloc() fails. When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_update_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue. mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header) This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success. Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: fix double free of encap_headerDust Li
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue. mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header) This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail. This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success. Fixes: d589e785baf5e ("net/mlx5e: Allow concurrent creation of encap entries") Reported-by: Cruz Zhao <cruzzhao@linux.alibaba.com> Reported-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Decouple PHC .adjtime and .adjphase implementationsRahul Rameshbabu
When running a phase adjustment operation, the free running clock should not be modified at all. The phase control keyword is intended to trigger an internal servo on the device that will converge to the provided delta. A free running counter cannot implement phase adjustment. Fixes: 8e11a68e2e8a ("net/mlx5: Add adjphase function to support hardware-only offset control") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: DR, Allow old devices to use multi destination FTEErez Shitrit
The current check isn't aware of old devices that don't have the relevant FW capability. This patch allows multi destination FTE in old cards, as it was before this check. Fixes: f6f46e7173cb ("net/mlx5: DR, Add check for multi destination FTE") Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Free used cpus mask when an IRQ is releasedMaher Sanalla
Each EQ table maintains a cpumask of the already used CPUs that are mapped to IRQs to ensure that each IRQ gets mapped to a unique CPU. However, on IRQ release, the said cpumask is not updated by clearing the CPU from the mask to allow future IRQ request, causing the following error when a SF is reloaded after it has utilized all CPUs for its IRQs: mlx5_irq_affinity_request:135:(pid 306010): Didn't find a matching IRQ. err = -28 Thus, when releasing an IRQ, clear its mapped CPU from the used CPUs mask, to prevent the case described above. While at it, move the used cpumask update to the EQ layer as it is more fitting and preserves symmetricity of the IRQ request/release API. Fixes: a1772de78d73 ("net/mlx5: Refactor completion IRQ request/release API") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15Revert "net/mlx5: DR, Supporting inline WQE when possible"Itamar Gozlan
This reverts commit 95c337cce0e11d06a715da73e6796ade9216637f. The revert is required due to the suspicion it cause some tests fail and will be moved to further investigation. Fixes: 95c337cce0e1 ("net/mlx5: DR, Supporting inline WQE when possible") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Remove early assignment to netdev->featuresTariq Toukan
The netdev->features is initialized to netdev->hw_features at a later point in the flow. Remove any redundant earlier assignment. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5e: Add local loopback counter to vport rep statsOr Har-Toov
Add counter for number of unicast, multicast and broadcast packets/ octets that were loopback. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: Query maximum frequency adjustment of the PTP hardware clockRahul Rameshbabu
Some mlx5 devices do not support the default advertised maximum frequency adjustment value for the PTP hardware clock that is set by the driver. These devices need to be queried when initializing the clock functionality in order to get the maximum supported frequency adjustment value. This value can be greater than the minimum supported frequency adjustment across mlx5 devices (50 million ppb). Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: Convert scaled ppm values outside the s32 range for PHC frequency ↵Rahul Rameshbabu
adjustments Represent scaled ppm as ppb to the device when the value in scaled ppm is not representable as a 32-bit signed integer. mlx5 devices only support a 32-bit field for the frequency adjustment value in units of either scaled ppm or ppb. Since mlx5 devices only support a 32-bit field for the frequency adjustment value independent of unit used, limit the maximum frequency adjustment to S32_MAX ppb. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: Initialize clock->ptp_info inside mlx5_init_timer_clockRahul Rameshbabu
Configure the PHC inside mlx5_init_timer_clock for calling mlx5_ptp_settime later in the function. Would previously use mlx5_ptp_clock_info instance to invoke mlx5_ptp_settime to set the NIC real-time clock to be synchronized with the host system clock. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: Refactor real time clock operation checks for PHCRahul Rameshbabu
Check if the MTUTC register of the NIC can be modified before attempting to execute a real-time clock operation. Previous implementation aborted the real-time clock operation pre-emptively when the MTUTC register used to control the real-time clock was not modifiable, indicating real-time clock mode was not enabled on the NIC. The original control flow was confusing since the noop-if-RTC-disabled branch looked similar to an error handling guard clause. The purpose of this patch is purely for improving readability and should lead to no functional change. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5e: Access array with enum values instead of magic numbersGal Pressman
Access the headers array using pedit_cmd enum values, and don't assume anything about their values. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: simplify mlx5_set_driver_version string assignmentsJustin Stitt
In total, just assigning this version string takes: (1) strncpy()'s (5) strlen()'s (3) strncat()'s (1) snprintf()'s (4) max_t()'s Moreover, `strncpy` is deprecated [1] and `strncat` really shouldn't be used either [2]. With this in mind, let's simply use a single `snprintf`. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://elixir.bootlin.com/linux/v6.6-rc5/source/include/linux/fortify-string.h#L448 [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15net/mlx5: Annotate struct mlx5_flow_handle with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mlx5_flow_handle. Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>