summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2024-01-30mlxsw: spectrum: Search for free LAD ID onceAmit Cohen
Currently, the function mlxsw_sp_lag_index_get() is called twice - first as part of NETDEV_PRECHANGEUPPER event and later as part of NETDEV_CHANGEUPPER. This function will be changed in the next patch. To simplify the code, call it only once as part of NETDEV_CHANGEUPPER event and set an error message using 'extack' in case of failure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30mlxsw: spectrum: Query max_lag onceAmit Cohen
The maximum number of LAGs is queried from core several times. It is used to allocate LAG array, and then to iterate over it. In addition, it is used for PGT initialization. To simplify the code, instead of querying it several times, store the value as part of 'mlxsw_sp' and use it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30mlxsw: spectrum: Remove mlxsw_sp_lag_get()Amit Cohen
A next patch will add mlxsw_sp_lag_{get,put}() functions to handle LAG reference counting and create/destroy it only for first user/last user. Remove mlxsw_sp_lag_get() function and access LAG array directly. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30mlxsw: spectrum: Change mlxsw_sp_upper to LAG structureAmit Cohen
The structure mlxsw_sp_upper is used only as LAG. Rename it to mlxsw_sp_lag and move it to spectrum.c file, as it is used only there. Move the function mlxsw_sp_lag_get() with the structure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30net: stmmac: dwmac-imx: set TSO/TBS TX queues default settingsEsben Haabendal
TSO and TBS cannot coexist. For now we set i.MX Ethernet QOS controller to use the first TX queue with TSO and the rest for TBS. TX queues with TBS can support etf qdisc hw offload. Signed-off-by: Esben Haabendal <esben@geanix.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30net: stmmac: do not clear TBS enable bit on link up/downEsben Haabendal
With the dma conf being reallocated on each call to stmmac_open(), any information in there is lost, unless we specifically handle it. The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc therefore would stop working when link was set down and then back up. Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open") Cc: stable@vger.kernel.org Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: stop destroying and reinitalizing Tx tracker during resetJacob Keller
The ice driver currently attempts to destroy and re-initialize the Tx timestamp tracker during the reset flow. The release of the Tx tracker only happened during CORE reset or GLOBAL reset. The ice_ptp_rebuild() function always calls the ice_ptp_init_tx function which will allocate a new tracker data structure, resulting in memory leaks during PF reset. Certainly the driver should not be allocating a new tracker without removing the old tracker data, as this results in a memory leak. Additionally, there's no reason to remove the tracker memory during a reset. Remove this logic from the reset and rebuild flow. Instead of releasing the Tx tracker, flush outstanding timestamps just before we reset the PHY timestamp block in ice_ptp_cfg_phy_interrupt(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: factor out ice_ptp_rebuild_owner()Jacob Keller
The ice_ptp_reset() function uses a goto to skip past clock owner operations if performing a PF reset or if the device is not the clock owner. This is a bit confusing. Factor this out into ice_ptp_rebuild_owner() instead. The ice_ptp_reset() function is called by ice_rebuild() to restore PTP functionality after a device reset. Follow the convention set by the ice_main.c file and rename this function to ice_ptp_rebuild(), in the same way that we have ice_prepare_for_reset() and ice_ptp_prepare_for_reset(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: rename ice_ptp_tx_cfg_intrJacob Keller
The ice_ptp_tx_cfg_intr() function sends a control queue message to configure the PHY timestamp interrupt block. This is a very similar name to a function which is used to configure the MAC Other Interrupt Cause Enable register. Rename this function to ice_ptp_cfg_phy_interrupt in order to make it more obvious to the reader what action it performs, and distinguish it from other similarly named functions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: don't check has_ready_bitmap in E810 functionsJacob Keller
E810 hardware does not have a Tx timestamp ready bitmap. Don't check has_ready_bitmap in E810-specific functions. Add has_ready_bitmap check in ice_ptp_process_tx_tstamp() to stop relying on the fact that ice_get_phy_tx_tstamp_ready() returns all 1s. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: rename verify_cached to has_ready_bitmapJacob Keller
The tx->verify_cached flag is used to inform the Tx timestamp tracking code whether it needs to verify the cached Tx timestamp value against a previous captured value. This is necessary on E810 hardware which does not have a Tx timestamp ready bitmap. In addition, we currently rely on the fact that the ice_get_phy_tx_tstamp_ready() function returns all 1s for E810 hardware. Instead of introducing a brand new flag, rename and verify_cached to has_ready_bitmap, inverting the relevant checks. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: pass reset type to PTP reset functionsJacob Keller
The ice_ptp_prepare_for_reset() and ice_ptp_reset() functions currently check the pf->flags ICE_FLAG_PFR_REQ bit to determine if the current reset is a PF reset or not. This is problematic, because it is possible that a PF reset and a higher level reset (CORE reset, GLOBAL reset, EMP reset) are requested simultaneously. In that case, the driver performs the highest level reset requested. However, the ICE_FLAG_PFR_REQ flag will still be set. The main driver reset functions take an enum ice_reset_req indicating which reset is actually being performed. Pass this data into the PTP functions and rely on this instead of relying on the driver flags. This ensures that the PTP code performs the proper level of reset that the driver is actually undergoing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30ice: introduce PTP state machineJacob Keller
Add PTP state machine so that the driver can correctly identify PTP state around resets. When the driver got information about ungraceful reset, PTP was not prepared for reset and it returned error. When this situation occurs, prepare PTP before rebuilding its structures. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-29ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()Christophe JAILLET
All error handling paths, except this one, go to 'out' where release_swfw_sync() is called. This call balances the acquire_swfw_sync() call done at the beginning of the function. Branch to the error handling path in order to correctly release some resources in case of error. Fixes: ae14a1d8e104 ("ixgbe: Fix IOSF SB access issues") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-29e1000e: correct maximum frequency adjustment valuesJacob Keller
The e1000e driver supports hardware with a variety of different clock speeds, and thus a variety of different increment values used for programming its PTP hardware clock. The values currently programmed in e1000e_ptp_init are incorrect. In particular, only two maximum adjustments are used: 24000000 - 1, and 600000000 - 1. These were originally intended to be used with the 96 MHz clock and the 25 MHz clock. Both of these values are actually slightly too high. For the 96 MHz clock, the actual maximum value that can safely be programmed is 23,999,938. For the 25 MHz clock, the maximum value is 599,999,904. Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts are incorrectly assigned one of either the 24million or 600million values. For the 24 MHz clock, this is not a significant issue: its current increment value can support an adjustment up to 7billion in the positive direction. However, the 38.4 KHz clock uses an increment value which can only support up to 230,769,157 before it starts overflowing. To understand where these values come from, consider that frequency adjustments have the form of: new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment) The maximum adjustment is reported in terms of parts per billion: new_incval = base_incval + (base_incval * adjustment) / 1 billion The largest possible adjustment is thus given by the following: max_incval = base_incval + (base_incval * max_adj) / 1 billion Re-arranging to solve for max_adj: max_adj = (max_incval - base_incval) * 1 billion / base_incval We also need to ensure that negative adjustments cannot underflow. This can be achieved simply by ensuring max_adj is always less than 1 billion. Introduce new macros in e1000.h codifying the maximum adjustment in PPB for each frequency given its associated increment values. Also clarify where these values come from by commenting about the above equations. Replace the switch statement in e1000e_ptp_init with one which mirrors the increment value switch statement from e1000e_get_base_timinica. For each device, assign the appropriate maximum adjustment based on its frequency. Some parts can have one of two frequency modes as determined by E1000_TSYNCRXCTL_SYSCFI. Since the new flow directly matches the assignments in e1000e_get_base_timinca, and uses well defined macro names, it is much easier to verify that the resulting maximum adjustments are correct. It also avoids difficult to parse construction such as the "hw->mac.type < e1000_phc_lpt", and the use of fallthrough which was especially confusing when combined with a conditional block. Note that I believe the current increment value configuration used for 24MHz clocks is sub-par, as it leaves at least 3 extra bits available in the INCVALUE register. However, fixing that requires more careful review of the clock rate and associated values. Reported-by: Trey Harrison <harrisondigitalmedia@gmail.com> Fixes: 68fe1d5da548 ("e1000e: Add Support for 38.4MHZ frequency") Fixes: d89777bf0e42 ("e1000e: add support for IEEE-1588 PTP") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-29net: fill in MODULE_DESCRIPTION()s for ec_bhfBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Beckhoff CX5020 EtherCAT Ethernet driver. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for cpsw-commonBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the TI CPSW switch module. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for dwmac-socfpgaBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the STMicro DWMAC for Altera SOCs. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for Qualcom driversBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Qualcom rmnet and emac drivers. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for SMSC driversBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the SMSC 91x/911x/9420 Ethernet drivers. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for ocelotBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Ocelot SoCs (VSC7514) helpers driver. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29net: fill in MODULE_DESCRIPTION()s for encx24j600Breno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Microchip ENCX24J600 helpers driver. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29octeontx2-af: Add filter profiles in hardware to extract packet headersSuman Ghosh
This patch adds hardware profile supports for extracting packet headers. It makes sure that hardware is capabale of extracting ICMP, CPT, ERSPAN headers. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-27treewide, serdev: change receive_buf() return type to size_tFrancesco Dolcini
receive_buf() is called from ttyport_receive_buf() that expects values ">= 0" from serdev_controller_receive_buf(), change its return type from ssize_t to size_t. The need for this clean-up was noticed while fixing a warning, see commit 94d053942544 ("Bluetooth: btnxpuart: fix recv_buf() return value"). Changing the callback prototype to return an unsigned seems the best way to document the API and ensure that is properly used. GNSS drivers implementation of serdev receive_buf() callback return directly the return value of gnss_insert_raw(). gnss_insert_raw() returns a signed int, however this is not an issue since the value returned is always positive, because of the kfifo_in() implementation. gnss_insert_raw() could be changed to return also an unsigned, however this is not implemented here as request by the GNSS maintainer Johan Hovold. Suggested-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/all/087be419-ec6b-47ad-851a-5e1e3ea5cfcc@kernel.org/ Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #for-iio Reviewed-by: Johan Hovold <johan@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Alex Elder <elder@linaro.org> Acked-by: Maximilian Luz <luzmaximilian@gmail.com> # for platform/surface Acked-by: Lee Jones <lee@kernel.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240122180551.34429-1-francesco@dolcini.it Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-27net: txgbe: use irq_domain for interrupt controllerJiawen Wu
In the current interrupt controller, the MAC interrupt acts as the parent interrupt in the GPIO IRQ chip. But when the number of Rx/Tx ring changes, the PCI IRQ vector needs to be reallocated. Then this interrupt controller would be corrupted. So use irq_domain structure to avoid the above problem. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-27net: txgbe: move interrupt codes to a separate fileJiawen Wu
In order to change the interrupt response structure, there will be a lot of code added next. Move these interrupt codes to a new file, to make the codes cleaner. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-27net: lan966x: Fix port configuration when using SGMII interfaceHoratiu Vultur
In case the interface between the MAC and the PHY is SGMII, then the bit GIGA_MODE on the MAC side needs to be set regardless of the speed at which it is running. Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-26bnx2x: Fix firmware version string character countsKees Cook
A potential string truncation was reported in bnx2x_fill_fw_str(), when a long bp->fw_ver and a long phy_fw_ver might coexist, but seems unlikely with real-world hardware. Use scnprintf() to indicate the intent that truncations are tolerated. While reading this code, I found a collection of various buffer size counting issues. None looked like they might lead to a buffer overflow with current code (the small buffers are 20 bytes and might only ever consume 10 bytes twice with a trailing %NUL). However, early truncation (due to a %NUL in the middle of the string) might be happening under likely rare conditions. Regardless fix the formatters and related functions: - Switch from a separate strscpy() to just adding an additional "%s" to the format string that immediately follows it in bnx2x_fill_fw_str(). - Use sizeof() universally instead of using unbound defines. - Fix bnx2x_7101_format_ver() and bnx2x_null_format_ver() to report the number of characters written, not including the trailing %NUL (as already done with the other firmware formatting functions). - Require space for at least 1 byte in bnx2x_get_ext_phy_fw_version() for the trailing %NUL. - Correct the needed buffer size in bnx2x_3_seq_format_ver(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401260858.jZN6vD1k-lkp@intel.com/ Cc: Ariel Elior <aelior@marvell.com> Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: Manish Chopra <manishc@marvell.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240126041044.work.220-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-26bnxt_en: Make PTP timestamp HWRM more silentBreno Leitao
commit 056bce63c469 ("bnxt_en: Make PTP TX timestamp HWRM query silent") changed a netdev_err() to netdev_WARN_ONCE(). netdev_WARN_ONCE() is it generates a kernel WARNING, which is bad, for the following reasons: * You do not a kernel warning if the firmware queries are late * In busy networks, timestamp query failures fairly regularly * A WARNING message doesn't bring much value, since the code path is clear. (This was discussed in-depth in [1]) Transform the netdev_WARN_ONCE() into a netdev_warn_once(), and print a more well-behaved message, instead of a full WARN(). bnxt_en 0000:67:00.0 eth0: TS query for TX timer failed rc = fffffff5 [1] Link: https://lore.kernel.org/all/ZbDj%2FFI4EJezcfd1@gmail.com/ Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Fixes: 056bce63c469 ("bnxt_en: Make PTP TX timestamp HWRM query silent") Link: https://lore.kernel.org/r/20240125134104.2045573-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25tsnep: Add link down PHY loopback supportGerhard Engleder
PHY loopback turns off link state change signalling. Therefore, the loopback only works if the link is already up before the PHY loopback is activated. Ensure that PHY loopback works even if the link is not already up during activation by calling netif_carrier_on() explicitly. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://lore.kernel.org/r/20240123200151.60848-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25net: ethernet: mtk_eth_soc: set DMA coherent mask to get PPE workingDaniel Golle
Set DMA coherent mask to 32-bit which makes PPE offloading engine start working on BPi-R4 which got 4 GiB of RAM. Fixes: 2d75891ebc09 ("net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988") Suggested-by: Elad Yifee <eladwf@users.github.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/97e90925368b405f0974b9b15f1b7377c4a329ad.1706113251.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25gve: Modify rx_buf_alloc_fail counter centrally and closer to failureAnkit Garg
Previously, each caller of gve_rx_alloc_buffer had to increase counter and as a result one caller was not tracking those failure. Increasing counters at a common location now so callers don't have to duplicate code or miss counter management. Signed-off-by: Ankit Garg <nktgrg@google.com> Link: https://lore.kernel.org/r/20240124205435.1021490-1-nktgrg@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25nfp: flower: fix hardware offload for the transfer layer portHui Zhou
The nfp driver will merge the tp source port and tp destination port into one dword which the offset must be zero to do hardware offload. However, the mangle action for the tp source port and tp destination port is separated for tc ct action. Modify the mangle action for the FLOW_ACT_MANGLE_HDR_TYPE_TCP and FLOW_ACT_MANGLE_HDR_TYPE_UDP to satisfy the nfp driver offload check for the tp port. The mangle action provides a 4B value for source, and a 4B value for the destination, but only 2B of each contains the useful information. For offload the 2B of each is combined into a single 4B word. Since the incoming mask for the source is '0xFFFF<mask>' the shift-left will throw away the 0xFFFF part. When this gets combined together in the offload it will clear the destination field. Fix this by setting the lower bits back to 0xFFFF, effectively doing a rotate-left operation on the mask. Fixes: 5cee92c6f57a ("nfp: flower: support hw offload for ct nat action") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Hui Zhou <hui.zhou@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20240124151909.31603-3-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25nfp: flower: add hardware offload check for post ct entryHui Zhou
The nfp offload flow pay will not allocate a mask id when the out port is openvswitch internal port. This is because these flows are used to configure the pre_tun table and are never actually send to the firmware as an add-flow message. When a tc rule which action contains ct and the post ct entry's out port is openvswitch internal port, the merge offload flow pay with the wrong mask id of 0 will be send to the firmware. Actually, the nfp can not support hardware offload for this situation, so return EOPNOTSUPP. Fixes: bd0fe7f96a3c ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows") CC: stable@vger.kernel.org # 5.14+ Signed-off-by: Hui Zhou <hui.zhou@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20240124151909.31603-2-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25gve: Fix skb truesize underestimationPraveen Kaligineedi
For a skb frag with a newly allocated copy page, the true size is incorrectly set to packet buffer size. It should be set to PAGE_SIZE instead. Fixes: 82fd151d38d9 ("gve: Reduce alloc and copy costs in the GQ rx path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://lore.kernel.org/r/20240124161025.1819836-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ringGerhard Engleder
The fill ring of the XDP socket may contain not enough buffers to completey fill the RX queue during socket creation. In this case the flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX queue is not completely filled during polling. Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled during XDP socket creation. Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25tsnep: Remove FCS for XDP data pathGerhard Engleder
The RX data buffer includes the FCS. The FCS is already stripped for the normal data path. But for the XDP data path the FCS is included and acts like additional/useless data. Remove the FCS from the RX data buffer also for XDP. Fixes: 65b28c810035 ("tsnep: Add XDP RX support") Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25Merge tag 'mlx5-fixes-2024-01-24' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-01-24 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. * tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: fix a potential double-free in fs_any_create_groups net/mlx5e: fix a double-free in arfs_create_groups net/mlx5e: Ignore IPsec replay window values on sender side net/mlx5e: Allow software parsing when IPsec crypto is enabled net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO net/mlx5: DR, Can't go to uplink vport on RX rule net/mlx5: DR, Use the right GVMI number for drop action net/mlx5: Bridge, fix multicast packets sent to uplink net/mlx5: Fix a WARN upon a callback command failure net/mlx5e: Fix peer flow lists handling net/mlx5e: Fix inconsistent hairpin RQT sizes net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context net/mlx5: Fix query of sd_group field net/mlx5e: Use the correct lag ports number when creating TISes ==================== Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-01-25 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 2 day(s) which contain a total of 13 files changed, 190 insertions(+), 91 deletions(-). The main changes are: 1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which support XDP multi-buffer. The former triggered a NULL pointer dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar. 2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and epilogue for struct_ops programs, from Pu Lehui. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue i40e: set xdp_rxq_info::frag_size xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers ice: remove redundant xdp_rxq_info registration i40e: handle multi-buffer packets that are shrunk by xdp prog ice: work on pre-XDP prog frag count xsk: fix usage of multi-buffer BPF helpers for ZC XDP xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags xsk: recycle buffer in case Rx queue was full riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops ==================== Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25net: fec: fix the unhandled context fault from smmuShenwei Wang
When repeatedly changing the interface link speed using the command below: ethtool -s eth0 speed 100 duplex full ethtool -s eth0 speed 1000 duplex full The following errors may sometimes be reported by the ARM SMMU driver: [ 5395.035364] fec 5b040000.ethernet eth0: Link is Down [ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2 [ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off It is identified that the FEC driver does not properly stop the TX queue during the link speed transitions, and this results in the invalid virtual I/O address translations from the SMMU and causes the context faults. Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-24i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queueMaciej Fijalkowski
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info. Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24i40e: set xdp_rxq_info::frag_sizeMaciej Fijalkowski
i40e support XDP multi-buffer so it is supposed to use __xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the frag_size. It can not be simply converted at existing callsite because rx_buf_len could be un-initialized, so let us register xdp_rxq_info within i40e_configure_rx_ring(), which happen to be called with already initialized rx_buf_len value. Commit 5180ff1364bc ("i40e: use int for i40e_status") converted 'err' to int, so two variables to deal with return codes are not needed within i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status from xdp_rxq_info registration. Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: update xdp_rxq_info::frag_size for ZC enabled Rx queueMaciej Fijalkowski
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info. Use a bigger hammer and instead of unregistering only xdp_rxq_info's memory model, unregister it altogether and register it again and have xdp_rxq_info with correct frag_size value. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24intel: xsk: initialize skb_frag_t::bv_offset in ZC driversMaciej Fijalkowski
Ice and i40e ZC drivers currently set offset of a frag within skb_shared_info to 0, which is incorrect. xdp_buffs that come from xsk_buff_pool always have 256 bytes of a headroom, so they need to be taken into account to retrieve xdp_buff::data via skb_frag_address(). Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from xdp_buff::data_hard_start which would result in overwriting existing payload. Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: remove redundant xdp_rxq_info registrationMaciej Fijalkowski
xdp_rxq_info struct can be registered by drivers via two functions - xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size which in turn will make it possible to grow the packet via bpf_xdp_adjust_tail() BPF helper. Currently, ice registers xdp_rxq_info in two spots: 1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG 2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK Cited commit under fixes tag took care of setting up frag_size and updated registration scheme in 2) but it did not help as 1) is called before 2) and as shown above it uses old registration function. This means that 2) sees that xdp_rxq_info is already registered and never calls __xdp_rxq_info_reg() which leaves us with xdp_rxq_info::frag_size being set to 0. To fix this misbehavior, simply remove xdp_rxq_info_reg() call from ice_setup_rx_ring(). Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24i40e: handle multi-buffer packets that are shrunk by xdp progTirthendu Sarkar
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail() helper function. For multi-buffer packets this may lead to reduction of frag count stored in skb_shared_info area of the xdp_buff struct. This results in issues with the current handling of XDP_PASS and XDP_DROP cases. For XDP_PASS, currently skb is being built using frag count of xdp_buffer before it was processed by XDP prog and thus will result in an inconsistent skb when frag count gets reduced by XDP prog. To fix this, get correct frag count while building the skb instead of using pre-obtained frag count. For XDP_DROP, current page recycling logic will not reuse the page but instead will adjust the pagecnt_bias so that the page can be freed. This again results in inconsistent behavior as the page refcnt has already been changed by the helper while freeing the frag(s) as part of shrinking the packet. To fix this, only adjust pagecnt_bias for buffers that are stillpart of the packet post-xdp prog run. Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: work on pre-XDP prog frag countMaciej Fijalkowski
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a multi-buffer packet by 4k bytes and then redirects it to an AF_XDP socket. Since support for handling multi-buffer frames was added to XDP, usage of bpf_xdp_adjust_tail() helper within XDP program can free the page that given fragment occupies and in turn decrease the fragment count within skb_shared_info that is embedded in xdp_buff struct. In current ice driver codebase, it can become problematic when page recycling logic decides not to reuse the page. In such case, __page_frag_cache_drain() is used with ice_rx_buf::pagecnt_bias that was not adjusted after refcount of page was changed by XDP prog which in turn does not drain the refcount to 0 and page is never freed. To address this, let us store the count of frags before the XDP program was executed on Rx ring struct. This will be used to compare with current frag count from skb_shared_info embedded in xdp_buff. A smaller value in the latter indicates that XDP prog freed frag(s). Then, for given delta decrement pagecnt_bias for XDP_DROP verdict. While at it, let us also handle the EOP frag within ice_set_rx_bufs_act() to make our life easier, so all of the adjustments needed to be applied against freed frags are performed in the single place. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24xsk: make xsk_buff_pool responsible for clearing xdp_buff::flagsMaciej Fijalkowski
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is used by drivers to notify data path whether xdp_buff contains fragments or not. Data path looks up mentioned flag on first buffer that occupies the linear part of xdp_buff, so drivers only modify it there. This is sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on stack or it resides within struct representing driver's queue and fragments are carried via skb_frag_t structs. IOW, we are dealing with only one xdp_buff. ZC mode though relies on list of xdp_buff structs that is carried via xsk_buff_pool::xskb_list, so ZC data path has to make sure that fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise, xsk_buff_free() could misbehave if it would be executed against xdp_buff that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can take place when within supplied XDP program bpf_xdp_adjust_tail() is used with negative offset that would in turn release the tail fragment from multi-buffer frame. Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would result in releasing all the nodes from xskb_list that were produced by driver before XDP program execution, which is not what is intended - only tail fragment should be deleted from xskb_list and then it should be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never make it up to user space, so from AF_XDP application POV there would be no traffic running, however due to free_list getting constantly new nodes, driver will be able to feed HW Rx queue with recycled buffers. Bottom line is that instead of traffic being redirected to user space, it would be continuously dropped. To fix this, let us clear the mentioned flag on xsk_buff_pool side during xdp_buff initialization, which is what should have been done right from the start of XSK multi-buffer support. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24net: fill in MODULE_DESCRIPTION()s for rvu_mboxBreno Leitao
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Marvel RVU mbox driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-11-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>