linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-10-08	i40e: Fix macvlan leak by synchronizing access to mac_filter_hash	Aleksandr Loktionov
	This patch addresses a macvlan leak issue in the i40e driver caused by concurrent access to vsi->mac_filter_hash. The leak occurs when multiple threads attempt to modify the mac_filter_hash simultaneously, leading to inconsistent state and potential memory leaks. To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock), ensuring atomic operations and preventing concurrent access. Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in i40e_add_mac_filter() to help catch similar issues in the future. Reproduction steps: 1. Spawn VFs and configure port vlan on them. 2. Trigger concurrent macvlan operations (e.g., adding and deleting portvlan and/or mac filters). 3. Observe the potential memory leak and inconsistent state in the mac_filter_hash. This synchronization ensures the integrity of the mac_filter_hash and prevents the described leak. Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Use common error handling code in two functions	Markus Elfring
	Add jump targets so that a bit of exception handling can be better reused at the end of two function implementations. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Make use of assign_bit() API	Hongbo Li
	We have for some time the assign_bit() API to replace open coded if (foo) set_bit(n, bar); else clear_bit(n, bar); Use this API to clean the code. No functional change intended. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: store max_frame and rx_buf_len only in ice_rx_ring	Jacob Keller
	The max_frame and rx_buf_len fields of the VSI set the maximum frame size for packets on the wire, and configure the size of the Rx buffer. In the hardware, these are per-queue configuration. Most VSI types use a simple method to determine the size of the buffers for all queues. However, VFs may potentially configure different values for each queue. While the Linux iAVF driver does not do this, it is allowed by the virtchnl interface. The current virtchnl code simply sets the per-VSI fields inbetween calls to ice_vsi_cfg_single_rxq(). This technically works, as these fields are only ever used when programming the Rx ring, and otherwise not checked again. However, it is confusing to maintain. The Rx ring also already has an rx_buf_len field in order to access the buffer length in the hotpath. It also has extra unused bytes in the ring structure which we can make use of to store the maximum frame size. Drop the VSI max_frame and rx_buf_len fields. Add max_frame to the Rx ring, and slightly re-order rx_buf_len to better fit into the gaps in the structure layout. Change the ice_vsi_cfg_frame_size function so that it writes to the ring fields. Call this function once per ring in ice_vsi_cfg_rxqs(). This is done over calling it inside the ice_vsi_cfg_rxq(), because ice_vsi_cfg_rxq() is called in the virtchnl flow where the max_frame and rx_buf_len have already been configured. Change the accesses for rx_buf_len and max_frame to all point to the ring structure. This has the added benefit that ice_vsi_cfg_rxq() no longer has the surprise side effect of updating ring->rx_buf_len based on the VSI field. Update the virtchnl ice_vc_cfg_qs_msg() function to set the ring values directly, and drop references to the removed VSI fields. This now makes the VF logic clear, as the ring fields are obviously per-queue. This reduces the required cognitive load when reasoning about this logic. Note that removing the VSI fields does leave a 4 byte gap, but the ice_vsi structure has many gaps, and its layout is not as critical in the hot path. The structure may benefit from a more thorough repacking, but no attempt was made in this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: consistently use q_idx in ice_vc_cfg_qs_msg()	Jacob Keller
	The ice_vc_cfg_qs_msg() function is used to configure VF queues in response to a VIRTCHNL_OP_CONFIG_VSI_QUEUES command. The virtchnl command contains an array of queue pair data for configuring Tx and Rx queues. This data includes a queue ID. When configuring the queues, the driver generally uses this queue ID to determine which Tx and Rx ring to program. However, a handful of places use the index into the queue pair data from the VF. While most VF implementations appear to send this data in order, it is not mandated by the virtchnl and it is not verified that the queue pair data comes in order. Fix the driver to consistently use the q_idx field instead of the 'i' iterator value when accessing the rings. For the Rx case, introduce a local ring variable to keep lines short. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: add E830 HW VF mailbox message limit support	Paul Greenwalt
	E830 adds hardware support to prevent the VF from overflowing the PF mailbox with VIRTCHNL messages. E830 will use the hardware feature (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf(). To prevent a VF from overflowing the PF, the PF sets the number of messages per VF that can be in the PF's mailbox queue (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF, the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG register. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Implement ethtool reset support	Wojciech Drewek
	Enable ethtool reset support. Ethtool reset flags are mapped to the E810 reset type: PF reset: $ ethtool --reset <ethX> irq dma filter offload CORE reset: $ ethtool --reset <ethX> irq-shared dma-shared filter-shared \ offload-shared ram-shared GLOBAL reset: $ ethtool --reset <ethX> irq-shared dma-shared filter-shared \ offload-shared mac-shared phy-shared ram-shared Calling the same set of flags as in PF reset case on port representor triggers VF reset. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Fix increasing MSI-X on VF	Marcin Szycik
	Increasing MSI-X value on a VF leads to invalid memory operations. This is caused by not reallocating some arrays. Reproducer: modprobe ice echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count Default MSI-X is 16, so 17 and above triggers this issue. KASAN reports: BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice] Read of size 8 at addr ffff8888b937d180 by task bash/28433 (...) Call Trace: (...) ? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice] kasan_report+0xed/0x120 ? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice] ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice] ice_vsi_cfg_def+0x3360/0x4770 [ice] ? mutex_unlock+0x83/0xd0 ? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice] ? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice] ice_vsi_cfg+0x7f/0x3b0 [ice] ice_vf_reconfig_vsi+0x114/0x210 [ice] ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice] sriov_vf_msix_count_store+0x21c/0x300 (...) Allocated by task 28201: (...) ice_vsi_cfg_def+0x1c8e/0x4770 [ice] ice_vsi_cfg+0x7f/0x3b0 [ice] ice_vsi_setup+0x179/0xa30 [ice] ice_sriov_configure+0xcaa/0x1520 [ice] sriov_numvfs_store+0x212/0x390 (...) To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This causes the required arrays to be reallocated taking the new queue count into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq before ice_vsi_rebuild(), so that realloc uses the newly set queue count. Additionally, ice_vsi_rebuild() does not remove VSI filters (ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer necessary. Reported-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Flush FDB entries before reset	Wojciech Drewek
	Triggering the reset while in switchdev mode causes errors[1]. Rules are already removed by this time because switch content is flushed in case of the reset. This means that rules were deleted from HW but SW still thinks they exist so when we get SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to delete not existing rule. We can avoid these errors by clearing the rules early in the reset flow before they are removed from HW. Switchdev API will get notified that the rule was removed so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification. Remove unnecessary ice_clear_sw_switch_recipes. [1] ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2 ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2 Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support") Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Fix netif_is_ice() in Safe Mode	Marcin Szycik
	netif_is_ice() works by checking the pointer to netdev ops. However, it only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops, so in Safe Mode it always returns false, which is unintuitive. While it doesn't look like netif_is_ice() is currently being called anywhere in Safe Mode, this could change and potentially lead to unexpected behaviour. Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	ice: Fix entering Safe Mode	Marcin Szycik
	If DDP package is missing or corrupted, the driver should enter Safe Mode. Instead, an error is returned and probe fails. To fix this, don't exit init if ice_init_ddp_config() returns an error. Repro: * Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg) * Load ice Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08	wifi: remove iw_public_data from struct net_device	Johannes Berg
	Given the previous patches, we no longer need the struct iw_public_data etc., it's only used by the old Intel drivers (and ps3_gelic creates it but then doesn't use it). Remove all of that, including the pointer in struct net_device. Link: https://patch.msgid.link/20241007213525.8b2d52b60531.I6a27aaf30bded9a0977f07f47fba2bd31a3b3330@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08	idpf: Don't hard code napi_struct size	Joe Damato
	The sizeof(struct napi_struct) can change. Don't hardcode the size to 400 bytes and instead use "sizeof(struct napi_struct)". Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20241004105407.73585-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08	net: fec: make PPS channel configurable	Francesco Dolcini
	Depending on the SoC where the FEC is integrated into the PPS channel might be routed to different timer instances. Make this configurable from the devicetree. When the related DT property is not present fallback to the previous default and use channel 0. Reviewed-by: Frank Li <Frank.Li@nxp.com> Tested-by: Rafael Beims <rafael.beims@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: fec: refactor PPS channel configuration	Francesco Dolcini
	Preparation patch to allow for PPS channel configuration, no functional change intended. Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: redefine internal ports and PGID's as offsets	Daniel Machon
	Internal ports and PGID's are both defined relative to the number of front ports on Sparx5. This will not work on lan969x. Instead make them offsets to the number of front ports and add two helpers to retrieve them. Use the helpers throughout. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add is_sparx5 macro and use it throughout	Daniel Machon
	We dont want to ops out each time a function needs to do some platform specifics. In particular we have a few places, where it would be convenient to just branch out on the platform type. Add the function is_sparx5() and, initially, use it for: - register writes that should only be done on Sparx5 (QSYS_CAL_CTRL, CLKGEN_LCPLL1_CORE_CLK). - function calls that should only be done on Sparx5 (ethtool_op_get_ts_info()) - register writes that are chip-exclusive (MASK_CFG1/2, PGID_CFG1/2, these are replicated for n_ports >32 on Sparx5). The is_sparx5() function simply checks the target chip type, to determine if this is a Sparx5 SKU or not. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: ops out function for DSM calendar calculation	Daniel Machon
	The DSM (Disassembler) calendar grants each port access to internal busses. The configuration of the calendar is done differently on Sparx5 and lan969x. Therefore ops out the function that calculates the calendar. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: ops out PTP IRQ handler	Daniel Machon
	The PTP registers are located in two different register targets on Sparx5 and lan969x. We can't handle this with the register macros, so ops out the handler. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: ops out function for setting the port mux	Daniel Machon
	Port muxing is configured based on the supported port modes. As these modes can differ on Sparx5 and lan969x we ops out the port muxing function. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: ops out functions for getting certain array values	Daniel Machon
	Add getters for getting values in arrays: sdlb_groups and sparx5_hsch_max_group_rate and ops out the getters, as these arrays will differ on lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: ops out chip port to device index/bit functions	Daniel Machon
	The chip port device index and mode bit can be obtained using the port number. However the mapping of port number to chip device index and mode bit differs on Sparx5 and lan969x. Therefore ops out the function. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add ops to match data	Daniel Machon
	Add new struct sparx5_ops, containing functions that needs to be different as the implementation differs on Sparx5 and lan969x. Initially we add functions for checking the port type (2g5, 5g, 10g or 25g) based on the port number. Update the code to use the ops instead of the platform specific functions. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: use SPX5_CONST for constants which do not have a symbol	Daniel Machon
	Now that we have indentified all the chip constants, update the use of them where a symbol is not defined for the constant. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: use SPX5_CONST for constants which already have a symbol	Daniel Machon
	Now that we have indentified all the chip constants, update the use of them where a symbol is already defined for the constant. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add constants to match data	Daniel Machon
	Add new struct sparx5_consts, containing all the chip constants that are known to be different for Sparx5 and lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add *sparx5 argument to a few functions	Daniel Machon
	The *sparx5 context pointer is required in functions that need to access platform constants (which will be added in a subsequent patch). Prepare for this by updating the prototype and use of such functions. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: modify SPX5_PORTS_ALL macro	Daniel Machon
	In preparation for lan969x, we need to define the SPX5_PORTS_ALL macro as 70 (65 front ports + 5 internal ports). This is required as the SPX5_PORT_CPU will be redefined as an offset to the number of front ports, in a subsequent patch. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add indirection layer to register macros	Daniel Machon
	The register macros are used to read and write to the switch registers. The registers are largely the same on Sparx5 and lan969x, however in some cases they differ. The differences can be one or more of the following: target size, register address, register count, group address, group count, group size, field position, field size. In order to handle these differences, we introduce a new indirection layer, that defines and maps them to corresponding values, based on the platform. As the register macro arguments can now be non-constants, we also add non-constant variants of FIELD_GET and FIELD_PREP. Since the indirection layer contributes to longer macros, we have changed the formatting of them slightly, to adhere to a 80 character limit, and added a comment if a macro is platform-specific. With these additions, we can reuse all the existing macros for lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: sparx5: add support for private match data	Daniel Machon
	In preparation for lan969x, add support for private match data. This will be needed for abstracting away differences between the Sparx5 and lan969x platforms. We initially add values for: iomap, iomap size and ioranges. Update the use of these throughout. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: ethernet: ti: am65-cpsw: avoid devm_alloc_etherdev, fix module removal	Nicolas Pitre
	Usage of devm_alloc_etherdev_mqs() conflicts with am65_cpsw_nuss_cleanup_ndev() as the same struct net_device instances get unregistered twice. Switch to alloc_etherdev_mqs() and make sure am65_cpsw_nuss_cleanup_ndev() unregisters and frees those net_device instances properly. With this, it is finally possible to rmmod the driver without oopsing the kernel. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: Roger Quadros <roger@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: ethernet: ti: am65-cpsw: prevent WARN_ON upon module removal	Nicolas Pitre
	In am65_cpsw_nuss_remove(), move the call to am65_cpsw_unregister_devlink() after am65_cpsw_nuss_cleanup_ndev() to avoid triggering the WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET) in devl_port_unregister(). Makes it coherent with usage in m65_cpsw_nuss_register_ndevs()'s cleanup path. Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support") Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08	net: qcom/emac: Find sgmii_ops by device_for_each_child()	Zijun Hu
	To prepare for constifying the following old driver core API: struct device device_find_child(struct device dev, void data, int (match)(struct device dev, void data)); to new: struct device device_find_child(struct device dev, const void data, int (match)(struct device dev, const void data)); The new API does not allow its match function (match)() to modify caller's match data @data, but emac_sgmii_acpi_match(), as the old API's match function, indeed modifies relevant match data, so it is not suitable for the new API any more, solved by implementing the same finding sgmii_ops function by correcting the function and using it as parameter of device_for_each_child() instead of device_find_child(). By the way, this commit does not change any existing logic. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://patch.msgid.link/20241003-qcom_emac_fix-v6-1-0658e3792ca4@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-07	net: airoha: Update tx cpu dma ring idx at the end of xmit loop	Lorenzo Bianconi
	Move the tx cpu dma ring index update out of transmit loop of airoha_dev_xmit routine in order to not start transmitting the packet before it is fully DMA mapped (e.g. fragmented skbs). Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Reported-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241004-airoha-eth-7581-mapping-fix-v1-1-8e4279ab1812@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07	net: ethernet: adi: adin1110: Fix some error handling path in ↵	Christophe JAILLET
	adin1110_read_fifo() If 'frame_size' is too small or if 'round_len' is an error code, it is likely that an error code should be returned to the caller. Actually, 'ret' is likely to be 0, so if one of these sanity checks fails, 'success' is returned. Return -EINVAL instead. Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/8ff73b40f50d8fa994a454911b66adebce8da266.1727981562.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07	Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled"	Jakub Kicinski
	This reverts commit b514c47ebf41a6536551ed28a05758036e6eca7c. The commit describes that we don't have to sync the page when recycling, and it tries to optimize that case. But we do need to sync after allocation. Recycling side should be changed to pass the right sync size instead. Fixes: b514c47ebf41 ("net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled") Reported-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/20241004070846.2502e9ea@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20241004142115.910876-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07	mlxsw: spectrum_acl_flex_keys: Constify struct mlxsw_afk_element_inst	Christophe JAILLET
	'struct mlxsw_afk_element_inst' are not modified in these drivers. Constifying these structures moves some data to a read-only section, so increases overall security. Update a few functions and struct mlxsw_afk_block accordingly. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 4278 4032 0 8310 2076 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.o After: ===== text data bss dec hex filename 7934 352 0 8286 205e drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/8ccfc7bfb2365dcee5b03c81ebe061a927d6da2e.1727541677.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-06	sfc: add per-queue RX bytes stats	Edward Cree
	While this does add overhead to the fast path, it should be minimal as the cacheline should already be held for write from updating the queue's rx_packets stat. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: implement per-queue TSO (hw_gso) stats	Edward Cree
	Use our existing TSO stats, which count enqueued TSO TXes. Users may expect them to count completions, as tx-packets and tx-bytes do; however, these are the counters we have, and the qstats documentation doesn't actually specify. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: implement per-queue rx drop and overrun stats	Edward Cree
	Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: account XDP TXes in netdev base stats	Edward Cree
	When we handle a TX completion for an XDP packet, it is not counted in the per-TXQ netdev stats. Record it in new internal counters, and include those in the device-wide total in efx_get_base_stats(). Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: add n_rx_overlength to ethtool stats	Edward Cree
	The previous patch changed when we increment the RX queue's rx_packets counter, to match the semantics of netdev per-queue stats. The differences between the old and new counts are scatter errors (which produce a WARN_ON) and this counter, which is incremented by efx_rx_packet__check_len() when an RX packet (which was placed in a single buffer by SG, i.e. n_frags == 1) has a length (from the RX event) which is too long to fit in the RX buffer. If this occurs, we drop the packet and fire a ratelimited netif_err(). The counter previously was not reported anywhere; add it to ethtool -S output to ensure users still have this information. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: implement basic per-queue stats	Edward Cree
	Just RX and TX packet counts and TX bytes for now. We do not have per-queue RX byte counts, which causes us to fail stats.pkt_byte_sum selftest with "Drivers should always report basic keys" error. Per-queue counts are since the last time the queue was inited (typically by efx_start_datapath(), on ifup or reconfiguration); device-wide total (efx_get_base_stats()) is since driver probe. This is not the same lifetime as rtnl_link_stats64, which uses firmware stats which count since FW (re)booted; this can cause a "Qstats are lower" or "RTNL stats are lower" failure in stats.pkt_byte_sum selftest. Move the increment of rx_queue->rx_packets to match the semantics specified for netdev per-queue stats, i.e. just before handing the packet to XDP (if present) or the netstack (through GRO). This will affect the existing ethtool -S output which also reports these counters. XDP TX packets are not yet counted into base_stats. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: remove obsolete counters from struct efx_channel	Edward Cree
	The n_rx_tobe_disc and n_rx_mcast_mismatch counters are a legacy from farch, and are never written in EF10 or EF100 code. Remove them from the struct and from ethtool -S output, saving a bit of memory and avoiding user confusion. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-04	net: ethernet: Switch back to struct platform_driver::remove()	Uwe Kleine-König
	After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/ethernet to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-10-01 (ice) This series contains updates to ice driver only. Karol cleans up current PTP GPIO pin handling, fixes minor bugs, refactors implementation for all products, introduces SDP (Software Definable Pins) for E825C and implements reading SDP section from NVM for E810 products. Sergey replaces multiple aux buses and devices used in the PTP support code with struct ice_adapter holding the necessary shared data. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Drop auxbus use for PTP to finalize ice_adapter move ice: Use ice_adapter for PTP shared data instead of auxdev ice: Initial support for E825C hardware in ice_adapter ice: Add ice_get_ctrl_ptp() wrapper to simplify the code ice: Introduce ice_get_phy_model() wrapper ice: Enable 1PPS out from CGU for E825C products ice: Read SDP section from NVM for pin definitions ice: Disable shared pin on E810 on setfunc ice: Cache perout/extts requests and check flags ice: Align E810T GPIO to other products ice: Add SDPs support for E825C ice: Implement ice_ptp_pin_desc ==================== Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04	ibmvnic: Inspect header requirements before using scrq direct	Nick Child
	Previously, the TX header requirement for standard frames was ignored. This requirement is a bitstring sent from the VIOS which maps to the type of header information needed during TX. If no header information, is needed then send subcrq direct can be used (which can be more performant). This bitstring was previously ignored for standard packets (AKA non LSO, non CSO) due to the belief that the bitstring was over-cautionary. It turns out that there are some configurations where the backing device does need header information for transmission of standard packets. If the information is not supplied then this causes continuous "Adapter error" transport events. Therefore, this bitstring should be respected and observed before considering the use of send subcrq direct. Fixes: 74839f7a8268 ("ibmvnic: Introduce send sub-crq direct") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001163200.1802522-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04	net/mlx5: hw counters: Remove mlx5_fc_create_ex	Cosmin Ratiu
	It no longer serves any purpose and is identical to mlx5_fc_create upon which it was originally based of. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04	net/mlx5: hw counters: Don't maintain a counter count	Cosmin Ratiu
	num_counters is only used for deciding whether to grow the bulk query buffer, which is done once more counters than a small initial threshold are present. After that, maintaining num_counters serves no purpose. This commit replaces that with an actual xarray traversal to count the counters. This appears expensive at first sight, but is only done when the number of counters is less than the initial threshold (8) and only once every sampling interval. Once the number of counters goes above the threshold, the bulk query buffer is grown to max size and the xarray traversal is never done again. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04	net/mlx5: hw counters: Drop unneeded cacheline alignment	Cosmin Ratiu
	The mlx5_fc struct has a cache for values queried from hw, which is cacheline aligned. On x86_64, this results in: struct mlx5_fc { u32 id; /* 0 4 / bool aging; / 4 1 / / XXX 3 bytes hole, try to pack / struct mlx5_fc_bulk bulk; /* 8 8 / / XXX 48 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / struct mlx5_fc_cache cache __attribute__((__aligned__(64))); / 64 24 / u64 lastpackets; / 88 8 / u64 lastbytes; / 96 8 / / size: 128, cachelines: 2, members: 6 / / sum members: 53, holes: 2, sum holes: 51 / / padding: 24 / / forced aligns: 1, forced holes: 1, sum forced holes: 48 / } __attribute__((__aligned__(64))); (output from pahole). ...So a 48+24=72 byte waste. As far as I can determine, this serves no purpose other than maybe making sure that the values in the cache do not span two cachelines in the worst case scenario, but that's not a valid enough reason to waste 72 bytes per counter, especially since this code is not performance-critical. There could potentially be hundreds of thousands of counters (e.g. for connection-tracking), so this quickly adds up to multiple MB wasted. This commit removes the alignment, resulting in: struct mlx5_fc { [...] / size: 56, cachelines: 1, members: 6 / / sum members: 53, holes: 1, sum holes: 3 / / last cacheline: 56 bytes */ }; Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>