summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2025-01-10net: mana: Cleanup "mana" debugfs dir after cleanup of all childrenShradha Gupta
In mana_driver_exit(), mana_debugfs_root gets cleanup before any of it's children (which happens later in the pci_unregister_driver()). Due to this, when mana driver is configured as a module and rmmod is invoked, following stack gets printed along with failure in rmmod command. [ 2399.317651] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 2399.318657] #PF: supervisor write access in kernel mode [ 2399.319057] #PF: error_code(0x0002) - not-present page [ 2399.319528] PGD 10eb68067 P4D 0 [ 2399.319914] Oops: Oops: 0002 [#1] SMP NOPTI [ 2399.320308] CPU: 72 UID: 0 PID: 5815 Comm: rmmod Not tainted 6.13.0-rc5+ #89 [ 2399.320986] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024 [ 2399.321892] RIP: 0010:down_write+0x1a/0x50 [ 2399.322303] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 9d cd ff ff 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 17 65 48 8b 05 f6 84 dd 5f 49 89 44 24 08 4c [ 2399.323669] RSP: 0018:ff53859d6c663a70 EFLAGS: 00010246 [ 2399.324061] RAX: 0000000000000000 RBX: ff1d4eb505060180 RCX: ffffff8100000000 [ 2399.324620] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098 [ 2399.325167] RBP: ff53859d6c663a78 R08: 00000000000009c4 R09: ff1d4eb4fac90000 [ 2399.325681] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000098 [ 2399.326185] R13: ff1d4e42e1a4a0c8 R14: ff1d4eb538ce0000 R15: 0000000000000098 [ 2399.326755] FS: 00007fe729570000(0000) GS:ff1d4eb2b7200000(0000) knlGS:0000000000000000 [ 2399.327269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2399.327690] CR2: 0000000000000098 CR3: 00000001c0584005 CR4: 0000000000373ef0 [ 2399.328166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2399.328623] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 2399.329055] Call Trace: [ 2399.329243] <TASK> [ 2399.329379] ? show_regs+0x69/0x80 [ 2399.329602] ? __die+0x25/0x70 [ 2399.329856] ? page_fault_oops+0x271/0x550 [ 2399.330088] ? psi_group_change+0x217/0x470 [ 2399.330341] ? do_user_addr_fault+0x455/0x7b0 [ 2399.330667] ? finish_task_switch.isra.0+0x91/0x2f0 [ 2399.331004] ? exc_page_fault+0x73/0x160 [ 2399.331275] ? asm_exc_page_fault+0x27/0x30 [ 2399.343324] ? down_write+0x1a/0x50 [ 2399.343631] simple_recursive_removal+0x4d/0x2c0 [ 2399.343977] ? __pfx_remove_one+0x10/0x10 [ 2399.344251] debugfs_remove+0x45/0x70 [ 2399.344511] mana_destroy_rxq+0x44/0x400 [mana] [ 2399.344845] mana_destroy_vport+0x54/0x1c0 [mana] [ 2399.345229] mana_detach+0x2f1/0x4e0 [mana] [ 2399.345466] ? ida_free+0x150/0x160 [ 2399.345718] ? __cond_resched+0x1a/0x50 [ 2399.345987] mana_remove+0xf4/0x1a0 [mana] [ 2399.346243] mana_gd_remove+0x25/0x80 [mana] [ 2399.346605] pci_device_remove+0x41/0xb0 [ 2399.346878] device_remove+0x46/0x70 [ 2399.347150] device_release_driver_internal+0x1e3/0x250 [ 2399.347831] ? klist_remove+0x81/0xe0 [ 2399.348377] driver_detach+0x4b/0xa0 [ 2399.348906] bus_remove_driver+0x83/0x100 [ 2399.349435] driver_unregister+0x31/0x60 [ 2399.349919] pci_unregister_driver+0x40/0x90 [ 2399.350492] mana_driver_exit+0x1c/0xb50 [mana] [ 2399.351102] __do_sys_delete_module.constprop.0+0x184/0x320 [ 2399.351664] ? __fput+0x1a9/0x2d0 [ 2399.352200] __x64_sys_delete_module+0x12/0x20 [ 2399.352760] x64_sys_call+0x1e66/0x2140 [ 2399.353316] do_syscall_64+0x79/0x150 [ 2399.353813] ? syscall_exit_to_user_mode+0x49/0x230 [ 2399.354346] ? do_syscall_64+0x85/0x150 [ 2399.354816] ? irqentry_exit+0x1d/0x30 [ 2399.355287] ? exc_page_fault+0x7f/0x160 [ 2399.355756] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2399.356302] RIP: 0033:0x7fe728d26aeb [ 2399.356776] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48 [ 2399.358372] RSP: 002b:00007ffff954d6f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 2399.359066] RAX: ffffffffffffffda RBX: 00005609156cc760 RCX: 00007fe728d26aeb [ 2399.359779] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005609156cc7c8 [ 2399.360535] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 2399.361261] R10: 00007fe728dbeac0 R11: 0000000000000206 R12: 00007ffff954d950 [ 2399.361952] R13: 00005609156cc2a0 R14: 00007ffff954ee5f R15: 00005609156cc760 [ 2399.362688] </TASK> Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device") Cc: stable@vger.kernel.org Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1736398991-764-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10eth: bnxt: always recalculate features after XDP clearing, fix null-derefJakub Kicinski
Recalculate features when XDP is detached. Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: off [requested on] After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: on The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly. Starting with commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") we only reconfigure the RSS hash table if the "effective" number of Rx rings has changed. If HW-GRO is enabled "effective" number of rings is 2x what user sees. So if we are in the bad state, with HW-GRO re-enablement "pending" after XDP off, and we lower the rings by / 2 - the HW-GRO rings doing 2x and the ethtool -L doing / 2 may cancel each other out, and the: if (old_rx_rings != bp->hw_resc.resv_rx_rings && condition in __bnxt_reserve_rings() will be false. The RSS map won't get updated, and we'll crash with: BUG: kernel NULL pointer dereference, address: 0000000000000168 RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0 bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180 __bnxt_setup_vnic_p5+0x58/0x110 bnxt_init_nic+0xb72/0xf50 __bnxt_open_nic+0x40d/0xab0 bnxt_open_nic+0x2b/0x60 ethtool_set_channels+0x18c/0x1d0 As we try to access a freed ring. The issue is present since XDP support was added, really, but prior to commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") it wasn't causing major issues. Fixes: 1054aee82321 ("bnxt_en: Use NETIF_F_GRO_HW.") Fixes: 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: remove stmmac_lpi_entry_timer_config()Russell King (Oracle)
Remove stmmac_lpi_entry_timer_config(), setting priv->eee_sw_timer_en at the original call sites, and calling the appropriate stmmac_xxx_hw_lpi_timer() function. No functional change. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEq-0002LQ-PC@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: split hardware LPI timer controlRussell King (Oracle)
Provide stmmac_disable_hw_lpi_timer() and stmmac_enable_hw_lpi_timer() to control the hardware transmit LPI timer. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEl-0002LK-LA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: remove unnecessary EEE handling in stmmac_release()Russell King (Oracle)
phylink_stop() will cause phylink to call the mac_link_down() operation before phylink_stop() returns. As mac_link_down() will call stmmac_eee_init(false), this will set both priv->eee_active and priv->eee_enabled to be false, deleting the eee_ctrl_timer if priv->eee_enabled was previously set. As stmmac_release() calls phylink_stop() before checking whether priv->eee_enabled is true, this is a condition that can never be satisfied, and thus the code within this if() block will never be executed. Remove it. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEg-0002LE-HH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: move setup of eee_ctrl_timer to stmmac_dvr_probe()Russell King (Oracle)
Move the initialisation of the EEE software timer to the probe function as it is unnecessary to do this each time we enable software LPI. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEb-0002L8-DJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: use boolean for eee_enabled and eee_activeRussell King (Oracle)
priv->eee_enabled and priv->eee_active are both assigned using boolean values. Type them as bool rather than int. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEW-0002L2-9w@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: move priv->eee_active into stmmac_eee_init()Russell King (Oracle)
Since all call sites of stmmac_eee_init() assign priv->eee_active immediately before, pass this state into stmmac_eee_init() and assign priv->eee_active within this function. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZER-0002Kv-5O@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: move priv->eee_enabled into stmmac_eee_init()Russell King (Oracle)
All call sites for stmmac_eee_init() assign the return code to priv->eee_enabled. Rather than having this coded at each call site, move the assignment inside stmmac_eee_init(). Since stmmac_init_eee() takes priv->lock before checking the state of priv->eee_enabled, move the assignment within the locked region. Also, stmmac_suspend() checks the state of this member under the lock. While two concurrent calls to stmmac_init_eee() aren't possible, there is a possibility that stmmac_suspend() may run concurrently with a change of priv->eee_enabled unless we modify it under the lock. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEM-0002Kq-2Z@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: remove priv->eee_tw_timerRussell King (Oracle)
priv->eee_tw_timer is only assigned during initialisation to a constant value (STMMAC_DEFAULT_TWT_LS) and then never changed. Remove priv->eee_tw_timer, and instead use STMMAC_DEFAULT_TWT_LS for both uses in stmmac_eee_init(). Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEG-0002Kk-VH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: convert to use phy_eee_rx_clock_stop()Russell King (Oracle)
Convert stmmac to use phy_eee_rx_clock_stop() to set the PHY receive clock stop in LPI setting, rather than calling the legacy phy_init_eee() function. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZEB-0002Ke-RZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: report EEE error statistics if EEE is supportedRussell King (Oracle)
Report the number of EEE error statistics in the xstats even when EEE is not enabled in hardware, but is supported. The PHY maintains this counter even when EEE is not enabled. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZE6-0002KY-Nx@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: remove priv->tx_lpi_enabledRussell King (Oracle)
Through using phylib's EEE state, priv->tx_lpi_enabled has become a write-only variable. Remove it. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZE1-0002KS-K1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: clean up stmmac_disable_eee_mode()Russell King (Oracle)
stmmac_disable_eee_mode() is now only called from stmmac_xmit() when both priv->tx_path_in_lpi_mode and priv->eee_sw_timer_en are true. Therefore: if (!priv->eee_sw_timer_en) in stmmac_disable_eee_mode() will never be true, so this is dead code. Remove it, and rename the function to indicate that it now only deals with software based EEE mode. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDw-0002KL-Gg@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: remove redundant code from ethtool EEE opsRussell King (Oracle)
Setting edata->tx_lpi_enabled in stmmac_ethtool_op_get_eee() gets overwritten by phylib, so there's no point setting this. In stmmac_ethtool_op_set_eee(), now that stmmac is using the result of phylib's evaluation of EEE, there is no need to handle anything in the ethtool EEE ops other than calling through to the appropriate phylink function, which will pass on to phylib the users request. As stmmac_disable_eee_mode() is now no longer called from outside stmmac_main.c, make it static. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDr-0002KF-Cv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: make EEE depend on phy->enable_tx_lpiRussell King (Oracle)
Make stmmac EEE depend on phylib's evaluation of user settings and PHY negotiation, as indicated by phy->enable_tx_lpi. This will ensure when phylib has evaluated that the user has disabled LPI, phy_init_eee() will not be called, and priv->eee_active will be false, causing LPI/EEE to be disabled. This is an interim measure - phy_init_eee() will be removed in a later patch. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDm-0002K9-9w@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: use unsigned int for eee_timerRussell King (Oracle)
Since eee_timer is used to initialise priv->tx_lpi_timer, this also should be unsigned to avoid a negative number being interpreted as a very large positive number. Note that this makes the check for negative numbers passed in as a module parameter redundant, and passing a negative number will now produce a large delay rather than the default. Since the default is used without an argument, passing a negative number would be quite obscure. However, if users do, then this will need to be revisited. Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDh-0002K3-6y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: use correct type for tx_lpi_timerRussell King (Oracle)
The ethtool interface uses u32 for tx_lpi_timer, and so does phylib. Use u32 to store this internally within stmmac rather than "int" which could misinterpret large values. Correct "value" in dwmac4_set_eee_lpi_entry_timer() to use u32 rather than int, which is derived from tx_lpi_timer. Even though this path won't be used with values larger than STMMAC_ET_MAX, this brings consistency of type usage to the stmmac code for this variable. We leave eee_timer unchanged for now, with the assumption that values up to INT_MAX will safely fit in a u32. Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDc-0002Jx-3b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10net: stmmac: move tx_lpi_timer tracking to phylibRussell King (Oracle)
When stmmac_ethtool_op_get_eee() is called, stmmac sets the tx_lpi_timer and tx_lpi_enabled members, and then calls into phylink and thus phylib. phylib overwrites these members. phylib will also cause a link down/link up transition when settings that impact the MAC have been changed. Convert stmmac to use the tx_lpi_timer setting in struct phy_device, updating priv->tx_lpi_timer each time when the link comes up, rather than trying to maintain this user setting itself. We initialise the phylib tx_lpi_timer setting by doing a get_ee-modify-set_eee sequence with the last known priv->tx_lpi_timer value. In order for this to work correctly, we also need this member to be initialised earlier. As stmmac_eee_init() is no longer called outside of stmmac_main.c, make it static. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tVZDW-0002Jr-W3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10Merge tag 'ipsec-next-2025-01-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-2025-01-09 1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality. From Christian Hopps. 2) Support ESN context update to hardware for TX. From Jianbo Liu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-09net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()Sudheer Kumar Doredla
CPSW ALE has 75-bit ALE entries stored across three 32-bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions support ALE field entries spanning up to two words at the most. The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as expected when ALE field spanned across word1 and word2, but fails when ALE field spanned across word2 and word3. For example, while reading the ALE field spanned across word2 and word3 (i.e. bits 62 to 64), the word3 data shifted to an incorrect position due to the index becoming zero while flipping. The same issue occurred when setting an ALE entry. This issue has not been seen in practice but will be an issue in the future if the driver supports accessing ALE fields spanning word2 and word3 Fix the methods to handle getting/setting fields spanning up to two words. Fixes: b685f1a58956 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()") Signed-off-by: Sudheer Kumar Doredla <s-doredla@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc7). Conflicts: a42d71e322a8 ("net_sched: sch_cake: Add drop reasons") 737d4d91d35b ("sched: sch_cake: add bounds checks to host bulk flow fairness counts") Adjacent changes: drivers/net/ethernet/meta/fbnic/fbnic.h 3a856ab34726 ("eth: fbnic: add IRQ reuse support") 95978931d55f ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09enic: Fix typo in comment in table indexed by link speedJohn Daley
The RX adaptive interrupt moderation table is indexed by link speed range, where the last row of the table is the catch-all for all link speeds greater than 10Gbps. The comment said 10 - 40Gbps, but since there are now adapters with link speeds than 40Gbps, the comment is now wrong and should indicate it applies to all speeds greater than 10Gbps. Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107214159.18807-4-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09enic: Obtain the Link speed only after the link comes upJohn Daley
The link speed is obtained in the RX adaptive coalescing function. It was being called at probe time when the link may not be up. Change the call to run after the Link comes up. The impact of not getting the correct link speed was that the low end of the adaptive interrupt range was always being set to 0 which could have caused a slight increase in the number of RX interrupts. Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107214159.18807-3-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09enic: Move RX coalescing set functionJohn Daley
Move the function used for setting the RX coalescing range to before the function that checks the link status. It needs to be called from there instead of from the probe function. There is no functional change. Co-developed-by: Nelson Escobar <neescoba@cisco.com> Signed-off-by: Nelson Escobar <neescoba@cisco.com> Co-developed-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: Satish Kharat <satishkh@cisco.com> Signed-off-by: John Daley <johndale@cisco.com> Link: https://patch.msgid.link/20250107214159.18807-2-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09net/mlx5: Fix variable not being completed when function returnsChenguang Zhao
When cmd_alloc_index(), fails cmd_work_handler() needs to complete ent->slotted before returning early. Otherwise the task which issued the command may hang: mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry INFO: task kworker/13:2:4055883 blocked for more than 120 seconds. Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/13:2 D 0 4055883 2 0x00000228 Workqueue: events mlx5e_tx_dim_work [mlx5_core] Call trace: __switch_to+0xe8/0x150 __schedule+0x2a8/0x9b8 schedule+0x2c/0x88 schedule_timeout+0x204/0x478 wait_for_common+0x154/0x250 wait_for_completion+0x28/0x38 cmd_exec+0x7a0/0xa00 [mlx5_core] mlx5_cmd_exec+0x54/0x80 [mlx5_core] mlx5_core_modify_cq+0x6c/0x80 [mlx5_core] mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core] mlx5e_tx_dim_work+0x54/0x68 [mlx5_core] process_one_work+0x1b0/0x448 worker_thread+0x54/0x468 kthread+0x134/0x138 ret_from_fork+0x10/0x18 Fixes: 485d65e13571 ("net/mlx5: Add a timeout to acquire the command queue semaphore") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09rtase: Fix a check for error in rtase_alloc_msix()Dan Carpenter
The pci_irq_vector() function never returns zero. It returns negative error codes or a positive non-zero IRQ number. Fix the error checking to test for negatives. Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/f2ecc88d-af13-4651-9820-7cc665230019@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09net: stmmac: dwmac-tegra: Read iommu stream id from device treeParker Newman
Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be written to the MGBE_WRAP_AXI_ASID0_CTRL register. The current driver is hard coded to use MGBE0's SID for all controllers. This causes softirq time outs and kernel panics when using controllers other than MGBE0. Example dmesg errors when an ethernet cable is connected to MGBE1: [ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms [ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter. [ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 [ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171) [ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features [ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported [ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock [ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode [ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337 [ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8) [ 181.921878] Sending NMI from CPU 4 to CPUs 7: [ 181.921886] NMI backtrace for cpu 7 [ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 [ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 [ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 181.922847] pc : handle_softirqs+0x98/0x368 [ 181.922978] lr : __do_softirq+0x18/0x20 [ 181.923095] sp : ffff80008003bf50 [ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 [ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 [ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 [ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 [ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 [ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d [ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 [ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74 [ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1 [ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 [ 181.967591] Call trace: [ 181.970043] handle_softirqs+0x98/0x368 (P) [ 181.974240] __do_softirq+0x18/0x20 [ 181.977743] ____do_softirq+0x14/0x28 [ 181.981415] call_on_irq_stack+0x24/0x30 [ 181.985180] do_softirq_own_stack+0x20/0x30 [ 181.989379] __irq_exit_rcu+0x114/0x140 [ 181.993142] irq_exit_rcu+0x14/0x28 [ 181.996816] el1_interrupt+0x44/0xb8 [ 182.000316] el1h_64_irq_handler+0x14/0x20 [ 182.004343] el1h_64_irq+0x80/0x88 [ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P) [ 182.012305] cpuidle_enter+0x3c/0x58 [ 182.015980] cpuidle_idle_call+0x128/0x1c0 [ 182.020005] do_idle+0xe0/0xf0 [ 182.023155] cpu_startup_entry+0x3c/0x48 [ 182.026917] secondary_start_kernel+0xdc/0x120 [ 182.031379] __secondary_switched+0x74/0x78 [ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/. [ 212.985935] rcu: blocking rcu_node structures (internal RCU debug): [ 212.992758] Sending NMI from CPU 0 to CPUs 7: [ 212.998539] NMI backtrace for cpu 7 [ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 [ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 [ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 213.040528] pc : handle_softirqs+0x98/0x368 [ 213.046563] lr : __do_softirq+0x18/0x20 [ 213.051293] sp : ffff80008003bf50 [ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 [ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 [ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 [ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 [ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 [ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d [ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 [ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608 [ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04 [ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 [ 213.162063] Call trace: [ 213.165494] handle_softirqs+0x98/0x368 (P) [ 213.171256] __do_softirq+0x18/0x20 [ 213.177291] ____do_softirq+0x14/0x28 [ 213.182017] call_on_irq_stack+0x24/0x30 [ 213.186565] do_softirq_own_stack+0x20/0x30 [ 213.191815] __irq_exit_rcu+0x114/0x140 [ 213.196891] irq_exit_rcu+0x14/0x28 [ 213.202401] el1_interrupt+0x44/0xb8 [ 213.207741] el1h_64_irq_handler+0x14/0x20 [ 213.213519] el1h_64_irq+0x80/0x88 [ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P) [ 213.224364] cpuidle_enter+0x3c/0x58 [ 213.228653] cpuidle_idle_call+0x128/0x1c0 [ 213.233993] do_idle+0xe0/0xf0 [ 213.237928] cpu_startup_entry+0x3c/0x48 [ 213.243791] secondary_start_kernel+0xdc/0x120 [ 213.249830] __secondary_switched+0x74/0x78 This bug has existed since the dwmac-tegra driver was added in Dec 2022 (See Fixes tag below for commit hash). The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit only uses MGBE0 which is why the bug was not found previously. Connect Tech has many products that use 2 (or more) MGBE controllers. The solution is to read the controller's SID from the existing "iommus" device tree property. The 2nd field of the "iommus" device tree property is the controller's SID. Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property: smmu_niso0: iommu@12000000 { compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; ... } /* MGBE1 */ ethernet@6900000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; ... } Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the SID using the tegra_dev_iommu_get_stream_id() helper function found in linux/iommu.h. Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus" property is removed from the device tree or the IOMMU is disabled. While the Tegra234 SOC technically supports bypassing the IOMMU, it is not supported by the current firmware, has not been tested and not recommended. More detailed discussion with Thierry Reding from Nvidia linked below. Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@connecttech.com Signed-off-by: Parker Newman <pnewman@connecttech.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/6fb97f32cf4accb4f7cf92846f6b60064ba0a3bd.1736284360.git.pnewman@connecttech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09net/mlx5: use do_aux_work for PHC overflow checksVadim Fedorenko
The overflow_work is using system wq to do overflow checks and updates for PHC device timecounter, which might be overhelmed by other tasks. But there is dedicated kthread in PTP subsystem designed for such things. This patch changes the work queue to proper align with PTP subsystem and to avoid overloading system work queue. The adjfine() function acts the same way as overflow check worker, we can postpone ptp aux worker till the next overflow period after adjfine() was called. Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250107104812.380225-1-vadfed@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09net: stmmac: Unexport stmmac_rx_offset() from stmmac.hFurong Xu
stmmac_rx_offset() is referenced in stmmac_main.c only, let's move it to stmmac_main.c. Drop the inline keyword by the way, it is better to let the compiler to decide. Compile tested only. No functional change intended. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250107075448.4039925-1-0x1207@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09r8169: add support for RTL8125BP rev.bChunHao Lin
Add support for RTL8125BP rev.b. Its XID is 0x689. This chip supports DASH and its dash type is "RTL_DASH_25_BP". Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250107064355.104711-1-hau@realtek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-08Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-01-07 (ice, igc) For ice: Arkadiusz corrects mask value being used to determine DPLL phase range. Przemyslaw corrects frequency value for E823 devices. For igc: En-Wei Wu adds a check and, early, return for failed register read. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: return early when failing to read EECD register ice: fix incorrect PHY settings for 100 GB/s ice: fix max values for dpll pin phase adjust ==================== Link: https://patch.msgid.link/20250107190150.1758577-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: fix kernel crash when 1588 is sent on HIP08 devicesJie Wang
Currently, HIP08 devices does not register the ptp devices, so the hdev->ptp is NULL. But the tx process would still try to set hardware time stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash. [ 128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 ... [ 128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge] [ 128.292938] sp : ffff800059b93140 [ 128.297200] x29: ffff800059b93140 x28: 0000000000003280 [ 128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080 [ 128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001 [ 128.315969] x23: 0000000000000000 x22: 0000000000000194 [ 128.322219] x21: ffff0cd94f986000 x20: 0000000000000000 [ 128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000 [ 128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24 [ 128.340934] x15: 0000ffffd530a518 x14: 0000000000000000 [ 128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368 [ 128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02 [ 128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0 [ 128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000 [ 128.372040] x5 : 0000000000000000 x4 : 000000000000ffff [ 128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294 [ 128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080 [ 128.390626] Call trace: [ 128.393964] hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.399893] hns3_nic_net_xmit+0x39c/0x4c4 [hns3] [ 128.405468] xmit_one.constprop.0+0xc4/0x200 [ 128.410600] dev_hard_start_xmit+0x54/0xf0 [ 128.415556] sch_direct_xmit+0xe8/0x634 [ 128.420246] __dev_queue_xmit+0x224/0xc70 [ 128.425101] dev_queue_xmit+0x1c/0x40 [ 128.429608] ovs_vport_send+0xac/0x1a0 [openvswitch] [ 128.435409] do_output+0x60/0x17c [openvswitch] [ 128.440770] do_execute_actions+0x898/0x8c4 [openvswitch] [ 128.446993] ovs_execute_actions+0x64/0xf0 [openvswitch] [ 128.453129] ovs_dp_process_packet+0xa0/0x224 [openvswitch] [ 128.459530] ovs_vport_receive+0x7c/0xfc [openvswitch] [ 128.465497] internal_dev_xmit+0x34/0xb0 [openvswitch] [ 128.471460] xmit_one.constprop.0+0xc4/0x200 [ 128.476561] dev_hard_start_xmit+0x54/0xf0 [ 128.481489] __dev_queue_xmit+0x968/0xc70 [ 128.486330] dev_queue_xmit+0x1c/0x40 [ 128.490856] ip_finish_output2+0x250/0x570 [ 128.495810] __ip_finish_output+0x170/0x1e0 [ 128.500832] ip_finish_output+0x3c/0xf0 [ 128.505504] ip_output+0xbc/0x160 [ 128.509654] ip_send_skb+0x58/0xd4 [ 128.513892] udp_send_skb+0x12c/0x354 [ 128.518387] udp_sendmsg+0x7a8/0x9c0 [ 128.522793] inet_sendmsg+0x4c/0x8c [ 128.527116] __sock_sendmsg+0x48/0x80 [ 128.531609] __sys_sendto+0x124/0x164 [ 128.536099] __arm64_sys_sendto+0x30/0x5c [ 128.540935] invoke_syscall+0x50/0x130 [ 128.545508] el0_svc_common.constprop.0+0x10c/0x124 [ 128.551205] do_el0_svc+0x34/0xdc [ 128.555347] el0_svc+0x20/0x30 [ 128.559227] el0_sync_handler+0xb8/0xc0 [ 128.563883] el0_sync+0x160/0x180 Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-8-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issueHao Lan
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs 1024-1279 are in different BAR space addresses. However, hclge_fetch_pf_reg does not distinguish the tqp space information when reading the tqp space information. When the number of TQPs is greater than 1024, access bar space overwriting occurs. The problem of different segments has been considered during the initialization of tqp.io_base. Therefore, tqp.io_base is directly used when the queue is read in hclge_fetch_pf_reg. The error message: Unable to handle kernel paging request at virtual address ffff800037200000 pc : hclge_fetch_pf_reg+0x138/0x250 [hclge] lr : hclge_get_regs+0x84/0x1d0 [hclge] Call trace: hclge_fetch_pf_reg+0x138/0x250 [hclge] hclge_get_regs+0x84/0x1d0 [hclge] hns3_get_regs+0x2c/0x50 [hns3] ethtool_get_regs+0xf4/0x270 dev_ethtool+0x674/0x8a0 dev_ioctl+0x270/0x36c sock_do_ioctl+0x110/0x2a0 sock_ioctl+0x2ac/0x530 __arm64_sys_ioctl+0xa8/0x100 invoke_syscall+0x4c/0x124 el0_svc_common.constprop.0+0x140/0x15c do_el0_svc+0x30/0xd0 el0_svc+0x1c/0x2c el0_sync_handler+0xb0/0xb4 el0_sync+0x168/0x180 Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-7-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: initialize reset_timer before hclgevf_misc_irq_init()Jian Shen
Currently the misc irq is initialized before reset_timer setup. But it will access the reset_timer in the irq handler. So initialize the reset_timer earlier. Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-6-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: don't auto enable misc vectorJian Shen
Currently, there is a time window between misc irq enabled and service task inited. If an interrupte is reported at this time, it will cause warning like below: [ 16.324639] Call trace: [ 16.324641] __queue_delayed_work+0xb8/0xe0 [ 16.324643] mod_delayed_work_on+0x78/0xd0 [ 16.324655] hclge_errhand_task_schedule+0x58/0x90 [hclge] [ 16.324662] hclge_misc_irq_handle+0x168/0x240 [hclge] [ 16.324666] __handle_irq_event_percpu+0x64/0x1e0 [ 16.324667] handle_irq_event+0x80/0x170 [ 16.324670] handle_fasteoi_edge_irq+0x110/0x2bc [ 16.324671] __handle_domain_irq+0x84/0xfc [ 16.324673] gic_handle_irq+0x88/0x2c0 [ 16.324674] el1_irq+0xb8/0x140 [ 16.324677] arch_cpu_idle+0x18/0x40 [ 16.324679] default_idle_call+0x5c/0x1bc [ 16.324682] cpuidle_idle_call+0x18c/0x1c4 [ 16.324684] do_idle+0x174/0x17c [ 16.324685] cpu_startup_entry+0x30/0x6c [ 16.324687] secondary_start_kernel+0x1a4/0x280 [ 16.324688] ---[ end trace 6aa0bff672a964aa ]--- So don't auto enable misc vector when request irq.. Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: Resolved the issue that the debugfs query result is inconsistent.Hao Lan
This patch modifies the implementation of debugfs: When the user process stops unexpectedly, not all data of the file system is read. In this case, the save_buf pointer is not released. When the user process is called next time, save_buf is used to copy the cached data to the user space. As a result, the queried data is stale. To solve this problem, this patch implements .open() and .release() handler for debugfs file_operations. moving allocation buffer and execution of the cmd to the .open() handler and freeing in to the .release() handler. Allocate separate buffer for each reader and associate the buffer with the file pointer. When different user read processes no longer share the buffer, the stale data problem is fixed. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: fix missing features due to dev->features configuration too earlyHao Lan
Currently, the netdev->features is configured in hns3_nic_set_features. As a result, __netdev_update_features considers that there is no feature difference, and the procedures of the real features are missing. Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08net: hns3: fixed reset failure issues caused by the incorrect reset typeHao Lan
When a reset type that is not supported by the driver is input, a reset pending flag bit of the HNAE3_NONE_RESET type is generated in reset_pending. The driver does not have a mechanism to clear this type of error. As a result, the driver considers that the reset is not complete. This patch provides a mechanism to clear the HNAE3_NONE_RESET flag and the parameter of hnae3_ae_ops.set_default_reset_request is verified. The error message: hns3 0000:39:01.0: cmd failed -16 hns3 0000:39:01.0: hclge device re-init failed, VF is disabled! hns3 0000:39:01.0: failed to reset VF stack hns3 0000:39:01.0: failed to reset VF(4) hns3 0000:39:01.0: prepare reset(2) wait done hns3 0000:39:01.0 eth4: already uninitialized Use the crash tool to view struct hclgevf_dev: struct hclgevf_dev { ... default_reset_request = 0x20, reset_level = HNAE3_NONE_RESET, reset_pending = 0x100, reset_type = HNAE3_NONE_RESET, ... }; Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08treewide: Introduce kthread_run_worker[_on_cpu]()Frederic Weisbecker
kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it. On the other hand, kthread_create_worker() creates a kthread worker and runs it. This difference in behaviours is confusing. Also there is no way to create a kthread worker and affine it using kthread_bind_mask() or kthread_affine_preferred() before starting it. Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]() that behaves just like kthread_run(). kthread_create_worker[_on_cpu]() will now only create a kthread worker without starting it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
2025-01-07intel/fm10k: Remove unused fm10k_iov_msg_mac_vlan_pfDr. David Alan Gilbert
fm10k_iov_msg_mac_vlan_pf() has been unused since 2017's commit 1f5c27e52857 ("fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-16-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igc: Link queues to NAPI instancesJoe Damato
Link queues to NAPI instances via netdev-genl API so that users can query this information with netlink. Handle a few cases in the driver: 1. Link/unlink the NAPIs when XDP is enabled/disabled 2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled Example output when IGC_FLAG_QUEUE_PAIRS is enabled: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}] Since IGC_FLAG_QUEUE_PAIRS is enabled, you'll note that the same NAPI ID is present for both rx and tx queues at the same index, for example index 0: {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using the grub command line option "maxcpus=2" to force igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS. Example output when IGC_FLAG_QUEUE_PAIRS is disabled: $ lscpu | grep "On-line CPU" On-line CPU(s) list: 0,2 $ ethtool -l enp86s0 | tail -5 Current hardware settings: RX: n/a TX: n/a Other: 1 Combined: 2 $ cat /proc/interrupts | grep enp 144: [...] enp86s0 145: [...] enp86s0-rx-0 146: [...] enp86s0-rx-1 147: [...] enp86s0-tx-0 148: [...] enp86s0-tx-1 1 "other" IRQ, and 2 IRQs for each of RX and Tx, so we expect netlink to report 4 IRQs with unique NAPI IDs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8196, 'ifindex': 2, 'irq': 148}, {'id': 8195, 'ifindex': 2, 'irq': 147}, {'id': 8194, 'ifindex': 2, 'irq': 146}, {'id': 8193, 'ifindex': 2, 'irq': 145}] Now we examine which queues these NAPIs are associated with, expecting that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will have its own NAPI instance: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-15-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igc: Link IRQs to NAPI instancesJoe Damato
Link IRQs to NAPI instances via netdev-genl API so that users can query this information with netlink. Compare the output of /proc/interrupts (noting that IRQ 128 is the "other" IRQ which does not appear to have a NAPI instance): $ cat /proc/interrupts | grep enp86s0 | cut --delimiter=":" -f1 128 129 130 131 132 The output from netlink shows the mapping of NAPI IDs to IRQs (again noting that 128 is absent as it is the "other" IRQ): $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8196, 'ifindex': 2, 'irq': 132}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8195, 'ifindex': 2, 'irq': 131}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8194, 'ifindex': 2, 'irq': 130}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8193, 'ifindex': 2, 'irq': 129}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-14-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07i40e: add ability to reset VF for Tx and Rx MDD eventsAleksandr Loktionov
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs. This flag is also used in other network adapters like ICE. Usage: - "on" - The problematic VF will be automatically reset if a malformed descriptor is detected. - "off" - The problematic VF will be disabled. In cases where a VF sends malformed packets classified as malicious, it can cause the Tx queue to freeze, rendering it unusable for several minutes. When an MDD event occurs, this new implementation allows for a graceful VF reset to quickly restore operational state. Currently, VF queues are disabled if an MDD event occurs. This patch adds the ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD event logging throttling to avoid dmesg pollution and unifies the format of Tx and Rx MDD messages. Note: Standard message rate limiting functions like dev_info_ratelimited() do not meet our requirements. Custom rate limiting is implemented, please see the code for details. Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07ixgbevf: Fix passing 0 to ERR_PTR in ixgbevf_run_xdp()Yue Haibing
ixgbevf_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in ixgbevf_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-12-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07ixgbe: Fix passing 0 to ERR_PTR in ixgbe_run_xdp()Yue Haibing
ixgbe_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in ixgbe_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-11-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igb: Fix passing 0 to ERR_PTR in igb_run_xdp()Yue Haibing
igb_run_xdp() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in igb_clean_rx_irq(). Remove this error pointer handing instead use plain int return value. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-10-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igc: Fix passing 0 to ERR_PTR in igc_xdp_run_prog()Yue Haibing
igc_xdp_run_prog() converts customed xdp action to a negative error code with the sk_buff pointer type which be checked with IS_ERR in igc_clean_rx_irq(). Remove this error pointer handing instead use plain int return value to fix this smatch warnings: drivers/net/ethernet/intel/igc/igc_main.c:2533 igc_xdp_run_prog() warn: passing zero to 'ERR_PTR' Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igc: Allow hot-swapping XDP programSong Yoong Siang
Currently, the driver would always close and reopen the network interface when setting/removing the XDP program, regardless of the presence of XDP resources. This could cause unnecessary disruptions. To avoid this, introduces a check to determine if there is a need to close and reopen the interface, allowing for seamless hot-swapping of XDP programs. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-8-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07igb: Add AF_XDP zero-copy Tx supportSriram Yagnaraman
Add support for AF_XDP zero-copy transmit path. A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and igb_clean_tx_irq() can clean the buffers correctly based on type. igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is enabled. We share the TX ring between slow path, XDP and AF_XDP zero-copy, so we use the netdev queue lock to ensure mutual exclusion. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted, Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in igb_xmit_zc(), Add FIXME for RS bit] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-7-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>