summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2022-07-18i40e: Fix erroneous adapter reinitialization during recovery processDawid Lukwinski
Fix an issue when driver incorrectly detects state of recovery process and erroneously reinitializes interrupts, which results in a kernel error and call trace message. The issue was caused by a combination of two factors: 1. Assuming the EMP reset issued after completing firmware recovery means the whole recovery process is complete. 2. Erroneous reinitialization of interrupt vector after detecting the above mentioned EMP reset. Fixes (1) by changing how recovery state change is detected and (2) by adjusting the conditional expression to ensure using proper interrupt reinitialization method, depending on the situation. Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support") Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: ethernet: mtk_eth_soc: fix off by one check of ARRAY_SIZETom Rix
In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here struct mtk_wed_ring *ring = &dev->tx_ring[idx]; The bounds of idx are checked here BUG_ON(idx > ARRAY_SIZE(dev->tx_ring)); If idx is the size of the array, it will pass this check and overflow. So change the check to >= . Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: lan966x: Fix usage of lan966x->mac_lock when used by FDBHoratiu Vultur
When the SW bridge was trying to add/remove entries to/from HW, the access to HW was not protected by any lock. In this way, it was possible to have race conditions. Fix this by using the lan966x->mac_lock to protect parallel access to HW for this cases. Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handlerHoratiu Vultur
The problem with this spin lock is that it was just protecting the list of the MAC entries in SW and not also the access to the MAC entries in HW. Because the access to HW is indirect, then it could happen to have race conditions. For example when SW introduced an entry in MAC table and the irq mac is trying to read something from the MAC. Update such that also the access to MAC entries in HW is protected by this lock. Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: lan966x: Fix usage of lan966x->mac_lock when entry is removedHoratiu Vultur
To remove an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to forget the entry. So if it happens for two threads to remove simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: e18aba8941b40 ("net: lan966x: add mactable support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: lan966x: Fix usage of lan966x->mac_lock when entry is addedHoratiu Vultur
To add an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to learn the entry. So if it happens for two threads to add simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: lan966x: Fix taking rtnl_lock while holding spin_lockHoratiu Vultur
When the HW deletes an entry in MAC table then it generates an interrupt. The SW will go through it's own list of MAC entries and if it is not found then it would notify the listeners about this. The problem is that when the SW will go through it's own list it would take a spin lock(lan966x->mac_lock) and when it notifies that the entry is deleted. But to notify the listeners it taking the rtnl_lock which is illegal. This is fixed by instead of notifying right away that the entry is deleted, move the entry on a temp list and once, it checks all the entries then just notify that the entries from temp list are deleted. Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18net: prestera: acl: use proper mask for port selectorMaksym Glubokiy
Adjusted as per packet processor documentation. This allows to properly match 'indev' for clsact rules. Fixes: 47327e198d42 ("net: prestera: acl: migrate to new vTCAM api") Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18net: stmmac: fix dma queue left shift overflow issueJunxiao Chang
When queue number is > 4, left shift overflows due to 32 bits integer variable. Mask calculation is wrong for MTL_RXQ_DMA_MAP1. If CONFIG_UBSAN is enabled, kernel dumps below warning: [ 10.363842] ================================================================== [ 10.363882] UBSAN: shift-out-of-bounds in /build/linux-intel-iotg-5.15-8e6Tf4/ linux-intel-iotg-5.15-5.15.0/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:224:12 [ 10.363929] shift exponent 40 is too large for 32-bit type 'unsigned int' [ 10.363953] CPU: 1 PID: 599 Comm: NetworkManager Not tainted 5.15.0-1003-intel-iotg [ 10.363956] Hardware name: ADLINK Technology Inc. LEC-EL/LEC-EL, BIOS 0.15.11 12/22/2021 [ 10.363958] Call Trace: [ 10.363960] <TASK> [ 10.363963] dump_stack_lvl+0x4a/0x5f [ 10.363971] dump_stack+0x10/0x12 [ 10.363974] ubsan_epilogue+0x9/0x45 [ 10.363976] __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e [ 10.363979] ? wake_up_klogd+0x4a/0x50 [ 10.363983] ? vprintk_emit+0x8f/0x240 [ 10.363986] dwmac4_map_mtl_dma.cold+0x42/0x91 [stmmac] [ 10.364001] stmmac_mtl_configuration+0x1ce/0x7a0 [stmmac] [ 10.364009] ? dwmac410_dma_init_channel+0x70/0x70 [stmmac] [ 10.364020] stmmac_hw_setup.cold+0xf/0xb14 [stmmac] [ 10.364030] ? page_pool_alloc_pages+0x4d/0x70 [ 10.364034] ? stmmac_clear_tx_descriptors+0x6e/0xe0 [stmmac] [ 10.364042] stmmac_open+0x39e/0x920 [stmmac] [ 10.364050] __dev_open+0xf0/0x1a0 [ 10.364054] __dev_change_flags+0x188/0x1f0 [ 10.364057] dev_change_flags+0x26/0x60 [ 10.364059] do_setlink+0x908/0xc40 [ 10.364062] ? do_setlink+0xb10/0xc40 [ 10.364064] ? __nla_validate_parse+0x4c/0x1a0 [ 10.364068] __rtnl_newlink+0x597/0xa10 [ 10.364072] ? __nla_reserve+0x41/0x50 [ 10.364074] ? __kmalloc_node_track_caller+0x1d0/0x4d0 [ 10.364079] ? pskb_expand_head+0x75/0x310 [ 10.364082] ? nla_reserve_64bit+0x21/0x40 [ 10.364086] ? skb_free_head+0x65/0x80 [ 10.364089] ? security_sock_rcv_skb+0x2c/0x50 [ 10.364094] ? __cond_resched+0x19/0x30 [ 10.364097] ? kmem_cache_alloc_trace+0x15a/0x420 [ 10.364100] rtnl_newlink+0x49/0x70 This change fixes MTL_RXQ_DMA_MAP1 mask issue and channel/queue mapping warning. Fixes: d43042f4da3e ("net: stmmac: mapping mtl rx to dma channel") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216195 Reported-by: Cedric Wassenaar <cedric@bytespeed.nl> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18net: stmmac: switch to use interrupt for hw crosstimestampingWong Vee Khee
Using current implementation of polling mode, there is high chances we will hit into timeout error when running phc2sys. Hence, update the implementation of hardware crosstimestamping to use the MAC interrupt service routine instead of polling for TSIS bit in the MAC Timestamp Interrupt Status register to be set. Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-14 This series contains updates to e1000e and igc drivers. Sasha re-enables GPT clock when exiting s0ix to prevent hardware unit hang and reverts a workaround for this issue on e1000e. Lennert Buytenhek restores checks for removed device while accessing registers to prevent NULL pointer dereferences for igc. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Reinstate IGC_REMOVED logic and implement it properly Revert "e1000e: Fix possible HW unit hang after an s0ix exit" e1000e: Enable GPT clock before sending message to CSME ==================== Link: https://lore.kernel.org/r/20220714175857.933537-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-15net: stmmac: fix unbalanced ptp clock issue in suspend/resume flowBiao Huang
Current stmmac driver will prepare/enable ptp_ref clock in stmmac_init_tstamp_counter(). The stmmac_pltfr_noirq_suspend will disable it once in suspend flow. But in resume flow, stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter ptp_ref clock reference counter increases twice, which leads to unbalance ptp clock when resume back. Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it. Fixes: 0735e639f129d ("net: stmmac: skip only stmmac_ptp_register when resume from suspend") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15net: stmmac: fix pm runtime issue in stmmac_dvr_remove()Biao Huang
If netif is running when stmmac_dvr_remove is invoked, the unregister_netdev will call ndo_stop(stmmac_release) and vlan_kill_rx_filter(stmmac_vlan_rx_kill_vid). Currently, stmmac_dvr_remove() will disable pm runtime before unregister_netdev. When stmmac_vlan_rx_kill_vid is invoked, pm_runtime_resume_and_get in it returns EACCESS error number, and reports: dwmac-mediatek 11021000.ethernet eth0: stmmac_dvr_remove: removing driver dwmac-mediatek 11021000.ethernet eth0: FPE workqueue stop dwmac-mediatek 11021000.ethernet eth0: failed to kill vid 0081/0 Move the pm_runtime_disable to the end of stmmac_dvr_remove to fix this issue. Fixes: 6449520391dfc ("net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15stmmac: dwmac-mediatek: fix clock issueBiao Huang
The pm_runtime takes care of the clock handling in current stmmac drivers, and dwmac-mediatek implement the mediatek_dwmac_clks_config() as the callback for pm_runtime. Then, stripping duplicated clocks handling in old init()/exit() to fix clock issue in suspend/resume test. As to clocks in probe/remove, vendor need symmetric handling to ensure clocks balance. Test pass, including suspend/resume and ko insertion/remove. Fixes: 3186bdad97d5 ("stmmac: dwmac-mediatek: add platform level clocks management") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15ip: Fix data-races around sysctl_ip_fwd_update_priority.Kuniyuki Iwashima
While reading sysctl_ip_fwd_update_priority, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15ip: Fix data-races around sysctl_ip_default_ttl.Kuniyuki Iwashima
While reading sysctl_ip_default_ttl, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14nfp: flower: configure tunnel neighbour on cmsg rxTianyu Yuan
nfp_tun_write_neigh() function will configure a tunnel neighbour when calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx() (with no tunnel neighbour type) from firmware. When configuring IP on physical port as a tunnel endpoint, no operation will be performed after receiving the cmsg mentioned above. Therefore, add a progress to configure tunnel neighbour in this case. v2: Correct format of fixes tag. Fixes: f1df7956c11f ("nfp: flower: rework tunnel neighbour configuration") Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14igc: Reinstate IGC_REMOVED logic and implement it properlyLennert Buytenhek
The initially merged version of the igc driver code (via commit 146740f9abc4, "igc: Add support for PF") contained the following IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors: u32 igc_rd32(struct igc_hw *hw, u32 reg) { u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr); u32 value = 0; if (IGC_REMOVED(hw_addr)) return ~value; value = readl(&hw_addr[reg]); /* reads should not return all F's */ if (!(~value) && (!reg || !(~readl(hw_addr)))) hw->hw_addr = NULL; return value; } And: #define wr32(reg, val) \ do { \ u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \ if (!IGC_REMOVED(hw_addr)) \ writel((val), &hw_addr[(reg)]); \ } while (0) E.g. igb has similar checks in its MMIO accessors, and has a similar macro E1000_REMOVED, which is implemented as follows: #define E1000_REMOVED(h) unlikely(!(h)) These checks serve to detect and take note of an 0xffffffff MMIO read return from the device, which can be caused by a PCIe link flap or some other kind of PCI bus error, and to avoid performing MMIO reads and writes from that point onwards. However, the IGC_REMOVED macro was not originally implemented: #ifndef IGC_REMOVED #define IGC_REMOVED(a) (0) #endif /* IGC_REMOVED */ This led to the IGC_REMOVED logic to be removed entirely in a subsequent commit (commit 3c215fb18e70, "igc: remove IGC_REMOVED function"), with the rationale that such checks matter only for virtualization and that igc does not support virtualization -- but a PCIe device can become detached even without virtualization being in use, and without proper checks, a PCIe bus error affecting an igc adapter will lead to various NULL pointer dereferences, as the first access after the error will set hw->hw_addr to NULL, and subsequent accesses will blindly dereference this now-NULL pointer. This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and implements IGC_REMOVED the way it is done for igb, by checking for the unlikely() case of hw_addr being NULL. This change prevents the oopses seen when a PCIe link flap occurs on an igc adapter. Fixes: 146740f9abc4 ("igc: Add support for PF") Signed-off-by: Lennert Buytenhek <buytenh@arista.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14Revert "e1000e: Fix possible HW unit hang after an s0ix exit"Sasha Neftin
This reverts commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd. Commit 1866aa0d0d64 ("e1000e: Fix possible HW unit hang after an s0ix exit") was a workaround for CSME problem to handle messages comes via H2ME mailbox. This problem has been fixed by patch "e1000e: Enable the GPT clock before sending message to the CSME". Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14e1000e: Enable GPT clock before sending message to CSMESasha Neftin
On corporate (CSME) ADL systems, the Ethernet Controller may stop working ("HW unit hang") after exiting from the s0ix state. The reason is that CSME misses the message sent by the host. Enabling the dynamic GPT clock solves this problem. This clock is cleared upon HW initialization. Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14net: atlantic: remove aq_nic_deinit() when resumeChia-Lin Kao (AceLan)
aq_nic_deinit() has been called while suspending, so we don't have to call it again on resume. Actually, call it again leads to another hang issue when resuming from S3. Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace: Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK> Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic] Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic] Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK> Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]--- Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110 Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Link: https://lore.kernel.org/r/20220713111224.1535938-2-acelan.kao@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-14net: atlantic: remove deep parameter on suspend/resume functionsChia-Lin Kao (AceLan)
Below commit claims that atlantic NIC requires to reset the device on pm op, and had set the deep to true for all suspend/resume functions. commit 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") So, we could remove deep parameter on suspend/resume functions without any functional change. Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Link: https://lore.kernel.org/r/20220713111224.1535938-1-acelan.kao@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-14sfc: fix kernel panic when creating VFÍñigo Huguet
When creating VFs a kernel panic can happen when calling to efx_ef10_try_update_nic_stats_vf. When releasing a DMA coherent buffer, sometimes, I don't know in what specific circumstances, it has to unmap memory with vunmap. It is disallowed to do that in IRQ context or with BH disabled. Otherwise, we hit this line in vunmap, causing the crash: BUG_ON(in_interrupt()); This patch reenables BH to release the buffer. Log messages when the bug is hit: kernel BUG at mm/vmalloc.c:2727! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020 RIP: 0010:vunmap+0x2e/0x30 ...skip... Call Trace: __iommu_dma_free+0x96/0x100 efx_nic_free_buffer+0x2b/0x40 [sfc] efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc] efx_ef10_update_stats_vf+0x18/0x40 [sfc] efx_start_all+0x15e/0x1d0 [sfc] efx_net_open+0x5a/0xe0 [sfc] __dev_open+0xe7/0x1a0 __dev_change_flags+0x1d7/0x240 dev_change_flags+0x21/0x60 ...skip... Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested") Reported-by: Ma Yuying <yuma@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-13Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-12 This series contains updates to ice driver only. Paul fixes detection of E822 devices for firmware update and changes NVM read for snapshot creation to be done in chunks as some systems cannot read the entire NVM in the allotted time. ==================== Link: https://lore.kernel.org/r/20220712164829.7275-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-13sfc: fix use after free when disabling sriovÍñigo Huguet
Use after free is detected by kfence when disabling sriov. What was read after being freed was vf->pci_dev: it was freed from pci_disable_sriov and later read in efx_ef10_sriov_free_vf_vports, called from efx_ef10_sriov_free_vf_vswitching. Set the pointer to NULL at release time to not trying to read it later. Reproducer and dmesg log (note that kfence doesn't detect it every time): $ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs $ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224): efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] efx_ef10_pci_sriov_disable+0x38/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k allocated by task 6771 on cpu 10 at 3137.860196s: pci_alloc_dev+0x21/0x60 pci_iov_add_virtfn+0x2a2/0x320 sriov_enable+0x212/0x3e0 efx_ef10_sriov_configure+0x67/0x80 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xba/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae freed by task 6771 on cpu 12 at 3170.991309s: device_release+0x34/0x90 kobject_cleanup+0x3a/0x130 pci_iov_remove_virtfn+0xd9/0x120 sriov_disable+0x30/0xe0 efx_ef10_pci_sriov_disable+0x57/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses") Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-13net: sunhme: output link status with a single print.Nick Bowler
This driver currently prints the link status using four separate printk calls, which these days gets presented to the user as four distinct messages, not exactly ideal: [ 32.582778] eth0: Link is up using [ 32.582828] internal [ 32.582837] transceiver at [ 32.582888] 100Mb/s, Full Duplex. Restructure the display_link_mode function to use a single netdev_info call to present all this information as a single message, which is much nicer: [ 33.640143] hme 0000:00:01.1 eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex. The display_forced_link_mode function has a similar structure, so adjust it in a similar fashion. Signed-off-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13net: stmmac: fix leaks in probeDan Carpenter
These two error paths should clean up before returning. Fixes: 2bb4b98b60d7 ("net: stmmac: Add Ingenic SoCs MAC support.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13net: ftgmac100: Hold reference returned by of_get_child_by_name()Liang He
In ftgmac100_probe(), we should hold the refernece returned by of_get_child_by_name() and use it to call of_node_put() for reference balance. Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property") Signed-off-by: Liang He <windhl@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13tcp: Fix data-races around sysctl_tcp_ecn.Kuniyuki Iwashima
While reading sysctl_tcp_ecn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-12bnxt_en: Fix bnxt_refclk_read()Pavan Chebbi
The upper 32-bit PHC register is not latched when reading the lower 32-bit PHC register. Current code leaves a small window where we may not read correct higher order bits if the lower order bits are just about to wrap around. This patch fixes this by reading higher order bits twice and makes sure that final value is correctly paired with its lower 32 bits. Fixes: 30e96f487f64 ("bnxt_en: Do not read the PTP PHC during chip reset") Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12bnxt_en: Fix and simplify XDP transmit pathMichael Chan
Fix the missing length hint in the TX BD for the XDP transmit path. The length hint is required on legacy chips. Also, simplify the code by eliminating the first_buf local variable. tx_buf contains the same value. The opaque value only needs to be set on the first BD. Fix this also for correctness. Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12bnxt_en: fix livepatch queryVikas Gupta
In the livepatch query fw_target BNXT_FW_SRT_PATCH is applicable for P5 chips only. Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching") Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12bnxt_en: Fix bnxt_reinit_after_abort() code pathMichael Chan
bnxt_reinit_after_abort() is called during ifup when a previous FW reset sequence has aborted or a previous ifup has failed after detecting FW reset. In all cases, it is safe to assume that a previous FW reset has completed and the driver may not have fully reinitialized. Prior to this patch, it is assumed that the FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE flag will always be set by the firmware in bnxt_hwrm_if_change(). This may not be true if the driver has already attempted to register with the firmware. The firmware may not set the RESET_DONE flag again after the driver has registered, assuming that the driver has seen the flag already. Fix it to always go through the FW reset initialization path if the BNXT_STATE_FW_RESET_DET flag is set. This flag is always set by the driver after successfully going through bnxt_reinit_after_abort(). Fixes: 6882c36cf82e ("bnxt_en: attempt to reinitialize after aborted reset") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12bnxt_en: reclaim max resources if sriov enable failsKashyap Desai
If bnxt_sriov_enable() fails after some resources have been reserved for the VFs, the current code is not unwinding properly and the reserved resources become unavailable afterwards. Fix it by properly unwinding with a call to bnxt_hwrm_func_qcaps() to reset all maximum resources. Also, add the missing bnxt_ulp_sriov_cfg() call to let the RDMA driver know to abort. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12ice: change devlink code to read NVM in blocksPaul M Stillwell Jr
When creating a snapshot of the NVM the driver needs to read the entire contents from the NVM and store it. The NVM reads are protected by a lock that is shared between the driver and the firmware. If the driver takes too long to read the entire NVM (which can happen on some systems) then the firmware could reclaim the lock and cause subsequent reads from the driver to fail. We could fix this by increasing the timeout that we pass to the firmware, but we could end up in the same situation again if the system is slow. Instead have the driver break the reading of the NVM into blocks that are small enough that we have confidence that the read will complete within the timeout time, but large enough not to cause significant AQ overhead. Fixes: dce730f17825 ("ice: add a devlink region for dumping NVM contents") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-12ice: handle E822 generic device ID in PLDM headerPaul M Stillwell Jr
The driver currently presumes that the record data in the PLDM header of the firmware image will match the device ID of the running device. This is true for E810 devices. It appears that for E822 devices that this is not guaranteed to be true. Fix this by adding a check for the generic E822 device. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-12net: marvell: prestera: fix missed deinit sequenceYevhen Orlov
Add unregister_fib_notifier as rollback of register_fib_notifier. Fixes: 4394fbcb78cf ("net: marvell: prestera: handle fib notifications") Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220710122021.7642-1-yevhen.orlov@plvision.eu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-09nfp: fix issue of skb segments exceeds descriptor limitationBaowen Zheng
TCP packets will be dropped if the segments number in the tx skb exceeds limitation when sending iperf3 traffic with --zerocopy option. we make the following changes: Get nr_frags in nfp_nfdk_tx_maybe_close_block instead of passing from outside because it will be changed after skb_linearize operation. Fill maximum dma_len in first tx descriptor to make sure the whole head is included in the first descriptor. Fixes: c10d12e3dce8 ("nfp: add support for NFDK data path") Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-07net: ocelot: fix wrong time_after usagePavel Skripkin
Accidentally noticed, that this driver is the only user of while (time_after(jiffies...)). It looks like typo, because likely this while loop will finish after 1st iteration, because time_after() returns true when 1st argument _is after_ 2nd one. There is one possible problem with this poll loop: the scheduler could put the thread to sleep, and it does not get woken up for OCELOT_FDMA_CH_SAFE_TIMEOUT_US. During that time, the hardware has done its thing, but you exit the while loop and return -ETIMEDOUT. Fix it by using sane poll API that avoids all problems described above Fixes: 753a026cfec1 ("net: ocelot: add FDMA support") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220706132845.27968-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07Merge tag 'mlx5-fixes-2022-07-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-07-06 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Ring the TX doorbell on DMA errors net/mlx5e: Fix capability check for updating vnic env counters net/mlx5e: CT: Use own workqueue instead of mlx5e priv net/mlx5: Lag, correct get the port select mode str net/mlx5e: Fix enabling sriov while tc nic rules are offloaded net/mlx5e: kTLS, Fix build time constant test in RX net/mlx5e: kTLS, Fix build time constant test in TX net/mlx5: Lag, decouple FDB selection and shared FDB net/mlx5: TC, allow offload from uplink to other PF's VF ==================== Link: https://lore.kernel.org/r/20220706231309.38579-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07net: ethernet: ti: am65-cpsw: Fix devlink port register sequenceSiddharth Vadapalli
Renaming interfaces using udevd depends on the interface being registered before its netdev is registered. Otherwise, udevd reads an empty phys_port_name value, resulting in the interface not being renamed. Fix this by registering the interface before registering its netdev by invoking am65_cpsw_nuss_register_devlink() before invoking register_netdev() for the interface. Move the function call to devlink_port_type_eth_set(), invoking it after register_netdev() is invoked, to ensure that netlink notification for the port state change is generated after the netdev is completely initialized. Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://lore.kernel.org/r/20220706070208.12207-1-s-vadapalli@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07net: stmmac: dwc-qos: Disable split header for Tegra194Jon Hunter
There is a long-standing issue with the Synopsys DWC Ethernet driver for Tegra194 where random system crashes have been observed [0]. The problem occurs when the split header feature is enabled in the stmmac driver. In the bad case, a larger than expected buffer length is received and causes the calculation of the total buffer length to overflow. This results in a very large buffer length that causes the kernel to crash. Why this larger buffer length is received is not clear, however, the feedback from the NVIDIA design team is that the split header feature is not supported for Tegra194. Therefore, disable split header support for Tegra194 to prevent these random crashes from occurring. [0] https://lore.kernel.org/linux-tegra/b0b17697-f23e-8fa5-3757-604a86f3a095@nvidia.com/ Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20220706083913.13750-1-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06r8169: fix accessing unset transport headerHeiner Kallweit
66e4c8d95008 ("net: warn if transport header was not set") added a check that triggers a warning in r8169, see [0]. The commit referenced in the Fixes tag refers to the change from which the patch applies cleanly, there's nothing wrong with this commit. It seems the actual issue (not bug, because the warning is harmless here) was introduced with bdfa4ed68187 ("r8169: use Giant Send"). [0] https://bugzilla.kernel.org/show_bug.cgi?id=216157 Fixes: 8d520b4de3ed ("r8169: work around RTL8125 UDP hw bug") Reported-by: Erhard F. <erhard_f@mailbox.org> Tested-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1b2c2b29-3dc0-f7b6-5694-97ec526d51a0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06net/mlx5e: Ring the TX doorbell on DMA errorsMaxim Mikityanskiy
TX doorbells may be postponed, because sometimes the driver knows that another packet follows (for example, when xmit_more is true, or when a MPWQE session is closed before transmitting a packet). However, the DMA mapping may fail for the next packet, in which case a new WQE is not posted, the doorbell isn't updated either, and the transmission of the previous packet will be delayed indefinitely. This commit fixes the described rare error flow by posting a NOP and ringing the doorbell on errors to flush all the previous packets. The MPWQE session is closed before that. DMA mapping in the MPWQE flow is moved to the beginning of mlx5e_sq_xmit_mpwqe, because empty sessions are not allowed. Stop room always has enough space for a NOP, because the actual TX WQE is not posted. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: Fix capability check for updating vnic env countersGal Pressman
The existing capability check for vnic env counters only checks for receive steering discards, although we need the counters update for the exposed internal queue oob counter as well. This could result in the latter counter not being updated correctly when the receive steering discards counter is not supported. Fix that by checking whether any counter is supported instead of only the steering counter capability. Fixes: 0cfafd4b4ddf ("net/mlx5e: Add device out of buffer counter") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: CT: Use own workqueue instead of mlx5e privRoi Dayan
Allocate a ct priv workqueue instead of using mlx5e priv one so flushing will only be of related CT entries. Also move flushing of the workqueue before rhashtable destroy otherwise entries won't be valid. Fixes: b069e14fff46 ("net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5: Lag, correct get the port select mode strLiu, Changcheng
mode & mode_flags is updated at the end of mlx5_activate_lag which may not reflect the actual mode as shown in below logic: mlx5_activate_lag(struct mlx5_lag *ldev, |-- unsigned long flags = 0; |-- err = mlx5_lag_set_flags(ldev, mode, tracker, shared_fdb, &flags); |-- err = mlx5_create_lag(ldev, tracker, mode, flags); |-- mlx5_get_str_port_sel_mode(ldev); |-- ldev->mode = mode; |-- ldev->mode_flags = flags; Use mode & flag as parameters to get port select mode info. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: Fix enabling sriov while tc nic rules are offloadedPaul Blakey
There is a total of four 4M entries flow tables. In sriov disabled mode, ct, ct_nat and post_act take three of them. When adding the first tc nic rule in this mode, it will take another 4M table for the tc <chain,prio> table. If user then enables sriov, the legacy flow table tries to take another 4M and fails, and so enablement fails. To fix that, have legacy fdb take the next available maximum size from the fs ft pool. Fixes: 4a98544d1827 ("net/mlx5: Move chains ft pool to be used by all firmware steering") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: kTLS, Fix build time constant test in RXTariq Toukan
Use the correct constant (TLS_DRIVER_STATE_SIZE_RX) in the comparison against the size of the private RX TLS driver context. Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: kTLS, Fix build time constant test in TXTariq Toukan
Use the correct constant (TLS_DRIVER_STATE_SIZE_TX) in the comparison against the size of the private TX TLS driver context. Fixes: df8d866770f9 ("net/mlx5e: kTLS, Use kernel API to extract private offload context") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>