Age | Commit message (Collapse) | Author |
|
Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.
The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.
Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.
Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here
struct mtk_wed_ring *ring = &dev->tx_ring[idx];
The bounds of idx are checked here
BUG_ON(idx > ARRAY_SIZE(dev->tx_ring));
If idx is the size of the array, it will pass this check and overflow.
So change the check to >= .
Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the SW bridge was trying to add/remove entries to/from HW, the
access to HW was not protected by any lock. In this way, it was
possible to have race conditions.
Fix this by using the lan966x->mac_lock to protect parallel access to HW
for this cases.
Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The problem with this spin lock is that it was just protecting the list
of the MAC entries in SW and not also the access to the MAC entries in HW.
Because the access to HW is indirect, then it could happen to have race
conditions.
For example when SW introduced an entry in MAC table and the irq mac is
trying to read something from the MAC.
Update such that also the access to MAC entries in HW is protected by
this lock.
Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To remove an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to forget the entry.
So if it happens for two threads to remove simultaneously an entry
in MAC table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
Fixes: e18aba8941b40 ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To add an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to learn the entry.
So if it happens for two threads to add simultaneously an entry in MAC
table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the HW deletes an entry in MAC table then it generates an
interrupt. The SW will go through it's own list of MAC entries and if it
is not found then it would notify the listeners about this. The problem
is that when the SW will go through it's own list it would take a spin
lock(lan966x->mac_lock) and when it notifies that the entry is deleted.
But to notify the listeners it taking the rtnl_lock which is illegal.
This is fixed by instead of notifying right away that the entry is
deleted, move the entry on a temp list and once, it checks all the
entries then just notify that the entries from temp list are deleted.
Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adjusted as per packet processor documentation.
This allows to properly match 'indev' for clsact rules.
Fixes: 47327e198d42 ("net: prestera: acl: migrate to new vTCAM api")
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When queue number is > 4, left shift overflows due to 32 bits
integer variable. Mask calculation is wrong for MTL_RXQ_DMA_MAP1.
If CONFIG_UBSAN is enabled, kernel dumps below warning:
[ 10.363842] ==================================================================
[ 10.363882] UBSAN: shift-out-of-bounds in /build/linux-intel-iotg-5.15-8e6Tf4/
linux-intel-iotg-5.15-5.15.0/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:224:12
[ 10.363929] shift exponent 40 is too large for 32-bit type 'unsigned int'
[ 10.363953] CPU: 1 PID: 599 Comm: NetworkManager Not tainted 5.15.0-1003-intel-iotg
[ 10.363956] Hardware name: ADLINK Technology Inc. LEC-EL/LEC-EL, BIOS 0.15.11 12/22/2021
[ 10.363958] Call Trace:
[ 10.363960] <TASK>
[ 10.363963] dump_stack_lvl+0x4a/0x5f
[ 10.363971] dump_stack+0x10/0x12
[ 10.363974] ubsan_epilogue+0x9/0x45
[ 10.363976] __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
[ 10.363979] ? wake_up_klogd+0x4a/0x50
[ 10.363983] ? vprintk_emit+0x8f/0x240
[ 10.363986] dwmac4_map_mtl_dma.cold+0x42/0x91 [stmmac]
[ 10.364001] stmmac_mtl_configuration+0x1ce/0x7a0 [stmmac]
[ 10.364009] ? dwmac410_dma_init_channel+0x70/0x70 [stmmac]
[ 10.364020] stmmac_hw_setup.cold+0xf/0xb14 [stmmac]
[ 10.364030] ? page_pool_alloc_pages+0x4d/0x70
[ 10.364034] ? stmmac_clear_tx_descriptors+0x6e/0xe0 [stmmac]
[ 10.364042] stmmac_open+0x39e/0x920 [stmmac]
[ 10.364050] __dev_open+0xf0/0x1a0
[ 10.364054] __dev_change_flags+0x188/0x1f0
[ 10.364057] dev_change_flags+0x26/0x60
[ 10.364059] do_setlink+0x908/0xc40
[ 10.364062] ? do_setlink+0xb10/0xc40
[ 10.364064] ? __nla_validate_parse+0x4c/0x1a0
[ 10.364068] __rtnl_newlink+0x597/0xa10
[ 10.364072] ? __nla_reserve+0x41/0x50
[ 10.364074] ? __kmalloc_node_track_caller+0x1d0/0x4d0
[ 10.364079] ? pskb_expand_head+0x75/0x310
[ 10.364082] ? nla_reserve_64bit+0x21/0x40
[ 10.364086] ? skb_free_head+0x65/0x80
[ 10.364089] ? security_sock_rcv_skb+0x2c/0x50
[ 10.364094] ? __cond_resched+0x19/0x30
[ 10.364097] ? kmem_cache_alloc_trace+0x15a/0x420
[ 10.364100] rtnl_newlink+0x49/0x70
This change fixes MTL_RXQ_DMA_MAP1 mask issue and channel/queue
mapping warning.
Fixes: d43042f4da3e ("net: stmmac: mapping mtl rx to dma channel")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216195
Reported-by: Cedric Wassenaar <cedric@bytespeed.nl>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using current implementation of polling mode, there is high chances we
will hit into timeout error when running phc2sys. Hence, update the
implementation of hardware crosstimestamping to use the MAC interrupt
service routine instead of polling for TSIS bit in the MAC Timestamp
Interrupt Status register to be set.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-14
This series contains updates to e1000e and igc drivers.
Sasha re-enables GPT clock when exiting s0ix to prevent hardware unit
hang and reverts a workaround for this issue on e1000e.
Lennert Buytenhek restores checks for removed device while accessing
registers to prevent NULL pointer dereferences for igc.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: Reinstate IGC_REMOVED logic and implement it properly
Revert "e1000e: Fix possible HW unit hang after an s0ix exit"
e1000e: Enable GPT clock before sending message to CSME
====================
Link: https://lore.kernel.org/r/20220714175857.933537-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current stmmac driver will prepare/enable ptp_ref clock in
stmmac_init_tstamp_counter().
The stmmac_pltfr_noirq_suspend will disable it once in suspend flow.
But in resume flow,
stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter
stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter
ptp_ref clock reference counter increases twice, which leads to unbalance
ptp clock when resume back.
Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it.
Fixes: 0735e639f129d ("net: stmmac: skip only stmmac_ptp_register when resume from suspend")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If netif is running when stmmac_dvr_remove is invoked,
the unregister_netdev will call ndo_stop(stmmac_release) and
vlan_kill_rx_filter(stmmac_vlan_rx_kill_vid).
Currently, stmmac_dvr_remove() will disable pm runtime before
unregister_netdev. When stmmac_vlan_rx_kill_vid is invoked,
pm_runtime_resume_and_get in it returns EACCESS error number,
and reports:
dwmac-mediatek 11021000.ethernet eth0: stmmac_dvr_remove: removing driver
dwmac-mediatek 11021000.ethernet eth0: FPE workqueue stop
dwmac-mediatek 11021000.ethernet eth0: failed to kill vid 0081/0
Move the pm_runtime_disable to the end of stmmac_dvr_remove
to fix this issue.
Fixes: 6449520391dfc ("net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pm_runtime takes care of the clock handling in current
stmmac drivers, and dwmac-mediatek implement the
mediatek_dwmac_clks_config() as the callback for pm_runtime.
Then, stripping duplicated clocks handling in old init()/exit()
to fix clock issue in suspend/resume test.
As to clocks in probe/remove, vendor need symmetric handling to
ensure clocks balance.
Test pass, including suspend/resume and ko insertion/remove.
Fixes: 3186bdad97d5 ("stmmac: dwmac-mediatek: add platform level clocks management")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reading sysctl_ip_fwd_update_priority, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
nfp_tun_write_neigh() function will configure a tunnel neighbour when
calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx()
(with no tunnel neighbour type) from firmware.
When configuring IP on physical port as a tunnel endpoint, no operation
will be performed after receiving the cmsg mentioned above.
Therefore, add a progress to configure tunnel neighbour in this case.
v2: Correct format of fixes tag.
Fixes: f1df7956c11f ("nfp: flower: rework tunnel neighbour configuration")
Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The initially merged version of the igc driver code (via commit
146740f9abc4, "igc: Add support for PF") contained the following
IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors:
u32 igc_rd32(struct igc_hw *hw, u32 reg)
{
u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr);
u32 value = 0;
if (IGC_REMOVED(hw_addr))
return ~value;
value = readl(&hw_addr[reg]);
/* reads should not return all F's */
if (!(~value) && (!reg || !(~readl(hw_addr))))
hw->hw_addr = NULL;
return value;
}
And:
#define wr32(reg, val) \
do { \
u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \
if (!IGC_REMOVED(hw_addr)) \
writel((val), &hw_addr[(reg)]); \
} while (0)
E.g. igb has similar checks in its MMIO accessors, and has a similar
macro E1000_REMOVED, which is implemented as follows:
#define E1000_REMOVED(h) unlikely(!(h))
These checks serve to detect and take note of an 0xffffffff MMIO read
return from the device, which can be caused by a PCIe link flap or some
other kind of PCI bus error, and to avoid performing MMIO reads and
writes from that point onwards.
However, the IGC_REMOVED macro was not originally implemented:
#ifndef IGC_REMOVED
#define IGC_REMOVED(a) (0)
#endif /* IGC_REMOVED */
This led to the IGC_REMOVED logic to be removed entirely in a
subsequent commit (commit 3c215fb18e70, "igc: remove IGC_REMOVED
function"), with the rationale that such checks matter only for
virtualization and that igc does not support virtualization -- but a
PCIe device can become detached even without virtualization being in
use, and without proper checks, a PCIe bus error affecting an igc
adapter will lead to various NULL pointer dereferences, as the first
access after the error will set hw->hw_addr to NULL, and subsequent
accesses will blindly dereference this now-NULL pointer.
This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and
implements IGC_REMOVED the way it is done for igb, by checking for the
unlikely() case of hw_addr being NULL. This change prevents the oopses
seen when a PCIe link flap occurs on an igc adapter.
Fixes: 146740f9abc4 ("igc: Add support for PF")
Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This reverts commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd.
Commit 1866aa0d0d64 ("e1000e: Fix possible HW unit hang after an s0ix
exit") was a workaround for CSME problem to handle messages comes via H2ME
mailbox. This problem has been fixed by patch "e1000e: Enable the GPT
clock before sending message to the CSME".
Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
On corporate (CSME) ADL systems, the Ethernet Controller may stop working
("HW unit hang") after exiting from the s0ix state. The reason is that
CSME misses the message sent by the host. Enabling the dynamic GPT clock
solves this problem. This clock is cleared upon HW initialization.
Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
aq_nic_deinit() has been called while suspending, so we don't have to call
it again on resume.
Actually, call it again leads to another hang issue when resuming from
S3.
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace:
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]---
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110
Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20220713111224.1535938-2-acelan.kao@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Below commit claims that atlantic NIC requires to reset the device on pm
op, and had set the deep to true for all suspend/resume functions.
commit 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
So, we could remove deep parameter on suspend/resume functions without
any functional change.
Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20220713111224.1535938-1-acelan.kao@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When creating VFs a kernel panic can happen when calling to
efx_ef10_try_update_nic_stats_vf.
When releasing a DMA coherent buffer, sometimes, I don't know in what
specific circumstances, it has to unmap memory with vunmap. It is
disallowed to do that in IRQ context or with BH disabled. Otherwise, we
hit this line in vunmap, causing the crash:
BUG_ON(in_interrupt());
This patch reenables BH to release the buffer.
Log messages when the bug is hit:
kernel BUG at mm/vmalloc.c:2727!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:vunmap+0x2e/0x30
...skip...
Call Trace:
__iommu_dma_free+0x96/0x100
efx_nic_free_buffer+0x2b/0x40 [sfc]
efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
efx_ef10_update_stats_vf+0x18/0x40 [sfc]
efx_start_all+0x15e/0x1d0 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xe7/0x1a0
__dev_change_flags+0x1d7/0x240
dev_change_flags+0x21/0x60
...skip...
Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested")
Reported-by: Ma Yuying <yuma@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-12
This series contains updates to ice driver only.
Paul fixes detection of E822 devices for firmware update and changes NVM
read for snapshot creation to be done in chunks as some systems cannot
read the entire NVM in the allotted time.
====================
Link: https://lore.kernel.org/r/20220712164829.7275-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use after free is detected by kfence when disabling sriov. What was read
after being freed was vf->pci_dev: it was freed from pci_disable_sriov
and later read in efx_ef10_sriov_free_vf_vports, called from
efx_ef10_sriov_free_vf_vswitching.
Set the pointer to NULL at release time to not trying to read it later.
Reproducer and dmesg log (note that kfence doesn't detect it every time):
$ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
$ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k
allocated by task 6771 on cpu 10 at 3137.860196s:
pci_alloc_dev+0x21/0x60
pci_iov_add_virtfn+0x2a2/0x320
sriov_enable+0x212/0x3e0
efx_ef10_sriov_configure+0x67/0x80 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xba/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
freed by task 6771 on cpu 12 at 3170.991309s:
device_release+0x34/0x90
kobject_cleanup+0x3a/0x130
pci_iov_remove_virtfn+0xd9/0x120
sriov_disable+0x30/0xe0
efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses")
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver currently prints the link status using four separate
printk calls, which these days gets presented to the user as four
distinct messages, not exactly ideal:
[ 32.582778] eth0: Link is up using
[ 32.582828] internal
[ 32.582837] transceiver at
[ 32.582888] 100Mb/s, Full Duplex.
Restructure the display_link_mode function to use a single netdev_info
call to present all this information as a single message, which is much
nicer:
[ 33.640143] hme 0000:00:01.1 eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex.
The display_forced_link_mode function has a similar structure, so adjust
it in a similar fashion.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These two error paths should clean up before returning.
Fixes: 2bb4b98b60d7 ("net: stmmac: Add Ingenic SoCs MAC support.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In ftgmac100_probe(), we should hold the refernece returned by
of_get_child_by_name() and use it to call of_node_put() for
reference balance.
Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reading sysctl_tcp_ecn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The upper 32-bit PHC register is not latched when reading the lower
32-bit PHC register. Current code leaves a small window where we may
not read correct higher order bits if the lower order bits are just about
to wrap around.
This patch fixes this by reading higher order bits twice and makes
sure that final value is correctly paired with its lower 32 bits.
Fixes: 30e96f487f64 ("bnxt_en: Do not read the PTP PHC during chip reset")
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the missing length hint in the TX BD for the XDP transmit path. The
length hint is required on legacy chips.
Also, simplify the code by eliminating the first_buf local variable.
tx_buf contains the same value. The opaque value only needs to be set
on the first BD. Fix this also for correctness.
Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the livepatch query fw_target BNXT_FW_SRT_PATCH is
applicable for P5 chips only.
Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching")
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_reinit_after_abort() is called during ifup when a previous
FW reset sequence has aborted or a previous ifup has failed after
detecting FW reset. In all cases, it is safe to assume that a
previous FW reset has completed and the driver may not have fully
reinitialized.
Prior to this patch, it is assumed that the
FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE flag will always be
set by the firmware in bnxt_hwrm_if_change(). This may not be true if
the driver has already attempted to register with the firmware. The
firmware may not set the RESET_DONE flag again after the driver has
registered, assuming that the driver has seen the flag already.
Fix it to always go through the FW reset initialization path if
the BNXT_STATE_FW_RESET_DET flag is set. This flag is always set
by the driver after successfully going through bnxt_reinit_after_abort().
Fixes: 6882c36cf82e ("bnxt_en: attempt to reinitialize after aborted reset")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If bnxt_sriov_enable() fails after some resources have been reserved
for the VFs, the current code is not unwinding properly and the
reserved resources become unavailable afterwards. Fix it by
properly unwinding with a call to bnxt_hwrm_func_qcaps() to
reset all maximum resources.
Also, add the missing bnxt_ulp_sriov_cfg() call to let the RDMA
driver know to abort.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When creating a snapshot of the NVM the driver needs to read the entire
contents from the NVM and store it. The NVM reads are protected by a lock
that is shared between the driver and the firmware.
If the driver takes too long to read the entire NVM (which can happen on
some systems) then the firmware could reclaim the lock and cause subsequent
reads from the driver to fail.
We could fix this by increasing the timeout that we pass to the firmware,
but we could end up in the same situation again if the system is slow.
Instead have the driver break the reading of the NVM into blocks that are
small enough that we have confidence that the read will complete within the
timeout time, but large enough not to cause significant AQ overhead.
Fixes: dce730f17825 ("ice: add a devlink region for dumping NVM contents")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The driver currently presumes that the record data in the PLDM header
of the firmware image will match the device ID of the running device.
This is true for E810 devices. It appears that for E822 devices that
this is not guaranteed to be true.
Fix this by adding a check for the generic E822 device.
Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add unregister_fib_notifier as rollback of register_fib_notifier.
Fixes: 4394fbcb78cf ("net: marvell: prestera: handle fib notifications")
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Link: https://lore.kernel.org/r/20220710122021.7642-1-yevhen.orlov@plvision.eu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
TCP packets will be dropped if the segments number in the tx skb
exceeds limitation when sending iperf3 traffic with --zerocopy option.
we make the following changes:
Get nr_frags in nfp_nfdk_tx_maybe_close_block instead of passing from
outside because it will be changed after skb_linearize operation.
Fill maximum dma_len in first tx descriptor to make sure the whole
head is included in the first descriptor.
Fixes: c10d12e3dce8 ("nfp: add support for NFDK data path")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Accidentally noticed, that this driver is the only user of
while (time_after(jiffies...)).
It looks like typo, because likely this while loop will finish after 1st
iteration, because time_after() returns true when 1st argument _is after_
2nd one.
There is one possible problem with this poll loop: the scheduler could put
the thread to sleep, and it does not get woken up for
OCELOT_FDMA_CH_SAFE_TIMEOUT_US. During that time, the hardware has done
its thing, but you exit the while loop and return -ETIMEDOUT.
Fix it by using sane poll API that avoids all problems described above
Fixes: 753a026cfec1 ("net: ocelot: add FDMA support")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220706132845.27968-1-paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-07-06
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2022-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Ring the TX doorbell on DMA errors
net/mlx5e: Fix capability check for updating vnic env counters
net/mlx5e: CT: Use own workqueue instead of mlx5e priv
net/mlx5: Lag, correct get the port select mode str
net/mlx5e: Fix enabling sriov while tc nic rules are offloaded
net/mlx5e: kTLS, Fix build time constant test in RX
net/mlx5e: kTLS, Fix build time constant test in TX
net/mlx5: Lag, decouple FDB selection and shared FDB
net/mlx5: TC, allow offload from uplink to other PF's VF
====================
Link: https://lore.kernel.org/r/20220706231309.38579-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Renaming interfaces using udevd depends on the interface being registered
before its netdev is registered. Otherwise, udevd reads an empty
phys_port_name value, resulting in the interface not being renamed.
Fix this by registering the interface before registering its netdev
by invoking am65_cpsw_nuss_register_devlink() before invoking
register_netdev() for the interface.
Move the function call to devlink_port_type_eth_set(), invoking it after
register_netdev() is invoked, to ensure that netlink notification for the
port state change is generated after the netdev is completely initialized.
Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20220706070208.12207-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a long-standing issue with the Synopsys DWC Ethernet driver
for Tegra194 where random system crashes have been observed [0]. The
problem occurs when the split header feature is enabled in the stmmac
driver. In the bad case, a larger than expected buffer length is
received and causes the calculation of the total buffer length to
overflow. This results in a very large buffer length that causes the
kernel to crash. Why this larger buffer length is received is not clear,
however, the feedback from the NVIDIA design team is that the split
header feature is not supported for Tegra194. Therefore, disable split
header support for Tegra194 to prevent these random crashes from
occurring.
[0] https://lore.kernel.org/linux-tegra/b0b17697-f23e-8fa5-3757-604a86f3a095@nvidia.com/
Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20220706083913.13750-1-jonathanh@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
66e4c8d95008 ("net: warn if transport header was not set") added
a check that triggers a warning in r8169, see [0].
The commit referenced in the Fixes tag refers to the change from
which the patch applies cleanly, there's nothing wrong with this
commit. It seems the actual issue (not bug, because the warning
is harmless here) was introduced with bdfa4ed68187
("r8169: use Giant Send").
[0] https://bugzilla.kernel.org/show_bug.cgi?id=216157
Fixes: 8d520b4de3ed ("r8169: work around RTL8125 UDP hw bug")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Tested-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/1b2c2b29-3dc0-f7b6-5694-97ec526d51a0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TX doorbells may be postponed, because sometimes the driver knows that
another packet follows (for example, when xmit_more is true, or when a
MPWQE session is closed before transmitting a packet).
However, the DMA mapping may fail for the next packet, in which case a
new WQE is not posted, the doorbell isn't updated either, and the
transmission of the previous packet will be delayed indefinitely.
This commit fixes the described rare error flow by posting a NOP and
ringing the doorbell on errors to flush all the previous packets. The
MPWQE session is closed before that. DMA mapping in the MPWQE flow is
moved to the beginning of mlx5e_sq_xmit_mpwqe, because empty sessions
are not allowed. Stop room always has enough space for a NOP, because
the actual TX WQE is not posted.
Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The existing capability check for vnic env counters only checks for
receive steering discards, although we need the counters update for the
exposed internal queue oob counter as well. This could result in the
latter counter not being updated correctly when the receive steering
discards counter is not supported.
Fix that by checking whether any counter is supported instead of only
the steering counter capability.
Fixes: 0cfafd4b4ddf ("net/mlx5e: Add device out of buffer counter")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Allocate a ct priv workqueue instead of using mlx5e priv one
so flushing will only be of related CT entries.
Also move flushing of the workqueue before rhashtable destroy
otherwise entries won't be valid.
Fixes: b069e14fff46 ("net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mode & mode_flags is updated at the end of mlx5_activate_lag which
may not reflect the actual mode as shown in below logic:
mlx5_activate_lag(struct mlx5_lag *ldev,
|-- unsigned long flags = 0;
|-- err = mlx5_lag_set_flags(ldev, mode, tracker, shared_fdb, &flags);
|-- err = mlx5_create_lag(ldev, tracker, mode, flags);
|-- mlx5_get_str_port_sel_mode(ldev);
|-- ldev->mode = mode;
|-- ldev->mode_flags = flags;
Use mode & flag as parameters to get port select mode info.
Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
There is a total of four 4M entries flow tables. In sriov disabled
mode, ct, ct_nat and post_act take three of them. When adding the
first tc nic rule in this mode, it will take another 4M table
for the tc <chain,prio> table. If user then enables sriov, the legacy
flow table tries to take another 4M and fails, and so enablement fails.
To fix that, have legacy fdb take the next available maximum
size from the fs ft pool.
Fixes: 4a98544d1827 ("net/mlx5: Move chains ft pool to be used by all firmware steering")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use the correct constant (TLS_DRIVER_STATE_SIZE_RX) in the comparison
against the size of the private RX TLS driver context.
Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use the correct constant (TLS_DRIVER_STATE_SIZE_TX) in the comparison
against the size of the private TX TLS driver context.
Fixes: df8d866770f9 ("net/mlx5e: kTLS, Use kernel API to extract private offload context")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|