Age | Commit message (Collapse) | Author |
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers)
unconditionally, but b44::rx_buffers is only valid when the
device is up (they get allocated in b44_open(), and deallocated
again in b44_close()), any other time these are just a NULL pointers.
So if you try to change the pause params while the network interface
is disabled/administratively down, everything explodes (which likely
netifd tries to do).
Link: https://github.com/openwrt/openwrt/issues/13789
Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2)
Cc: stable@vger.kernel.org
Reported-by: Peter Münster <pm@a16n.net>
Suggested-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vaclav Svoboda <svoboda@neng.cz>
Tested-by: Peter Münster <pm@a16n.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Peter Münster <pm@a16n.net>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As interrupts are now requested from ravb_probe(), before calling
register_netdev(), ndev->name still contains the template "eth%d",
leading to funny names in /proc/interrupts. E.g. on R-Car E3:
89: 0 0 GICv2 93 Level eth%d:ch22:multi
90: 0 3 GICv2 95 Level eth%d:ch24:emac
91: 0 23484 GICv2 71 Level eth%d:ch0:rx_be
92: 0 0 GICv2 72 Level eth%d:ch1:rx_nc
93: 0 13735 GICv2 89 Level eth%d:ch18:tx_be
94: 0 0 GICv2 90 Level eth%d:ch19:tx_nc
Worse, on platforms with multiple RAVB instances (e.g. R-Car V4H), all
interrupts have similar names.
Fix this by using the device name instead, like is done in several other
drivers:
89: 0 0 GICv2 93 Level e6800000.ethernet:ch22:multi
90: 0 1 GICv2 95 Level e6800000.ethernet:ch24:emac
91: 0 28578 GICv2 71 Level e6800000.ethernet:ch0:rx_be
92: 0 0 GICv2 72 Level e6800000.ethernet:ch1:rx_nc
93: 0 14044 GICv2 89 Level e6800000.ethernet:ch18:tx_be
94: 0 0 GICv2 90 Level e6800000.ethernet:ch19:tx_nc
Rename the local variable dev_name, as it shadows the dev_name()
function, and pre-initialize it, to simplify the code.
Fixes: 32f012b8c01ca9fd ("net: ravb: Move getting/requesting IRQs in the probe() method")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> # on RZ/G3S
Link: https://lore.kernel.org/r/cde67b68adf115b3cf0b44c32334ae00b2fbb321.1713944647.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Clang static checker(scan-build) warning:
drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c:line 2184, column 2
Attempt to free released memory.
npc_mcam_rsrcs_deinit() has released 'mcam->counters.bmap'. Deleted this
redundant kfree() to fix this double free problem.
Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240424022724.144587-1-suhui@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.)
field from the second nibble of the PTP header which is defined in the
PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification
the first two bytes of the PTP header are defined as the versionType
which is always 0x0001. This means that any PTPv1 packets that are
tagged for TX timestamping by the CPTS will have their messageType set
to 0x0 which corresponds to a Sync message type. This causes issues
when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1)
timestamp that never appears.
Fix this by checking if the ptp_class of the timestamped TX packet is
PTP_CLASS_V1 and then matching the PTP sequence ID to the stored
sequence ID in the skb->cb data structure. If the sequence IDs match
and the packet is of type PTPv1 then there is a chance that the
messageType has been incorrectly stored by the CPTS so overwrite the
messageType stored by the CPTS with the messageType from the skb->cb
data structure. This allows the PTPv1 stack to receive TX timestamps
for Delay_Req packets which are necessary to lock onto a PTP Leader.
Signed-off-by: Jason Reeder <jreeder@ti.com>
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Tested-by: Ed Trexel <ed.trexel@hp.com>
Fixes: f6bd59526ca5 ("net: ethernet: ti: introduce am654 common platform time sync driver")
Link: https://lore.kernel.org/r/20240424071626.32558-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over
aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf().
The commit placed this lock acquisition just prior to the acquisition of
the VF configuration lock.
If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK
flag, this could deadlock with ice_vc_cfg_qs_msg() because it always
acquires the locks in the order of the VF configuration lock and then the
LAG mutex.
Lockdep reports this violation almost immediately on creating and then
removing 2 VF:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc6 #54 Tainted: G W O
------------------------------------------------------
kworker/60:3/6771 is trying to acquire lock:
ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice]
but task is already holding lock:
ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&pf->lag_mutex){+.+.}-{3:3}:
__lock_acquire+0x4f8/0xb40
lock_acquire+0xd4/0x2d0
__mutex_lock+0x9b/0xbf0
ice_vc_cfg_qs_msg+0x45/0x690 [ice]
ice_vc_process_vf_msg+0x4f5/0x870 [ice]
__ice_clean_ctrlq+0x2b5/0x600 [ice]
ice_service_task+0x2c9/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
kthread+0x104/0x140
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1b/0x30
-> #0 (&vf->cfg_lock){+.+.}-{3:3}:
check_prev_add+0xe2/0xc50
validate_chain+0x558/0x800
__lock_acquire+0x4f8/0xb40
lock_acquire+0xd4/0x2d0
__mutex_lock+0x9b/0xbf0
ice_reset_vf+0x22f/0x4d0 [ice]
ice_process_vflr_event+0x98/0xd0 [ice]
ice_service_task+0x1cc/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
kthread+0x104/0x140
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1b/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
*** DEADLOCK ***
4 locks held by kworker/60:3/6771:
#0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
#1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
#2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice]
#3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]
stack backtrace:
CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54
Hardware name:
Workqueue: ice ice_service_task [ice]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x80
check_noncircular+0x12d/0x150
check_prev_add+0xe2/0xc50
? save_trace+0x59/0x230
? add_chain_cache+0x109/0x450
validate_chain+0x558/0x800
__lock_acquire+0x4f8/0xb40
? lockdep_hardirqs_on+0x7d/0x100
lock_acquire+0xd4/0x2d0
? ice_reset_vf+0x22f/0x4d0 [ice]
? lock_is_held_type+0xc7/0x120
__mutex_lock+0x9b/0xbf0
? ice_reset_vf+0x22f/0x4d0 [ice]
? ice_reset_vf+0x22f/0x4d0 [ice]
? rcu_is_watching+0x11/0x50
? ice_reset_vf+0x22f/0x4d0 [ice]
ice_reset_vf+0x22f/0x4d0 [ice]
? process_one_work+0x176/0x4d0
ice_process_vflr_event+0x98/0xd0 [ice]
ice_service_task+0x1cc/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x104/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
To avoid deadlock, we must acquire the LAG mutex only after acquiring the
VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only
after we either acquire or check that the VF configuration lock is held.
Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Same number of TCs doesn't imply that underlying TC configs are
same. The config could be different due to difference in number
of queues in each TC. Add utility function to determine if TC
configs are same.
Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Mineri Bhange <minerix.bhange@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the MFS is set below the default (0x2600), a warning message is
reported like the following :
MFS for port 1 has been set below the default: 600
This message is a bit confusing as the number shown here (600) is in
fact an hexa number: 0x600 = 1536
Without any explicit "0x" prefix, this message is read like the MFS is
set to 600 bytes.
MFS, as per MTUs, are usually expressed in decimal base.
This commit reports both current and default MFS values in decimal
so it's less confusing for end-users.
A typical warning message looks like the following :
MFS for port 1 (1536) has been set below the default (9728)
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Issue reported by customer during SRIOV testing, call trace:
When both i40e and the i40iw driver are loaded, a warning
in check_flush_dependency is being triggered. This seems
to be because of the i40e driver workqueue is allocated with
the WQ_MEM_RECLAIM flag, and the i40iw one is not.
Similar error was encountered on ice too and it was fixed by
removing the flag. Do the same for i40e too.
[Feb 9 09:08] ------------[ cut here ]------------
[ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is
flushing !WQ_MEM_RECLAIM infiniband:0x0
[ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966
check_flush_dependency+0x10b/0x120
[ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq
snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4
nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr
rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma
intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif
isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal
intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core
iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore
ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich
intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad
xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe
drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel
libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror
dm_region_hash dm_log dm_mod fuse
[ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not
tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1
[ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS
SE5C620.86B.02.01.0013.121520200651 12/15/2020
[ +0.000001] Workqueue: i40e i40e_service_task [i40e]
[ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120
[ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48
81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd
ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90
[ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282
[ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX:
0000000000000027
[ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI:
ffff94d47f620bc0
[ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09:
00000000ffff7fff
[ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12:
ffff94c5451ea180
[ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15:
ffff94c5f1330ab0
[ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000)
knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4:
00000000007706f0
[ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ +0.000001] PKRU: 55555554
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] ? __warn+0x80/0x130
[ +0.000003] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? report_bug+0x195/0x1a0
[ +0.000005] ? handle_bug+0x3c/0x70
[ +0.000003] ? exc_invalid_op+0x14/0x70
[ +0.000002] ? asm_exc_invalid_op+0x16/0x20
[ +0.000006] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? check_flush_dependency+0x10b/0x120
[ +0.000002] __flush_workqueue+0x126/0x3f0
[ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core]
[ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core]
[ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core]
[ +0.000020] i40iw_close+0x4b/0x90 [irdma]
[ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e]
[ +0.000035] i40e_service_task+0x126/0x190 [i40e]
[ +0.000024] process_one_work+0x174/0x340
[ +0.000003] worker_thread+0x27e/0x390
[ +0.000001] ? __pfx_worker_thread+0x10/0x10
[ +0.000002] kthread+0xdf/0x110
[ +0.000002] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x2d/0x50
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork_asm+0x1b/0x30
[ +0.000004] </TASK>
[ +0.000001] ---[ end trace 0000000000000000 ]---
Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue")
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5 Rx flow steering and CQE handling enable the driver to be able to
update an skb's md_dst attribute as MACsec when MACsec traffic arrives when
a device is configured for offloading. Advertise this to the core stack to
take advantage of this capability.
Cc: stable@vger.kernel.org
Fixes: b7c9400cbc48 ("net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-5-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rx_chn->irq[] array is unsigned int but it should be signed for the
error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero
then we should return -ENXIO instead of success.
Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/05282415-e7f4-42f3-99f8-32fde8f30936@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the hardcoded values used in the calculations for
vnic descriptors and rings with defines. Minor code cleanup.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I added OOM and netpoll discard counters, naively assuming that
the cpr pointer is pointing to a common completion ring.
Turns out that is usually *a* completion ring but not *the*
completion ring which bnapi->cp_ring points to. bnapi->cp_ring
is where the stats are read from, so we end up reporting 0
thru ethtool -S and qstat even though the drop events have happened.
Make 100% sure we're recording statistics in the correct structure.
Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Roman reports a deadlock on unplug of a Thunderbolt docking station
containing an Intel I225 Ethernet adapter.
The root cause is that led_classdev's for LEDs on the adapter are
registered such that they're device-managed by the netdev. That
results in recursive acquisition of the rtnl_lock() mutex on unplug:
When the driver calls unregister_netdev(), it acquires rtnl_lock(),
then frees the device-managed resources. Upon unregistering the LEDs,
netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
which tries to acquire rtnl_lock() again.
Avoid by using non-device-managed LED registration.
Stack trace for posterity:
schedule+0x6e/0xf0
schedule_preempt_disabled+0x15/0x20
__mutex_lock+0x2a0/0x750
unregister_netdevice_notifier+0x40/0x150
netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
led_trigger_set+0x102/0x330
led_classdev_unregister+0x4b/0x110
release_nodes+0x3d/0xb0
devres_release_all+0x8b/0xc0
device_del+0x34f/0x3c0
unregister_netdevice_many_notify+0x80b/0xaf0
unregister_netdev+0x7c/0xd0
igc_remove+0xd8/0x1e0 [igc]
pci_device_remove+0x3f/0xb0
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240422204503.225448-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: Support 5 layer Tx scheduler topology
Mateusz Polchlopek says:
For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Document tx_scheduling_layers parameter
ice: Add tx_scheduling_layers devlink param
ice: Enable switching default Tx scheduler topology
ice: Adjust the VSI/Aggregator layers
ice: Support 5 layer topology
devlink: extend devlink_param *set pointer
====================
Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use flow_rule_is_supp_control_flags() to reject filters with
unsupported control flags.
In case any unsupported control flags are masked,
flow_rule_is_supp_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.
Remove FLOW_DIS_FIRST_FRAG specific error message,
and treat it as any other unsupported control flag.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20240422152735.175693-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver currently doesn't support any control flags.
Use flow_rule_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Also propagate extack to hclge_get_cls_key_ip(), and convert it to
return error code.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240422152717.175659-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240422152656.175627-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240422152643.175592-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TI's J784S4 SoC supports SGMII mode with CPSW9G instance of the CPSW
Ethernet Switch. Thus, enable it by adding SGMII mode to the
extra_modes member of the "j784s4_cpswxg_pdata" SoC data.
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://lore.kernel.org/r/20240422124515.887511-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit e30cef001da259e8df354b813015d0e5acc08740.
commit 99f4570cfba1 ("clkdev: Update clkdev id usage to allow
for longer names") can fix clk_name exceed MAX_DEV_ID limits,
so this commit is meaningless.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-2-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit c644920ce9220d83e070f575a4df711741c07f07.
when register i2c dev, txgbe shorten "i2c_designware" to "i2c_dw",
will cause this i2c dev can't match platfom driver i2c_designware_platform.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-1-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rehash delayed work is rescheduled with a delay if the number of
credits at end of the work is not negative as supposedly it means that
the migration ended. Otherwise, it is rescheduled immediately.
After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during
rehash" the above is no longer accurate as a non-negative number of
credits is no longer indicative of the migration being done. It can also
happen if the work encountered an error in which case the migration will
resume the next time the work is scheduled.
The significance of the above is that it is possible for the work to be
pending and associated with hints that were allocated when the migration
started. This leads to the hints being leaked [1] when the work is
canceled while pending as part of ACL region dismantle.
Fix by freeing the hints if hints are associated with a work that was
canceled while pending.
Blame the original commit since the reliance on not having a pending
work associated with hints is fragile.
[1]
unreferenced object 0xffff88810e7c3000 (size 256):
comm "kworker/0:16", pid 176, jiffies 4295460353
hex dump (first 32 bytes):
00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a.......
00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@...........
backtrace (crc 2544ddb9):
[<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0
[<000000004d9a1ad9>] objagg_hints_get+0x42/0x390
[<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400
[<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160
[<00000000e81fd734>] process_one_work+0x59c/0xf20
[<00000000ceee9e81>] worker_thread+0x799/0x12c0
[<00000000bda6fe39>] kthread+0x246/0x300
[<0000000070056d23>] ret_from_fork+0x34/0x70
[<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30
Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both the function that migrates all the chunks within a region and the
function that migrates all the entries within a chunk call
list_first_entry() on the respective lists without checking that the
lists are not empty. This is incorrect usage of the API, which leads to
the following warning [1].
Fix by returning if the lists are empty as there is nothing to migrate
in this case.
[1]
WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0>
Modules linked in:
CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As previously explained, the rehash delayed work migrates filters from
one region to another. This is done by iterating over all chunks (all
the filters with the same priority) in the region and in each chunk
iterating over all the filters.
When the work runs out of credits it stores the current chunk and entry
as markers in the per-work context so that it would know where to resume
the migration from the next time the work is scheduled.
Upon error, the chunk marker is reset to NULL, but without resetting the
entry markers despite being relative to it. This can result in migration
being resumed from an entry that does not belong to the chunk being
migrated. In turn, this will eventually lead to a chunk being iterated
over as if it is an entry. Because of how the two structures happen to
be defined, this does not lead to KASAN splats, but to warnings such as
[1].
Fix by creating a helper that resets all the markers and call it from
all the places the currently only reset the chunk marker. For good
measures also call it when starting a completely new rehash. Add a
warning to avoid future cases.
[1]
WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0
Modules linked in:
CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_afk_encode+0x242/0x2f0
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290
mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
</TASK>
Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rehash delayed work migrates filters from one region to another.
This is done by iterating over all chunks (all the filters with the same
priority) in the region and in each chunk iterating over all the
filters.
If the migration fails, the code tries to migrate the filters back to
the old region. However, the rollback itself can also fail in which case
another migration will be erroneously performed. Besides the fact that
this ping pong is not a very good idea, it also creates a problem.
Each virtual chunk references two chunks: The currently used one
('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
first holds the chunk we want to migrate filters to and the second holds
the chunk we are migrating filters from.
The code currently assumes - but does not verify - that the backup chunk
does not exist (NULL) if the currently used chunk does not reference the
target region. This assumption breaks when we are trying to rollback a
rollback, resulting in the backup chunk being overwritten and leaked
[1].
Fix by not rolling back a failed rollback and add a warning to avoid
future cases.
[1]
WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
Modules linked in:
CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W 6.9.0-rc2-custom-00784-gc6a05c468a0b #14
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:parman_destroy+0x17/0x20
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_region_fini+0x19/0x60
mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the rare cases when the device resources are exhausted it is likely
that the rehash delayed work will fail. An error message will be printed
whenever this happens which can be overwhelming considering the fact
that the work is per-region and that there can be hundreds of regions.
Fix by rate limiting the error message.
Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rehash delayed work migrates filters from one region to another
according to the number of available credits.
The migrated from region is destroyed at the end of the work if the
number of credits is non-negative as the assumption is that this is
indicative of migration being complete. This assumption is incorrect as
a non-negative number of credits can also be the result of a failed
migration.
The destruction of a region that still has filters referencing it can
result in a use-after-free [1].
Fix by not destroying the region if migration failed.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858
CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W 6.9.0-rc2-custom-00782-gf2275c2157d8 #5
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70
mlxsw_sp_acl_atcam_entry_del+0x81/0x210
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 174:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 7:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_region_destroy+0x272/0x310
mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rule activity update delayed work periodically traverses the list of
configured rules and queries their activity from the device.
As part of this task it accesses the entry pointed by 'ventry->entry',
but this entry can be changed concurrently by the rehash delayed work,
leading to a use-after-free [1].
Fix by closing the race and perform the activity query under the
'vregion->lock' mutex.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181
CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
mlxsw_sp_acl_rule_activity_update_work+0x219/0x400
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The purpose of the rehash delayed work is to reduce the number of masks
(eRPs) used by an ACL region as the eRP bank is a global and limited
resource.
This is done in three steps:
1. Creating a new set of masks and a new ACL region which will use the
new masks and to which the existing filters will be migrated to. The
new region is assigned to 'vregion->region' and the region from which
the filters are migrated from is assigned to 'vregion->region2'.
2. Migrating all the filters from the old region to the new region.
3. Destroying the old region and setting 'vregion->region2' to NULL.
Only the second steps is performed under the 'vregion->lock' mutex
although its comments says that among other things it "Protects
consistency of region, region2 pointers".
This is problematic as the first step can race with filter insertion
from user space that uses 'vregion->region', but under the mutex.
Fix by holding the mutex across the entirety of the delayed work and not
only during the second step.
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Region identifiers can be allocated both when user space tries to insert
a new tc filter and when filters are migrated from one region to another
as part of the rehash delayed work.
There is no lock protecting the bitmap from which these identifiers are
allocated from, which is racy and leads to bad parameter errors from the
device's firmware.
Fix by converting the bitmap to IDA which handles its own locking. For
consistency, do the same for the group identifiers that are part of the
same structure.
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce494b7940cadfe84f3e18da7785b51ef5f776e3.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
no custom recycling logics, it can easily be switched to using Page
Pool / libeth API instead.
This allows to removing the whole dancing around headroom, HW buffer
size, and page order. All DMA-for-device is now done in the PP core,
for-CPU -- in the libeth helper.
Use skb_mark_for_recycle() to bring back the recycling and restore the
performance. Speaking of performance: on par with the baseline and
faster with the PP optimization series applied. But the memory usage for
1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page
every second descriptor.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Before replacing the Rx buffer management with libie, clean up
&iavf_ring a bit.
There are several fields not used anywhere in the code -- simply remove
them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with
1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr
out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx
stall detection). Place it next to the stats on the ring itself to fill
the 4-byte slot.
The result: no holes and all the hot fields fit into the first 64-byte
cacheline.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add a couple intuitive helpers to hide Rx buffer implementation details
in the library and not multiplicate it between drivers. The settings are
sorta optimized for 100G+ NICs, but nothing really HW-specific here.
Use the new page_pool_dev_alloc() to dynamically switch between
split-page and full-page modes depending on MTU, page size, required
headroom etc. For example, on x86_64 with the default driver settings
each page is shared between 2 buffers. Turning on XDP (not in this
series) -> increasing headroom requirement pushes truesize out of 2048
boundary, leading to that each buffer starts getting a full page.
The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to
avoid compound overhead. For the above architecture, this means maximum
linear frame size of 3712 w/o XDP.
Not that &libeth_buf_queue is not a complete queue/ring structure for
now, rather a shim, but eventually the libeth-enabled drivers will move
to it, with iavf being the first one.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
As an intermediate step, remove all page splitting/recycling code. Just
always allocate a new page and don't touch its refcount, so that it gets
freed by the core stack later.
Same for the "in-place" recycling, i.e. when an unused buffer gets
assigned to a first needs-refilling descriptor. In some cases, this
was leading to moving up to 63 &iavf_rx_buf structures around the ring
on a per-field basis -- not something wanted on hotpath.
The change allows to greatly simplify certain parts of the code:
Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744)
Although the array of &iavf_rx_buf is barely used now and could be
replaced with just page pointer array, don't touch it now to not
complicate replacing it with libie Rx buffer struct later on.
No surprise perf loses up to 30% here, but that regression will
go away once PP lands.
Note that iavf_rx_pg_*() definitions are left to reduce diffstat.
They will be removed with the conversion to Page Pool.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Ever since build_skb() became stable, the old way with allocating an skb
for storing the headers separately, which will be then copied manually,
was slower, less flexible, and thus obsolete.
* It had higher pressure on MM since it actually allocates new pages,
which then get split and refcount-biased (NAPI page cache);
* It implies memcpy() of packet headers (40+ bytes per each frame);
* the actual header length was calculated via eth_get_headlen(), which
invokes Flow Dissector and thus wastes a bunch of CPU cycles;
* XDP makes it even more weird since it requires headroom for long and
also tailroom for some time (since mbuf landed). Take a look at the
ice driver, which is built around work-arounds to make XDP work with
it.
Even on some quite low-end hardware (not a common case for 100G NICs) it
was performing worse.
The only advantage "legacy-rx" had is that it didn't require any
reserved headroom and tailroom. But iavf didn't use this, as it always
splits pages into two halves of 2k, while that save would only be useful
when striding. And again, XDP effectively removes that sole pro.
There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
multi-buffer etc. Each new would require adding more and more Danse
Macabre for absolutely no reason, besides making hotpath less and less
effective.
Remove the "feature" with all the related code. This includes at least
one very hot branch (typically hit on each new frame), which was either
always-true or always-false at least for a complete NAPI bulk of 64
frames, the whole private flags cruft, and so on. Some stats:
Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-721 (-721)
RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Not a secret there's a ton of code duplication between two and more Intel
ethernet modules.
Before introducing new changes, which would need to be copied over again,
start decoupling the already existing duplicate functionality into a new
module, which will be shared between several Intel Ethernet drivers.
Add the lookup table which converts 8/10-bit hardware packet type into
a parsed bitfield structure for easy checking packet format parameters,
such as payload level, IP version, etc. This is currently used by i40e,
ice and iavf and it's all the same in all three drivers.
The only difference introduced in this implementation is that instead of
defining a 256 (or 1024 in case of ice) element array, add unlikely()
condition to limit the input to 154 (current maximum non-reserved packet
type). There's no reason to waste 600 (or even 3600) bytes only to not
hurt very unlikely exception packets.
The hash computation function now takes payload level directly as a
pkt_hash_type. There's a couple cases when non-IP ptypes are marked as
L3 payload and in the previous versions their hash level would be 2, not
3. But skb_set_hash() only sees difference between L4 and non-L4, thus
this won't change anything at all.
The module is behind the hidden Kconfig symbol, which the drivers will
select when needed. The exports are behind 'LIBIE' namespace to limit
the scope of the functions.
Not that non-HW-specific symbols will live in yet another module,
libeth. This is done to easily distinguish pretty generic code ready
for reusing by any other vendor and/or for moving the layer up from
the code useful in Intel's 1-100G drivers only.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support for tc matchall mirror stats. When a new matchall mirror
rule is added, the baseline stats for that port is saved.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the necessary tc glue to add and delete mirror rules through tc
matchall.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The hardware supports three independent mirroring probes. Each probe can
be configured to mirror rx or tx traffic (direction).
Using tc matchall, it is now possible to add a source port and a monitor
port to a mirror probe. Depending on the mirror direction, rx or tx
traffic from a source port will be mirrored to the monitor port.
A single source port can be a member of multiple mirror probes.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for new tc matchall rules, we add a bit of bookkeeping
code to keep track of them. The rules are identified by the cookie
passed from the tc stack.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for port mirroring support through tc matchall, add the
required register definitions.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].
Un-embed the net_device from the private struct by converting it
into a pointer. Then use the leverage the new alloc_netdev_dummy()
helper to allocate and initialize dummy devices.
[1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].
Un-embed the net_device from the private struct by converting it
into a pointer. Then use the leverage the new alloc_netdev_dummy()
helper to allocate and initialize dummy devices.
[1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].
Un-embed the net_device from the private struct by converting it
into a pointer. Then use the leverage the new alloc_netdev_dummy()
helper to allocate and initialize dummy devices.
[1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After a set of recent fixes the stmmac_phy_setup() and
stmmac_reinit_queues() methods have turned to having some duplicated code.
Let's get rid from the duplication by moving the MAC-capabilities
initialization to the PHYLINK MAC-capabilities getter. The getter is
called during each network device interface open/close cycle. So the
MAC-capabilities will be initialized in generic device open procedure and
in case of the Tx/Rx queues re-initialization as the original code
semantics implies.
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since recent commits the stmmac_ops::phylink_get_caps() callback has no
longer been responsible for the phylink MAC capabilities getting, but
merely updates the MAC capabilities in the mac_device_info::link::caps
field. Rename the callback to comply with the what the method does now.
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
packets
In the current mechanism of timestamping, am65-cpsw-nuss driver
enables hardware timestamping for all received packets by setting
the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS
module to timestamp all received packets, followed by passing
timestamp via DMA descriptors. This mechanism causes CPSW Port to
Lock up.
To prevent port lock up, don't enable rx packet timestamping by
setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for
timestamping received packets is to utilize the CPTS Event FIFO
that records timestamps corresponding to certain events. The CPTS
module is configured to generate timestamps for Multicast Ethernet,
UDP/IPv4 and UDP/IPv6 PTP packets.
Update supported hwtstamp_rx_filters values for CPSW's timestamping
capability.
Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
CPTS FIFO
Add a new function "am65_cpts_rx_timestamp()" which checks for PTP
packets from header and timestamps them.
Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO
Event to get the timestamp of received PTP packet.
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|