Age | Commit message (Collapse) | Author |
|
Cross-merge networking fixes after downstream PR (net-6.12-rc7).
Conflicts:
drivers/net/ethernet/freescale/enetc/enetc_pf.c
e15c5506dd39 ("net: enetc: allocate vf_state during PF probes")
3774409fd4c6 ("net: enetc: build enetc_pf_common.c as a separate module")
https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/ti/am65-cpsw-nuss.c
de794169cf17 ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7")
4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a partial revert to commit 76a0a3f9cc2f ("e1000e: fix force smbus
during suspend flow"). That commit fixed a sporadic PHY access issue but
introduced a regression in runtime suspend flows.
The original issue on Meteor Lake systems was rare in terms of the
reproduction rate and the number of the systems affected.
After the integration of commit 0a6ad4d9e169 ("e1000e: avoid failing the
system during pm_suspend"), PHY access loss can no longer cause a
system-level suspend failure. As it only occurs when the LAN cable is
disconnected, and is recovered during system resume flow. Therefore, its
functional impact is low, and the priority is given to stabilizing
runtime suspend.
Fixes: 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.
1. Thread T0 allocates a filter in i40e_add_filter() within
i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
refers to the already freed filter memory, causing corruption.
Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
please set promiscuous on manually for VF XX".
Exact code for stable reproduction Intel can't open-source now.
The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.
Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org # 6.9+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When the device control plane is removed or the platform
running device control plane is rebooted, a reset is detected
on the driver. On driver reset, it releases the resources and
waits for the reset to complete. If the reset fails, it takes
the error path and releases the vport lock. At this time if the
monitoring tools tries to access link settings, it call traces
for accessing released vport pointer.
To avoid it, move link_speed_mbps to netdev_priv structure
which removes the dependency on vport pointer and the vport lock
in idpf_get_link_ksettings. Also use netif_carrier_ok()
to check the link status and adjust the offsetof to use link_up
instead of link_speed_mbps.
Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Cc: stable@vger.kernel.org # 6.7+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix Flow Director not allowing to re-map traffic to 0th queue when action
is configured to drop (and vice versa).
The current implementation of ethtool callback in the ice driver forbids
change Flow Director action from 0 to -1 and from -1 to 0 with an error,
e.g:
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1
rmgr: Cannot insert RX class rule: Invalid argument
We set the value of `u16 q_index = 0` at the beginning of the function
ice_set_fdir_input_set(). In case of "drop traffic" action (which is
equal to -1 in ethtool) we store the 0 value. Later, when want to change
traffic rule to redirect to queue with index 0 it returns an error
caused by duplicate found.
Fix this behaviour by change of the type of field `q_index` from u16 to s16
in `struct ice_fdir_fltr`. This allows to store -1 in the field in case
of "drop traffic" action. What is more, change the variable type in the
function ice_set_fdir_input_set() and assign at the beginning the new
`#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set
to another value (point specific queue index) the variable value is
overwritten in the function.
Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:
modprobe ice
devlink dev eswitch set $PF1_PCI mode switchdev
ip link add $BR type bridge
ip link set $BR up
echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
sleep 2
ip link set $PF1 master $BR
ip link set $VF1_PR master $BR
ip link set $VF2_PR master $BR
ip link set $PF1 up
ip link set $VF1_PR up
ip link set $VF2_PR up
ip link set $VF1 up
rmmod irdma ice
When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).
Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).
Trace:
Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
(...)
Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
RIP: 0010:__rht_bucket_nested+0xb4/0x180
(...)
Call Trace:
(...)
ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
? __schedule+0xf60/0x5210
? mutex_lock+0x91/0xe0
? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
(...)
A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.
Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.
Fixes: fff292b47ac1 ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc6).
Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/
net/mac80211/cfg.c
c4382d5ca1af ("wifi: mac80211: update the right link for tx power")
8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx")
drivers/net/ethernet/intel/ice/ice_ptp_hw.h
6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM")
e4291b64e118 ("ice: Align E810T GPIO to other products")
ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions")
ac532f4f4251 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The E810 Lan On Motherboard (LOM) design is vendor specific. Intel
provides the reference design, but it is up to vendor on the final
product design. For some cases, like Linux DPLL support, the static
values defined in the driver does not reflect the actual LOM design.
Current implementation of dpll pins is causing the crash on probe
of the ice driver for such DPLL enabled E810 LOM designs:
WARNING: (...) at drivers/dpll/dpll_core.c:495 dpll_pin_get+0x2c4/0x330
...
Call Trace:
<TASK>
? __warn+0x83/0x130
? dpll_pin_get+0x2c4/0x330
? report_bug+0x1b7/0x1d0
? handle_bug+0x42/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? dpll_pin_get+0x117/0x330
? dpll_pin_get+0x2c4/0x330
? dpll_pin_get+0x117/0x330
ice_dpll_get_pins.isra.0+0x52/0xe0 [ice]
...
The number of dpll pins enabled by LOM vendor is greater than expected
and defined in the driver for Intel designed NICs, which causes the crash.
Prevent the crash and allow generic pin initialization within Linux DPLL
subsystem for DPLL enabled E810 LOM designs.
Newly designed solution for described issue will be based on "per HW
design" pin initialization. It requires pin information dynamically
acquired from the firmware and is already in progress, planned for
next-tree only.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There is no support for SF in legacy mode. Reflect it in the code.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Fixes: eda69d654c7e ("ice: add basic devlink subfunctions support")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.
The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx.
To resolve this, we set IRQF_NO_THREAD for the affected interrupt,
ensuring that the kernel handles it immediately, thereby preventing
the aforementioned error.
Reproducer:
#!/bin/bash
# echo 2 > /sys/class/net/ens14f0/device/sriov_numvfs
ipaddr_vlan=3
nic_test=ens14f0
vf=${nic_test}v0
while true; do
ip link set ${nic_test} mtu 1500
ip link set ${vf} mtu 1500
ip link set $vf up
ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
if ! ip link show $vf | grep 'state UP'; then
echo 'Error found'
break
fi
ip link set $vf down
done
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Reported-by: Yuying Ma <yuma@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc3).
No conflicts and no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Paolo Abeni says:
====================
net: introduce TX H/W shaping API
We have a plurality of shaping-related drivers API, but none flexible
enough to meet existing demand from vendors[1].
This series introduces new device APIs to configure in a flexible way
TX H/W shaping. The new functionalities are exposed via a newly
defined generic netlink interface and include introspection
capabilities. Some self-tests are included, on top of a dummy
netdevsim implementation. Finally a basic implementation for the iavf
driver is provided.
Some usage examples:
* Configure shaping on a given queue:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do set --json '{"ifindex": '$IFINDEX',
"shaper": {"handle":
{"scope": "queue", "id":'$QUEUEID'},
"bw-max": 2000000}}'
* Container B/W sharing
The orchestration infrastructure wants to group the
container-related queues under a RR scheduling and limit the aggregate
bandwidth:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope":"node"},
"bw-max": 10000000}'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}}
Q1 \
\
Q2 -- node 0 ------- netdev
/ (bw-max: 10M)
Q3 /
* Delegation
A containers wants to limit the aggregate B/W bandwidth of 2 of the 3
queues it owns - the starting configuration is the one from the
previous point:
SPEC=Documentation/netlink/specs/net_shaper.yaml
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
"handle": {"scope": "node"},
"bw-max": 5000000 }'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}}
Q1 -- node 1 --------\
/ (bw-max: 5M) \
Q2 / node 0 ------- netdev
/(bw-max: 10M)
Q3 ------------------/
In a group operation, when prior to the op itself, the leaves have
different parents, the user must specify the parent handle for the
group. I.e., starting from the previous config:
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"bw-max": 3000000 }'
Netlink error: Invalid argument
nl_len = 96 (80) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'All the leaves shapers must have the same old parent'}
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"parent": {"scope": "node", "id": 1},
"bw-max": 3000000 }
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}}
Q1 -- node 2 ---
/(bw-max:3M)\
Q3 / \
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
* Cleanup:
Still starting from config 1To delete a single queue shaper
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID3'}}'
Q1 -- node 2 ---
(bw-max:3M)\
\
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
Deleting a node shaper relinks all its leaves to the node's parent:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id":2}}'
Q1 ---\
\
node 1----- \
/ (bw-max: 5M)\
Q2----/ node 0 ------- netdev
(bw-max: 10M)
Deleting the last shaper under a node shaper deletes the node, too:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID1'}}'
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID2'}}'
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 1}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
Such delete recurses on parents that are left over with no leaves:
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 0}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com
v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com
v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com
v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com
v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com
v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com
RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com
RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com
====================
Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During driver initialization VF determines QOS capability is allowed
by PF and receives QOS parameters. After which quanta size for queues
is configured which is not configurable and is set to 1KB currently.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement net_shaper_ops support for IAVF. This enables configuration
of rate limiting on per queue basis. Customer intends to enforce
bandwidth limit on Tx traffic steered to the queue by configuring
rate limits on the queue.
To set rate limiting for a queue, update shaper object of given queues
in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW
configuration.
Deleting shaper configured for queue is nothing but configuring shaper
with bw_max 0. The PF restores the default rate limiting config
when bw_max is zero.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support to configure VF queue rate limit and quanta size.
For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for netdev-genl, allowing users to query IRQ, NAPI, and queue
information.
After this patch is applied, note the IRQ assigned to my NIC:
$ cat /proc/interrupts | grep enp0s8 | cut -f1 --delimiter=':'
18
Note the output from the cli:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 513, 'ifindex': 2, 'irq': 18}]
This device supports only 1 rx and 1 tx queue, so querying that:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'tx'}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support for netdev-genl, allowing users to query IRQ, NAPI, and queue
information.
After this patch is applied, note the IRQs assigned to my NIC:
$ cat /proc/interrupts | grep ens | cut -f1 --delimiter=':'
50
51
52
While e1000e allocates 3 IRQs (RX, TX, and other), it looks like e1000e
only has a single NAPI, so I've associated the NAPI with the RX IRQ (50
on my system, seen above).
Note the output from the cli:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 145, 'ifindex': 2, 'irq': 50}]
This device supports only 1 rx and 1 tx queue. so querying that:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 145, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 145, 'type': 'tx'}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Duplicated register initialization codes exist in e1000_configure_tx()
and e1000_configure_rx().
For example, writel(0, tx_ring->head) writes 0 to tx_ring->head, which
is adapter->hw.hw_addr + E1000_TDH(0).
This initialization is already done in ew32(TDH(0), 0).
ew32(TDH(0), 0) is equivalent to __ew32(hw, E1000_TDH(0), 0). It
executes writel(0, hw->hw_addr + E1000_TDH(0)). Since variable hw is
set to &adapter->hw, it is equal to writel(0, tx_ring->head).
We can remove similar four writel() in e1000_configure_tx() and
e1000_configure_rx().
commit 0845d45e900c ("e1000e: Modify Tx/Rx configurations to avoid
null pointer dereferences in e1000_open") has introduced these
writel(). This commit moved register writing to
e1000_configure_tx/rx(), and as result, it caused duplication in
e1000_configure_tx/rx().
This patch modifies the sequence of register writing, but removing
these writes is safe because the same writes were already there before
the commit.
I also have checked the datasheets [0] [1] and have not found any
description that we need to write RDH, RDT, TDH and TDT registers
twice at initialization. Furthermore, we have tested this patch on an
I219-V device physically.
Link: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82577-gbe-phy-datasheet.pdf [0]
Link: https://www.intel.com/content/www/us/en/content-details/613460/intel-82583v-gbe-controller-datasheet.html [1]
Tested-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
e1000_init_function_pointers_82575() is never implemented and used since
commit 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver").
And commit 9835fd7321a6 ("igb: Add new function to read part number from
EEPROM in string format") removed igb_read_part_num() implementation.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There is no caller and implementation in tree.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Since commit fff292b47ac1 ("ice: add VF representors one by one")
ice_eswitch_configure() is not used anymore.
Commit 1b8f15b64a00 ("ice: refactor filter functions") removed
ice_vsi_cfg_mac_fltr() but leave declaration.
Commit a24b4c6e9aab ("ice: xsk: Do not convert to buff to frame for
XDP_TX") leave ice_xmit_xdp_buff() declaration.
Commit 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C
products") declared ice_phy_cfg_{rx,tx}_offset_eth56g(),
commit a1ffafb0b4a4 ("ice: Support configuring the device to Double
VLAN Mode") declared ice_pkg_buf_get_free_space(), and
commit 8a3a565ff210 ("ice: add admin commands to access cgu
configuration") declared ice_is_pca9575_present(), but all these never
be implemented.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Sporadic issues, such as PHY access loss, have been observed on I219 (19)
devices. It was found that these devices have hardware more closely
related to ADP than MTP and the issues were caused by taking MTP-specific
flows.
Change the MAC and board types of these devices from MTP to ADP to
correctly reflect the LAN hardware, and flows, of these devices.
Fixes: db2d737d63c5 ("e1000e: Separate MTP board type from ADP")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
changed igb_io_error_detected() to ignore non-fatal pcie errors in order
to avoid hung task that can happen when igb_down() is called multiple
times. This caused an issue when processing transient non-fatal errors.
igb_io_resume(), which is called after igb_io_error_detected(), assumes
that device is brought down by igb_io_error_detected() if the interface
is up. This resulted in panic with stacktrace below.
[ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
[ T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[ T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ T292] igb 0000:09:00.0: device [8086:1537] error status/mask=00004000/00000000
[ T292] igb 0000:09:00.0: [14] CmpltTO [ 200.105524,009][ T292] igb 0000:09:00.0: AER: TLP Header: 00000000 00000000 00000000 00000000
[ T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
[ T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[ T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
[ T292] pcieport 0000:00:1c.5: AER: broadcast resume message
[ T292] ------------[ cut here ]------------
[ T292] kernel BUG at net/core/dev.c:6539!
[ T292] invalid opcode: 0000 [#1] PREEMPT SMP
[ T292] RIP: 0010:napi_enable+0x37/0x40
[ T292] Call Trace:
[ T292] <TASK>
[ T292] ? die+0x33/0x90
[ T292] ? do_trap+0xdc/0x110
[ T292] ? napi_enable+0x37/0x40
[ T292] ? do_error_trap+0x70/0xb0
[ T292] ? napi_enable+0x37/0x40
[ T292] ? napi_enable+0x37/0x40
[ T292] ? exc_invalid_op+0x4e/0x70
[ T292] ? napi_enable+0x37/0x40
[ T292] ? asm_exc_invalid_op+0x16/0x20
[ T292] ? napi_enable+0x37/0x40
[ T292] igb_up+0x41/0x150
[ T292] igb_io_resume+0x25/0x70
[ T292] report_resume+0x54/0x70
[ T292] ? report_frozen_detected+0x20/0x20
[ T292] pci_walk_bus+0x6c/0x90
[ T292] ? aer_print_port_info+0xa0/0xa0
[ T292] pcie_do_recovery+0x22f/0x380
[ T292] aer_process_err_devices+0x110/0x160
[ T292] aer_isr+0x1c1/0x1e0
[ T292] ? disable_irq_nosync+0x10/0x10
[ T292] irq_thread_fn+0x1a/0x60
[ T292] irq_thread+0xe3/0x1a0
[ T292] ? irq_set_affinity_notifier+0x120/0x120
[ T292] ? irq_affinity_notify+0x100/0x100
[ T292] kthread+0xe2/0x110
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork+0x2d/0x50
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork_asm+0x11/0x20
[ T292] </TASK>
To fix this issue igb_io_resume() checks if the interface is running and
the device is not down this means igb_io_error_detected() did not bring
the device down and there is no need to bring it up.
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.
To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.
Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.
Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.
This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.
Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add jump targets so that a bit of exception handling can be better reused
at the end of two function implementations.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We have for some time the assign_bit() API to replace open coded
if (foo)
set_bit(n, bar);
else
clear_bit(n, bar);
Use this API to clean the code. No functional change intended.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The max_frame and rx_buf_len fields of the VSI set the maximum frame size
for packets on the wire, and configure the size of the Rx buffer. In the
hardware, these are per-queue configuration. Most VSI types use a simple
method to determine the size of the buffers for all queues.
However, VFs may potentially configure different values for each queue.
While the Linux iAVF driver does not do this, it is allowed by the virtchnl
interface.
The current virtchnl code simply sets the per-VSI fields inbetween calls to
ice_vsi_cfg_single_rxq(). This technically works, as these fields are only
ever used when programming the Rx ring, and otherwise not checked again.
However, it is confusing to maintain.
The Rx ring also already has an rx_buf_len field in order to access the
buffer length in the hotpath. It also has extra unused bytes in the ring
structure which we can make use of to store the maximum frame size.
Drop the VSI max_frame and rx_buf_len fields. Add max_frame to the Rx ring,
and slightly re-order rx_buf_len to better fit into the gaps in the
structure layout.
Change the ice_vsi_cfg_frame_size function so that it writes to the ring
fields. Call this function once per ring in ice_vsi_cfg_rxqs(). This is
done over calling it inside the ice_vsi_cfg_rxq(), because
ice_vsi_cfg_rxq() is called in the virtchnl flow where the max_frame and
rx_buf_len have already been configured.
Change the accesses for rx_buf_len and max_frame to all point to the ring
structure. This has the added benefit that ice_vsi_cfg_rxq() no longer has
the surprise side effect of updating ring->rx_buf_len based on the VSI
field.
Update the virtchnl ice_vc_cfg_qs_msg() function to set the ring values
directly, and drop references to the removed VSI fields.
This now makes the VF logic clear, as the ring fields are obviously
per-queue. This reduces the required cognitive load when reasoning about
this logic.
Note that removing the VSI fields does leave a 4 byte gap, but the ice_vsi
structure has many gaps, and its layout is not as critical in the hot path.
The structure may benefit from a more thorough repacking, but no attempt
was made in this change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_vc_cfg_qs_msg() function is used to configure VF queues in response
to a VIRTCHNL_OP_CONFIG_VSI_QUEUES command.
The virtchnl command contains an array of queue pair data for configuring
Tx and Rx queues. This data includes a queue ID. When configuring the
queues, the driver generally uses this queue ID to determine which Tx and
Rx ring to program. However, a handful of places use the index into the
queue pair data from the VF. While most VF implementations appear to send
this data in order, it is not mandated by the virtchnl and it is not
verified that the queue pair data comes in order.
Fix the driver to consistently use the q_idx field instead of the 'i'
iterator value when accessing the rings. For the Rx case, introduce a local
ring variable to keep lines short.
Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
E830 adds hardware support to prevent the VF from overflowing the PF
mailbox with VIRTCHNL messages. E830 will use the hardware feature
(ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf().
To prevent a VF from overflowing the PF, the PF sets the number of
messages per VF that can be in the PF's mailbox queue
(ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF,
the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG
register.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Enable ethtool reset support. Ethtool reset flags are mapped to the
E810 reset type:
PF reset:
$ ethtool --reset <ethX> irq dma filter offload
CORE reset:
$ ethtool --reset <ethX> irq-shared dma-shared filter-shared \
offload-shared ram-shared
GLOBAL reset:
$ ethtool --reset <ethX> irq-shared dma-shared filter-shared \
offload-shared mac-shared phy-shared ram-shared
Calling the same set of flags as in PF reset case on port representor
triggers VF reset.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Increasing MSI-X value on a VF leads to invalid memory operations. This
is caused by not reallocating some arrays.
Reproducer:
modprobe ice
echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe
echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs
echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count
Default MSI-X is 16, so 17 and above triggers this issue.
KASAN reports:
BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
Read of size 8 at addr ffff8888b937d180 by task bash/28433
(...)
Call Trace:
(...)
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
kasan_report+0xed/0x120
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_cfg_def+0x3360/0x4770 [ice]
? mutex_unlock+0x83/0xd0
? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice]
? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vf_reconfig_vsi+0x114/0x210 [ice]
ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice]
sriov_vf_msix_count_store+0x21c/0x300
(...)
Allocated by task 28201:
(...)
ice_vsi_cfg_def+0x1c8e/0x4770 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vsi_setup+0x179/0xa30 [ice]
ice_sriov_configure+0xcaa/0x1520 [ice]
sriov_numvfs_store+0x212/0x390
(...)
To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This
causes the required arrays to be reallocated taking the new queue count
into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq
before ice_vsi_rebuild(), so that realloc uses the newly set queue
count.
Additionally, ice_vsi_rebuild() does not remove VSI filters
(ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer
necessary.
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Triggering the reset while in switchdev mode causes
errors[1]. Rules are already removed by this time
because switch content is flushed in case of the reset.
This means that rules were deleted from HW but SW
still thinks they exist so when we get
SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to
delete not existing rule.
We can avoid these errors by clearing the rules
early in the reset flow before they are removed from HW.
Switchdev API will get notified that the rule was removed
so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification.
Remove unnecessary ice_clear_sw_switch_recipes.
[1]
ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2
ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2
Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support")
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
netif_is_ice() works by checking the pointer to netdev ops. However, it
only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops,
so in Safe Mode it always returns false, which is unintuitive. While it
doesn't look like netif_is_ice() is currently being called anywhere in Safe
Mode, this could change and potentially lead to unexpected behaviour.
Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If DDP package is missing or corrupted, the driver should enter Safe Mode.
Instead, an error is returned and probe fails.
To fix this, don't exit init if ice_init_ddp_config() returns an error.
Repro:
* Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
* Load ice
Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The sizeof(struct napi_struct) can change. Don't hardcode the size to
400 bytes and instead use "sizeof(struct napi_struct)".
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241004105407.73585-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-01 (ice)
This series contains updates to ice driver only.
Karol cleans up current PTP GPIO pin handling, fixes minor bugs,
refactors implementation for all products, introduces SDP (Software
Definable Pins) for E825C and implements reading SDP section from NVM
for E810 products.
Sergey replaces multiple aux buses and devices used in the PTP support
code with struct ice_adapter holding the necessary shared data.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Drop auxbus use for PTP to finalize ice_adapter move
ice: Use ice_adapter for PTP shared data instead of auxdev
ice: Initial support for E825C hardware in ice_adapter
ice: Add ice_get_ctrl_ptp() wrapper to simplify the code
ice: Introduce ice_get_phy_model() wrapper
ice: Enable 1PPS out from CGU for E825C products
ice: Read SDP section from NVM for pin definitions
ice: Disable shared pin on E810 on setfunc
ice: Cache perout/extts requests and check flags
ice: Align E810T GPIO to other products
ice: Add SDPs support for E825C
ice: Implement ice_ptp_pin_desc
====================
Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf)
This series contains updates to ice and idpf drivers:
For ice:
Michal corrects setting of dst VSI on LAN filters and adds clearing of
port VLAN configuration during reset.
Gui-Dong Han corrects failures to decrement refcount in some error
paths.
Przemek resolves a memory leak in ice_init_tx_topology().
Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper
value.
Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly
restored after reset.
For idpf:
Ahmed sets uninitialized dyn_ctl_intrvl_s value.
Josh corrects use and reporting of mailbox size.
Larysa corrects order of function calls during de-initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: deinit virtchnl transaction manager after vport and vectors
idpf: use actual mbx receive payload length
idpf: fix VF dynamic interrupt ctl register initialization
ice: fix VLAN replay after reset
ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins
ice: fix memleak in ice_init_tx_topology()
ice: clear port vlan config during reset
ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count()
ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins()
ice: set correct dst VSI in only LAN filters
====================
Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
Drop unused auxbus/auxdev support from the PTP code due to
move to the ice_adapter.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Use struct ice_adapter to hold shared PTP data and control PTP
related actions instead of auxbus. This allows significant code
simplification and faster access to the container fields used in
the PTP support code.
Move the PTP port list to the ice_adapter container to simplify
the code and avoid race conditions which could occur due to the
synchronous nature of the initialization/access and
certain memory saving can be achieved by moving PTP data into
the ice_adapter itself.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Address E825C devices by PCI ID since dual IP core configurations
need 1 ice_adapter for both devices.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add ice_get_ctrl_ptp() wrapper to simplify the PTP support code
in the functions that do not use ctrl_pf directly.
Add the control PF pointer to struct ice_adapter
Rearrange fields in struct ice_adapter
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Introduce ice_get_phy_model() to improve code readability
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implement configuring 1PPS signal output from CGU. Use maximal amplitude
because Linux PTP pin API does not have any way for user to set signal
level.
This change is necessary for E825C products to properly output any
signal from 1PPS pin.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
PTP pins assignment and their related SDPs (Software Definable Pins) are
currently hardcoded.
Fix that by reading NVM section instead on products supporting this,
which are E810 products.
If SDP section is not defined in NVM, the driver continues to use the
hardcoded table.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Yochai Hagvi <yochai.hagvi@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When setting a new supported function for a pin on E810, disable other
enabled pin that shares the same GPIO.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Cache original PTP GPIO requests instead of saving each parameter in
internal structures for periodic output or external timestamp request.
Factor out all periodic output register writes from ice_ptp_cfg_clkout
to a separate function to improve readability.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Instead of having separate PTP GPIO implementation for E810T, use
existing one from all other products.
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|