From bd8c624c0cd59de0032752ba3001c107bba97f7b Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Thu, 31 Mar 2022 09:20:06 -0700 Subject: ice: Clear default forwarding VSI during VSI release VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera Reviewed-by: Michal Swiatkowski Reviewed-by: Maciej Fijalkowski Signed-off-by: Alice Michael Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index b897926f817d..6d6233204388 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2983,6 +2983,8 @@ int ice_vsi_release(struct ice_vsi *vsi) } } + if (ice_is_vsi_dflt_vsi(pf->first_sw, vsi)) + ice_clear_dflt_vsi(pf->first_sw); ice_fltr_remove_all(vsi); ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx); err = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx); -- cgit From 2c0069f3f91f125b1b2ce66cc6bea8eb134723c3 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Thu, 31 Mar 2022 09:20:07 -0700 Subject: ice: Fix MAC address setting Commit 2ccc1c1ccc671b ("ice: Remove excess error variables") merged the usage of 'status' and 'err' variables into single one in function ice_set_mac_address(). Unfortunately this causes a regression when call of ice_fltr_add_mac() returns -EEXIST because this return value does not indicate an error in this case but value of 'err' remains to be -EEXIST till the end of the function and is returned to caller. Prior mentioned commit this does not happen because return value of ice_fltr_add_mac() was stored to 'status' variable first and if it was -EEXIST then 'err' remains to be zero. Fix the problem by reset 'err' to zero when ice_fltr_add_mac() returns -EEXIST. Fixes: 2ccc1c1ccc671b ("ice: Remove excess error variables") Signed-off-by: Ivan Vecera Reviewed-by: Jacob Keller Acked-by: Alexander Lobakin Signed-off-by: Alice Michael Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index b588d7995631..d755ce07869f 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5475,16 +5475,19 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi) /* Add filter for new MAC. If filter exists, return success */ err = ice_fltr_add_mac(vsi, mac, ICE_FWD_TO_VSI); - if (err == -EEXIST) + if (err == -EEXIST) { /* Although this MAC filter is already present in hardware it's * possible in some cases (e.g. bonding) that dev_addr was * modified outside of the driver and needs to be restored back * to this value. */ netdev_dbg(netdev, "filter for MAC %pM already exists\n", mac); - else if (err) + + return 0; + } else if (err) { /* error if the new filter addition failed */ err = -EADDRNOTAVAIL; + } err_update_filters: if (err) { -- cgit From 1273f89578f268ea705ddbad60c4bd2dcff80611 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Thu, 31 Mar 2022 09:20:08 -0700 Subject: ice: Fix broken IFF_ALLMULTI handling Handling of all-multicast flag and associated multicast promiscuous mode is broken in ice driver. When an user switches allmulticast flag on or off the driver checks whether any VLANs are configured over the interface (except default VLAN 0). If any extra VLANs are registered it enables multicast promiscuous mode for all these VLANs (including default VLAN 0) using ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all multicast packets tagged with known VLAN ID or untagged are received and multicast packets tagged with unknown VLAN ID ignored. If no extra VLANs are registered (so only VLAN 0 exists) it enables multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC look-up type. In this situation any multicast packets including tagged ones are received. The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way: ice_vsi_sync_fltr() { ... if (changed_flags & IFF_ALLMULTI) { if (netdev->flags & IFF_ALLMULTI) { if (vsi->num_vlans > 1) ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_set_promisc(..., ICE_MCAST_PROMISC_BITS); } else { if (vsi->num_vlans > 1) ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS); } } ... } The code above depends on value vsi->num_vlan that specifies number of VLANs configured over the interface (including VLAN 0) and this is problem because that value is modified in NDO callbacks ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid(). Scenario 1: 1. ip link set ens7f0 allmulticast on 2. ip link add vlan10 link ens7f0 type vlan id 10 3. ip link set ens7f0 allmulticast off 4. ip link set ens7f0 allmulticast on [1] In this scenario IFF_ALLMULTI is enabled and the driver calls ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs multicast promisc rule with non-VLAN look-up type. [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2 [3] Command switches IFF_ALLMULTI off and the driver calls ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this call is effectively NOP because it looks for multicast promisc rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such rules exist. So the all-multicast remains enabled silently in hardware. [4] Command tries to switch IFF_ALLMULTI on and the driver calls ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call fails (-EEXIST) because non-VLAN multicast promisc rule already exists. Scenario 2: 1. ip link add vlan10 link ens7f0 type vlan id 10 2. ip link set ens7f0 allmulticast on 3. ip link add vlan20 link ens7f0 type vlan id 20 4. ip link del vlan10 ; ip link del vlan20 5. ip link set ens7f0 allmulticast off [1] VLAN with ID 10 is added and vsi->num_vlan==2 [2] Command switches IFF_ALLMULTI on and driver installs multicast promisc rules with VLAN look-up type for VLAN 0 and 10 [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast promisc rules is added for this new VLAN so the interface does not receive MC packets from VLAN 20 [4] Both VLANs are removed but multicast rule for VLAN 10 remains installed so interface receives multicast packets from VLAN 10 [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1 the driver tries to remove multicast promisc rule for VLAN 0 with non-VLAN look-up that does not exist. All-multicast looks disabled from user point of view but it is partially enabled in HW (interface receives all multicast packets either untagged or tagged with VLAN ID 10) To resolve these issues the patch introduces these changes: 1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed and IFF_ALLMULTI is enabled an appropriate multicast promisc rule for that VLAN ID is added/removed. 2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_VLAN_PROMISC_BITS. 3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_PROMISC_BITS. 4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY bit protection to avoid races with ice_vsi_sync_fltr() that runs in ice_service_task() context. 5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed. 6. Error messages added to ice_fltr_*_vsi_promisc() helper functions to avoid them in their callers 7. Small improvements to increase readability Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Ivan Vecera Reviewed-by: Jacob Keller Signed-off-by: Alice Michael Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/ice/ice.h | 1 - drivers/net/ethernet/intel/ice/ice_fltr.c | 44 ++++++++++-- drivers/net/ethernet/intel/ice/ice_main.c | 114 +++++++++++++++++++++--------- 3 files changed, 121 insertions(+), 38 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index d4f1874df7d0..26eaee0b6503 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -301,7 +301,6 @@ enum ice_vsi_state { ICE_VSI_NETDEV_REGISTERED, ICE_VSI_UMAC_FLTR_CHANGED, ICE_VSI_MMAC_FLTR_CHANGED, - ICE_VSI_VLAN_FLTR_CHANGED, ICE_VSI_PROMISC_CHANGED, ICE_VSI_STATE_NBITS /* must be last */ }; diff --git a/drivers/net/ethernet/intel/ice/ice_fltr.c b/drivers/net/ethernet/intel/ice/ice_fltr.c index af57eb114966..85a94483c2ed 100644 --- a/drivers/net/ethernet/intel/ice/ice_fltr.c +++ b/drivers/net/ethernet/intel/ice/ice_fltr.c @@ -58,7 +58,16 @@ int ice_fltr_set_vlan_vsi_promisc(struct ice_hw *hw, struct ice_vsi *vsi, u8 promisc_mask) { - return ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_mask, false); + struct ice_pf *pf = hw->back; + int result; + + result = ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_mask, false); + if (result) + dev_err(ice_pf_to_dev(pf), + "Error setting promisc mode on VSI %i (rc=%d)\n", + vsi->vsi_num, result); + + return result; } /** @@ -73,7 +82,16 @@ int ice_fltr_clear_vlan_vsi_promisc(struct ice_hw *hw, struct ice_vsi *vsi, u8 promisc_mask) { - return ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_mask, true); + struct ice_pf *pf = hw->back; + int result; + + result = ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_mask, true); + if (result) + dev_err(ice_pf_to_dev(pf), + "Error clearing promisc mode on VSI %i (rc=%d)\n", + vsi->vsi_num, result); + + return result; } /** @@ -87,7 +105,16 @@ int ice_fltr_clear_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask, u16 vid) { - return ice_clear_vsi_promisc(hw, vsi_handle, promisc_mask, vid); + struct ice_pf *pf = hw->back; + int result; + + result = ice_clear_vsi_promisc(hw, vsi_handle, promisc_mask, vid); + if (result) + dev_err(ice_pf_to_dev(pf), + "Error clearing promisc mode on VSI %i for VID %u (rc=%d)\n", + ice_get_hw_vsi_num(hw, vsi_handle), vid, result); + + return result; } /** @@ -101,7 +128,16 @@ int ice_fltr_set_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask, u16 vid) { - return ice_set_vsi_promisc(hw, vsi_handle, promisc_mask, vid); + struct ice_pf *pf = hw->back; + int result; + + result = ice_set_vsi_promisc(hw, vsi_handle, promisc_mask, vid); + if (result) + dev_err(ice_pf_to_dev(pf), + "Error setting promisc mode on VSI %i for VID %u (rc=%d)\n", + ice_get_hw_vsi_num(hw, vsi_handle), vid, result); + + return result; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index d755ce07869f..1d2ca39add95 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -243,8 +243,7 @@ static int ice_add_mac_to_unsync_list(struct net_device *netdev, const u8 *addr) static bool ice_vsi_fltr_changed(struct ice_vsi *vsi) { return test_bit(ICE_VSI_UMAC_FLTR_CHANGED, vsi->state) || - test_bit(ICE_VSI_MMAC_FLTR_CHANGED, vsi->state) || - test_bit(ICE_VSI_VLAN_FLTR_CHANGED, vsi->state); + test_bit(ICE_VSI_MMAC_FLTR_CHANGED, vsi->state); } /** @@ -260,10 +259,15 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m) if (vsi->type != ICE_VSI_PF) return 0; - if (ice_vsi_has_non_zero_vlans(vsi)) - status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi, promisc_m); - else - status = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, promisc_m, 0); + if (ice_vsi_has_non_zero_vlans(vsi)) { + promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX); + status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi, + promisc_m); + } else { + status = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, + promisc_m, 0); + } + return status; } @@ -280,10 +284,15 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m) if (vsi->type != ICE_VSI_PF) return 0; - if (ice_vsi_has_non_zero_vlans(vsi)) - status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi, promisc_m); - else - status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, promisc_m, 0); + if (ice_vsi_has_non_zero_vlans(vsi)) { + promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX); + status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi, + promisc_m); + } else { + status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, + promisc_m, 0); + } + return status; } @@ -302,7 +311,6 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi) struct ice_pf *pf = vsi->back; struct ice_hw *hw = &pf->hw; u32 changed_flags = 0; - u8 promisc_m; int err; if (!vsi->netdev) @@ -320,7 +328,6 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi) if (ice_vsi_fltr_changed(vsi)) { clear_bit(ICE_VSI_UMAC_FLTR_CHANGED, vsi->state); clear_bit(ICE_VSI_MMAC_FLTR_CHANGED, vsi->state); - clear_bit(ICE_VSI_VLAN_FLTR_CHANGED, vsi->state); /* grab the netdev's addr_list_lock */ netif_addr_lock_bh(netdev); @@ -369,29 +376,15 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi) /* check for changes in promiscuous modes */ if (changed_flags & IFF_ALLMULTI) { if (vsi->current_netdev_flags & IFF_ALLMULTI) { - if (ice_vsi_has_non_zero_vlans(vsi)) - promisc_m = ICE_MCAST_VLAN_PROMISC_BITS; - else - promisc_m = ICE_MCAST_PROMISC_BITS; - - err = ice_set_promisc(vsi, promisc_m); + err = ice_set_promisc(vsi, ICE_MCAST_PROMISC_BITS); if (err) { - netdev_err(netdev, "Error setting Multicast promiscuous mode on VSI %i\n", - vsi->vsi_num); vsi->current_netdev_flags &= ~IFF_ALLMULTI; goto out_promisc; } } else { /* !(vsi->current_netdev_flags & IFF_ALLMULTI) */ - if (ice_vsi_has_non_zero_vlans(vsi)) - promisc_m = ICE_MCAST_VLAN_PROMISC_BITS; - else - promisc_m = ICE_MCAST_PROMISC_BITS; - - err = ice_clear_promisc(vsi, promisc_m); + err = ice_clear_promisc(vsi, ICE_MCAST_PROMISC_BITS); if (err) { - netdev_err(netdev, "Error clearing Multicast promiscuous mode on VSI %i\n", - vsi->vsi_num); vsi->current_netdev_flags |= IFF_ALLMULTI; goto out_promisc; } @@ -3488,6 +3481,20 @@ ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid) if (!vid) return 0; + while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) + usleep_range(1000, 2000); + + /* Add multicast promisc rule for the VLAN ID to be added if + * all-multicast is currently enabled. + */ + if (vsi->current_netdev_flags & IFF_ALLMULTI) { + ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_VLAN_PROMISC_BITS, + vid); + if (ret) + goto finish; + } + vlan_ops = ice_get_compat_vsi_vlan_ops(vsi); /* Add a switch rule for this VLAN ID so its corresponding VLAN tagged @@ -3495,8 +3502,23 @@ ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid) */ vlan = ICE_VLAN(be16_to_cpu(proto), vid, 0); ret = vlan_ops->add_vlan(vsi, &vlan); - if (!ret) - set_bit(ICE_VSI_VLAN_FLTR_CHANGED, vsi->state); + if (ret) + goto finish; + + /* If all-multicast is currently enabled and this VLAN ID is only one + * besides VLAN-0 we have to update look-up type of multicast promisc + * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. + */ + if ((vsi->current_netdev_flags & IFF_ALLMULTI) && + ice_vsi_num_non_zero_vlans(vsi) == 1) { + ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_PROMISC_BITS, 0); + ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_VLAN_PROMISC_BITS, 0); + } + +finish: + clear_bit(ICE_CFG_BUSY, vsi->state); return ret; } @@ -3522,6 +3544,9 @@ ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid) if (!vid) return 0; + while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) + usleep_range(1000, 2000); + vlan_ops = ice_get_compat_vsi_vlan_ops(vsi); /* Make sure VLAN delete is successful before updating VLAN @@ -3530,10 +3555,33 @@ ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid) vlan = ICE_VLAN(be16_to_cpu(proto), vid, 0); ret = vlan_ops->del_vlan(vsi, &vlan); if (ret) - return ret; + goto finish; - set_bit(ICE_VSI_VLAN_FLTR_CHANGED, vsi->state); - return 0; + /* Remove multicast promisc rule for the removed VLAN ID if + * all-multicast is enabled. + */ + if (vsi->current_netdev_flags & IFF_ALLMULTI) + ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_VLAN_PROMISC_BITS, vid); + + if (!ice_vsi_has_non_zero_vlans(vsi)) { + /* Update look-up type of multicast promisc rule for VLAN 0 + * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when + * all-multicast is enabled and VLAN 0 is the only VLAN rule. + */ + if (vsi->current_netdev_flags & IFF_ALLMULTI) { + ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_VLAN_PROMISC_BITS, + 0); + ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, + ICE_MCAST_PROMISC_BITS, 0); + } + } + +finish: + clear_bit(ICE_CFG_BUSY, vsi->state); + + return ret; } /** -- cgit From ccfee1822042b87e5135d33cad8ea353e64612d2 Mon Sep 17 00:00:00 2001 From: Anatolii Gerasymenko Date: Mon, 4 Apr 2022 11:35:47 -0700 Subject: ice: Set txq_teid to ICE_INVAL_TEID on ring creation When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko Reviewed-by: Maciej Fijalkowski Tested-by: Konrad Jankowski Signed-off-by: Alice Michael Signed-off-by: Tony Nguyen Signed-off-by: Paolo Abeni --- drivers/net/ethernet/intel/ice/ice_lib.c | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 6d6233204388..2774cbd5b12a 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1480,6 +1480,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi) ring->tx_tstamps = &pf->ptp.port.tx; ring->dev = dev; ring->count = vsi->num_tx_desc; + ring->txq_teid = ICE_INVAL_TEID; if (dvm_ena) ring->flags |= ICE_TX_FLAGS_RING_VLAN_L2TAG2; else -- cgit From 05ef6813b234db3196f083b91db3963f040b65bb Mon Sep 17 00:00:00 2001 From: Anatolii Gerasymenko Date: Mon, 4 Apr 2022 11:35:48 -0700 Subject: ice: Do not skip not enabled queues in ice_vc_dis_qs_msg Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko Tested-by: Konrad Jankowski Signed-off-by: Alice Michael Signed-off-by: Tony Nguyen Signed-off-by: Paolo Abeni --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 3f1a63815bac..69ff4b929772 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -1358,9 +1358,9 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) goto error_param; } - /* Skip queue if not enabled */ if (!test_bit(vf_q_id, vf->txq_ena)) - continue; + dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", + vf_q_id, vsi->vsi_num); ice_fill_txq_meta(vsi, ring, &txq_meta); -- cgit From f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 17 Mar 2022 19:36:27 +0100 Subject: ice: synchronize_rcu() when terminating rings Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski Tested-by: Shwetha Nagaraju Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice.h | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 4 +++- drivers/net/ethernet/intel/ice/ice_xsk.c | 4 +++- 3 files changed, 7 insertions(+), 3 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 26eaee0b6503..8ed3c9ab7ff7 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -671,7 +671,7 @@ static inline struct ice_pf *ice_netdev_to_pf(struct net_device *netdev) static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi) { - return !!vsi->xdp_prog; + return !!READ_ONCE(vsi->xdp_prog); } static inline void ice_set_ring_xdp(struct ice_tx_ring *ring) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 1d2ca39add95..d2039a9306b8 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2758,8 +2758,10 @@ free_qmap: ice_for_each_xdp_txq(vsi, i) if (vsi->xdp_rings[i]) { - if (vsi->xdp_rings[i]->desc) + if (vsi->xdp_rings[i]->desc) { + synchronize_rcu(); ice_free_tx_ring(vsi->xdp_rings[i]); + } kfree_rcu(vsi->xdp_rings[i], rcu); vsi->xdp_rings[i] = NULL; } diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index dfbcaf08520e..33b28a72ffcb 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -41,8 +41,10 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx) static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx) { ice_clean_tx_ring(vsi->tx_rings[q_idx]); - if (ice_is_xdp_ena_vsi(vsi)) + if (ice_is_xdp_ena_vsi(vsi)) { + synchronize_rcu(); ice_clean_tx_ring(vsi->xdp_rings[q_idx]); + } ice_clean_rx_ring(vsi->rx_rings[q_idx]); } -- cgit From 72b915a2b444e9247c9d424a840e94263db07c27 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 17 Mar 2022 19:36:28 +0100 Subject: ice: xsk: fix VSI state check in ice_xsk_wakeup() ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski Tested-by: Shwetha Nagaraju Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 33b28a72ffcb..866ee4df9671 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -920,7 +920,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, struct ice_vsi *vsi = np->vsi; struct ice_tx_ring *ring; - if (test_bit(ICE_DOWN, vsi->state)) + if (test_bit(ICE_VSI_DOWN, vsi->state)) return -ENETDOWN; if (!ice_is_xdp_ena_vsi(vsi)) -- cgit From e19778e6c911691856447c3bf9617f00b3e1347f Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 17 Mar 2022 19:36:29 +0100 Subject: ice: clear cmd_type_offset_bsz for TX rings Currently when XDP rings are created, each descriptor gets its DD bit set, which turns out to be the wrong approach as it can lead to a situation where more descriptors get cleaned than it was supposed to, e.g. when AF_XDP busy poll is run with a large batch size. In this situation, the driver would request for more buffers than it is able to handle. Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They should be initialized to zero instead. Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads") Signed-off-by: Maciej Fijalkowski Tested-by: Shwetha Nagaraju Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index d2039a9306b8..d768925785ca 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2562,7 +2562,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) spin_lock_init(&xdp_ring->tx_lock); for (j = 0; j < xdp_ring->count; j++) { tx_desc = ICE_TX_DESC(xdp_ring, j); - tx_desc->cmd_type_offset_bsz = cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE); + tx_desc->cmd_type_offset_bsz = 0; } } -- cgit From d7442f512b71fc63a99c8a801422dde4fbbf9f93 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Mon, 4 Apr 2022 18:15:09 +0200 Subject: ice: arfs: fix use-after-free when freeing @rx_cpu_rmap The CI testing bots triggered the following splat: [ 718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80 [ 718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834 [ 718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S W IOE 5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1 [ 718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 718.223418] Call Trace: [ 718.227139] [ 718.230783] dump_stack_lvl+0x33/0x42 [ 718.234431] print_address_description.constprop.9+0x21/0x170 [ 718.238177] ? free_irq_cpu_rmap+0x53/0x80 [ 718.241885] ? free_irq_cpu_rmap+0x53/0x80 [ 718.245539] kasan_report.cold.18+0x7f/0x11b [ 718.249197] ? free_irq_cpu_rmap+0x53/0x80 [ 718.252852] free_irq_cpu_rmap+0x53/0x80 [ 718.256471] ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice] [ 718.260174] ice_remove_arfs+0x5f/0x70 [ice] [ 718.263810] ice_rebuild_arfs+0x3b/0x70 [ice] [ 718.267419] ice_rebuild+0x39c/0xb60 [ice] [ 718.270974] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 718.274472] ? ice_init_phy_user_cfg+0x360/0x360 [ice] [ 718.278033] ? delay_tsc+0x4a/0xb0 [ 718.281513] ? preempt_count_sub+0x14/0xc0 [ 718.284984] ? delay_tsc+0x8f/0xb0 [ 718.288463] ice_do_reset+0x92/0xf0 [ice] [ 718.292014] ice_pci_err_resume+0x91/0xf0 [ice] [ 718.295561] pci_reset_function+0x53/0x80 <...> [ 718.393035] Allocated by task 690: [ 718.433497] Freed by task 20834: [ 718.495688] Last potentially related work creation: [ 718.568966] The buggy address belongs to the object at ffff8881bd127e00 which belongs to the cache kmalloc-96 of size 96 [ 718.574085] The buggy address is located 0 bytes inside of 96-byte region [ffff8881bd127e00, ffff8881bd127e60) [ 718.579265] The buggy address belongs to the page: [ 718.598905] Memory state around the buggy address: [ 718.601809] ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 718.604796] ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc [ 718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 718.610811] ^ [ 718.613819] ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 718.617107] ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc This is due to that free_irq_cpu_rmap() is always being called *after* (devm_)free_irq() and thus it tries to work with IRQ descs already freed. For example, on device reset the driver frees the rmap right before allocating a new one (the splat above). Make rmap creation and freeing function symmetrical with {request,free}_irq() calls i.e. do that on ifup/ifdown instead of device probe/remove/resume. These operations can be performed independently from the actual device aRFS configuration. Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers only when aRFS is disabled -- otherwise, CPU rmap sets and clears its own and they must not be touched manually. Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Co-developed-by: Ivan Vecera Signed-off-by: Ivan Vecera Signed-off-by: Alexander Lobakin Tested-by: Ivan Vecera Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_arfs.c | 9 ++------- drivers/net/ethernet/intel/ice/ice_lib.c | 5 ++++- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++++++---------- 3 files changed, 14 insertions(+), 18 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c b/drivers/net/ethernet/intel/ice/ice_arfs.c index 5daade32ea62..fba178e07600 100644 --- a/drivers/net/ethernet/intel/ice/ice_arfs.c +++ b/drivers/net/ethernet/intel/ice/ice_arfs.c @@ -577,7 +577,7 @@ void ice_free_cpu_rx_rmap(struct ice_vsi *vsi) { struct net_device *netdev; - if (!vsi || vsi->type != ICE_VSI_PF || !vsi->arfs_fltr_list) + if (!vsi || vsi->type != ICE_VSI_PF) return; netdev = vsi->netdev; @@ -599,7 +599,7 @@ int ice_set_cpu_rx_rmap(struct ice_vsi *vsi) int base_idx, i; if (!vsi || vsi->type != ICE_VSI_PF) - return -EINVAL; + return 0; pf = vsi->back; netdev = vsi->netdev; @@ -636,7 +636,6 @@ void ice_remove_arfs(struct ice_pf *pf) if (!pf_vsi) return; - ice_free_cpu_rx_rmap(pf_vsi); ice_clear_arfs(pf_vsi); } @@ -653,9 +652,5 @@ void ice_rebuild_arfs(struct ice_pf *pf) return; ice_remove_arfs(pf); - if (ice_set_cpu_rx_rmap(pf_vsi)) { - dev_err(ice_pf_to_dev(pf), "Failed to rebuild aRFS\n"); - return; - } ice_init_arfs(pf_vsi); } diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 2774cbd5b12a..6d19c58ccacd 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2689,6 +2689,8 @@ void ice_vsi_free_irq(struct ice_vsi *vsi) return; vsi->irqs_ready = false; + ice_free_cpu_rx_rmap(vsi); + ice_for_each_q_vector(vsi, i) { u16 vector = i + base; int irq_num; @@ -2702,7 +2704,8 @@ void ice_vsi_free_irq(struct ice_vsi *vsi) continue; /* clear the affinity notifier in the IRQ descriptor */ - irq_set_affinity_notifier(irq_num, NULL); + if (!IS_ENABLED(CONFIG_RFS_ACCEL)) + irq_set_affinity_notifier(irq_num, NULL); /* clear the affinity_mask in the IRQ descriptor */ irq_set_affinity_hint(irq_num, NULL); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index d768925785ca..5b1198859da7 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2510,6 +2510,13 @@ static int ice_vsi_req_irq_msix(struct ice_vsi *vsi, char *basename) irq_set_affinity_hint(irq_num, &q_vector->affinity_mask); } + err = ice_set_cpu_rx_rmap(vsi); + if (err) { + netdev_err(vsi->netdev, "Failed to setup CPU RMAP on VSI %u: %pe\n", + vsi->vsi_num, ERR_PTR(err)); + goto free_q_irqs; + } + vsi->irqs_ready = true; return 0; @@ -3692,20 +3699,12 @@ static int ice_setup_pf_sw(struct ice_pf *pf) */ ice_napi_add(vsi); - status = ice_set_cpu_rx_rmap(vsi); - if (status) { - dev_err(dev, "Failed to set CPU Rx map VSI %d error %d\n", - vsi->vsi_num, status); - goto unroll_napi_add; - } status = ice_init_mac_fltr(pf); if (status) - goto free_cpu_rx_map; + goto unroll_napi_add; return 0; -free_cpu_rx_map: - ice_free_cpu_rx_rmap(vsi); unroll_napi_add: ice_tc_indir_block_unregister(vsi); unroll_cfg_netdev: @@ -5167,7 +5166,6 @@ static int __maybe_unused ice_suspend(struct device *dev) continue; ice_vsi_free_q_vectors(pf->vsi[v]); } - ice_free_cpu_rx_rmap(ice_get_main_vsi(pf)); ice_clear_interrupt_scheme(pf); pci_save_state(pdev); -- cgit From 7d59706dbef8de83b3662026766507bc494223d7 Mon Sep 17 00:00:00 2001 From: Mateusz Palczewski Date: Thu, 24 Mar 2022 14:19:15 +0100 Subject: Revert "iavf: Fix deadlock occurrence during resetting VF interface" This change caused a regression with resetting while changing network namespaces. By clearing the IFF_UP flag, the kernel now thinks it has fully closed the device. This reverts commit 0cc318d2e8408bc0ffb4662a0c3e5e57005ac6ff. Fixes: 0cc318d2e840 ("iavf: Fix deadlock occurrence during resetting VF interface") Signed-off-by: Mateusz Palczewski Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 190590d32faf..7dfcf78b57fb 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -2871,7 +2871,6 @@ continue_reset: running = adapter->state == __IAVF_RUNNING; if (running) { - netdev->flags &= ~IFF_UP; netif_carrier_off(netdev); netif_tx_stop_all_queues(netdev); adapter->link_up = false; @@ -2988,7 +2987,7 @@ continue_reset: * to __IAVF_RUNNING */ iavf_up_complete(adapter); - netdev->flags |= IFF_UP; + iavf_irq_enable(adapter, true); } else { iavf_change_state(adapter, __IAVF_DOWN); @@ -3004,10 +3003,8 @@ continue_reset: reset_err: mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); - if (running) { + if (running) iavf_change_state(adapter, __IAVF_RUNNING); - netdev->flags |= IFF_UP; - } dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); iavf_close(netdev); } -- cgit From 907862e9aef75bf89e2b265efcc58870be06081e Mon Sep 17 00:00:00 2001 From: Sasha Neftin Date: Tue, 1 Mar 2022 15:32:10 +0200 Subject: igc: Fix infinite loop in release_swfw_sync An infinite loop may occur if we fail to acquire the HW semaphore, which is needed for resource release. This will typically happen if the hardware is surprise-removed. At this stage there is nothing to do, except log an error and quit. Fixes: c0071c7aa5fe ("igc: Add HW initialization code") Suggested-by: Dima Ruinskiy Signed-off-by: Sasha Neftin Tested-by: Naama Meir Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/igc/igc_i225.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/igc/igc_i225.c b/drivers/net/ethernet/intel/igc/igc_i225.c index 66ea566488d1..59d5c467ea6e 100644 --- a/drivers/net/ethernet/intel/igc/igc_i225.c +++ b/drivers/net/ethernet/intel/igc/igc_i225.c @@ -156,8 +156,15 @@ void igc_release_swfw_sync_i225(struct igc_hw *hw, u16 mask) { u32 swfw_sync; - while (igc_get_hw_semaphore_i225(hw)) - ; /* Empty */ + /* Releasing the resource requires first getting the HW semaphore. + * If we fail to get the semaphore, there is nothing we can do, + * except log an error and quit. We are not allowed to hang here + * indefinitely, as it may cause denial of service or system crash. + */ + if (igc_get_hw_semaphore_i225(hw)) { + hw_dbg("Failed to release SW_FW_SYNC.\n"); + return; + } swfw_sync = rd32(IGC_SW_FW_SYNC); swfw_sync &= ~mask; -- cgit From c80a29f0fe9b6f5457e0788e27d1110577eba99b Mon Sep 17 00:00:00 2001 From: Sasha Neftin Date: Wed, 9 Mar 2022 08:19:19 +0200 Subject: igc: Fix BUG: scheduling while atomic Replace usleep_range() method with udelay() method to allow atomic contexts in low-level MDIO access functions. The following issue can be seen by doing the following: $ modprobe -r bonding $ modprobe -v bonding max_bonds=1 mode=1 miimon=100 use_carrier=0 $ ip link set bond0 up $ ifenslave bond0 eth0 eth1 [ 982.357308] BUG: scheduling while atomic: kworker/u64:0/9/0x00000002 [ 982.364431] INFO: lockdep is turned off. [ 982.368824] Modules linked in: bonding sctp ip6_udp_tunnel udp_tunnel mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core nfp tls sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate intel_uncore pcspkr lpc_ich mei_me ipmi_ssif mei ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci crc32c_intel libata i2c_algo_bit tg3 megaraid_sas igc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: bonding] [ 982.437941] CPU: 25 PID: 9 Comm: kworker/u64:0 Kdump: loaded Tainted: G W --------- - - 4.18.0-348.el8.x86_64+debug #1 [ 982.451333] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 2.7.0 12/005/2017 [ 982.459791] Workqueue: bond0 bond_mii_monitor [bonding] [ 982.465622] Call Trace: [ 982.468355] dump_stack+0x8e/0xd0 [ 982.472056] __schedule_bug.cold.60+0x3a/0x60 [ 982.476919] __schedule+0x147b/0x1bc0 [ 982.481007] ? firmware_map_remove+0x16b/0x16b [ 982.485967] ? hrtimer_fixup_init+0x40/0x40 [ 982.490625] schedule+0xd9/0x250 [ 982.494227] schedule_hrtimeout_range_clock+0x10d/0x2c0 [ 982.500058] ? hrtimer_nanosleep_restart+0x130/0x130 [ 982.505598] ? hrtimer_init_sleeper_on_stack+0x90/0x90 [ 982.511332] ? usleep_range+0x88/0x130 [ 982.515514] ? recalibrate_cpu_khz+0x10/0x10 [ 982.520279] ? ktime_get+0xab/0x1c0 [ 982.524175] ? usleep_range+0x88/0x130 [ 982.528355] usleep_range+0xdd/0x130 [ 982.532344] ? console_conditional_schedule+0x30/0x30 [ 982.537987] ? igc_put_hw_semaphore+0x17/0x60 [igc] [ 982.543432] igc_read_phy_reg_gpy+0x111/0x2b0 [igc] [ 982.548887] igc_phy_has_link+0xfa/0x260 [igc] [ 982.553847] ? igc_get_phy_id+0x210/0x210 [igc] [ 982.558894] ? lock_acquire+0x34d/0x890 [ 982.563187] ? lock_downgrade+0x710/0x710 [ 982.567659] ? rcu_read_unlock+0x50/0x50 [ 982.572039] igc_check_for_copper_link+0x106/0x210 [igc] [ 982.577970] ? igc_config_fc_after_link_up+0x840/0x840 [igc] [ 982.584286] ? rcu_read_unlock+0x50/0x50 [ 982.588661] ? lock_release+0x591/0xb80 [ 982.592939] ? lock_release+0x591/0xb80 [ 982.597220] igc_has_link+0x113/0x330 [igc] [ 982.601887] ? lock_downgrade+0x710/0x710 [ 982.606362] igc_ethtool_get_link+0x6d/0x90 [igc] [ 982.611614] bond_check_dev_link+0x131/0x2c0 [bonding] [ 982.617350] ? bond_time_in_interval+0xd0/0xd0 [bonding] [ 982.623277] ? rcu_read_lock_held+0x62/0xc0 [ 982.627944] ? rcu_read_lock_sched_held+0xe0/0xe0 [ 982.633198] bond_mii_monitor+0x314/0x2500 [bonding] [ 982.638738] ? lock_contended+0x880/0x880 [ 982.643214] ? bond_miimon_link_change+0xa0/0xa0 [bonding] [ 982.649336] ? lock_acquire+0x34d/0x890 [ 982.653615] ? lock_downgrade+0x710/0x710 [ 982.658089] ? debug_object_deactivate+0x221/0x340 [ 982.663436] ? rcu_read_unlock+0x50/0x50 [ 982.667811] ? debug_print_object+0x2b0/0x2b0 [ 982.672672] ? __switch_to_asm+0x41/0x70 [ 982.677049] ? __switch_to_asm+0x35/0x70 [ 982.681426] ? _raw_spin_unlock_irq+0x24/0x40 [ 982.686288] ? trace_hardirqs_on+0x20/0x195 [ 982.690956] ? _raw_spin_unlock_irq+0x24/0x40 [ 982.695818] process_one_work+0x8f0/0x1770 [ 982.700390] ? pwq_dec_nr_in_flight+0x320/0x320 [ 982.705443] ? debug_show_held_locks+0x50/0x50 [ 982.710403] worker_thread+0x87/0xb40 [ 982.714489] ? process_one_work+0x1770/0x1770 [ 982.719349] kthread+0x344/0x410 [ 982.722950] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 982.728975] ret_from_fork+0x3a/0x50 Fixes: 5586838fe9ce ("igc: Add code for PHY support") Reported-by: Corinna Vinschen Suggested-by: Dima Ruinskiy Signed-off-by: Sasha Neftin Tested-by: Corinna Vinschen Tested-by: Naama Meir Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/igc/igc_phy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c index 40dbf4b43234..6961f65d36b9 100644 --- a/drivers/net/ethernet/intel/igc/igc_phy.c +++ b/drivers/net/ethernet/intel/igc/igc_phy.c @@ -581,7 +581,7 @@ static s32 igc_read_phy_reg_mdic(struct igc_hw *hw, u32 offset, u16 *data) * the lower time out */ for (i = 0; i < IGC_GEN_POLL_TIMEOUT; i++) { - usleep_range(500, 1000); + udelay(50); mdic = rd32(IGC_MDIC); if (mdic & IGC_MDIC_READY) break; @@ -638,7 +638,7 @@ static s32 igc_write_phy_reg_mdic(struct igc_hw *hw, u32 offset, u16 data) * the lower time out */ for (i = 0; i < IGC_GEN_POLL_TIMEOUT; i++) { - usleep_range(500, 1000); + udelay(50); mdic = rd32(IGC_MDIC); if (mdic & IGC_MDIC_READY) break; -- cgit From 822f52e7efdc88fccffb9fbf6250a4b7666a0b0f Mon Sep 17 00:00:00 2001 From: Vinicius Costa Gomes Date: Tue, 12 Apr 2022 18:58:15 -0700 Subject: igc: Fix suspending when PTM is active Some mainboard/CPU combinations, in particular, Alder Lake-S with a W680 mainboard, have shown problems (system hangs usually, no kernel logs) with suspend/resume when PCIe PTM is enabled and active. In some cases, it could be reproduced when removing the igc module. The best we can do is to stop PTM dialogs from the downstream/device side before the interface is brought down. PCIe PTM will be re-enabled when the interface is being brought up. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Vinicius Costa Gomes Tested-by: Naama Meir Acked-by: Paul Menzel Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/igc/igc_ptp.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c index 0d6e3215e98f..653e9f1e35b5 100644 --- a/drivers/net/ethernet/intel/igc/igc_ptp.c +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c @@ -992,6 +992,17 @@ static void igc_ptp_time_restore(struct igc_adapter *adapter) igc_ptp_write_i225(adapter, &ts); } +static void igc_ptm_stop(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u32 ctrl; + + ctrl = rd32(IGC_PTM_CTRL); + ctrl &= ~IGC_PTM_CTRL_EN; + + wr32(IGC_PTM_CTRL, ctrl); +} + /** * igc_ptp_suspend - Disable PTP work items and prepare for suspend * @adapter: Board private structure @@ -1009,8 +1020,10 @@ void igc_ptp_suspend(struct igc_adapter *adapter) adapter->ptp_tx_skb = NULL; clear_bit_unlock(__IGC_PTP_TX_IN_PROGRESS, &adapter->state); - if (pci_device_is_present(adapter->pdev)) + if (pci_device_is_present(adapter->pdev)) { igc_ptp_time_save(adapter); + igc_ptm_stop(adapter); + } } /** -- cgit From 04ebaa1cfddae5f240cc7404f009133bb0389a47 Mon Sep 17 00:00:00 2001 From: Sasha Neftin Date: Tue, 5 Apr 2022 18:56:01 +0300 Subject: e1000e: Fix possible overflow in LTR decoding When we decode the latency and the max_latency, u16 value may not fit the required size and could lead to the wrong LTR representation. Scaling is represented as: scale 0 - 1 (2^(5*0)) = 2^0 scale 1 - 32 (2^(5 *1))= 2^5 scale 2 - 1024 (2^(5 *2)) =2^10 scale 3 - 32768 (2^(5 *3)) =2^15 scale 4 - 1048576 (2^(5 *4)) = 2^20 scale 5 - 33554432 (2^(5 *4)) = 2^25 scale 4 and scale 5 required 20 and 25 bits respectively. scale 6 reserved. Replace the u16 type with the u32 type and allow corrected LTR representation. Cc: stable@vger.kernel.org Fixes: 44a13a5d99c7 ("e1000e: Fix the max snoop/no-snoop latency for 10M") Reported-by: James Hutchinson Link: https://bugzilla.kernel.org/show_bug.cgi?id=215689 Suggested-by: Dima Ruinskiy Signed-off-by: Sasha Neftin Tested-by: Naama Meir Tested-by: James Hutchinson Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index d60e2016d03c..e6c8e6d5234f 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1009,8 +1009,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link) { u32 reg = link << (E1000_LTRV_REQ_SHIFT + E1000_LTRV_NOSNOOP_SHIFT) | link << E1000_LTRV_REQ_SHIFT | E1000_LTRV_SEND; - u16 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */ - u16 lat_enc_d = 0; /* latency decoded */ + u32 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */ + u32 lat_enc_d = 0; /* latency decoded */ u16 lat_enc = 0; /* latency encoded */ if (link) { -- cgit From d1fc4c6feac18f893e55aeefa267a281e132c7b7 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Thu, 17 Mar 2022 18:57:27 +0100 Subject: ice: xsk: check if Rx ring was filled up to the end __ice_alloc_rx_bufs_zc() checks if a number of the descriptors to be allocated would cause the ring wrap. In that case, driver will issue two calls to xsk_buff_alloc_batch() - one that will fill the ring up to the end and the second one that will start with filling descriptors from the beginning of the ring. ice_fill_rx_descs() is a wrapper for taking care of what xsk_buff_alloc_batch() gave back to the driver. It works in a best effort approach, so for example when driver asks for 64 buffers, ice_fill_rx_descs() could assign only 32. Such case needs to be checked when ring is being filled up to the end, because in that situation ntu might not reached the end of the ring. Fix the ring wrap by checking if nb_buffs_extra has the expected value. If not, bump ntu and go directly to tail update. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Magnus Karlsson Signed-off-by: Maciej Fijalkowski Tested-by: Shwetha Nagaraju Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_xsk.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 866ee4df9671..9dd38f667059 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -415,8 +415,8 @@ static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp, */ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) { + u32 nb_buffs_extra = 0, nb_buffs = 0; union ice_32b_rx_flex_desc *rx_desc; - u32 nb_buffs_extra = 0, nb_buffs; u16 ntu = rx_ring->next_to_use; u16 total_count = count; struct xdp_buff **xdp; @@ -428,6 +428,10 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) nb_buffs_extra = ice_fill_rx_descs(rx_ring->xsk_pool, xdp, rx_desc, rx_ring->count - ntu); + if (nb_buffs_extra != rx_ring->count - ntu) { + ntu += nb_buffs_extra; + goto exit; + } rx_desc = ICE_RX_DESC(rx_ring, 0); xdp = ice_xdp_buf(rx_ring, 0); ntu = 0; @@ -441,6 +445,7 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) if (ntu == rx_ring->count) ntu = 0; +exit: if (rx_ring->next_to_use != ntu) ice_release_rx_desc(rx_ring, ntu); -- cgit From aacca7a83b9753c562395ef328352dfd8c003c59 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Fri, 1 Apr 2022 14:54:38 +0200 Subject: ice: allow creating VFs for !CONFIG_NET_SWITCHDEV Currently for !CONFIG_NET_SWITCHDEV kernel builds it is not possible to create VFs properly as call to ice_eswitch_configure() returns -EOPNOTSUPP for us. This is because CONFIG_ICE_SWITCHDEV depends on CONFIG_NET_SWITCHDEV. Change the ice_eswitch_configure() implementation for !CONFIG_ICE_SWITCHDEV to return 0 instead -EOPNOTSUPP and let ice_ena_vfs() finish its work properly. CC: Grzegorz Nitka Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Signed-off-by: Maciej Fijalkowski Signed-off-by: Michal Swiatkowski Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_eswitch.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.h b/drivers/net/ethernet/intel/ice/ice_eswitch.h index bd58d9d2e565..6a413331572b 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.h +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.h @@ -52,7 +52,7 @@ static inline void ice_eswitch_update_repr(struct ice_vsi *vsi) { } static inline int ice_eswitch_configure(struct ice_pf *pf) { - return -EOPNOTSUPP; + return 0; } static inline int ice_eswitch_rebuild(struct ice_pf *pf) -- cgit From d201665147ae788b7cca9fab58a1826f64152034 Mon Sep 17 00:00:00 2001 From: Wojciech Drewek Date: Fri, 8 Apr 2022 09:56:10 +0200 Subject: ice: fix crash in switchdev mode Below steps end up with crash: - modprobe ice - devlink dev eswitch set $PF1_PCI mode switchdev - echo 64 > /sys/class/net/$PF1/device/sriov_numvfs - rmmod ice Calling ice_eswitch_port_start_xmit while the process of removing VFs is in progress ends up with NULL pointer dereference. That's because PR netdev is not released but some resources are already freed. Fix it by checking if ICE_VF_DIS bit is set. Call trace: [ 1379.595146] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 1379.595284] #PF: supervisor read access in kernel mode [ 1379.595410] #PF: error_code(0x0000) - not-present page [ 1379.595535] PGD 0 P4D 0 [ 1379.595657] Oops: 0000 [#1] PREEMPT SMP PTI [ 1379.595783] CPU: 4 PID: 974 Comm: NetworkManager Kdump: loaded Tainted: G OE 5.17.0-rc8_mrq_dev-queue+ #12 [ 1379.595926] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0042.013020190050 01/30/2019 [ 1379.596063] RIP: 0010:ice_eswitch_port_start_xmit+0x46/0xd0 [ice] [ 1379.596292] Code: c7 c8 09 00 00 e8 9a c9 fc ff 84 c0 0f 85 82 00 00 00 4c 89 e7 e8 ca 70 fe ff 48 8b 7d 58 48 89 c3 48 85 ff 75 5e 48 8b 53 20 <8b> 42 40 85 c0 74 78 8d 48 01 f0 0f b1 4a 40 75 f2 0f b6 95 84 00 [ 1379.596456] RSP: 0018:ffffaba0c0d7bad0 EFLAGS: 00010246 [ 1379.596584] RAX: ffff969c14c71680 RBX: ffff969c14c71680 RCX: 000100107a0f0000 [ 1379.596715] RDX: 0000000000000000 RSI: ffff969b9d631000 RDI: 0000000000000000 [ 1379.596846] RBP: ffff969c07b46500 R08: ffff969becfca8ac R09: 0000000000000001 [ 1379.596977] R10: 0000000000000004 R11: ffffaba0c0d7bbec R12: ffff969b9d631000 [ 1379.597106] R13: ffffffffc08357a0 R14: ffff969c07b46500 R15: ffff969b9d631000 [ 1379.597237] FS: 00007f72c0e25c80(0000) GS:ffff969f13500000(0000) knlGS:0000000000000000 [ 1379.597414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1379.597562] CR2: 0000000000000040 CR3: 000000012b316006 CR4: 00000000003706e0 [ 1379.597713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1379.597863] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1379.598015] Call Trace: [ 1379.598153] [ 1379.598294] dev_hard_start_xmit+0xd9/0x220 [ 1379.598444] sch_direct_xmit+0x8a/0x340 [ 1379.598592] __dev_queue_xmit+0xa3c/0xd30 [ 1379.598739] ? packet_parse_headers+0xb4/0xf0 [ 1379.598890] packet_sendmsg+0xa15/0x1620 [ 1379.599038] ? __check_object_size+0x46/0x140 [ 1379.599186] sock_sendmsg+0x5e/0x60 [ 1379.599330] ____sys_sendmsg+0x22c/0x270 [ 1379.599474] ? import_iovec+0x17/0x20 [ 1379.599622] ? sendmsg_copy_msghdr+0x59/0x90 [ 1379.599771] ___sys_sendmsg+0x81/0xc0 [ 1379.599917] ? __pollwait+0xd0/0xd0 [ 1379.600061] ? preempt_count_add+0x68/0xa0 [ 1379.600210] ? _raw_write_lock_irq+0x1a/0x40 [ 1379.600369] ? ep_done_scan+0xc9/0x110 [ 1379.600494] ? _raw_spin_unlock_irqrestore+0x25/0x40 [ 1379.600622] ? preempt_count_add+0x68/0xa0 [ 1379.600747] ? _raw_spin_lock_irq+0x1a/0x40 [ 1379.600899] ? __fget_light+0x8f/0x110 [ 1379.601024] __sys_sendmsg+0x49/0x80 [ 1379.601148] ? release_ds_buffers+0x50/0xe0 [ 1379.601274] do_syscall_64+0x3b/0x90 [ 1379.601399] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1379.601525] RIP: 0033:0x7f72c1e2e35d Fixes: f5396b8a663f ("ice: switchdev slow path") Signed-off-by: Wojciech Drewek Reported-by: Marcin Szycik Reviewed-by: Michal Swiatkowski Tested-by: Sandeep Penigalapati Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c index 9a84d746a6c4..6a463b242c7d 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c @@ -361,7 +361,8 @@ ice_eswitch_port_start_xmit(struct sk_buff *skb, struct net_device *netdev) np = netdev_priv(netdev); vsi = np->vsi; - if (ice_is_reset_in_progress(vsi->back->state)) + if (ice_is_reset_in_progress(vsi->back->state) || + test_bit(ICE_VF_DIS, vsi->back->state)) return NETDEV_TX_BUSY; repr = ice_netdev_to_repr(netdev); -- cgit From 7c8881b77908a51814a050da408c89f1a25b7fb7 Mon Sep 17 00:00:00 2001 From: Jianglei Nie Date: Sat, 2 Apr 2022 10:17:15 +0800 Subject: ice: Fix memory leak in ice_get_orom_civd_data() A memory chunk was allocated for orom_data in ice_get_orom_civd_data() by vzmalloc(). But when ice_read_flash_module() fails, the allocated memory is not freed, which will lead to a memory leak. We can fix it by freeing the orom_data when ce_read_flash_module() fails. Fixes: af18d8866c80 ("ice: reduce time to read Option ROM CIVD data") Signed-off-by: Jianglei Nie Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_nvm.c | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c index 4eb0599714f4..13cdb5ea594d 100644 --- a/drivers/net/ethernet/intel/ice/ice_nvm.c +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c @@ -641,6 +641,7 @@ ice_get_orom_civd_data(struct ice_hw *hw, enum ice_bank_select bank, status = ice_read_flash_module(hw, bank, ICE_SR_1ST_OROM_BANK_PTR, 0, orom_data, hw->flash.banks.orom_size); if (status) { + vfree(orom_data); ice_debug(hw, ICE_DBG_NVM, "Unable to read Option ROM data\n"); return status; } -- cgit From aaf461af729b81dbb19ec33abe6da74702b352d2 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Fri, 1 Apr 2022 12:40:52 +0200 Subject: ice: Fix incorrect locking in ice_vc_process_vf_msg() Usage of mutex_trylock() in ice_vc_process_vf_msg() is incorrect because message sent from VF is ignored and never processed. Use mutex_lock() instead to fix the issue. It is safe because this mutex is used to prevent races between VF related NDOs and handlers processing request messages from VF and these handlers are running in ice_service_task() context. Additionally move this mutex lock prior ice_vc_is_opcode_allowed() call to avoid potential races during allowlist access. Fixes: e6ba5273d4ed ("ice: Fix race conditions between virtchnl handling and VF ndo ops") Signed-off-by: Ivan Vecera Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 69ff4b929772..5612c032f15a 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -3642,14 +3642,6 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) err = -EINVAL; } - if (!ice_vc_is_opcode_allowed(vf, v_opcode)) { - ice_vc_send_msg_to_vf(vf, v_opcode, - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, - 0); - ice_put_vf(vf); - return; - } - error_handler: if (err) { ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM, @@ -3660,12 +3652,13 @@ error_handler: return; } - /* VF is being configured in another context that triggers a VFR, so no - * need to process this message - */ - if (!mutex_trylock(&vf->cfg_lock)) { - dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n", - vf->vf_id); + mutex_lock(&vf->cfg_lock); + + if (!ice_vc_is_opcode_allowed(vf, v_opcode)) { + ice_vc_send_msg_to_vf(vf, v_opcode, + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, + 0); + mutex_unlock(&vf->cfg_lock); ice_put_vf(vf); return; } -- cgit From 77d64d285be5f8d427893e9c54425b1e4f5d9be7 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Tue, 19 Apr 2022 16:22:21 +0200 Subject: ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() Previous patch labelled "ice: Fix incorrect locking in ice_vc_process_vf_msg()" fixed an issue with ignored messages sent by VF driver but a small race window still left. Recently caught trace during 'ip link set ... vf 0 vlan ...' operation: [ 7332.995625] ice 0000:3b:00.0: Clearing port VLAN on VF 0 [ 7333.001023] iavf 0000:3b:01.0: Reset indication received from the PF [ 7333.007391] iavf 0000:3b:01.0: Scheduling reset task [ 7333.059575] iavf 0000:3b:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 3 [ 7333.059626] ice 0000:3b:00.0: Invalid message from VF 0, opcode 3, len 4, error -1 Setting of VLAN for VF causes a reset of the affected VF using ice_reset_vf() function that runs with cfg_lock taken: 1. ice_notify_vf_reset() informs IAVF driver that reset is needed and IAVF schedules its own reset procedure 2. Bit ICE_VF_STATE_DIS is set in vf->vf_state 3. Misc initialization steps 4. ice_sriov_post_vsi_rebuild() -> ice_vf_set_initialized() and that clears ICE_VF_STATE_DIS in vf->vf_state Step 3 is mentioned race window because IAVF reset procedure runs in parallel and one of its step is sending of VIRTCHNL_OP_GET_VF_RESOURCES message (opcode==3). This message is handled in ice_vc_process_vf_msg() and if it is received during the mentioned race window then it's marked as invalid and error is returned to VF driver. Protect vf_state check in ice_vc_process_vf_msg() by cfg_lock to avoid this race condition. Fixes: e6ba5273d4ed ("ice: Fix race conditions between virtchnl handling and VF ndo ops") Tested-by: Fei Liu Signed-off-by: Ivan Vecera Reviewed-by: Jacob Keller Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 5612c032f15a..b72606c9e6d0 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -3625,6 +3625,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) return; } + mutex_lock(&vf->cfg_lock); + /* Check if VF is disabled. */ if (test_bit(ICE_VF_STATE_DIS, vf->vf_states)) { err = -EPERM; @@ -3648,19 +3650,14 @@ error_handler: NULL, 0); dev_err(dev, "Invalid message from VF %d, opcode %d, len %d, error %d\n", vf_id, v_opcode, msglen, err); - ice_put_vf(vf); - return; + goto finish; } - mutex_lock(&vf->cfg_lock); - if (!ice_vc_is_opcode_allowed(vf, v_opcode)) { ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, 0); - mutex_unlock(&vf->cfg_lock); - ice_put_vf(vf); - return; + goto finish; } switch (v_opcode) { @@ -3773,6 +3770,7 @@ error_handler: vf_id, v_opcode, err); } +finish: mutex_unlock(&vf->cfg_lock); ice_put_vf(vf); } -- cgit From b537752e6cbf0e4475c165178ca02241b53ff6ef Mon Sep 17 00:00:00 2001 From: Petr Oros Date: Wed, 13 Apr 2022 17:37:45 +0200 Subject: ice: wait 5 s for EMP reset after firmware flash MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We need to wait 5 s for EMP reset after firmware flash. Code was extracted from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this wait, fw_activate let card in inconsistent state and recoverable only by second flash/activate. Flash was tested on these fw's: From -> To 3.00 -> 3.10/3.20 3.10 -> 3.00/3.20 3.20 -> 3.00/3.10 Reproducer: [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin Preparing to flash [fw.mgmt] Erasing [fw.mgmt] Erasing done [fw.mgmt] Flashing 100% [fw.mgmt] Flashing done 100% [fw.undi] Erasing [fw.undi] Erasing done [fw.undi] Flashing 100% [fw.undi] Flashing done 100% [fw.netlist] Erasing [fw.netlist] Erasing done [fw.netlist] Flashing 100% [fw.netlist] Flashing done 100% Activate new firmware by devlink reload [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate reload_actions_performed: fw_activate [root@host ~]# ip link show ens7f0 71: ens7f0: mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff altname enp202s0f0 dmesg after flash: [ 55.120788] ice: Copyright (c) 2018, Intel Corporation. [ 55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway [ 55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0 [ 55.603629] ice 0000:ca:00.0: Get PHY capability failed. [ 55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5 [ 55.647348] ice 0000:ca:00.0: PTP init successful [ 55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8 [ 55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode. [ 55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware [ 55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link) Reboot doesn’t help, only second flash/activate with OOT or patched driver put card back in consistent state. After patch: [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin Preparing to flash [fw.mgmt] Erasing [fw.mgmt] Erasing done [fw.mgmt] Flashing 100% [fw.mgmt] Flashing done 100% [fw.undi] Erasing [fw.undi] Erasing done [fw.undi] Flashing 100% [fw.undi] Flashing done 100% [fw.netlist] Erasing [fw.netlist] Erasing done [fw.netlist] Flashing 100% [fw.netlist] Flashing done 100% Activate new firmware by devlink reload [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate reload_actions_performed: fw_activate [root@host ~]# ip link show ens7f0 19: ens7f0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff altname enp202s0f0 Fixes: 399e27dbbd9e94 ("ice: support immediate firmware activation via devlink reload") Signed-off-by: Petr Oros Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 5b1198859da7..9a0a358a15c2 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6929,12 +6929,15 @@ static void ice_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type) dev_dbg(dev, "rebuilding PF after reset_type=%d\n", reset_type); +#define ICE_EMP_RESET_SLEEP_MS 5000 if (reset_type == ICE_RESET_EMPR) { /* If an EMP reset has occurred, any previously pending flash * update will have completed. We no longer know whether or * not the NVM update EMP reset is restricted. */ pf->fw_emp_reset_disabled = false; + + msleep(ICE_EMP_RESET_SLEEP_MS); } err = ice_init_all_ctrlq(hw); -- cgit From b668f4cd715a297737c6e5952bc609a25b9af944 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Tue, 12 Apr 2022 10:34:22 -0700 Subject: ice: fix use-after-free when deinitializing mailbox snapshot During ice_sriov_configure, if num_vfs is 0, we are being asked by the kernel to remove all VFs. The driver first de-initializes the snapshot before freeing all the VFs. This results in a use-after-free BUG detected by KASAN. The bug occurs because the snapshot can still be accessed until all VFs are removed. Fix this by freeing all the VFs first before calling ice_mbx_deinit_snapshot. [ +0.032591] ================================================================== [ +0.000021] BUG: KASAN: use-after-free in ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000315] Write of size 28 at addr ffff889908eb6f28 by task kworker/55:2/1530996 [ +0.000029] CPU: 55 PID: 1530996 Comm: kworker/55:2 Kdump: loaded Tainted: G S I 5.17.0-dirty #1 [ +0.000022] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 1.6.13 12/17/2018 [ +0.000013] Workqueue: ice ice_service_task [ice] [ +0.000279] Call Trace: [ +0.000012] [ +0.000011] dump_stack_lvl+0x33/0x42 [ +0.000030] print_report.cold.13+0xb2/0x6b3 [ +0.000028] ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000295] kasan_report+0xa5/0x120 [ +0.000026] ? __switch_to_asm+0x21/0x70 [ +0.000024] ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000298] kasan_check_range+0x183/0x1e0 [ +0.000019] memset+0x1f/0x40 [ +0.000018] ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000304] ? ice_conv_link_speed_to_virtchnl+0x160/0x160 [ice] [ +0.000297] ? ice_vsi_dis_spoofchk+0x40/0x40 [ice] [ +0.000305] ice_is_malicious_vf+0x1aa/0x250 [ice] [ +0.000303] ? ice_restore_all_vfs_msi_state+0x160/0x160 [ice] [ +0.000297] ? __mutex_unlock_slowpath.isra.15+0x410/0x410 [ +0.000022] ? ice_debug_cq+0xb7/0x230 [ice] [ +0.000273] ? __kasan_slab_alloc+0x2f/0x90 [ +0.000022] ? memset+0x1f/0x40 [ +0.000017] ? do_raw_spin_lock+0x119/0x1d0 [ +0.000022] ? rwlock_bug.part.2+0x60/0x60 [ +0.000024] __ice_clean_ctrlq+0x3a6/0xd60 [ice] [ +0.000273] ? newidle_balance+0x5b1/0x700 [ +0.000026] ? ice_print_link_msg+0x2f0/0x2f0 [ice] [ +0.000271] ? update_cfs_group+0x1b/0x140 [ +0.000018] ? load_balance+0x1260/0x1260 [ +0.000022] ? ice_process_vflr_event+0x27/0x130 [ice] [ +0.000301] ice_service_task+0x136e/0x1470 [ice] [ +0.000281] process_one_work+0x3b4/0x6c0 [ +0.000030] worker_thread+0x65/0x660 [ +0.000023] ? __kthread_parkme+0xe4/0x100 [ +0.000021] ? process_one_work+0x6c0/0x6c0 [ +0.000020] kthread+0x179/0x1b0 [ +0.000018] ? kthread_complete_and_exit+0x20/0x20 [ +0.000022] ret_from_fork+0x22/0x30 [ +0.000026] [ +0.000018] Allocated by task 10742: [ +0.000013] kasan_save_stack+0x1c/0x40 [ +0.000018] __kasan_kmalloc+0x84/0xa0 [ +0.000016] kmem_cache_alloc_trace+0x16c/0x2e0 [ +0.000015] intel_iommu_probe_device+0xeb/0x860 [ +0.000015] __iommu_probe_device+0x9a/0x2f0 [ +0.000016] iommu_probe_device+0x43/0x270 [ +0.000015] iommu_bus_notifier+0xa7/0xd0 [ +0.000015] blocking_notifier_call_chain+0x90/0xc0 [ +0.000017] device_add+0x5f3/0xd70 [ +0.000014] pci_device_add+0x404/0xa40 [ +0.000015] pci_iov_add_virtfn+0x3b0/0x550 [ +0.000016] sriov_enable+0x3bb/0x600 [ +0.000013] ice_ena_vfs+0x113/0xa79 [ice] [ +0.000293] ice_sriov_configure.cold.17+0x21/0xe0 [ice] [ +0.000291] sriov_numvfs_store+0x160/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000018] new_sync_write+0x21d/0x330 [ +0.000013] vfs_write+0x376/0x410 [ +0.000013] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000028] Freed by task 10742: [ +0.000011] kasan_save_stack+0x1c/0x40 [ +0.000015] kasan_set_track+0x21/0x30 [ +0.000016] kasan_set_free_info+0x20/0x30 [ +0.000012] __kasan_slab_free+0x104/0x170 [ +0.000016] kfree+0x9b/0x470 [ +0.000013] devres_destroy+0x1c/0x20 [ +0.000015] devm_kfree+0x33/0x40 [ +0.000012] ice_mbx_deinit_snapshot+0x39/0x70 [ice] [ +0.000295] ice_sriov_configure+0xb0/0x260 [ice] [ +0.000295] sriov_numvfs_store+0x1bc/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000016] new_sync_write+0x21d/0x330 [ +0.000012] vfs_write+0x376/0x410 [ +0.000012] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000024] Last potentially related work creation: [ +0.000010] kasan_save_stack+0x1c/0x40 [ +0.000016] __kasan_record_aux_stack+0x98/0xa0 [ +0.000013] insert_work+0x34/0x160 [ +0.000015] __queue_work+0x20e/0x650 [ +0.000016] queue_work_on+0x4c/0x60 [ +0.000015] nf_nat_masq_schedule+0x297/0x2e0 [nf_nat] [ +0.000034] masq_device_event+0x5a/0x60 [nf_nat] [ +0.000031] raw_notifier_call_chain+0x5f/0x80 [ +0.000017] dev_close_many+0x1d6/0x2c0 [ +0.000015] unregister_netdevice_many+0x4e3/0xa30 [ +0.000015] unregister_netdevice_queue+0x192/0x1d0 [ +0.000014] iavf_remove+0x8f9/0x930 [iavf] [ +0.000058] pci_device_remove+0x65/0x110 [ +0.000015] device_release_driver_internal+0xf8/0x190 [ +0.000017] pci_stop_bus_device+0xb5/0xf0 [ +0.000014] pci_stop_and_remove_bus_device+0xe/0x20 [ +0.000016] pci_iov_remove_virtfn+0x19c/0x230 [ +0.000015] sriov_disable+0x4f/0x170 [ +0.000014] ice_free_vfs+0x9a/0x490 [ice] [ +0.000306] ice_sriov_configure+0xb8/0x260 [ice] [ +0.000294] sriov_numvfs_store+0x1bc/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000016] new_sync_write+0x21d/0x330 [ +0.000012] vfs_write+0x376/0x410 [ +0.000012] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000025] The buggy address belongs to the object at ffff889908eb6f00 which belongs to the cache kmalloc-96 of size 96 [ +0.000016] The buggy address is located 40 bytes inside of 96-byte region [ffff889908eb6f00, ffff889908eb6f60) [ +0.000026] The buggy address belongs to the physical page: [ +0.000010] page:00000000b7e99a2e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1908eb6 [ +0.000016] flags: 0x57ffffc0000200(slab|node=1|zone=2|lastcpupid=0x1fffff) [ +0.000024] raw: 0057ffffc0000200 ffffea0069d9fd80 dead000000000002 ffff88810004c780 [ +0.000015] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ +0.000009] page dumped because: kasan: bad access detected [ +0.000016] Memory state around the buggy address: [ +0.000012] ffff889908eb6e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000014] ffff889908eb6e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000014] >ffff889908eb6f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000011] ^ [ +0.000013] ffff889908eb6f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000013] ffff889908eb7000: fa fb fb fb fb fb fb fb fc fc fc fc fa fb fb fb [ +0.000012] ================================================================== Fixes: 0891c89674e8 ("ice: warn about potentially malicious VFs") Reported-by: Slawomir Laba Signed-off-by: Jacob Keller Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_sriov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c index 8915a9d39e36..0c438219f7a3 100644 --- a/drivers/net/ethernet/intel/ice/ice_sriov.c +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c @@ -1046,8 +1046,8 @@ int ice_sriov_configure(struct pci_dev *pdev, int num_vfs) if (!num_vfs) { if (!pci_vfs_assigned(pdev)) { - ice_mbx_deinit_snapshot(&pf->hw); ice_free_vfs(pf); + ice_mbx_deinit_snapshot(&pf->hw); if (pf->lag) ice_enable_lag(pf->lag); return 0; -- cgit From f049efc7f7cd2f3c419f55040928eaefb13b3636 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 27 Apr 2022 10:31:52 -0700 Subject: ixgbe: ensure IPsec VF<->PF compatibility The VF driver can forward any IPsec flags and such makes the function is not extendable and prone to backward/forward incompatibility. If new software runs on VF, it won't know that PF configured something completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag. Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Reviewed-by: Shannon Nelson Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index e596e1a9fc75..69d11ff7677d 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -903,7 +903,8 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) /* Tx IPsec offload doesn't seem to work on this * device, so block these requests for now. */ - if (!(sam->flags & XFRM_OFFLOAD_INBOUND)) { + sam->flags = sam->flags & ~XFRM_OFFLOAD_IPV6; + if (sam->flags != XFRM_OFFLOAD_INBOUND) { err = -EOPNOTSUPP; goto err_out; } -- cgit From 486b9eee57ddca5c9a2d59fc41153f36002e0a00 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Sat, 23 Apr 2022 12:20:21 +0200 Subject: ice: Fix race during aux device (un)plugging Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky Signed-off-by: Ivan Vecera Reviewed-by: Dave Ertman Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_idc.c | 25 +++++++++++++++++-------- drivers/net/ethernet/intel/ice/ice_main.c | 2 ++ 3 files changed, 20 insertions(+), 8 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 8ed3c9ab7ff7..a895e3a8e988 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -540,6 +540,7 @@ struct ice_pf { struct mutex avail_q_mutex; /* protects access to avail_[rx|tx]qs */ struct mutex sw_mutex; /* lock for protecting VSI alloc flow */ struct mutex tc_mutex; /* lock to protect TC changes */ + struct mutex adev_mutex; /* lock to protect aux device access */ u32 msg_enable; struct ice_ptp ptp; struct tty_driver *ice_gnss_tty_driver; diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c index 25a436d342c2..3e3b2ed4cd5d 100644 --- a/drivers/net/ethernet/intel/ice/ice_idc.c +++ b/drivers/net/ethernet/intel/ice/ice_idc.c @@ -37,14 +37,17 @@ void ice_send_event_to_aux(struct ice_pf *pf, struct iidc_event *event) if (WARN_ON_ONCE(!in_task())) return; + mutex_lock(&pf->adev_mutex); if (!pf->adev) - return; + goto finish; device_lock(&pf->adev->dev); iadrv = ice_get_auxiliary_drv(pf); if (iadrv && iadrv->event_handler) iadrv->event_handler(pf, event); device_unlock(&pf->adev->dev); +finish: + mutex_unlock(&pf->adev_mutex); } /** @@ -290,7 +293,6 @@ int ice_plug_aux_dev(struct ice_pf *pf) return -ENOMEM; adev = &iadev->adev; - pf->adev = adev; iadev->pf = pf; adev->id = pf->aux_idx; @@ -300,18 +302,20 @@ int ice_plug_aux_dev(struct ice_pf *pf) ret = auxiliary_device_init(adev); if (ret) { - pf->adev = NULL; kfree(iadev); return ret; } ret = auxiliary_device_add(adev); if (ret) { - pf->adev = NULL; auxiliary_device_uninit(adev); return ret; } + mutex_lock(&pf->adev_mutex); + pf->adev = adev; + mutex_unlock(&pf->adev_mutex); + return 0; } @@ -320,12 +324,17 @@ int ice_plug_aux_dev(struct ice_pf *pf) */ void ice_unplug_aux_dev(struct ice_pf *pf) { - if (!pf->adev) - return; + struct auxiliary_device *adev; - auxiliary_device_delete(pf->adev); - auxiliary_device_uninit(pf->adev); + mutex_lock(&pf->adev_mutex); + adev = pf->adev; pf->adev = NULL; + mutex_unlock(&pf->adev_mutex); + + if (adev) { + auxiliary_device_delete(adev); + auxiliary_device_uninit(adev); + } } /** diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 9a0a358a15c2..949669fed7d6 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3769,6 +3769,7 @@ u16 ice_get_avail_rxq_count(struct ice_pf *pf) static void ice_deinit_pf(struct ice_pf *pf) { ice_service_task_stop(pf); + mutex_destroy(&pf->adev_mutex); mutex_destroy(&pf->sw_mutex); mutex_destroy(&pf->tc_mutex); mutex_destroy(&pf->avail_q_mutex); @@ -3847,6 +3848,7 @@ static int ice_init_pf(struct ice_pf *pf) mutex_init(&pf->sw_mutex); mutex_init(&pf->tc_mutex); + mutex_init(&pf->adev_mutex); INIT_HLIST_HEAD(&pf->aq_wait_list); spin_lock_init(&pf->aq_wait_lock); -- cgit From 6096dae926a22e2892ef9169f582589c16d39639 Mon Sep 17 00:00:00 2001 From: Anatolii Gerasymenko Date: Thu, 28 Apr 2022 12:01:00 +0000 Subject: ice: clear stale Tx queue settings before configuring The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller Signed-off-by: Anatolii Gerasymenko Tested-by: Konrad Jankowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_virtchnl.c | 68 ++++++++++++++++++++------- 1 file changed, 50 insertions(+), 18 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index b72606c9e6d0..2889e050a4c9 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -1307,13 +1307,52 @@ error_param: NULL, 0); } +/** + * ice_vf_vsi_dis_single_txq - disable a single Tx queue + * @vf: VF to disable queue for + * @vsi: VSI for the VF + * @q_id: VF relative (0-based) queue ID + * + * Attempt to disable the Tx queue passed in. If the Tx queue was successfully + * disabled then clear q_id bit in the enabled queues bitmap and return + * success. Otherwise return error. + */ +static int +ice_vf_vsi_dis_single_txq(struct ice_vf *vf, struct ice_vsi *vsi, u16 q_id) +{ + struct ice_txq_meta txq_meta = { 0 }; + struct ice_tx_ring *ring; + int err; + + if (!test_bit(q_id, vf->txq_ena)) + dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", + q_id, vsi->vsi_num); + + ring = vsi->tx_rings[q_id]; + if (!ring) + return -EINVAL; + + ice_fill_txq_meta(vsi, ring, &txq_meta); + + err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, vf->vf_id, ring, &txq_meta); + if (err) { + dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Tx ring %d on VSI %d\n", + q_id, vsi->vsi_num); + return err; + } + + /* Clear enabled queues flag */ + clear_bit(q_id, vf->txq_ena); + + return 0; +} + /** * ice_vc_dis_qs_msg * @vf: pointer to the VF info * @msg: pointer to the msg buffer * - * called from the VF to disable all or specific - * queue(s) + * called from the VF to disable all or specific queue(s) */ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) { @@ -1350,30 +1389,15 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) q_map = vqs->tx_queues; for_each_set_bit(vf_q_id, &q_map, ICE_MAX_RSS_QS_PER_VF) { - struct ice_tx_ring *ring = vsi->tx_rings[vf_q_id]; - struct ice_txq_meta txq_meta = { 0 }; - if (!ice_vc_isvalid_q_id(vf, vqs->vsi_id, vf_q_id)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; } - if (!test_bit(vf_q_id, vf->txq_ena)) - dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", - vf_q_id, vsi->vsi_num); - - ice_fill_txq_meta(vsi, ring, &txq_meta); - - if (ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, vf->vf_id, - ring, &txq_meta)) { - dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Tx ring %d on VSI %d\n", - vf_q_id, vsi->vsi_num); + if (ice_vf_vsi_dis_single_txq(vf, vsi, vf_q_id)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; } - - /* Clear enabled queues flag */ - clear_bit(vf_q_id, vf->txq_ena); } } @@ -1622,6 +1646,14 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg) if (qpi->txq.ring_len > 0) { vsi->tx_rings[i]->dma = qpi->txq.dma_ring_addr; vsi->tx_rings[i]->count = qpi->txq.ring_len; + + /* Disable any existing queue first */ + if (ice_vf_vsi_dis_single_txq(vf, vsi, q_idx)) { + v_ret = VIRTCHNL_STATUS_ERR_PARAM; + goto error_param; + } + + /* Configure a queue with the requested settings */ if (ice_vsi_cfg_single_txq(vsi, vsi->tx_rings, q_idx)) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; goto error_param; -- cgit From a11b6c1a383ff092f432e040c20e032503785d47 Mon Sep 17 00:00:00 2001 From: Michal Michalik Date: Wed, 20 Apr 2022 14:23:02 +0200 Subject: ice: fix PTP stale Tx timestamps cleanup Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik Reviewed-by: Jacob Keller Reviewed-by: Paul Menzel Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_ptp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index a1cd33273ca4..da025c204577 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -2287,6 +2287,7 @@ ice_ptp_init_tx_e810(struct ice_pf *pf, struct ice_ptp_tx *tx) /** * ice_ptp_tx_tstamp_cleanup - Cleanup old timestamp requests that got dropped + * @hw: pointer to the hw struct * @tx: PTP Tx tracker to clean up * * Loop through the Tx timestamp requests and see if any of them have been @@ -2295,7 +2296,7 @@ ice_ptp_init_tx_e810(struct ice_pf *pf, struct ice_ptp_tx *tx) * timestamp will never be captured. This might happen if the packet gets * discarded before it reaches the PHY timestamping block. */ -static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx) +static void ice_ptp_tx_tstamp_cleanup(struct ice_hw *hw, struct ice_ptp_tx *tx) { u8 idx; @@ -2304,11 +2305,16 @@ static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx) for_each_set_bit(idx, tx->in_use, tx->len) { struct sk_buff *skb; + u64 raw_tstamp; /* Check if this SKB has been waiting for too long */ if (time_is_after_jiffies(tx->tstamps[idx].start + 2 * HZ)) continue; + /* Read tstamp to be able to use this register again */ + ice_read_phy_tstamp(hw, tx->quad, idx + tx->quad_offset, + &raw_tstamp); + spin_lock(&tx->lock); skb = tx->tstamps[idx].skb; tx->tstamps[idx].skb = NULL; @@ -2330,7 +2336,7 @@ static void ice_ptp_periodic_work(struct kthread_work *work) ice_ptp_update_cached_phctime(pf); - ice_ptp_tx_tstamp_cleanup(&pf->ptp.port.tx); + ice_ptp_tx_tstamp_cleanup(&pf->hw, &pf->ptp.port.tx); /* Run twice a second */ kthread_queue_delayed_work(ptp->kworker, &ptp->work, -- cgit From 3f95a7472d14abef284d8968734fe2ae7ff4845f Mon Sep 17 00:00:00 2001 From: Xiaomeng Tong Date: Tue, 10 May 2022 13:48:46 -0700 Subject: i40e: i40e_main: fix a missing check on list iterator The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/i40e/i40e_main.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 6778df2177a1..98871f014994 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7549,42 +7549,43 @@ static void i40e_free_macvlan_channels(struct i40e_vsi *vsi) static int i40e_fwd_ring_up(struct i40e_vsi *vsi, struct net_device *vdev, struct i40e_fwd_adapter *fwd) { + struct i40e_channel *ch = NULL, *ch_tmp, *iter; int ret = 0, num_tc = 1, i, aq_err; - struct i40e_channel *ch, *ch_tmp; struct i40e_pf *pf = vsi->back; struct i40e_hw *hw = &pf->hw; - if (list_empty(&vsi->macvlan_list)) - return -EINVAL; - /* Go through the list and find an available channel */ - list_for_each_entry_safe(ch, ch_tmp, &vsi->macvlan_list, list) { - if (!i40e_is_channel_macvlan(ch)) { - ch->fwd = fwd; + list_for_each_entry_safe(iter, ch_tmp, &vsi->macvlan_list, list) { + if (!i40e_is_channel_macvlan(iter)) { + iter->fwd = fwd; /* record configuration for macvlan interface in vdev */ for (i = 0; i < num_tc; i++) netdev_bind_sb_channel_queue(vsi->netdev, vdev, i, - ch->num_queue_pairs, - ch->base_queue); - for (i = 0; i < ch->num_queue_pairs; i++) { + iter->num_queue_pairs, + iter->base_queue); + for (i = 0; i < iter->num_queue_pairs; i++) { struct i40e_ring *tx_ring, *rx_ring; u16 pf_q; - pf_q = ch->base_queue + i; + pf_q = iter->base_queue + i; /* Get to TX ring ptr */ tx_ring = vsi->tx_rings[pf_q]; - tx_ring->ch = ch; + tx_ring->ch = iter; /* Get the RX ring ptr */ rx_ring = vsi->rx_rings[pf_q]; - rx_ring->ch = ch; + rx_ring->ch = iter; } + ch = iter; break; } } + if (!ch) + return -EINVAL; + /* Guarantee all rings are updated before we update the * MAC address filter. */ -- cgit From 4503cc7fdf9a84cd631b0cb8ecb3c9b1bdbf3594 Mon Sep 17 00:00:00 2001 From: Arkadiusz Kubalewski Date: Thu, 28 Apr 2022 10:33:50 +0200 Subject: ice: fix crash when writing timestamp on RX rings Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski Reviewed-by: Michal Schmidt Tested-by: Dave Cain Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_ptp.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index da025c204577..662947c882e8 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -500,12 +500,19 @@ ice_ptp_read_src_clk_reg(struct ice_pf *pf, struct ptp_system_timestamp *sts) * This function must be called periodically to ensure that the cached value * is never more than 2 seconds old. It must also be called whenever the PHC * time has been changed. + * + * Return: + * * 0 - OK, successfully updated + * * -EAGAIN - PF was busy, need to reschedule the update */ -static void ice_ptp_update_cached_phctime(struct ice_pf *pf) +static int ice_ptp_update_cached_phctime(struct ice_pf *pf) { u64 systime; int i; + if (test_and_set_bit(ICE_CFG_BUSY, pf->state)) + return -EAGAIN; + /* Read the current PHC time */ systime = ice_ptp_read_src_clk_reg(pf, NULL); @@ -528,6 +535,9 @@ static void ice_ptp_update_cached_phctime(struct ice_pf *pf) WRITE_ONCE(vsi->rx_rings[j]->cached_phctime, systime); } } + clear_bit(ICE_CFG_BUSY, pf->state); + + return 0; } /** @@ -2330,17 +2340,18 @@ static void ice_ptp_periodic_work(struct kthread_work *work) { struct ice_ptp *ptp = container_of(work, struct ice_ptp, work.work); struct ice_pf *pf = container_of(ptp, struct ice_pf, ptp); + int err; if (!test_bit(ICE_FLAG_PTP, pf->flags)) return; - ice_ptp_update_cached_phctime(pf); + err = ice_ptp_update_cached_phctime(pf); ice_ptp_tx_tstamp_cleanup(&pf->hw, &pf->ptp.port.tx); - /* Run twice a second */ + /* Run twice a second or reschedule if phc update failed */ kthread_queue_delayed_work(ptp->kworker, &ptp->work, - msecs_to_jiffies(500)); + msecs_to_jiffies(err ? 10 : 500)); } /** -- cgit From 31b6298fd8e29effe9ed6b77351ac5969be56ce0 Mon Sep 17 00:00:00 2001 From: Paul Greenwalt Date: Thu, 28 Apr 2022 14:11:42 -0700 Subject: ice: fix possible under reporting of ethtool Tx and Rx statistics The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 949669fed7d6..963a5f40e071 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6172,9 +6172,10 @@ static int ice_up_complete(struct ice_vsi *vsi) ice_ptp_link_change(pf, pf->hw.pf_id, true); } - /* clear this now, and the first stats read will be used as baseline */ - vsi->stat_offsets_loaded = false; - + /* Perform an initial read of the statistics registers now to + * set the baseline so counters are ready when interface is up + */ + ice_update_eth_stats(vsi); ice_service_task_schedule(pf); return 0; -- cgit From bf13502ed5f941b0777b3fd1e24dac5d93f3886c Mon Sep 17 00:00:00 2001 From: Michal Wilczynski Date: Sun, 8 May 2022 19:33:48 -0400 Subject: ice: Fix interrupt moderation settings getting cleared Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lib.c | 16 ++++++++-------- drivers/net/ethernet/intel/ice/ice_txrx.h | 11 ++++++++--- 2 files changed, 16 insertions(+), 11 deletions(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 6d19c58ccacd..454e01ae09b9 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3043,8 +3043,8 @@ ice_vsi_rebuild_get_coalesce(struct ice_vsi *vsi, ice_for_each_q_vector(vsi, i) { struct ice_q_vector *q_vector = vsi->q_vectors[i]; - coalesce[i].itr_tx = q_vector->tx.itr_setting; - coalesce[i].itr_rx = q_vector->rx.itr_setting; + coalesce[i].itr_tx = q_vector->tx.itr_settings; + coalesce[i].itr_rx = q_vector->rx.itr_settings; coalesce[i].intrl = q_vector->intrl; if (i < vsi->num_txq) @@ -3100,21 +3100,21 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi, */ if (i < vsi->alloc_rxq && coalesce[i].rx_valid) { rc = &vsi->q_vectors[i]->rx; - rc->itr_setting = coalesce[i].itr_rx; + rc->itr_settings = coalesce[i].itr_rx; ice_write_itr(rc, rc->itr_setting); } else if (i < vsi->alloc_rxq) { rc = &vsi->q_vectors[i]->rx; - rc->itr_setting = coalesce[0].itr_rx; + rc->itr_settings = coalesce[0].itr_rx; ice_write_itr(rc, rc->itr_setting); } if (i < vsi->alloc_txq && coalesce[i].tx_valid) { rc = &vsi->q_vectors[i]->tx; - rc->itr_setting = coalesce[i].itr_tx; + rc->itr_settings = coalesce[i].itr_tx; ice_write_itr(rc, rc->itr_setting); } else if (i < vsi->alloc_txq) { rc = &vsi->q_vectors[i]->tx; - rc->itr_setting = coalesce[0].itr_tx; + rc->itr_settings = coalesce[0].itr_tx; ice_write_itr(rc, rc->itr_setting); } @@ -3128,12 +3128,12 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi, for (; i < vsi->num_q_vectors; i++) { /* transmit */ rc = &vsi->q_vectors[i]->tx; - rc->itr_setting = coalesce[0].itr_tx; + rc->itr_settings = coalesce[0].itr_tx; ice_write_itr(rc, rc->itr_setting); /* receive */ rc = &vsi->q_vectors[i]->rx; - rc->itr_setting = coalesce[0].itr_rx; + rc->itr_settings = coalesce[0].itr_rx; ice_write_itr(rc, rc->itr_setting); vsi->q_vectors[i]->intrl = coalesce[0].intrl; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index cead3eb149bd..ffb3f6a589da 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -384,9 +384,14 @@ struct ice_ring_container { /* this matches the maximum number of ITR bits, but in usec * values, so it is shifted left one bit (bit zero is ignored) */ - u16 itr_setting:13; - u16 itr_reserved:2; - u16 itr_mode:1; + union { + struct { + u16 itr_setting:13; + u16 itr_reserved:2; + u16 itr_mode:1; + }; + u16 itr_settings; + }; enum ice_container_type type; }; -- cgit From 942d2ad5d2e0df758a645ddfadffde2795322728 Mon Sep 17 00:00:00 2001 From: Kevin Mitchell Date: Tue, 17 May 2022 11:01:05 -0700 Subject: igb: skip phy status check where unavailable igb_read_phy_reg() will silently return, leaving phy_data untouched, if hw->ops.read_reg isn't set. Depending on the uninitialized value of phy_data, this led to the phy status check either succeeding immediately or looping continuously for 2 seconds before emitting a noisy err-level timeout. This message went out to the console even though there was no actual problem. Instead, first check if there is read_reg function pointer. If not, proceed without trying to check the phy status register. Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell Tested-by: Gurucharan (A Contingent worker at Intel) Signed-off-by: Tony Nguyen Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/igb/igb_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel') diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 34b33b21e0dc..68be2976f539 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -5505,7 +5505,8 @@ static void igb_watchdog_task(struct work_struct *work) break; } - if (adapter->link_speed != SPEED_1000) + if (adapter->link_speed != SPEED_1000 || + !hw->phy.ops.read_reg) goto no_wait; /* wait for Remote receiver status OK */ -- cgit