summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/iavf
AgeCommit message (Collapse)Author
2023-10-27iavf: fix the waiting time for initial resetMichal Schmidt
Every time I create VFs on ice, I receive at least one "Device is still in reset (-16), retrying" message per VF. It recovers fine, but typical usecases should not trigger scary-looking messages. The waiting for reset is too short. It makes no sense to check every 10 microseconds. Typical reset waiting times are at least tens of milliseconds and can be several seconds. I suspect the polling interval was meant to be 10 milliseconds all along. IAVF_RESET_WAIT_COMPLETE_COUNT is defined as 2000, so the total waiting time could be over 20 seconds. I have seen resets take 5 seconds (with 128 VFs on ice). The added benefit of not triggering the "Device is still in reset" path is that we avoid going through the __IAVF_INIT_FAILED state, which would take a full second before retrying. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231027175941.1340255-5-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27iavf: in iavf_down, don't queue watchdog_task if comms failedMichal Schmidt
The reason for queueing watchdog_task is to have it process the aq_required flags that are being set here. If comms failed, there's nothing to do, so return early. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231027175941.1340255-4-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27iavf: simplify mutex_trylock+sleep loopsMichal Schmidt
This pattern appears in two places in the iavf source code: while (!mutex_trylock(...)) usleep_range(...); That's just mutex_lock with extra steps. The pattern is a leftover from when iavf used bit flags instead of mutexes for locking. Commit 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") replaced test_and_set_bit with !mutex_trylock, preserving the pattern. Simplify it to mutex_lock. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231027175941.1340255-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27iavf: fix comments about old bit locksMichal Schmidt
Bit lock __IAVF_IN_CRITICAL_TASK does not exist anymore since commit 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections"). Adjust the comments accordingly. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231027175941.1340255-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-25iavf: in iavf_down, disable queues when removing the driverMichal Schmidt
In iavf_down, we're skipping the scheduling of certain operations if the driver is being removed. However, the IAVF_FLAG_AQ_DISABLE_QUEUES request must not be skipped in this case, because iavf_close waits for the transition to the __IAVF_DOWN state, which happens in iavf_virtchnl_completion after the queues are released. Without this fix, "rmmod iavf" takes half a second per interface that's up and prints the "Device resources not yet released" warning. Fixes: c8de44b577eb ("iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is set") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025183213.874283-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20iavf: initialize waitqueues before starting watchdog_taskMichal Schmidt
It is not safe to initialize the waitqueues after queueing the watchdog_task. It will be using them. The chance of this causing a real problem is very small, because there will be some sleeping before any of the waitqueues get used. I got a crash only after inserting an artificial sleep in iavf_probe. Queue the watchdog_task as the last step in iavf_probe. Add a comment to prevent repeating the mistake. Fixes: fe2647ab0c99 ("i40evf: prevent VF close returning before state transitions to DOWN") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-19iavf: delete unused iavf_mac_info fieldsMichal Schmidt
'san_addr' and 'mac_fcoeq' members of struct iavf_mac_info are unused. 'type' is write-only. Delete all three. The function iavf_set_mac_type that sets 'type' also checks if the PCI vendor ID is Intel. This is unnecessary. Delete the whole function. If in the future there's a need for the MAC type (or other PCI ID-dependent data), I would prefer to use .driver_data in iavf_pci_tbl[] for this purpose. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231018111527.78194-1-mschmidt@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-18intel: fix format warningsJesse Brandeburg
Get ahead of the game and fix all the -Wformat=2 noted warnings in the intel drivers directory. There are one set of i40e and iavf warnings I couldn't figure out how to fix because the driver is already using vsnprintf without an explicit "const char *" format string. Tested with both gcc-12 and clang-15. I found gcc-12 runs clean after this series but clang-15 is a little worried about the vsnprintf lines. summary of warnings: drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c:148:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/iavf/iavf_ethtool.c:199:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:208:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: note: treat the string as an argument to avoid this Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18intel: fix string truncation warningsJesse Brandeburg
Fix -Wformat-truncated warnings to complete the intel directories' W=1 clean efforts. The W=1 recently got enhanced with a few new flags and this brought up some new warnings. Switch to using kasprintf() when possible so we always allocate the right length strings. summary of warnings: drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:60: warning: ‘%s’ directive output may be truncated writing 4 bytes into a region of size between 1 and 11 [-Wformat-truncation=] drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:17: note: ‘snprintf’ output between 7 and 17 bytes into a destination of size 13 drivers/net/ethernet/intel/ice/ice_ptp.c:43:27: warning: ‘%s’ directive output may be truncated writing up to 479 bytes into a region of size 64 [-Wformat-truncation=] drivers/net/ethernet/intel/ice/ice_ptp.c:42:17: note: ‘snprintf’ output between 1 and 480 bytes into a destination of size 64 drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning: ‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=] drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535] drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535] drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note: ‘snprintf’ output between 23 and 43 bytes into a destination of size 32 Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03iavf: remove "inline" functions from iavf_txrx.cJacob Keller
The iAVF txrx hotpath code has several functions that are marked as "static inline" in the iavf_txrx.c file. This use of inline is frowned upon in the netdev community and explicitly marked as something to avoid in the Linux coding-style document (section 15). Even though these functions are only used once, it is expected that GCC is smart enough to decide when to perform function inlining where appropriate without the "hint". ./scripts/bloat-o-meter is showing zero difference with this changes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-15iavf: schedule a request immediately after add/delete vlanPetr Oros
When the iavf driver wants to reconfigure the VLAN filters (iavf_add_vlan, iavf_del_vlan), it sets a flag in aq_required: adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER; or: adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER; This is later processed by the watchdog_task, but it runs periodically every 2 seconds, so it can be a long time before it processes the request. In the worst case, the interface is unable to receive traffic for more than 2 seconds for no objective reason. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Petr Oros <poros@redhat.com> Co-developed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-15iavf: add iavf_schedule_aq_request() helperPetr Oros
Add helper for set iavf aq request AVF_FLAG_AQ_* and immediately schedule watchdog_task. Helper will be used in cases where it is necessary to run aq requests asap Signed-off-by: Petr Oros <poros@redhat.com> Co-developed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-15iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is setRadoslaw Tyl
Prevent schedule operations for adminq during device remove and when __IAVF_IN_REMOVE_TASK flag is set. Currently, the iavf_down function adds operations for adminq that shouldn't be processed when the device is in the __IAVF_REMOVE state. Reproduction: echo 4 > /sys/bus/pci/devices/0000:17:00.0/sriov_numvfs ip link set dev ens1f0 vf 0 trust on ip link set dev ens1f0 vf 1 trust on ip link set dev ens1f0 vf 2 trust on ip link set dev ens1f0 vf 3 trust on ip link set dev ens1f0 vf 0 mac 00:22:33:44:55:66 ip link set dev ens1f0 vf 1 mac 00:22:33:44:55:67 ip link set dev ens1f0 vf 2 mac 00:22:33:44:55:68 ip link set dev ens1f0 vf 3 mac 00:22:33:44:55:69 echo 0000:17:02.0 > /sys/bus/pci/devices/0000\:17\:02.0/driver/unbind echo 0000:17:02.1 > /sys/bus/pci/devices/0000\:17\:02.1/driver/unbind echo 0000:17:02.2 > /sys/bus/pci/devices/0000\:17\:02.2/driver/unbind echo 0000:17:02.3 > /sys/bus/pci/devices/0000\:17\:02.3/driver/unbind sleep 10 echo 0000:17:02.0 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.1 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.2 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.3 > /sys/bus/pci/drivers/iavf/bind modprobe vfio-pci echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id qemu-system-x86_64 -accel kvm -m 4096 -cpu host \ -drive file=centos9.qcow2,if=none,id=virtio-disk0 \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -smp 4 \ -device vfio-pci,host=17:02.0 -net none \ -device vfio-pci,host=17:02.1 -net none \ -device vfio-pci,host=17:02.2 -net none \ -device vfio-pci,host=17:02.3 -net none \ -daemonize -vnc :5 Current result: There is a probability that the mac of VF in guest is inconsistent with it in host Expected result: When passthrough NIC VF to guest, the VF in guest should always get the same mac as it in host. Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13iavf: Add ability to turn off CRC stripping for VFNorbert Zulinski
Previously CRC stripping was always enabled for VF. Now it is possible to turn off CRC stripping via ethtool: #ethtool -K <interface> rx-fcs on To turn off CRC stripping, first VLAN stripping must be disabled: #ethtool -K <interface> rx-vlan-offload off if any VLAN interfaces exists, otherwise VLAN stripping will be turned off by the driver. In iavf_configure_queues add check if CRC stripping is enabled for VF, if it's enabled then set crc_disabled to false on every VF's queue. In iavf_set_features add check if CRC stripping setting was changed then schedule reset. Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11iavf: Fix promiscuous mode configuration flow messagesBrett Creeley
Currently when configuring promiscuous mode on the AVF we detect a change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to determine whether or not we need to request/release promiscuous mode and/or multicast promiscuous mode. The problem is that the AQ calls for setting/clearing promiscuous/multicast mode are treated separately. This leads to a case where we can trigger two promiscuous mode AQ calls in a row with the incorrect state. To fix this make a few changes. Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags. In iavf_set_rx_mode() detect if there is a change in the netdev->flags in comparison with adapter->flags and set the IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE and call iavf_set_promiscuous() if it's set. In iavf_set_promiscuous() check again to see which (if any) promiscuous mode bits have changed when comparing the netdev->flags with the adapter->flags. Use this to set the flags which get sent to the PF driver. Add a spinlock that is used for updating current_netdev_promisc_flags and only allows one promiscuous mode AQ at a time. [1] Fixes the fact that we will only have one AQ call in the aq_required queue at any one time. [2] Streamlines the change in promiscuous mode to only set one AQ required bit. [3] This allows us to keep track of the current state of the flags and also makes it so we can take the most recent netdev->flags promiscuous mode state. [4] This fixes the problem where a change in the netdev->flags can cause IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(), but cleared in iavf_set_promiscuous() before the change is ever made via AQ call. Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-18Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== virtchnl: fix fake 1-elem arrays Alexander Lobakin says: 6.5-rc1 started spitting warning splats when composing virtchnl messages, precisely on virtchnl_rss_key and virtchnl_lut: [ 84.167709] memcpy: detected field-spanning write (size 52) of single field "vrk->key" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1095 (size 1) [ 84.169915] WARNING: CPU: 3 PID: 11 at drivers/net/ethernet/intel/ iavf/iavf_virtchnl.c:1095 iavf_set_rss_key+0x123/0x140 [iavf] ... [ 84.191982] Call Trace: [ 84.192439] <TASK> [ 84.192900] ? __warn+0xc9/0x1a0 [ 84.193353] ? iavf_set_rss_key+0x123/0x140 [iavf] [ 84.193818] ? report_bug+0x12c/0x1b0 [ 84.194266] ? handle_bug+0x42/0x70 [ 84.194714] ? exc_invalid_op+0x1a/0x50 [ 84.195149] ? asm_exc_invalid_op+0x1a/0x20 [ 84.195592] ? iavf_set_rss_key+0x123/0x140 [iavf] [ 84.196033] iavf_watchdog_task+0xb0c/0xe00 [iavf] ... [ 84.225476] memcpy: detected field-spanning write (size 64) of single field "vrl->lut" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1127 (size 1) [ 84.227190] WARNING: CPU: 27 PID: 1044 at drivers/net/ethernet/intel/ iavf/iavf_virtchnl.c:1127 iavf_set_rss_lut+0x123/0x140 [iavf] ... [ 84.246601] Call Trace: [ 84.247228] <TASK> [ 84.247840] ? __warn+0xc9/0x1a0 [ 84.248263] ? iavf_set_rss_lut+0x123/0x140 [iavf] [ 84.248698] ? report_bug+0x12c/0x1b0 [ 84.249122] ? handle_bug+0x42/0x70 [ 84.249549] ? exc_invalid_op+0x1a/0x50 [ 84.249970] ? asm_exc_invalid_op+0x1a/0x20 [ 84.250390] ? iavf_set_rss_lut+0x123/0x140 [iavf] [ 84.250820] iavf_watchdog_task+0xb16/0xe00 [iavf] Gustavo already tried to fix those back in 2021[0][1]. Unfortunately, a VM can run a different kernel than the host, meaning that those structures are sorta ABI. However, it is possible to have proper flex arrays + struct_size() calculations and still send the very same messages with the same sizes. The common rule is: elem[1] -> elem[] size = struct_size() + <difference between the old and the new msg size> The "old" size in the current code is calculated 3 different ways for 10 virtchnl structures total. Each commit addresses one of the ways cumulatively instead of per-structure. I was planning to send it to -net initially, but given that virtchnl was renamed from i40evf and got some fat style cleanup commits in the past, it's not very straightforward to even pick appropriate SHAs, not speaking of automatic portability. I may send manual backports for a couple of the latest supported kernels later on if anyone needs it at all. [0] https://lore.kernel.org/all/20210525230912.GA175802@embeddedor [1] https://lore.kernel.org/all/20210525231851.GA176647@embeddedor * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: virtchnl: fix fake 1-elem arrays for structures allocated as `nents` virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1` virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1 ==================== Link: https://lore.kernel.org/r/20230816210657.1326772-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16virtchnl: fix fake 1-elem arrays for structures allocated as `nents`Alexander Lobakin
Finally, fix 3 structures which are allocated technically correctly, i.e. the calculated size equals to the one that struct_size() would return, except for sizeof(). For &virtchnl_vlan_filter_list_v2, use the same approach when there are no enough space as taken previously for &virtchnl_vlan_filter_list, i.e. let the maximum size be calculated automatically instead of trying to guestimate it using maths. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`Alexander Lobakin
There are five virtchnl structures, which are allocated and checked in the code as `nents + 1`, meaning that they always have memory for one excessive element regardless of their actual number. This comes from that their sizeof() includes space for 1 element and then they get allocated via struct_size() or its open-coded equivalents, passing the actual number of elements. Expand virtchnl_struct_size() to handle such structures and replace those 1-elem arrays with proper flex ones. Also fix several places which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the virtchnl_ether_addr_list size be computed automatically when there's no enough space for the whole list, otherwise we have to open-code reverse struct_size() logics. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1Alexander Lobakin
The two most problematic virtchnl structures are virtchnl_rss_key and virtchnl_rss_lut. Their "flex" arrays have the type of u8, thus, when allocating / checking, the actual size is calculated as `sizeof + nents - 1 byte`. But their sizeof() is not 1 byte larger than the size of such structure with proper flex array, it's two bytes larger due to the padding. That said, their size is always 1 byte larger unless there are no tail elements -- then it's +2 bytes. Add virtchnl_struct_size() macro which will handle this case (and later other cases as well). Make its calling conv the same as we call struct_size() to allow it to be drop-in, even though it's unlikely to become possible to switch to generic API. The macro will calculate a proper size of a structure with a flex array at the end, so that it becomes transparent for the compilers, but add the difference from the old values, so that the real size of sorta-ABI-messages doesn't change. Use it on the allocation side in IAVF and the receiving side (defined as static inline in virtchnl.h) for the mentioned two structures. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16iavf: fix FDIR rule fields masks validationPiotr Gardocki
Return an error if a field's mask is neither full nor empty. When a mask is only partial the field is not being used for rule programming but it gives a wrong impression it is used. Fix by returning an error on any partial mask to make it clear they are not supported. The ip_ver assignment is moved earlier in code to allow using it in iavf_validate_fdir_fltr_masks. Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Fixes: e90cbc257a6f ("iavf: Support IPv6 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08iavf: fix potential races for FDIR filtersPiotr Gardocki
Add fdir_fltr_lock locking in unprotected places. The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which iterates over all filters and looks for a duplicate. The filter can be removed from list and freed from memory at the same time it's being compared. All other places where filters are deleted are already protected with spinlock. The remaining changes protect adapter->fdir_active_fltr variable so now all its uses are under a spinlock. Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31net: flow_dissector: Use 64bits for used_keysRatheesh Kannoth
As 32bits of dissector->used_keys are exhausted, increase the size to 64bits. This is base change for ESP/AH flow dissector patch. Please find patch and discussions at https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw Tested-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILEDJacob Keller
In iavf_adminq_task(), if the function can't acquire the adapter->crit_lock, it checks if the driver is removing. If so, it simply exits without re-enabling the interrupt. This is done to ensure that the task stops processing as soon as possible once the driver is being removed. However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this before attempting to acquire the lock. In this case, the function exits early and re-enables the interrupt. This will happen even if the driver is already removing. Avoid this, by moving the check to after the adapter->crit_lock is acquired. This way, if the driver is removing, we will not re-enable the interrupt. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21iavf: fix potential deadlock on allocation failureJacob Keller
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf, the function will exit without releasing the adapter->crit_lock. This is unlikely, but if it happens, the next access to that mutex will deadlock. Fix this by moving the unlock to the end of the function, and adding a new label to allow jumping to the unlock portion of the function exit flow. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: fix reset task race with iavf_remove()Ahmed Zaki
The reset task is currently scheduled from the watchdog or adminq tasks. First, all direct calls to schedule the reset task are replaced with the iavf_schedule_reset(), which is modified to accept the flag showing the type of reset. To prevent the reset task from starting once iavf_remove() starts, we need to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now easily added to iavf_schedule_reset(). Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task. It is redundant since all callers who set the flag immediately schedules the reset task. Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove") Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: fix a deadlock caused by rtnl and driver's lock circular dependenciesAhmed Zaki
A driver's lock (crit_lock) is used to serialize all the driver's tasks. Lockdep, however, shows a circular dependency between rtnl and crit_lock. This happens when an ndo that already holds the rtnl requests the driver to reset, since the reset task (in some paths) tries to grab rtnl to either change real number of queues of update netdev features. [566.241851] ====================================================== [566.241893] WARNING: possible circular locking dependency detected [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE [566.241984] ------------------------------------------------------ [566.242025] repro.sh/2604 is trying to acquire lock: [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf] [566.242167] but task is already holding lock: [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf] [566.242300] which lock already depends on the new lock. [566.242353] the existing dependency chain (in reverse order) is: [566.242401] -> #1 (rtnl_mutex){+.+.}-{3:3}: [566.242451] __mutex_lock+0xc1/0xbb0 [566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf] [566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf] [566.242627] process_one_work+0x2b3/0x560 [566.242663] worker_thread+0x4f/0x3a0 [566.242696] kthread+0xf2/0x120 [566.242730] ret_from_fork+0x29/0x50 [566.242763] -> #0 (&adapter->crit_lock){+.+.}-{3:3}: [566.242815] __lock_acquire+0x15ff/0x22b0 [566.242869] lock_acquire+0xd2/0x2c0 [566.242901] __mutex_lock+0xc1/0xbb0 [566.242934] iavf_close+0x3c/0x240 [iavf] [566.242997] __dev_close_many+0xac/0x120 [566.243036] dev_close_many+0x8b/0x140 [566.243071] unregister_netdevice_many_notify+0x165/0x7c0 [566.243116] unregister_netdevice_queue+0xd3/0x110 [566.243157] iavf_remove+0x6c1/0x730 [iavf] [566.243217] pci_device_remove+0x33/0xa0 [566.243257] device_release_driver_internal+0x1bc/0x240 [566.243299] pci_stop_bus_device+0x6c/0x90 [566.243338] pci_stop_and_remove_bus_device+0xe/0x20 [566.243380] pci_iov_remove_virtfn+0xd1/0x130 [566.243417] sriov_disable+0x34/0xe0 [566.243448] ice_free_vfs+0x2da/0x330 [ice] [566.244383] ice_sriov_configure+0x88/0xad0 [ice] [566.245353] sriov_numvfs_store+0xde/0x1d0 [566.246156] kernfs_fop_write_iter+0x15e/0x210 [566.246921] vfs_write+0x288/0x530 [566.247671] ksys_write+0x74/0xf0 [566.248408] do_syscall_64+0x58/0x80 [566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc [566.249886] other info that might help us debug this: [566.252014] Possible unsafe locking scenario: [566.253432] CPU0 CPU1 [566.254118] ---- ---- [566.254800] lock(rtnl_mutex); [566.255514] lock(&adapter->crit_lock); [566.256233] lock(rtnl_mutex); [566.256897] lock(&adapter->crit_lock); [566.257388] *** DEADLOCK *** The deadlock can be triggered by a script that is continuously resetting the VF adapter while doing other operations requiring RTNL, e.g: while :; do ip link set $VF up ethtool --set-channels $VF combined 2 ip link set $VF down ip link set $VF up ethtool --set-channels $VF combined 4 ip link set $VF down done Any operation that triggers a reset can substitute "ethtool --set-channles" As a fix, add a new task "finish_config" that do all the work which needs rtnl lock. With the exception of iavf_remove(), all work that require rtnl should be called from this task. As for iavf_remove(), at the point where we need to call unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent finish_config work cannot restart after that since the task is guarded by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config(). Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17Revert "iavf: Do not restart Tx queues after reset task failure"Marcin Szycik
This reverts commit 08f1c147b7265245d67321585c68a27e990e0c4b. Netdev is no longer being detached during reset, so this fix can be reverted. We leave the removal of "hacky" IFF_UP flag update. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17Revert "iavf: Detach device during reset task"Marcin Szycik
This reverts commit aa626da947e9cd30c4cf727493903e1adbb2c0a0. Detaching device during reset was not fully fixing the rtnl locking issue, as there could be a situation where callback was already in progress before detaching netdev. Furthermore, detaching netdevice causes TX timeouts if traffic is running. To reproduce: ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null & while :; do for i in 200 7000 400 5000 300 3000 ; do ip netns exec ns1 ip link set $VF1 mtu $i sleep 2 done sleep 10 done Currently, callbacks such as iavf_change_mtu() wait for the reset. If the reset fails to acquire the rtnl_lock, they schedule the netdev update for later while continuing the reset flow. Operations like MTU changes are performed under the rtnl_lock. Therefore, when the operation finishes, another callback that uses rtnl_lock can start. Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Wait for reset in callbacks which trigger itMarcin Szycik
There was a fail when trying to add the interface to bonding right after changing the MTU on the interface. It was caused by bonding interface unable to open the interface due to interface being in __RESETTING state because of MTU change. Add new reset_waitqueue to indicate that reset has finished. Add waiting for reset to finish in callbacks which trigger hw reset: iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam(). We use a 5000ms timeout period because on Hyper-V based systems, this operation takes around 3000-4000ms. In normal circumstances, it doesn't take more than 500ms to complete. Add a function iavf_wait_for_reset() to reuse waiting for reset code and use it also in iavf_set_channels(), which already waits for reset. We don't use error handling in iavf_set_channels() as this could cause the device to be in incorrect state if the reset was scheduled but hit timeout or the waitng function was interrupted by a signal. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: use internal state to free traffic IRQsAhmed Zaki
If the system tries to close the netdev while iavf_reset_task() is running, __LINK_STATE_START will be cleared and netif_running() will return false in iavf_reinit_interrupt_scheme(). This will result in iavf_free_traffic_irqs() not being called and a leak as follows: [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0' [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0 is shown when pci_disable_msix() is later called. Fix by using the internal adapter state. The traffic IRQs will always exist if state == __IAVF_RUNNING. Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Fix out-of-bounds when setting channels on removeDing Hui
If we set channels greater during iavf_remove(), and waiting reset done would be timeout, then returned with error but changed num_active_queues directly, that will lead to OOB like the following logs. Because the num_active_queues is greater than tx/rx_rings[] allocated actually. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 3506.152887] iavf 0000:41:02.0: Removing device [ 3510.400799] ================================================================== [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536 [ 3510.400823] [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 3510.400835] Call Trace: [ 3510.400851] dump_stack+0x71/0xab [ 3510.400860] print_address_description+0x6b/0x290 [ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400868] kasan_report+0x14a/0x2b0 [ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf] [ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf] [ 3510.400891] ? wait_woken+0x1d0/0x1d0 [ 3510.400895] ? notifier_call_chain+0xc1/0x130 [ 3510.400903] pci_device_remove+0xa8/0x1f0 [ 3510.400910] device_release_driver_internal+0x1c6/0x460 [ 3510.400916] pci_stop_bus_device+0x101/0x150 [ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20 [ 3510.400924] pci_iov_remove_virtfn+0x187/0x420 [ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10 [ 3510.400929] ? pci_get_subsys+0x90/0x90 [ 3510.400932] sriov_disable+0xed/0x3e0 [ 3510.400936] ? bus_find_device+0x12d/0x1a0 [ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e] [ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 3510.400968] ? pci_get_device+0x7c/0x90 [ 3510.400970] ? pci_get_subsys+0x90/0x90 [ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210 [ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 3510.401001] sriov_numvfs_store+0x214/0x290 [ 3510.401005] ? sriov_totalvfs_show+0x30/0x30 [ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.401011] ? __check_object_size+0x15a/0x350 [ 3510.401018] kernfs_fop_write+0x280/0x3f0 [ 3510.401022] vfs_write+0x145/0x440 [ 3510.401025] ksys_write+0xab/0x160 [ 3510.401028] ? __ia32_sys_read+0xb0/0xb0 [ 3510.401031] ? fput_many+0x1a/0x120 [ 3510.401032] ? filp_close+0xf0/0x130 [ 3510.401038] do_syscall_64+0xa0/0x370 [ 3510.401041] ? page_fault+0x8/0x30 [ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 3510.401073] RIP: 0033:0x7f3a9bb842c0 [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0 [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001 [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700 [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001 [ 3510.401090] [ 3510.401093] Allocated by task 76795: [ 3510.401098] kasan_kmalloc+0xa6/0xd0 [ 3510.401099] __kmalloc+0xfb/0x200 [ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf] [ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf] [ 3510.401114] process_one_work+0x56a/0x11f0 [ 3510.401115] worker_thread+0x8f/0xf40 [ 3510.401117] kthread+0x2a0/0x390 [ 3510.401119] ret_from_fork+0x1f/0x40 [ 3510.401122] 0xffffffffffffffff [ 3510.401123] In timeout handling, we should keep the original num_active_queues and reset num_req_queues to 0. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Fix use-after-free in free_netdevDing Hui
We do netif_napi_add() for all allocated q_vectors[], but potentially do netif_napi_del() for part of them, then kfree q_vectors and leave invalid pointers at dev->napi_list. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 4093.900222] ================================================================== [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390 [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699 [ 4093.900233] [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 4093.900239] Call Trace: [ 4093.900244] dump_stack+0x71/0xab [ 4093.900249] print_address_description+0x6b/0x290 [ 4093.900251] ? free_netdev+0x308/0x390 [ 4093.900252] kasan_report+0x14a/0x2b0 [ 4093.900254] free_netdev+0x308/0x390 [ 4093.900261] iavf_remove+0x825/0xd20 [iavf] [ 4093.900265] pci_device_remove+0xa8/0x1f0 [ 4093.900268] device_release_driver_internal+0x1c6/0x460 [ 4093.900271] pci_stop_bus_device+0x101/0x150 [ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900275] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10 [ 4093.900278] ? pci_get_subsys+0x90/0x90 [ 4093.900280] sriov_disable+0xed/0x3e0 [ 4093.900282] ? bus_find_device+0x12d/0x1a0 [ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 4093.900299] ? pci_get_device+0x7c/0x90 [ 4093.900300] ? pci_get_subsys+0x90/0x90 [ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210 [ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900318] sriov_numvfs_store+0x214/0x290 [ 4093.900320] ? sriov_totalvfs_show+0x30/0x30 [ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900323] ? __check_object_size+0x15a/0x350 [ 4093.900326] kernfs_fop_write+0x280/0x3f0 [ 4093.900329] vfs_write+0x145/0x440 [ 4093.900330] ksys_write+0xab/0x160 [ 4093.900332] ? __ia32_sys_read+0xb0/0xb0 [ 4093.900334] ? fput_many+0x1a/0x120 [ 4093.900335] ? filp_close+0xf0/0x130 [ 4093.900338] do_syscall_64+0xa0/0x370 [ 4093.900339] ? page_fault+0x8/0x30 [ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900357] RIP: 0033:0x7f16ad4d22c0 [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0 [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001 [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700 [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001 [ 4093.900367] [ 4093.900368] Allocated by task 820: [ 4093.900371] kasan_kmalloc+0xa6/0xd0 [ 4093.900373] __kmalloc+0xfb/0x200 [ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf] [ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf] [ 4093.900382] process_one_work+0x56a/0x11f0 [ 4093.900383] worker_thread+0x8f/0xf40 [ 4093.900384] kthread+0x2a0/0x390 [ 4093.900385] ret_from_fork+0x1f/0x40 [ 4093.900387] 0xffffffffffffffff [ 4093.900387] [ 4093.900388] Freed by task 6699: [ 4093.900390] __kasan_slab_free+0x137/0x190 [ 4093.900391] kfree+0x8b/0x1b0 [ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf] [ 4093.900397] iavf_remove+0x35a/0xd20 [iavf] [ 4093.900399] pci_device_remove+0xa8/0x1f0 [ 4093.900400] device_release_driver_internal+0x1c6/0x460 [ 4093.900401] pci_stop_bus_device+0x101/0x150 [ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900403] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900404] sriov_disable+0xed/0x3e0 [ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900416] sriov_numvfs_store+0x214/0x290 [ 4093.900417] kernfs_fop_write+0x280/0x3f0 [ 4093.900418] vfs_write+0x145/0x440 [ 4093.900419] ksys_write+0xab/0x160 [ 4093.900420] do_syscall_64+0xa0/0x370 [ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900422] 0xffffffffffffffff [ 4093.900422] [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200 which belongs to the cache kmalloc-8k of size 8192 [ 4093.900425] The buggy address is located 5184 bytes inside of 8192-byte region [ffff88b4dc144200, ffff88b4dc146200) [ 4093.900425] The buggy address belongs to the page: [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0 [ 4093.900430] flags: 0x10000000008100(slab|head) [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80 [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 [ 4093.900434] page dumped because: kasan: bad access detected [ 4093.900435] [ 4093.900435] Memory state around the buggy address: [ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] ^ [ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ================================================================== Although the patch #2 (of 2) can avoid the issue triggered by this repro.sh, there still are other potential risks that if num_active_queues is changed to less than allocated q_vectors[] by unexpected, the mismatched netif_napi_add/del() can also cause UAF. Since we actually call netif_napi_add() for all allocated q_vectors unconditionally in iavf_alloc_q_vectors(), so we should fix it by letting netif_napi_del() match to netif_napi_add(). Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22iavf: make functions static where possiblePrzemek Kitszel
Make all possible functions static. Move iavf_force_wb() up to avoid forward declaration. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22iavf: remove some unused functions and pointless wrappersPrzemek Kitszel
Remove iavf_aq_get_rss_lut(), iavf_aq_get_rss_key(), iavf_vf_reset(). Remove some "OS specific memory free for shared code" wrappers ;) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22iavf: fix err handling for MAC replacePrzemek Kitszel
Defer removal of current primary MAC until a replacement is successfully added. Previous implementation would left filter list with no primary MAC. This was found while reading the code. The patch takes advantage of the fact that there can only be a single primary MAC filter at any time ([1] by Piotr) Piotr has also applied some review suggestions during our internal patch submittal process. [1] https://lore.kernel.org/netdev/20230614145302.902301-2-piotrx.gardocki@intel.com/ Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-10iavf: remove mask from iavf_irq_enable_queues()Ahmed Zaki
Enable more than 32 IRQs by removing the u32 bit mask in iavf_irq_enable_queues(). There is no need for the mask as there are no callers that select individual IRQs through the bitmask. Also, if the PF allocates more than 32 IRQs, this mask will prevent us from using all of them. Modify the comment in iavf_register.h to show that the maximum number allowed for the IRQ index is 63 as per the iAVF standard 1.0 [1]. link: [1] https://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ethernet-adaptive-virtual-function-hardware-spec.pdf Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230608200226.451861-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-16iavf: send VLAN offloading caps once after VFRAhmed Zaki
When the user disables rxvlan offloading and then changes the number of channels, all VLAN ports are unable to receive traffic. Changing the number of channels triggers a VFR reset. During re-init, when VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is received, we do: 1 - set the IAVF_FLAG_SETUP_NETDEV_FEATURES flag 2 - call iavf_set_vlan_offload_features(adapter, 0, netdev->features); The second step sends to the PF the __default__ features, in this case aq_required |= IAVF_FLAG_AQ_ENABLE_CTAG_VLAN_STRIPPING While the first step forces the watchdog task to call netdev_update_features() -> iavf_set_features() -> iavf_set_vlan_offload_features(adapter, netdev->features, features). Since the user disabled the "rxvlan", this sets: aq_required |= IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_STRIPPING When we start processing the AQ commands, both flags are enabled. Since we process DISABLE_XTAG first then ENABLE_XTAG, this results in the PF enabling the rxvlan offload. This breaks all communications on the VLAN net devices. Fix by removing the call to iavf_set_vlan_offload_features() (second step). Calling netdev_update_features() from watchdog task is enough for both init and reset paths. Fixes: 7598f4b40bd6 ("iavf: Move netdev_update_features() into watchdog task") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: tools/testing/selftests/net/config 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07iavf: remove active_cvlans and active_svlans bitmapsAhmed Zaki
The VLAN filters info is currently being held in a list and 2 bitmaps (active_cvlans and active_svlans). We are experiencing some racing where data is not in sync in the list and bitmaps. For example, the VLAN is initially added to the list but only when the PF replies, it is added to the bitmap. If a user adds many V2 VLANS before the PF responds: while [ $((i++)) ] ip l add l eth0 name eth0.$i type vlan id $i we might end up with more VLAN list entries than the designated limit. Also, The "ip link show" will show more links added than the PF limit. On the other and, the bitmaps are only used to check the number of VLAN filters and to re-enable the filters when the interface goes from DOWN to UP. This patch gets rid of the bitmaps and uses the list only. To do that, the states of the VLAN filter are modified: 1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing the PF. This is the "ip link del eth0.$i" path. 2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be removed from the PF and then marked INACTIVE. 3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not delete the VLAN. Fixes: 48ccc43ecf10 ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-07iavf: refactor VLAN filter statesAhmed Zaki
The VLAN filter states are currently being saved as individual bits. This is error prone as multiple bits might be mistakenly set. Fix by replacing the bits with a single state enum. Also, add an "ACTIVE" state for filters that are accepted by the PF. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero") 1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters") https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/ https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/ Adjacent changes: drivers/net/phy/phy.c 323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine") 4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21iavf: fix hang on reboot with iceStefan Assmann
When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-16iavf: do not track VLAN 0 filtersAhmed Zaki
When an interface with the maximum number of VLAN filters is brought up, a spurious error is logged: [257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3 [257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported. The VF driver complains that it cannot add the VLAN 0 filter. On the other hand, the PF driver always adds VLAN 0 filter on VF initialization. The VF does not need to ask the PF for that filter at all. Fix the error by not tracking VLAN 0 filters altogether. With that, the check added by commit 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0 filters") in iavf_virtchnl.c is useless and might be confusing if left as it suggests that we track VLAN 0. Fixes: 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0 filters") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-16iavf: fix non-tunneled IPv6 UDP packet type and hashingAlexander Lobakin
Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP. Originally, i40e, ice and iavf were affected. Commit 73df8c9e3e3d ("i40e: Correct UDP packet header for non_tunnel-ipv6") fixed that in i40e, then commit 638a0c8c8861 ("ice: fix incorrect payload indicator on PTYPE") fixed that for ice. IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash type for such packets, so that the stack won't calculate it on CPU when needs it. Fixes: 206812b5fccb ("i40e/i40evf: i40e implementation for skb_set_hash") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-16iavf: fix inverted Rx hash condition leading to disabled hashAlexander Lobakin
Condition, which checks whether the netdev has hashing enabled is inverted. Basically, the tagged commit effectively disabled passing flow hash from descriptor to skb, unless user *disables* it via Ethtool. Commit a876c3ba59a6 ("i40e/i40evf: properly report Rx packet hash") fixed this problem, but only for i40e. Invert the condition now in iavf and unblock passing hash to skbs again. Fixes: 857942fd1aa1 ("i40e: Fix Rx hash reported to the stack by our driver") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-08iavf: Remove unnecessary aer.h includeBjorn Helgaas
<linux/aer.h> is unused, so remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>