summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_virtchnl.c
AgeCommit message (Collapse)Author
12 daysice: use libie_aq_strMichal Swiatkowski
Simple: s/ice_aq_str/libie_aq_str Add libie_aminq module in ice Kconfig. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
12 daysice, libie: move generic adminq descriptors to libMichal Swiatkowski
The descriptor structure is the same in ice, ixgbe and i40e. Move it to common libie header to use it across different driver. Leave device specific adminq commands in separate folders. This lead to a change that need to be done in filling/getting descriptor: - previous: struct specific_desc *cmd; cmd = &desc.params.specific_desc; - now: struct specific_desc *cmd; cmd = libie_aq_raw(&desc); Do this changes across the driver to allow clean build. The casting only have to be done in case of specific descriptors, for generic one union can still be used. Changes beside code moving: - change ICE_ prefix to LIBIE_ prefix (ice_ and libie_ too) - remove shift variables not otherwise needed (in libie_aq_flags) - fill/get descriptor data based on desc.params.raw whenever the descriptor isn't defined in libie - move defines from the libie_aq_sth structure outside - add libie_aq_raw helper and use it instead of explicit casting Reviewed by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: breakout common LAG code into helpersDave Ertman
In the VF handling code, parts of the code for lag can be broken out into helper functions to reduce code duplication. Break this code out into helper functions Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: expose VF functions used by live migrationJacob Keller
The live migration process will require configuring the target VF with the data provided from the source host. A few helper functions in ice_sriov.c and ice_virtchnl.c will be needed for this process, but are currently static. Expose these functions in their respective headers so that the live migration module can use them during the migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: move ice_vsi_update_l2tsel to ice_lib.cJacob Keller
A future change is going to need to call ice_vsi_update_l2tsel from a new context outside of ice_virtchnl.c Since this function deals with a generic VSI, move it into ice_lib.c to enable calling it from other places in the ice driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: save RSS hash configuration for migrationJacob Keller
The VF can program the RSS hash configuration over virtchnl. It does this by sending a u64 bitmask which represents the current hash configuration. It is not trivial to reverse the hardware configuration back to this hash set for migration. Instead, save the value to the ice_vf structure when its modified by the VF. The rss_hashcfg value is an 8-byte field. Make room for it in ice_vf by re-arranging some of the existing fields. There is a 4-byte gap after the first_vector_idx, and a 4-byte gap between max_tx_rate and vf_states. Move first_vector_idx into the later 4-byte gap, creating an 8 byte area where rss_hashcfg can be placed. Also move the num_msix field near min_tx_rate, filling 2 bytes of a 3 byte hole. The end result of these changes enables placing the rss_hashcfg field into the structure while also saving 8 bytes in size. It looks like there are a handful of more possible cleanups to reduce the size even further, but those have been left as a future cleanup. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09net: intel: rename 'hena' to 'hashcfg' for clarityJacob Keller
i40e, ice, and iAVF all use 'hena' as a shorthand for the "hash enable" configuration. This comes originally from the X710 datasheet 'xxQF_HENA' registers. In the context of the registers the meaning is fairly clear. However, on its own, hena is a weird name that can be more difficult to understand. This is especially true in ice. The E810 hardware doesn't even have registers with HENA in the name. Replace the shorthand 'hena' with 'hashcfg'. This makes it clear the variables deal with the Hash configuration, not just a single boolean on/off for all hashing. Do not update the register names. These come directly from the datasheet for X710 and X722, and it is more important that the names can be searched. Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19ice: fix vf->num_mac count with port representorsJacob Keller
The ice_vc_repr_add_mac() function indicates that it does not store the MAC address filters in the firmware. However, it still increments vf->num_mac. This is incorrect, as vf->num_mac should represent the number of MAC filters currently programmed to firmware. Indeed, we only perform this increment if the requested filter is a unicast address that doesn't match the existing vf->hw_lan_addr. In addition, ice_vc_repr_del_mac() does not decrement the vf->num_mac counter. This results in the counter becoming out of sync with the actual count. As it turns out, vf->num_mac is currently only used in legacy made without port representors. The single place where the value is checked is for enforcing a filter limit on untrusted VFs. Upcoming patches to support VF Live Migration will use this value when determining the size of the TLV for MAC address filters. Fix the representor mode function to stop incrementing the counter incorrectly. Fixes: ac19e03ef780 ("ice: allow process VF opcodes in different ways") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11ice: receive LLDP on trusted VFsMateusz Pacuszka
When a trusted VF tries to configure an LLDP multicast address, configure a rule that would mirror the traffic to this VF, untrusted VFs are not allowed to receive LLDP at all, so the request to add LLDP MAC address will always fail for them. Add a forwarding LLDP filter to a trusted VF when it tries to add an LLDP multicast MAC address. The MAC address has to be added after enabling trust (through restarting the LLDP service). Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.15 net-next PR. No conflicts, adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c 919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-18ice: fix input validation for virtchnl BWLukasz Czapnik
Add missing validation of tc and queue id values sent by a VF in ice_vc_cfg_q_bw(). Additionally fixed logged value in the warning message, where max_tx_rate was incorrectly referenced instead of min_tx_rate. Also correct error handling in this function by properly exiting when invalid configuration is detected. Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: validate queue quanta parameters to prevent OOB accessJan Glaza
Add queue wraparound prevention in quanta configuration. Ensure end_qid does not overflow by validating start_qid and num_queues. Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: stop truncating queue ids when checkingJan Glaza
Queue IDs can be up to 4096, fix invalid check to stop truncating IDs to 8 bits. Fixes: bf93bf791cec8 ("ice: introduce ice_virtchnl.c and ice_virtchnl.h") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-14iavf: add support for negotiating flexible RXDID formatJacob Keller
Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF driver the ability to determine what Rx descriptor formats are available. This requires sending an additional message during initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This operation requests the supported Rx descriptor IDs available from the PF. This is treated the same way that VLAN V2 capabilities are handled. Add a new set of extended capability flags, used to process send and receipt of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message. This ensures we finish negotiating for the supported descriptor formats prior to beginning configuration of receive queues. This change stores the supported format bitmap into the iavf_adapter structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled by the PF, we need to make sure that the Rx queue configuration specifies the format. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-14ice: support Rx timestamp on flex descriptorSimei Su
To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by the VF to request PTP capability and responded by the PF what capability is enabled for that VF. Hardware captures timestamps which contain only 32 bits of nominal nanoseconds, as opposed to the 64bit timestamps that the stack expects. To convert 32b to 64b, we need a current PHC time. VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the PF with the current PHC time. Signed-off-by: Simei Su <simei.su@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-03ice: Fix NULL pointer dereference in switchdevWojciech Drewek
Commit 608a5c05c39b ("virtchnl: support queue rate limit and quanta size configuration") introduced new virtchnl ops: - get_qos_caps - cfg_q_bw - cfg_q_quanta New ops were added to ice_virtchnl_dflt_ops, in commit 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration"), but not to the ice_virtchnl_repr_ops. Because of that, if we get one of those messages in switchdev mode we end up with NULL pointer dereference: [ 1199.794701] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1199.794804] Workqueue: ice ice_service_task [ice] [ 1199.794878] RIP: 0010:0x0 [ 1199.795027] Call Trace: [ 1199.795033] <TASK> [ 1199.795039] ? __die+0x20/0x70 [ 1199.795051] ? page_fault_oops+0x140/0x520 [ 1199.795064] ? exc_page_fault+0x7e/0x270 [ 1199.795074] ? asm_exc_page_fault+0x22/0x30 [ 1199.795086] ice_vc_process_vf_msg+0x6e5/0xd30 [ice] [ 1199.795165] __ice_clean_ctrlq+0x734/0x9d0 [ice] [ 1199.795207] ice_service_task+0xccf/0x12b0 [ice] [ 1199.795248] process_one_work+0x21a/0x620 [ 1199.795260] worker_thread+0x18d/0x330 [ 1199.795269] ? __pfx_worker_thread+0x10/0x10 [ 1199.795279] kthread+0xec/0x120 [ 1199.795288] ? __pfx_kthread+0x10/0x10 [ 1199.795296] ret_from_fork+0x2d/0x50 [ 1199.795305] ? __pfx_kthread+0x10/0x10 [ 1199.795312] ret_from_fork_asm+0x1a/0x30 [ 1199.795323] </TASK> Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: use stack variable for virtchnl_supported_rxdidsJacob Keller
The ice_vc_query_rxdid() function allocates memory to store the virtchnl_supported_rxdids structure used to communicate the bitmap of supported RXDIDs to a VF. This structure is only 8 bytes in size. The function must hold the allocated length on the stack as well as the pointer to the structure which itself is 8 bytes. Allocating this storage on the heap adds unnecessary overhead including a potential error path that must be handled in case kzalloc fails. Because this structure is so small, we're not saving stack space. Additionally, because we must ensure that we free the allocated memory, the return value from ice_vc_send_msg_to_vf() must also be saved in the stack ret variable. Depending on compiler optimization, this means allocating the 8-byte structure is requiring up to 16-bytes of stack memory! Simplify this function to keep the rxdid variable on the stack, saving memory and removing a potential failure exit path from this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: initialize pf->supported_rxdids immediately after loading DDPJacob Keller
The pf->supported_rxdids field is used to populate the list of valid RXDIDs that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The set of supported RXDIDs is dependent on the DDP, and can be read from the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to use this list to validate the requested descriptor ID from the VF when programming the Rx queues. A future update to support VF live migration will also want to validate that the target VF can support the same descriptor ID when migrating. Currently, pf->supported_rxdids is initialized inside the ice_vc_query_rxdid() function. This means that it is only ever initialized if at least one VF actually tries to negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized every time the VF loads and requests the descriptor list. This worked before because the PF only checks pf->suppported_rxdids when programming the Rx queue if the VF actually negotiates the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature. This will be problematic for VF live migration. We need the list of supported Rx descriptor IDs when migrating. It is possible that no VF on the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. Refactor the driver to initialize pf->supported_rxdids during driver initialization after the DDP is loaded. This is simpler, avoids unnecessary duplicate work, and avoids issues with the live migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: only allow Tx promiscuous for multicastBrett Creeley
Currently when any VF is trusted and true promiscuous mode is enabled on the PF, the VF will receive all unicast traffic directed to the device's internal switch. This includes traffic external to the NIC and also from other VSI (i.e. VFs). This does not match the expected behavior as unicast traffic should only be visible from external sources in this case. Disable the Tx promiscuous mode bits for unicast promiscuous mode. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-10Merge branch 'net-introduce-tx-h-w-shaping-api'Jakub Kicinski
Paolo Abeni says: ==================== net: introduce TX H/W shaping API We have a plurality of shaping-related drivers API, but none flexible enough to meet existing demand from vendors[1]. This series introduces new device APIs to configure in a flexible way TX H/W shaping. The new functionalities are exposed via a newly defined generic netlink interface and include introspection capabilities. Some self-tests are included, on top of a dummy netdevsim implementation. Finally a basic implementation for the iavf driver is provided. Some usage examples: * Configure shaping on a given queue: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do set --json '{"ifindex": '$IFINDEX', "shaper": {"handle": {"scope": "queue", "id":'$QUEUEID'}, "bw-max": 2000000}}' * Container B/W sharing The orchestration infrastructure wants to group the container-related queues under a RR scheduling and limit the aggregate bandwidth: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope":"node"}, "bw-max": 10000000}' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}} Q1 \ \ Q2 -- node 0 ------- netdev / (bw-max: 10M) Q3 / * Delegation A containers wants to limit the aggregate B/W bandwidth of 2 of the 3 queues it owns - the starting configuration is the one from the previous point: SPEC=Documentation/netlink/specs/net_shaper.yaml ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], "handle": {"scope": "node"}, "bw-max": 5000000 }' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}} Q1 -- node 1 --------\ / (bw-max: 5M) \ Q2 / node 0 ------- netdev /(bw-max: 10M) Q3 ------------------/ In a group operation, when prior to the op itself, the leaves have different parents, the user must specify the parent handle for the group. I.e., starting from the previous config: ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "bw-max": 3000000 }' Netlink error: Invalid argument nl_len = 96 (80) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'All the leaves shapers must have the same old parent'} ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "parent": {"scope": "node", "id": 1}, "bw-max": 3000000 } {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}} Q1 -- node 2 --- /(bw-max:3M)\ Q3 / \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) * Cleanup: Still starting from config 1To delete a single queue shaper ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID3'}}' Q1 -- node 2 --- (bw-max:3M)\ \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) Deleting a node shaper relinks all its leaves to the node's parent: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id":2}}' Q1 ---\ \ node 1----- \ / (bw-max: 5M)\ Q2----/ node 0 ------- netdev (bw-max: 10M) Deleting the last shaper under a node shaper deletes the node, too: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID1'}}' ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID2'}}' ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 1}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} Such delete recurses on parents that are left over with no leaves: ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 0}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10ice: Support VF queue rate limit and quanta size configurationWenjun Wu
Add support to configure VF queue rate limit and quanta size. For quanta size configuration, the quanta profiles are divided evenly by PF numbers. For each port, the first quanta profile is reserved for default. When VF is asked to set queue quanta size, PF will search for an available profile, change the fields and assigned this profile to the queue. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08ice: store max_frame and rx_buf_len only in ice_rx_ringJacob Keller
The max_frame and rx_buf_len fields of the VSI set the maximum frame size for packets on the wire, and configure the size of the Rx buffer. In the hardware, these are per-queue configuration. Most VSI types use a simple method to determine the size of the buffers for all queues. However, VFs may potentially configure different values for each queue. While the Linux iAVF driver does not do this, it is allowed by the virtchnl interface. The current virtchnl code simply sets the per-VSI fields inbetween calls to ice_vsi_cfg_single_rxq(). This technically works, as these fields are only ever used when programming the Rx ring, and otherwise not checked again. However, it is confusing to maintain. The Rx ring also already has an rx_buf_len field in order to access the buffer length in the hotpath. It also has extra unused bytes in the ring structure which we can make use of to store the maximum frame size. Drop the VSI max_frame and rx_buf_len fields. Add max_frame to the Rx ring, and slightly re-order rx_buf_len to better fit into the gaps in the structure layout. Change the ice_vsi_cfg_frame_size function so that it writes to the ring fields. Call this function once per ring in ice_vsi_cfg_rxqs(). This is done over calling it inside the ice_vsi_cfg_rxq(), because ice_vsi_cfg_rxq() is called in the virtchnl flow where the max_frame and rx_buf_len have already been configured. Change the accesses for rx_buf_len and max_frame to all point to the ring structure. This has the added benefit that ice_vsi_cfg_rxq() no longer has the surprise side effect of updating ring->rx_buf_len based on the VSI field. Update the virtchnl ice_vc_cfg_qs_msg() function to set the ring values directly, and drop references to the removed VSI fields. This now makes the VF logic clear, as the ring fields are obviously per-queue. This reduces the required cognitive load when reasoning about this logic. Note that removing the VSI fields does leave a 4 byte gap, but the ice_vsi structure has many gaps, and its layout is not as critical in the hot path. The structure may benefit from a more thorough repacking, but no attempt was made in this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08ice: consistently use q_idx in ice_vc_cfg_qs_msg()Jacob Keller
The ice_vc_cfg_qs_msg() function is used to configure VF queues in response to a VIRTCHNL_OP_CONFIG_VSI_QUEUES command. The virtchnl command contains an array of queue pair data for configuring Tx and Rx queues. This data includes a queue ID. When configuring the queues, the driver generally uses this queue ID to determine which Tx and Rx ring to program. However, a handful of places use the index into the queue pair data from the VF. While most VF implementations appear to send this data in order, it is not mandated by the virtchnl and it is not verified that the queue pair data comes in order. Fix the driver to consistently use the q_idx field instead of the 'i' iterator value when accessing the rings. For the Rx case, introduce a local ring variable to keep lines short. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08ice: add E830 HW VF mailbox message limit supportPaul Greenwalt
E830 adds hardware support to prevent the VF from overflowing the PF mailbox with VIRTCHNL messages. E830 will use the hardware feature (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf(). To prevent a VF from overflowing the PF, the PF sets the number of messages per VF that can be in the PF's mailbox queue (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF, the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG register. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13iavf: add support for offloading tc U32 cls filtersAhmed Zaki
Add support for offloading cls U32 filters. Only "skbedit queue_mapping" and "drop" actions are supported. Also, only "ip" and "802_3" tc protocols are allowed. The PF must advertise the VIRTCHNL_VF_OFFLOAD_TC_U32 capability flag. Since the filters will be enabled via the FD stage at the PF, a new type of FDIR filters is added and the existing list and state machine are used. The new filters can be used to configure flow directors based on raw (binary) pattern in the rx packet. Examples: 0. # tc qdisc add dev enp175s0v0 ingress 1. Redirect UDP from src IP 192.168.2.1 to queue 12: # tc filter add dev <dev> protocol ip ingress u32 \ match u32 0x45000000 0xff000000 at 0 \ match u32 0x00110000 0x00ff0000 at 8 \ match u32 0xC0A80201 0xffffffff at 12 \ match u32 0x00000000 0x00000000 at 24 \ action skbedit queue_mapping 12 skip_sw 2. Drop all ICMP: # tc filter add dev <dev> protocol ip ingress u32 \ match u32 0x45000000 0xff000000 at 0 \ match u32 0x00010000 0x00ff0000 at 8 \ match u32 0x00000000 0x00000000 at 24 \ action drop skip_sw 3. Redirect ICMP traffic from MAC 3c:fd:fe:a5:47:e0 to queue 7 (note proto: 802_3): # tc filter add dev <dev> protocol 802_3 ingress u32 \ match u32 0x00003CFD 0x0000ffff at 4 \ match u32 0xFEA547E0 0xffffffff at 8 \ match u32 0x08004500 0xffffff00 at 12 \ match u32 0x00000001 0x000000ff at 20 \ match u32 0x0000 0x0000 at 40 \ action skbedit queue_mapping 7 skip_sw Notes on matches: 1 - All intermediate fields that are needed to parse the correct PTYPE must be provided (in e.g. 3: Ethernet Type 0x0800 in MAC, IP version and IP length: 0x45 and protocol: 0x01 (ICMP)). 2 - The last match must provide an offset that guarantees all required headers are accounted for, even if the last header is not matched. For example, in #2, the last match is 4 bytes at offset 24 starting from IP header, so the total is 14 (MAC) + 24 + 4 = 42, which is the sum of MAC+IP+ICMP headers. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-12ice: store VF relative MSI-X index in q_vector->vf_reg_idxJacob Keller
The ice physical function driver needs to configure the association of queues and interrupts on behalf of its virtual functions. This is done over virtchnl by the VF sending messages during its initialization phase. These messages contain a vector_id which the VF wants to associate with a given queue. This ID is relative to the VF space, where 0 indicates the control IRQ for non-queue interrupts. When programming the mapping, the PF driver currently passes this vector_id directly to the low level functions for programming. This works for SR-IOV, because the hardware uses the VF-based indexing for interrupts. This won't work for Scalable IOV, which uses PF-based indexing for programming its VSIs. To handle this, the driver needs to be able to look up the proper index to use for programming. For typical IRQs, this would be the q_vector->reg_idx field. The q_vector->reg_idx can't be set to a VF relative value, because it is used when the PF needs to control the interrupt, such as when triggering a software interrupt on stopping the Tx queue. Thus, introduce a new q_vector->vf_reg_idx which can store the VF relative index for registers which expect this. Use this in ice_cfg_interrupt to look up the VF index from the q_vector. This allows removing the vector ID parameter of ice_cfg_interrupt. Also notice that this function returns an int, but then is cast to the virtchnl error enumeration, virtchnl_status_code. Update the return type to indicate it does not return an integer error code. We can't use normal error codes here because the return values are passed across the virtchnl interface. This will allow the future Scalable IOV VFs to correctly look up the index needed for programming the VF queues without breaking SR-IOV. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05ice: virtchnl: stop pretending to support RSS over AQ or registersJacob Keller
The E800 series hardware uses the same iAVF driver as older devices, including the virtchnl negotiation scheme. This negotiation scheme includes a mechanism to determine what type of RSS should be supported, including RSS over PF virtchnl messages, RSS over firmware AdminQ messages, and RSS via direct register access. The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its supported by the VF driver. However, if an older VF driver is loaded, it may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ. The ice driver happily agrees to support these methods. Unfortunately, the underlying hardware does not support these mechanisms. The E800 series VFs don't have the appropriate registers for RSS_REG. The mailbox queue used by VFs for VF to PF communication blocks messages which do not have the VF-to-PF opcode. Stop lying to the VF that it could support RSS over AdminQ or registers, as these interfaces do not work when the hardware is operating on an E800 series device. In practice this is unlikely to be hit by any normal user. The iAVF driver has supported RSS over PF virtchnl commands since 2016, and always defaults to using RSS_PF if possible. In principle, nothing actually stops the existing VF from attempting to access the registers or send an AQ command. However a properly coded VF will check the capability flags and will report a more useful error if it detects a case where the driver does not support the RSS offloads that it does. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Alan Brady <alan.brady@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-04ice: use relative VSI index for VFs instead of PF VSI numberJacob Keller
When initializing over virtchnl, the PF is required to pass a VSI ID to the VF as part of its capabilities exchange. The VF driver reports this value back to the PF in a variety of commands. The PF driver validates that this value matches the value it sent to the VF. Some hardware families such as the E700 series could use this value when reading RSS registers or communicating directly with firmware over the Admin Queue. However, E800 series hardware does not support any of these interfaces and the VF's only use for this value is to report it back to the PF. Thus, there is no requirement that this value be an actual VSI ID value of any kind. The PF driver already does not trust that the VF sends it a real VSI ID. The VSI structure is always looked up from the VF structure. The PF does validate that the VSI ID provided matches a VSI associated with the VF, but otherwise does not use the VSI ID for any purpose. Instead of reporting the VSI number relative to the PF space, report a fixed value of 1. When communicating with the VF over virtchnl, validate that the VSI number is returned appropriately. This avoids leaking information about the firmware of the PF state. Currently the ice driver only supplies a VF with a single VSI. However, it appears that virtchnl has some support for allowing multiple VSIs. I did not attempt to implement this. However, space is left open to allow further relative indexes if additional VSIs are provided in future feature development. For this reason, keep the ice_vc_isvalid_vsi_id function in place to allow extending it for multiple VSIs in the future. This change will also simplify handling of live migration in a future series. Since we no longer will provide a real VSI number to the VF, there will be no need to keep track of this number when migrating to a new host. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-04ice: pass VSI pointer into ice_vc_isvalid_q_idJacob Keller
The ice_vc_isvalid_q_id() function takes a VSI index and a queue ID. It looks up the VSI from its index, and then validates that the queue number is valid for that VSI. The VSI ID passed is typically a VSI index from the VF. This VSI number is validated by the PF to ensure that it matches the VSI associated with the VF already. In every flow where ice_vc_isvalid_q_id() is called, the PF driver already has a pointer to the VSI associated with the VF. This pointer is obtained using ice_get_vf_vsi(), rather than looking up the VSI using the index sent by the VF. Since we already know which VSI to operate on, we can modify ice_vc_isvalid_q_id() to take a VSI pointer instead of a VSI index. Pass the VSI we found from ice_get_vf_vsi() instead of re-doing the lookup. This removes some unnecessary computation and scanning of the VSI list. It also removes the last place where the driver directly used the VSI number from the VF. This will pave the way for refactoring to communicate relative VSI numbers to the VF instead of absolute numbers from the PF space. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18ice: field get conversionJesse Brandeburg
Refactor the ice driver to use FIELD_GET() for mask and shift reads, which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired. @get@ constant shift,mask; type T; expression a; @@ -(((T)(a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ CC: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-13iavf: enable symmetric-xor RSS for Toeplitz hash functionAhmed Zaki
Allow the user to set the symmetric Toeplitz hash function via: # ethtool -X eth0 hfunc toeplitz symmetric-xor The driver will reject any new RSS configuration if a field other than (IP src/dst and L4 src/dst ports) is requested for hashing. The symmetric RSS will not be supported on PFs not advertising the ADV RSS Offload flag (ADV_RSS_SUPPORT()), for example the E700 series (i40e). Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-9-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13ice: enable symmetric-xor RSS for Toeplitz hash functionJeff Guo
Allow the user to set the symmetric Toeplitz hash function via: # ethtool -X eth0 hfunc toeplitz symmetric-xor All existing RSS configurations will be converted to symmetric unless they have a non-symmetric field (other than IP src/dst and L4 src/dst ports) used for hashing. The driver will reject a new RSS configuration if such a field is requested. The hash function in the E800 NICs is set per-VSI and a specific AQ command is needed to modify the hash function. Use the AQ command to enable setting the symmetric Toeplitz RSS hash function for any VSI in the new ice_set_rss_hfunc(). When the Symmetric Toeplitz hash function is used, the hardware sets the input set of the RSS (Toeplitz) algorithm to be the XOR of the fields index by HSYMM and the fields index by the INSET registers. We use this to create a symmetric hash by setting the HSYMM registers to point to their counterparts in the INSET registers: HSYMM [src_fv] = dst_fv; HSYMM [dst_fv] = src_fv; where src_fv and dst_fv are the indexes of the protocol's src and dst fields. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Jeff Guo <jia.guo@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-8-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13ice: refactor RSS configurationQi Zhang
Refactor the driver to use a communication data structure for RSS config. To do so we introduce the new ice_rss_hash_cfg struct, and then pass it as an argument to several functions. Also introduce enum ice_rss_cfg_hdr_type to specify a more granular and flexible RSS configuration: ICE_RSS_OUTER_HEADERS - take outer layer as RSS input set ICE_RSS_INNER_HEADERS - take inner layer as RSS input set ICE_RSS_INNER_HEADERS_W_OUTER_IPV4 - take inner layer as RSS input set for packet with outer IPV4 ICE_RSS_INNER_HEADERS_W_OUTER_IPV6 - take inner layer as RSS input set for packet with outer IPV6 ICE_RSS_ANY_HEADERS - try with outer first then inner (same as the behaviour without this change) Finally, move the virtchnl_rss_algorithm enum to be with the other RSS related structures in the virtchnl.h file. There should be no functional change due to this patch. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-6-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13ice: fix ICE_AQ_VSI_Q_OPT_RSS_* register valuesAhmed Zaki
Fix the values of the ICE_AQ_VSI_Q_OPT_RSS_* registers. Shifting is already done when the values are used, no need to double shift. Bug was not discovered earlier since only ICE_AQ_VSI_Q_OPT_RSS_TPLZ (Zero) is currently used. Also, rename ICE_AQ_VSI_Q_OPT_RSS_XXX to ICE_AQ_VSI_Q_OPT_RSS_HASH_XXX for consistency. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-5-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05ice: change vfs.num_msix_per to vf->num_msixMichal Swiatkowski
vfs::num_msix_per should be only used as default value for vf->num_msix. For other use cases vf->num_msix should be used, as VF can have different MSI-X amount values. Fix incorrect register index calculation. vfs::num_msix_per and pf->sriov_base_vector shouldn't be used after implementation of changing MSI-X amount on VFs. Instead vf->first_vector_idx should be used, as it is storing value for first irq index. Fixes: fe1c5ca2fe76 ("ice: implement num_msix field per VF") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-11-29ice: Fix VF Reset paths when interface in a failed over aggregateDave Ertman
There is an error when an interface has the following conditions: - PF is in an aggregate (bond) - PF has VFs created on it - bond is in a state where it is failed-over to the secondary interface - A VF reset is issued on one or more of those VFs The issue is generated by the originating PF trying to rebuild or reconfigure the VF resources. Since the bond is failed over to the secondary interface the queue contexts are in a modified state. To fix this issue, have the originating interface reclaim its resources prior to the tear-down and rebuild or reconfigure. Then after the process is complete, move the resources back to the currently active interface. There are multiple paths that can be used depending on what triggered the event, so create a helper function to move the queues and use paired calls to the helper (back to origin, process, then move back to active interface) under the same lag_mutex lock. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20231127212340.1137657-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20ice: implement num_msix field per VFMichal Swiatkowski
Store the amount of MSI-X per VF instead of storing it in pf struct. It is used to calculate number of q_vectors (and queues) for VF VSI. This is necessary because with follow up changes the number of MSI-X can be different between VFs. Use it instead of using pf->vf_msix value in all cases. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-28ice: always add legacy 32byte RXDID in supported_rxdidsMichal Schmidt
When the PF and VF drivers both support flexible rx descriptors and have negotiated the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC capability, the VF driver queries the PF for the list of supported descriptor formats (VIRTCHNL_OP_GET_SUPPORTED_RXDIDS). The PF driver is supposed to set the supported_rxdids bits that correspond to the descriptor formats the firmware implements. The legacy 32-byte rx desc format is always supported, even though it is not expressed in GLFLXP_RXDID_FLAGS. The ice driver does not advertise the legacy 32-byte rx desc support, which leads to this failure to bring up the VF using the Intel out-of-tree iavf driver: iavf 0000:41:01.0: PF does not list support for default Rx descriptor format ... iavf 0000:41:01.0: PF returned error -5 (VIRTCHNL_STATUS_ERR_PARAM) to our request 6 The in-tree iavf driver does not expose this bug, because it does not yet implement VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The ice driver must always set the ICE_RXDID_LEGACY_1 bit in supported_rxdids. The Intel out-of-tree ice driver and the ice driver in DPDK both do this. I copied this piece of the code and the comment text from the Intel out-of-tree driver. Fixes: e753df8fbca5 ("ice: Add support Flex RXD") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230920115439.61172-1-mschmidt@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-13ice: Check CRC strip requirement for VLAN stripHaiyue Wang
When VLAN strip is enabled, the CRC strip must not be disabled. And when the CRC strip is disabled, the VLAN strip should not be enabled. The driver needs to check CRC strip disable setting parameter before configuring the Rx/Tx queues, otherwise, in current error handling, the already set Tx queue context doesn't roll back correctly, it will cause the Tx queue setup failure next time: "Failed to set LAN Tx queue context" Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13ice: Support FCS/CRC strip disable for VFHaiyue Wang
To support CRC strip enable/disable functionality, VF needs the explicit request VIRTCHNL_VF_OFFLOAD_CRC offload. Then according to crc_disable flag of Rx queue configuration information to set up the queue context. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc842de ("ipv4: fix data-races around inet->inet_id") c274af224269 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8def3 ("bonding: fix macvlan over alb bond support") f11e5bd159b0 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc842de ("ipv4: fix data-races around inet->inet_id") b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21Revert "ice: Fix ice VF reset during iavf initialization"Petr Oros
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc. After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems. Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-16virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`Alexander Lobakin
There are five virtchnl structures, which are allocated and checked in the code as `nents + 1`, meaning that they always have memory for one excessive element regardless of their actual number. This comes from that their sizeof() includes space for 1 element and then they get allocated via struct_size() or its open-coded equivalents, passing the actual number of elements. Expand virtchnl_struct_size() to handle such structures and replace those 1-elem arrays with proper flex ones. Also fix several places which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the virtchnl_ether_addr_list size be computed automatically when there's no enough space for the whole list, otherwise we have to open-code reverse struct_size() logics. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-07ice: clean up __ice_aq_get_set_rss_lut()Przemek Kitszel
Refactor __ice_aq_get_set_rss_lut() to improve reader experience and limit misuse scenarios (undesired LUT size for given LUT type). Allow only 3 RSS LUT type+size variants: PF LUT sized 2048, GLOBAL LUT sized 512, and VSI LUT sized 64, which were used on default flows prior to this commit. Prior to the change, code was mixing the meaning of @params->lut_size and @params->lut_type, flag assigning logic was cryptic, while long defines made everything harder to follow. Fix that by extracting some code out to separate helpers. Drop some of "shift by 0" statements that originated from Intel's internal HW documentation. Drop some redundant VSI masks (since ice_is_vsi_valid() gives "valid" for up to 0x300 VSIs). After sweeping all the defines out of struct ice_aqc_get_set_rss_lut, it fits into 7 lines. Finally apply some cleanup to the callsite (use of the new enums, tmp var for lengthy bit extraction). Note that flags for 128 and 64 sized VSI LUT are the same, and 64 is used everywhere in the code (updated to new enum here), it just happened that there was 128 in flag name. __ice_aq_get_set_rss_key() uses the same VSI valid bit, make constant common for it and __ice_aq_get_set_rss_lut(). Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-27ice: process events created by lag netdev event handlerDave Ertman
Add in the function framework for the processing of LAG events. Also add in helper function to perform common tasks. Add the basis of the process of linking a lower netdev to an upper netdev. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: use src VSI instead of src MAC in slow-pathMichal Swiatkowski
The use of a source MAC to direct packets from the VF to the corresponding port representor is only ok if there is only one MAC on a VF. To support this functionality when the number of MACs on a VF is greater, it is necessary to match a source VSI instead of a source MAC. Let's use the new switch API that allows matching on metadata. If MAC isn't used in match criteria there is no need to handle adding rule after virtchnl command. Instead add new rule while port representor is being configured. Remove rule_added field, checking for sp_rule can be used instead. Remove also checking for switchdev running in deleting rule as it can be called from unroll context when running flag isn't set. Checking for sp_rule covers both context (with and without running flag). Rules are added in eswitch configuration flow, so there is no need to have replay function. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-16ice: Fix ice VF reset during iavf initializationDawid Wesierski
Fix the current implementation that causes ice_trigger_vf_reset() to start resetting the VF even when the VF-NIC is still initializing. When we reset NIC with ice driver it can interfere with iavf-vf initialization e.g. during consecutive resets induced by ice iavf ice | | |<-----------------| | ice resets vf iavf | reset | start | |<-----------------| | ice resets vf | causing iavf | initialization | error | | iavf reset end This leads to a series of -53 errors (failed to init adminq) from the IAVF. Change the state of the vf_state field to be not active when the IAVF is still initializing. Make sure to wait until receiving the message on the message box to ensure that the vf is ready and initializded. In simple terms we use the ACTIVE flag to make sure that the ice driver knows if the iavf is ready for another reset iavf ice | | | | |<------------- ice resets vf iavf vf_state != ACTIVE reset | start | | | | | iavf | reset-------> vf_state == ACTIVE end ice resets vf | | | | Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration") Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Acked-by: Jacob Keller <Jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>