summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/idpf
AgeCommit message (Collapse)Author
8 daysidpf: set mac type when adding and removing MAC filtersEmil Tantilov
On control planes that allow changing the MAC address of the interface, the driver must provide a MAC type to avoid errors such as: idpf 0000:0a:00.0: Transaction failed (op 535) idpf 0000:0a:00.0: Received invalid MAC filter payload (op 535) (len 0) idpf 0000:0a:00.0: Transaction failed (op 536) These errors occur during driver load or when changing the MAC via: ip link set <iface> address <mac> Add logic to set the MAC type when sending ADD/DEL (opcodes 535/536) to the control plane. Since only one primary MAC is supported per vport, the driver only needs to send an ADD opcode when setting it. Remove the old address by calling __idpf_del_mac_filter(), which skips the message and just clears the entry from the internal list. This avoids an error on DEL as it attempts to remove an address already cleared by the preceding ADD opcode. Fixes: ce1b75d0635c ("idpf: add ptypes and MAC filter support") Reported-by: Jian Liu <jianliu@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
8 daysidpf: fix UAF in RDMA core aux dev deinitializationJoshua Hay
Free the adev->id before auxiliary_device_uninit. The call to uninit triggers the release callback, which frees the iadev memory containing the adev. The previous flow results in a UAF during rmmod due to the adev->id access. [264939.604077] ================================================================== [264939.604093] BUG: KASAN: slab-use-after-free in idpf_idc_deinit_core_aux_device+0xe4/0x100 [idpf] [264939.604134] Read of size 4 at addr ff1100109eb6eaf8 by task rmmod/17842 ... [264939.604635] Allocated by task 17597: [264939.604643] kasan_save_stack+0x20/0x40 [264939.604654] kasan_save_track+0x14/0x30 [264939.604663] __kasan_kmalloc+0x8f/0xa0 [264939.604672] idpf_idc_init_aux_core_dev+0x4bd/0xb60 [idpf] [264939.604700] idpf_idc_init+0x55/0xd0 [idpf] [264939.604726] process_one_work+0x658/0xfe0 [264939.604742] worker_thread+0x6e1/0xf10 [264939.604750] kthread+0x382/0x740 [264939.604762] ret_from_fork+0x23a/0x310 [264939.604772] ret_from_fork_asm+0x1a/0x30 [264939.604785] Freed by task 17842: [264939.604790] kasan_save_stack+0x20/0x40 [264939.604799] kasan_save_track+0x14/0x30 [264939.604808] kasan_save_free_info+0x3b/0x60 [264939.604820] __kasan_slab_free+0x37/0x50 [264939.604830] kfree+0xf1/0x420 [264939.604840] device_release+0x9c/0x210 [264939.604850] kobject_put+0x17c/0x4b0 [264939.604860] idpf_idc_deinit_core_aux_device+0x4f/0x100 [idpf] [264939.604886] idpf_vc_core_deinit+0xba/0x3a0 [idpf] [264939.604915] idpf_remove+0xb0/0x7c0 [idpf] [264939.604944] pci_device_remove+0xab/0x1e0 [264939.604955] device_release_driver_internal+0x371/0x530 [264939.604969] driver_detach+0xbf/0x180 [264939.604981] bus_remove_driver+0x11b/0x2a0 [264939.604991] pci_unregister_driver+0x2a/0x250 [264939.605005] __do_sys_delete_module.constprop.0+0x2eb/0x540 [264939.605014] do_syscall_64+0x64/0x2c0 [264939.605024] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Samuel Salin <Samuel.salin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: remove obsolete stashing codeJoshua Hay
With the new Tx buffer management scheme, there is no need for all of the stashing mechanisms, the hash table, the reserve buffer stack, etc. Remove all of that. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: stop Tx if there are insufficient buffer resourcesJoshua Hay
The Tx refillq logic will cause packets to be silently dropped if there are not enough buffer resources available to send a packet in flow scheduling mode. Instead, determine how many buffers are needed along with number of descriptors. Make sure there are enough of both resources to send the packet, and stop the queue if not. Fixes: 7292af042bcf ("idpf: fix a race in txq wakeup") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: replace flow scheduling buffer ring with buffer poolJoshua Hay
Replace the TxQ buffer ring with one large pool/array of buffers (only for flow scheduling). This eliminates the tag generation and makes it impossible for a tag to be associated with more than one packet. The completion tag passed to HW through the descriptor is the index into the array. That same completion tag is posted back to the driver in the completion descriptor, and used to index into the array to quickly retrieve the buffer during cleaning. In this way, the tags are treated as a fix sized resource. If all tags are in use, no more packets can be sent on that particular queue (until some are freed up). The tag pool size is 64K since the completion tag width is 16 bits. For each packet, the driver pulls a free tag from the refillq to get the next free buffer index. When cleaning is complete, the tag is posted back to the refillq. A multi-frag packet spans multiple buffers in the driver, therefore it uses multiple buffer indexes/tags from the pool. Each frag pulls from the refillq to get the next free buffer index. These are tracked in a next_buf field that replaces the completion tag field in the buffer struct. This chains the buffers together so that the packet can be cleaned from the starting completion tag taken from the completion descriptor, then from the next_buf field for each subsequent buffer. In case of a dma_mapping_error occurs or the refillq runs out of free buf_ids, the packet will execute the rollback error path. This unmaps any buffers previously mapped for the packet. Since several free buf_ids could have already been pulled from the refillq, we need to restore its original state as well. Otherwise, the buf_ids/tags will be leaked and not used again until the queue is reallocated. Descriptor completions only advance the descriptor ring index to "clean" the descriptors. The packet completions only clean the buffers associated with the given packet completion tag and do not update the descriptor ring index. When operating in queue based scheduling mode, the array still acts as a ring and will only have TxQ descriptor count entries. The tx_bufs are still associated 1:1 with the descriptor ring entries and we can use the conventional indexing mechanisms. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: simplify and fix splitq Tx packet rollback error pathJoshua Hay
Move (and rename) the existing rollback logic to singleq.c since that will be the only consumer. Create a simplified splitq specific rollback function to loop through and unmap tx_bufs based on the completion tag. This is critical before replacing the Tx buffer ring with the buffer pool since the previous rollback indexing will not work to unmap the chained buffers from the pool. Cache the next_to_use index before any portion of the packet is put on the descriptor ring. In case of an error, the rollback will bump tail to the correct next_to_use value. Because the splitq path now supports different types of context descriptors (and potentially multiple in the future), this will take care of rolling back any and all context descriptors encoded on the ring for the erroneous packet. The previous rollback logic was broken for PTP packets since it would not account for the PTP context descriptor. Fixes: 1a49cf814fe1 ("idpf: add Tx timestamp flows") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: improve when to set RE bit logicJoshua Hay
Track the gap between next_to_use and the last RE index. Set RE again if the gap is large enough to ensure RE bit is set frequently. This is critical before removing the stashing mechanisms because the opportunistic descriptor ring cleaning from the out-of-order completions will go away. Previously the descriptors would be "cleaned" by both the descriptor (RE) completion and the out-of-order completions. Without the latter, we must ensure the RE bit is set more frequently. Otherwise, it's theoretically possible for the descriptor ring next_to_clean to never advance. The previous implementation was dependent on the start of a packet falling on a 64th index in the descriptor ring, which is not guaranteed with large packets. Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-08-21idpf: add support for Tx refillqs in flow scheduling modeJoshua Hay
In certain production environments, it is possible for completion tags to collide, meaning N packets with the same completion tag are in flight at the same time. In this environment, any given Tx queue is effectively used to send both slower traffic and higher throughput traffic simultaneously. This is the result of a customer's specific configuration in the device pipeline, the details of which Intel cannot provide. This configuration results in a small number of out-of-order completions, i.e., a small number of packets in flight. The existing guardrails in the driver only protect against a large number of packets in flight. The slower flow completions are delayed which causes the out-of-order completions. The fast flow will continue sending traffic and generating tags. Because tags are generated on the fly, the fast flow eventually uses the same tag for a packet that is still in flight from the slower flow. The driver has no idea which packet it should clean when it processes the completion with that tag, but it will look for the packet on the buffer ring before the hash table. If the slower flow packet completion is processed first, it will end up cleaning the fast flow packet on the ring prematurely. This leaves the descriptor ring in a bad state resulting in a crash or Tx timeout. In summary, generating a tag when a packet is sent can lead to the same tag being associated with multiple packets. This can lead to resource leaks, crashes, and/or Tx timeouts. Before we can replace the tag generation, we need a new mechanism for the send path to know what tag to use next. The driver will allocate and initialize a refillq for each TxQ with all of the possible free tag values. During send, the driver grabs the next free tag from the refillq from next_to_clean. While cleaning the packet, the clean routine posts the tag back to the refillq's next_to_use to indicate that it is now free to use. This mechanism works exactly the same way as the existing Rx refill queues, which post the cleaned buffer IDs back to the buffer queue to be reposted to HW. Since we're using the refillqs for both Rx and Tx now, genericize some of the existing refillq support. Note: the refillqs will not be used yet. This is only demonstrating how they will be used to pass free tags back to the send path. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-23idpf: access ->pp through netmem_desc instead of pageByungchul Park
To eliminate the use of struct page in page pool, the page pool users should use netmem descriptor and APIs instead. Make idpf access ->pp through netmem_desc instead of page. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20250721021835.63939-10-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18idpf: preserve coalescing settings across resetsAhmed Zaki
The IRQ coalescing config currently reside only inside struct idpf_q_vector. However, all idpf_q_vector structs are de-allocated and re-allocated during resets. This leads to user-set coalesce configuration to be lost. Add new fields to struct idpf_vport_user_config_data to save the user settings and re-apply them after reset. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add cross timestampingMilena Olech
Add cross timestamp support through virtchnl mailbox messages and directly, through PCIe BAR registers. Cross timestamping assumes that both system time and device clock time values are cached simultaneously, what is triggered by HW. Feature is enabled for both ARM and x86 archs. Signed-off-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add flow steering supportAhmed Zaki
Use the new virtchnl2 OP codes to communicate with the Control Plane to add flow steering filters. We add the basic functionality for add/delete with TCP/UDP IPv4 only. Support for other OP codes and protocols will be added later. Standard 'ethtool -N|--config-ntuple' should be used, for example: # ethtool -N ens801f0d1 flow-type tcp4 src-ip 10.0.0.1 action 6 to route all IPv4/TCP traffic from IP 10.0.0.1 to queue 6. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: add flow steering supportSudheer Mogilappagari
Add opcodes and corresponding message structure to add and delete flow steering rules. Flow steering enables configuration of rules to take an action or subset of actions based on a match criteria. Actions could be redirect to queue, redirect to queue group, drop packet or mark. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Co-developed-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: rename enum virtchnl2_cap_rssAhmed Zaki
The "enum virtchnl2_cap_rss" will be used for negotiating flow steering capabilities. Instead of adding a new enum, rename virtchnl2_cap_rss to virtchnl2_flow_types. Also rename the enum's constants. Flow steering will use this enum in the next patches. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-17Merge branch 'for-next' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Add RDMA support for Intel IPU E2000 in idpf Tatyana Nikolova says: This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ IWL reviews: v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/ v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/ v1 (split from previous series): https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/ v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/ RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/ RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: idpf: implement get LAN MMIO memory regions idpf: implement IDC vport aux driver MTU change handler idpf: implement remaining IDC RDMA core callbacks and handlers idpf: implement RDMA vport auxiliary dev create, init, and destroy idpf: implement core RDMA auxiliary dev create, init, and destroy idpf: use reserved RDMA vectors from control plane ==================== Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-14idpf: implement get LAN MMIO memory regionsJoshua Hay
The RDMA driver needs to map its own MMIO regions for the sake of performance, meaning the IDPF needs to avoid mapping portions of the BAR space. However, to be HW agnostic, the IDPF cannot assume where these are and must avoid mapping hard coded regions as much as possible. The IDPF maps the bare minimum to load and communicate with the control plane, i.e., the mailbox registers and the reset state registers. Because of how and when mailbox register offsets are initialized, it is easier to adjust the existing defines to be relative to the mailbox region starting address. Use a specific mailbox register write function that uses these relative offsets. The reset state register addresses are calculated the same way as for other registers, described below. The IDPF then calls a new virtchnl op to fetch a list of MMIO regions that it should map. The addresses for the registers in these regions are calculated by determining what region the register resides in, adjusting the offset to be relative to that region, and then adding the register's offset to that region's mapped address. If the new virtchnl op is not supported, the IDPF will fallback to mapping the whole bar. However, it will still map them as separate regions outside the mailbox and reset state registers. This way we can use the same logic in both cases to access the MMIO space. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement IDC vport aux driver MTU change handlerJoshua Hay
The only event an RDMA vport aux driver cares about right now is an MTU change on its underlying vport. Implement and plumb the handler to signal the pre MTU change event and post MTU change events to the RDMA vport aux driver. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement remaining IDC RDMA core callbacks and handlersJoshua Hay
Implement the idpf_idc_request_reset and idpf_idc_rdma_vc_send_sync callbacks for the rdma core auxiliary driver to issue reset events to the idpf and send (synchronous) virtchnl messages to the control plane respectively. Implement and plumb the reset handler for the opposite flow as well, i.e. when the idpf is resetiing and needs to notify the rdma core auxiliary driver. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement RDMA vport auxiliary dev create, init, and destroyJoshua Hay
Implement the functions to create, initialize, and destroy an RDMA vport auxiliary device. The vport aux dev creation is dependent on the core aux device to call idpf_idc_vport_dev_ctrl to signal that it is ready for vport aux devices. Implement that core callback to either create and initialize the vport aux dev or deinitialize. RDMA vport aux dev creation is also dependent on the control plane to tell us the vport is RDMA enabled. Add a flag in the create vport message to signal individual vport RDMA capabilities. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement core RDMA auxiliary dev create, init, and destroyJoshua Hay
Add the initial idpf_idc.c file with the functions to kick off the IDC initialization, create and initialize a core RDMA auxiliary device, and destroy said device. The RDMA core has a dependency on the vports being created by the control plane before it can be initialized. Therefore, once all the vports are up after a hard reset (either during driver load a function level reset), the core RDMA device info will be created. It is populated with the function type (as distinguished by the IDC initialization function pointer), the core idc_ops function points (just stubs for now), the reserved RDMA MSIX table, and various other info the core RDMA auxiliary driver will need. It is then plugged on to the bus. During a function level reset or driver unload, the device will be unplugged from the bus and destroyed. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: use reserved RDMA vectors from control planeJoshua Hay
Fetch the number of reserved RDMA vectors from the control plane. Adjust the number of reserved LAN vectors if necessary. Adjust the minimum number of vectors the OS should reserve to include RDMA; and fail if the OS cannot reserve enough vectors for the minimum number of LAN and RDMA vectors required. Create a separate msix table for the reserved RDMA vectors, which will just get handed off to the RDMA core device to do with what it will. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01idpf: convert control queue mutex to a spinlockAhmed Zaki
With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated on module load: [ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 [ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager [ 324.701689] preempt_count: 201, expected: 0 [ 324.701693] RCU nest depth: 0, expected: 0 [ 324.701697] 2 locks held by NetworkManager/1582: [ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0 [ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870 [ 324.701749] Preemption disabled at: [ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870 [ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary) [ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 324.701774] Call Trace: [ 324.701777] <TASK> [ 324.701779] dump_stack_lvl+0x5d/0x80 [ 324.701788] ? __dev_open+0x3dd/0x870 [ 324.701793] __might_resched.cold+0x1ef/0x23d <..> [ 324.701818] __mutex_lock+0x113/0x1b80 <..> [ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf] [ 324.701935] ? kasan_save_track+0x14/0x30 [ 324.701941] idpf_mb_clean+0x143/0x380 [idpf] <..> [ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf] [ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf] [ 324.702021] ? rcu_is_watching+0x12/0xc0 [ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf] <..> [ 324.702122] __hw_addr_sync_dev+0x1cf/0x300 [ 324.702126] ? find_held_lock+0x32/0x90 [ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf] [ 324.702152] __dev_open+0x3f8/0x870 [ 324.702159] ? __pfx___dev_open+0x10/0x10 [ 324.702174] __dev_change_flags+0x443/0x650 <..> [ 324.702208] netif_change_flags+0x80/0x160 [ 324.702218] do_setlink.isra.0+0x16a0/0x3960 <..> [ 324.702349] rtnl_newlink+0x12fd/0x21e0 The sequence is as follows: rtnl_newlink()-> __dev_change_flags()-> __dev_open()-> dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock" idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON __dev_uc_sync() -> idpf_add_mac_filter -> idpf_add_del_mac_filters -> idpf_send_mb_msg() -> idpf_mb_clean() -> idpf_ctlq_clean_sq() # mutex_lock(cq_lock) Fix by converting cq_lock to a spinlock. All operations under the new lock are safe except freeing the DMA memory, which may use vunmap(). Fix by requesting a contiguous physical memory for the DMA mapping. Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-01idpf: return 0 size for RSS key if not supportedMichal Swiatkowski
Returning -EOPNOTSUPP from function returning u32 is leading to cast and invalid size value as a result. -EOPNOTSUPP as a size probably will lead to allocation fail. Command: ethtool -x eth0 It is visible on all devices that don't have RSS caps set. [ 136.615917] Call Trace: [ 136.615921] <TASK> [ 136.615927] ? __warn+0x89/0x130 [ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.615953] ? report_bug+0x164/0x190 [ 136.615968] ? handle_bug+0x58/0x90 [ 136.615979] ? exc_invalid_op+0x17/0x70 [ 136.615987] ? asm_exc_invalid_op+0x1a/0x20 [ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.616028] __alloc_pages_noprof+0xe/0x20 [ 136.616038] ___kmalloc_large_node+0x80/0x110 [ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0 [ 136.616081] __kmalloc_noprof+0x32c/0x4c0 [ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616114] ethnl_default_doit+0x107/0x3d0 [ 136.616131] genl_family_rcv_msg_doit+0x100/0x160 [ 136.616147] genl_rcv_msg+0x1b8/0x2c0 [ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10 [ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10 [ 136.616176] netlink_rcv_skb+0x58/0x110 [ 136.616186] genl_rcv+0x28/0x40 [ 136.616195] netlink_unicast+0x19b/0x290 [ 136.616206] netlink_sendmsg+0x222/0x490 [ 136.616215] __sys_sendto+0x1fd/0x210 [ 136.616233] __x64_sys_sendto+0x24/0x30 [ 136.616242] do_syscall_64+0x82/0x160 [ 136.616252] ? __sys_recvmsg+0x83/0xe0 [ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210 [ 136.616275] ? do_syscall_64+0x8e/0x160 [ 136.616282] ? __count_memcg_events+0xa1/0x130 [ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30 [ 136.616306] ? handle_mm_fault+0xae/0x2d0 [ 136.616319] ? do_user_addr_fault+0x379/0x670 [ 136.616328] ? clear_bhb_loop+0x45/0xa0 [ 136.616340] ? clear_bhb_loop+0x45/0xa0 [ 136.616349] ? clear_bhb_loop+0x45/0xa0 [ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 136.616369] RIP: 0033:0x7fd30ba7b047 [ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44 [ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047 [ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003 [ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c [ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340 [ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000 Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16libeth: convert to netmemAlexander Lobakin
Back when the libeth Rx core was initially written, devmem was a draft and netmem_ref didn't exist in the mainline. Now that it's here, make libeth MP-agnostic before introducing any new code or any new library users. When it's known that the created PP/FQ is for header buffers, use faster "unsafe" underscored netmem <--> virt accessors as netmem_is_net_iov() is always false in that case, but consumes some cycles (bit test + true branch). Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-30idpf: avoid mailbox timeout delays during resetEmil Tantilov
Mailbox operations are not possible while the driver is in reset. Operations that require MBX exchange with the control plane will result in long delays if executed while a reset is in progress: ethtool -L <inf> combined 8& echo 1 > /sys/class/net/<inf>/device/reset idpf 0000:83:00.0: HW reset detected idpf 0000:83:00.0: Device HW Reset initiated idpf 0000:83:00.0: Transaction timed-out (op:504 cookie:be00 vc_op:504 salt:be timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:508 cookie:bf00 vc_op:508 salt:bf timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:512 cookie:c000 vc_op:512 salt:c0 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:510 cookie:c100 vc_op:510 salt:c1 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c200 vc_op:509 salt:c2 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c300 vc_op:509 salt:c3 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:505 cookie:c400 vc_op:505 salt:c4 timeout:60000ms) idpf 0000:83:00.0: Failed to configure queues for vport 0, -62 Disable mailbox communication in case of a reset, unless it's done during a driver load, where the virtchnl operations are needed to configure the device. Fixes: 8077c727561aa ("idpf: add controlq init and reset checks") Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-30idpf: fix a race in txq wakeupBrian Vazquez
Add a helper function to correctly handle the lockless synchronization when the sender needs to block. The paradigm is if (no_resources()) { stop_queue(); barrier(); if (!no_resources()) restart_queue(); } netif_subqueue_maybe_stop already handles the paradigm correctly, but the code split the check for resources in three parts, the first one (descriptors) followed the protocol, but the other two (completions and tx_buf) were only doing the first part and so race prone. Luckily netif_subqueue_maybe_stop macro already allows you to use a function to evaluate the start/stop conditions so the fix only requires the right helper function to evaluate all the conditions at once. The patch removes idpf_tx_maybe_stop_common since it's no longer needed and instead adjusts separately the conditions for singleq and splitq. Note that idpf_tx_buf_hw_update doesn't need to check for resources since that will be covered in idpf_tx_splitq_frame. To reproduce: Reduce the threshold for pending completions to increase the chances of hitting this pause by changing your kernel: drivers/net/ethernet/intel/idpf/idpf_txrx.h -#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1) +#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4) Use pktgen to force the host to push small pkts very aggressively: ./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \ -p 10000-10000 -t 16 -n 0 -v -x -c 64 Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21idpf: fix idpf_vport_splitq_napi_poll()Eric Dumazet
idpf_vport_splitq_napi_poll() can incorrectly return @budget after napi_complete_done() has been called. This violates NAPI rules, because after napi_complete_done(), current thread lost napi ownership. Move the test against POLL_MODE before the napi_complete_done(). Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Reported-by: Peter Newman <peternewman@google.com> Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joshua Hay <joshua.a.hay@intel.com> Cc: Alan Brady <alan.brady@intel.com> Cc: Madhu Chittim <madhu.chittim@intel.com> Cc: Phani Burra <phani.r.burra@intel.com> Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19idpf: fix null-ptr-deref in idpf_features_checkPavan Kumar Linga
idpf_features_check is used to validate the TX packet. skb header length is compared with the hardware supported value received from the device control plane. The value is stored in the adapter structure and to access it, vport pointer is used. During reset all the vports are released and the vport pointer that the netdev private structure points to is NULL. To avoid null-ptr-deref, store the max header length value in netdev private structure. This also helps to cache the value and avoid accessing adapter pointer in hot path. BUG: kernel NULL pointer dereference, address: 0000000000000068 ... RIP: 0010:idpf_features_check+0x6d/0xe0 [idpf] Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x154/0x520 ? exc_page_fault+0x76/0x190 ? asm_exc_page_fault+0x26/0x30 ? idpf_features_check+0x6d/0xe0 [idpf] netif_skb_features+0x88/0x310 validate_xmit_skb+0x2a/0x2b0 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x19d/0x3a0 __dev_queue_xmit+0xb74/0xe70 ... Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Madhu Chititm <madhu.chittim@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add support for Rx timestampingMilena Olech
Add Rx timestamp function when the Rx timestamp value is read directly from the Rx descriptor. In order to extend the Rx timestamp value to 64 bit in hot path, the PHC time is cached in the receive groups. Add supported Rx timestamp modes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: YiFei Zhu <zhuyifei@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp flowsMilena Olech
Add functions to request Tx timestamp for the PTP packets, read the Tx timestamp when the completion tag for that packet is being received, extend the Tx timestamp value and set the supported timestamping modes. Tx timestamp is requested for the PTP packets by setting a TSYN bit and index value in the Tx context descriptor. The driver assumption is that the Tx timestamp value is ready to be read when the completion tag is received. Then the driver schedules delayed work and the Tx timestamp value read is requested through virtchnl message. At the end, the Tx timestamp value is extended to 64-bit and provided back to the skb. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp capabilities negotiationMilena Olech
Tx timestamp capabilities are negotiated for the uplink Vport. Driver receives information about the number of available Tx timestamp latches, the size of Tx timestamp value and the set of indexes used for Tx timestamping. Add function to get the Tx timestamp capabilities and parse the uplink vport flag. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add PTP clock configurationMilena Olech
PTP clock configuration operations - set time, adjust time and adjust frequency are required to control the clock and maintain synchronization process. Extend get PTP capabilities function to request for the clock adjustments and add functions to enable these actions using dedicated virtchnl messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add mailbox access to read PTP clock timeMilena Olech
When the access to read PTP clock is specified as mailbox, the driver needs to send virtchnl message to perform PTP actions. Message is sent using idpf_mbq_opc_send_msg_to_peer_drv mailbox opcode, with the parameters received during PTP capabilities negotiation. Add functions to recognize PTP messages, move them to dedicated secondary mailbox, read the PTP clock time and cross timestamp using mailbox messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: negotiate PTP capabilities and get PTP clockMilena Olech
PTP capabilities are negotiated using virtchnl command. Add get capabilities function, direct access to read the PTP clock. Set initial PTP capabilities exposed to the stack. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Willem de Bruijn <willemb@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: move virtchnl structures to the header fileMilena Olech
Move virtchnl structures to the header file to expose them for the PTP virtchnl file. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16virtchnl: add PTP virtchnl definitionsMilena Olech
PTP capabilities are negotiated using virtchnl commands. There are two available modes of the PTP support: direct and mailbox. When the direct access to PTP resources is negotiated, virtchnl messages returns a set of registers that allow read/write directly. When the mailbox access to PTP resources is negotiated, virtchnl messages are used to access PTP clock and to read the timestamp values. Virtchnl API covers both modes and exposes a set of PTP capabilities. Using virtchnl API, the driver recognizes also HW abilities - maximum adjustment of the clock and the basic increment value. Additionally, API allows to configure the secondary mailbox, dedicated exclusively for PTP purposes. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add initial PTP supportMilena Olech
PTP feature is supported if the VIRTCHNL2_CAP_PTP is negotiated during the capabilities recognition. Initial PTP support includes PTP initialization and registration of the clock. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: change the method for mailbox workqueue allocationMilena Olech
Since workqueues are created per CPU, the works scheduled to this workqueues are run on the CPU they were assigned. It may result in overloaded CPU that is not able to handle virtchnl messages in relatively short time. Allocating workqueue with WQ_UNBOUND and WQ_HIGHPRI flags allows scheduler to queue virtchl messages on less loaded CPUs, what eliminates delays. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-01Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf) For igb: Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and adds persistent NAPI config. Lastly, he removes undesired IRQs that occur while busy polling. For igc: Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the current implementation with TAPRIO. For ixgbe: Jedrzej adds separate ethtool ops for E610 devices to account for device differences. Slawomir adds devlink region support for E610 devices. For idpf: Mateusz assigns and utilizes the ptype field out of libeth_rqe_info. Michal removes unreachable code. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: remove unreachable code from setting mailbox idpf: assign extracted ptype to struct libeth_rqe_info field ixgbe: devlink: add devlink region support for E610 ixgbe: add E610 .set_phys_id() callback implementation ixgbe: apply different rules for setting FC on E610 ixgbe: add support for ACPI WOL for E610 ixgbe: create E610 specific ethtool_ops structure igc: Change Tx mode for MQPRIO offloading igc: Limit netdev_tc calls to MQPRIO igb: Get rid of spurious interrupts igb: Add support for persistent NAPI config igb: Link queues to NAPI instances igb: Link IRQs to NAPI instances ==================== Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29idpf: remove unreachable code from setting mailboxMichal Swiatkowski
Remove code that isn't reached. There is no need to check for adapter->req_vec_chunks, because if it isn't set idpf_set_mb_vec_id() won't be called. Only one path when idpf_set_mb_vec_id() is called: idpf_intr_req() -> idpf_send_alloc_vectors_msg() -> adapter->req_vec_chunk is allocated here, otherwise an error is returned and idpf_intr_req() exits with an error. The idpf_set_mb_vec_id() becomes one-liner and it is called only once. Remove it and set mailbox vector index directly. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: assign extracted ptype to struct libeth_rqe_info fieldMateusz Polchlopek
Assign the ptype extracted from qword to the ptype field of struct libeth_rqe_info. Remove the now excess ptype param of idpf_rx_singleq_extract_fields(), idpf_rx_singleq_extract_base_fields() and idpf_rx_singleq_extract_flex_fields(). Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: protect shutdown from resetLarysa Zaremba
Before the referenced commit, the shutdown just called idpf_remove(), this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task rescheduling reset. Without this flag set the shutdown process is vulnerable to HW reset or any other triggering conditions (such as default mailbox being destroyed). When one of conditions checked in idpf_service_task becomes true, vc_event_task can be rescheduled during shutdown, this leads to accessing freed memory e.g. idpf_req_rel_vector_indexes() trying to read vport->q_vector_idxs. This in turn causes the system to become defunct during e.g. systemctl kexec. Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown process, instead just cancel the serv_task before cancelling adapter->serv_task before cancelling adapter->vc_event_task to ensure that reset will not be scheduled while we are doing a shutdown. Fixes: 4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: fix potential memory leak on kcalloc() failureMichal Swiatkowski
In case of failing on rss_data->rss_key allocation the function is freeing vport without freeing earlier allocated q_vector_idxs. Fix it. Move from freeing in error branch to goto scheme. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-28idpf: fix offloads support for encapsulated packetsMadhu Chittim
Split offloads into csum, tso and other offloads so that tunneled packets do not by default have all the offloads enabled. Stateless offloads for encapsulated packets are not yet supported in firmware/software but in the driver we were setting the features same as non encapsulated features. Fixed naming to clarify CSUM bits are being checked for Tx. Inherit netdev features to VLAN interfaces as well. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Zachary Goldstein <zachmgoldstein@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02idpf: fix adapter NULL pointer dereference on rebootEmil Tantilov
With SRIOV enabled, idpf ends up calling into idpf_remove() twice. First via idpf_shutdown() and then again when idpf_remove() calls into sriov_disable(), because the VF devices use the idpf driver, hence the same remove routine. When that happens, it is possible for the adapter to be NULL from the first call to idpf_remove(), leading to a NULL pointer dereference. echo 1 > /sys/class/net/<netif>/device/sriov_numvfs reboot BUG: kernel NULL pointer dereference, address: 0000000000000020 ... RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] ... ? idpf_remove+0x22/0x1f0 [idpf] ? idpf_remove+0x1e4/0x1f0 [idpf] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x19f/0x200 pci_stop_bus_device+0x6d/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pci_iov_remove_virtfn+0xbe/0x120 sriov_disable+0x34/0xe0 idpf_sriov_configure+0x58/0x140 [idpf] idpf_remove+0x1b9/0x1f0 [idpf] idpf_shutdown+0x12/0x30 [idpf] pci_device_shutdown+0x35/0x60 device_shutdown+0x156/0x200 ... Replace the direct idpf_remove() call in idpf_shutdown() with idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform the bulk of the cleanup, such as stopping the init task, freeing IRQs, destroying the vports and freeing the mailbox. This avoids the calls to sriov_disable() in addition to a small netdev cleanup, and destroying workqueues, which don't seem to be required on shutdown. Reported-by: Yuying Ma <yuma@redhat.com> Fixes: e850efed5e15 ("idpf: add module register and probe functionality") Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.15 net-next PR. No conflicts, adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c 919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-18idpf: check error for register_netdev() on initEmil Tantilov
Current init logic ignores the error code from register_netdev(), which will cause WARN_ON() on attempt to unregister it, if there was one, and there is no info for the user that the creation of the netdev failed. WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10 ... [ 3707.563641] unregister_netdev+0x1c/0x30 [ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf] [ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf] [ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf] [ 3707.563739] idpf_remove+0xbf/0x780 [idpf] [ 3707.563769] pci_device_remove+0xab/0x1e0 [ 3707.563786] device_release_driver_internal+0x371/0x530 [ 3707.563803] driver_detach+0xbf/0x180 [ 3707.563816] bus_remove_driver+0x11b/0x2a0 [ 3707.563829] pci_unregister_driver+0x2a/0x250 Introduce an error check and log the vport number and error code. On removal make sure to check VPORT_REG_NETDEV flag prior to calling unregister and free on the netdev. Add local variables for idx, vport_config and netdev for readability. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>