summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)Author
2021-10-19ice: Add support for VF rate limitingBrett Creeley
Implement ndo_set_vf_rate to support setting of min_tx_rate and max_tx_rate; set the appropriate bandwidth in the scheduler for the node representing the specified VF VSI. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19e1000e: Remove redundant statementluo penghao
This assignment statement is meaningless, because the statement will execute to the tag "set_itr_now". The clang_analyzer complains as follows: drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning: Value stored to 'current_itr' is never read. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18iavf: Combine init and watchdog state machinesMateusz Palczewski
Use single state machine for driver initialization and for service initialized driver. The init state machine implemented in init_task() is merged into the watchdog_task(). The init_task() function is removed. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-18iavf: Add __IAVF_INIT_FAILED stateMateusz Palczewski
This commit adds a new state, __IAVF_INIT_FAILED to the state machine. From now on initialization functions report errors not by returning an error value, but by changing the state to indicate that something went wrong. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-18iavf: Refactor iavf state machine trackingMateusz Palczewski
Replace state changes of iavf state machine with a method that also tracks the previous state the machine was on. This change is required for further work with refactoring init and watchdog state machines. Tracking of previous state would help us recover iavf after failure has occurred. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16ethernet: ixgb: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). ixgb_get_ee_mac_addr() is used with a non-nevdev->dev_addr pointer so we can't deal with the problem inside it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15ice: make use of ice_for_each_* macrosMaciej Fijalkowski
Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: introduce XDP_TX fallback pathMaciej Fijalkowski
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: optimize XDP_TX workloadsMaciej Fijalkowski
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: propagate xdp_ring onto rx_ringMaciej Fijalkowski
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/ Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: do not create xdp_frame on XDP_TXMaciej Fijalkowski
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: unify xdp_rings accessesMaciej Fijalkowski
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: split ice_ring onto Tx/Rx separate structsMaciej Fijalkowski
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: move ice_container_type onto ice_ring_containerMaciej Fijalkowski
Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: remove ring_active from ice_ringMaciej Fijalkowski
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14net: intel: igc_ptp: fix build for UMLRandy Dunlap
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp conversion. The function that is used is not available on UML builds, so have the function use the default system_counterval_t timestamp instead for UML builds. Prevents this build error on UML: ../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’: ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration] return convert_art_ns_to_tsc(tstamp); ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected return convert_art_ns_to_tsc(tstamp); Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ice: Print the api_patch as part of the fw.mgmt.apiBrett Creeley
Currently when a user uses "devlink dev info", the fw.mgmt.api will be the major.minor numbers as shown below: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7 <--- No patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 There are many features in the driver that depend on the major, minor, and patch version of the FW. Without the patch number in the output for fw.mgmt.api debugging issues related to the FW API version is difficult. Also, using major.minor.patch aligns with the existing firmware version which uses a 3 digit value. Fix this by making the fw.mgmt.api print the major.minor.patch versions. Shown below is the result: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7.9 <--- patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: fix getting UDP tunnel entryMichal Swiatkowski
Correct parameters order in call to ice_tunnel_idx_to_entry function. Entry in sparse port table is correct when the idx is 0. For idx 1 one correct entry should be skipped, for idx 2 two of them should be skipped etc. Change if condition to be true when idx is 0, which means that previous valid entry of this tunnel type were skipped. Fixes: b20e6c17c468 ("ice: convert to new udp_tunnel infrastructure") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Avoid crash from unnecessary IDA freeDave Ertman
In the remove path, there is an attempt to free the aux_idx IDA whether it was allocated or not. This can potentially cause a crash when unloading the driver on systems that do not initialize support for RDMA. But, this free cannot be gated by the status bit for RDMA, since it is allocated if the driver detects support for RDMA at probe time, but the driver can enter into a state where RDMA is not supported after the IDA has been allocated at probe time and this would lead to a memory leak. Initialize aux_idx to an invalid value and check for a valid value when unloading to determine if an IDA free is necessary. Fixes: d25a0fc41c1f9 ("ice: Initialize RDMA support") Reported-by: Jun Miao <jun.miao@windriver.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Fix failure to re-add LAN/RDMA Tx queuesBrett Creeley
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active the RDMA Tx queue scheduler node configuration will not be cleaned up. This will cause the rebuild/re-add of the VSI to fail due to the software structures not being correctly cleaned up for the VSI index. Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there are no RDMA scheduler nodes created, then there is no harm in calling ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if RDMA support is added for other VSI types they will also get this change. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ethernet: constify references to netdev->dev_addr in driversJakub Kicinski
This big patch sprinkles const on local variables and function arguments which may refer to netdev->dev_addr. Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Some of the changes here are not strictly required - const is sometimes cast off but pointer is not used for writing. It seems like it's still better to add the const in case the code changes later or relevant -W flags get enabled for the build. No functional changes. Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ice: Implement support for SMA and U.FL on E810-TMaciej Machnikowski
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and allow controlling them. E810-T adapters are equipped with: - 2 external bidirectional SMA connectors - 1 internal TX U.FL - 1 internal RX U.FL U.FL connectors share signal lines with the SMA connectors. The TX U.FL1 share the line with the SMA1 and the RX U.FL2 share line with the SMA2. This dependence is controlled by the ice_verify_pin_e810t. Additionally add support for the E810-T-based devices which don't use the SMA/U.FL controller. If the IO expander is not detected don't expose pins and use 2 predefined 1PPS input and output pins. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Add support for SMA control multiplexerMaciej Machnikowski
E810-T adapters have two external bidirectional SMA connectors and two internal unidirectional U.FL connectors. Multiplexing between U.FL and SMA and SMA direction is controlled using the PCA9575 expander. Add support for the PCA9575 detection and control of the respective pins of the SMA/U.FL multiplexer using the GPIO AQ API. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Implement functions for reading and setting GPIO pinsMaciej Machnikowski
Implement ice_aq_get_gpio and ice_aq_set_gpio for reading and changing the state of GPIO pins described in the topology. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Refactor ice_aqc_link_topo_addrMaciej Machnikowski
Separate link topo parameters in struct ice_aqc_link_topo_addr into new struct ice_aqc_link_topo_params. This keeps input parameters for the get_link_topo command in a separate structure and is required by future commands that operate only on link topo params without the node handle. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-12ice: fix locking for Tx timestamp tracking flushJacob Keller
Commit 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") added a lock around the Tx timestamp tracker flow which is used to cleanup any left over SKBs and prepare for device removal. This lock is problematic because it is being held around a call to ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY write command to firmware. This could lead to a deadlock if the mutex actually sleeps, and causes the following warning on a kernel with preemption debugging enabled: [ 715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573 [ 715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod [ 715.435652] INFO: lockdep is turned off. [ 715.439591] Preemption disabled at: [ 715.439594] [<0000000000000000>] 0x0 [ 715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G W OE 5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c [ 715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 715.468483] Call Trace: [ 715.470940] dump_stack_lvl+0x6a/0x9a [ 715.474613] ___might_sleep.cold+0x224/0x26a [ 715.478895] __mutex_lock+0xb3/0x1440 [ 715.482569] ? stack_depot_save+0x378/0x500 [ 715.486763] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.494979] ? kfree+0xc1/0x520 [ 715.498128] ? mutex_lock_io_nested+0x12a0/0x12a0 [ 715.502837] ? kasan_set_free_info+0x20/0x30 [ 715.507110] ? __kasan_slab_free+0x10b/0x140 [ 715.511385] ? slab_free_freelist_hook+0xc7/0x220 [ 715.516092] ? kfree+0xc1/0x520 [ 715.519235] ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.527359] ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.535133] ? pci_device_remove+0xab/0x1d0 [ 715.539318] ? __device_release_driver+0x35b/0x690 [ 715.544110] ? driver_detach+0x214/0x2f0 [ 715.548035] ? bus_remove_driver+0x11d/0x2f0 [ 715.552309] ? pci_unregister_driver+0x26/0x250 [ 715.556840] ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.564799] ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.570554] ? do_syscall_64+0x3b/0x90 [ 715.574303] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.579529] ? start_flush_work+0x542/0x8f0 [ 715.583719] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.591923] ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.599960] ? wait_for_completion_io+0x250/0x250 [ 715.604662] ? lock_acquire+0x196/0x200 [ 715.608504] ? do_raw_spin_trylock+0xa5/0x160 [ 715.612864] ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.620813] ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.628497] ? __debug_check_no_obj_freed+0x1e8/0x3c0 [ 715.633550] ? trace_hardirqs_on+0x1c/0x130 [ 715.637748] ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.646220] ? do_raw_spin_trylock+0xa5/0x160 [ 715.650581] ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.658797] ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.667013] ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.675403] ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.683440] ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.691037] ? _raw_spin_unlock_irqrestore+0x46/0x73 [ 715.696005] pci_device_remove+0xab/0x1d0 [ 715.700018] __device_release_driver+0x35b/0x690 [ 715.704637] driver_detach+0x214/0x2f0 [ 715.708389] bus_remove_driver+0x11d/0x2f0 [ 715.712489] pci_unregister_driver+0x26/0x250 [ 715.716857] ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.724637] __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.730210] ? free_module+0x6d0/0x6d0 [ 715.733963] ? task_work_run+0xe1/0x170 [ 715.737803] ? exit_to_user_mode_loop+0x17f/0x1d0 [ 715.742509] ? rcu_read_lock_sched_held+0x12/0x80 [ 715.747215] ? trace_hardirqs_on+0x1c/0x130 [ 715.751401] do_syscall_64+0x3b/0x90 [ 715.754981] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.760033] RIP: 0033:0x7f4dfe59000b [ 715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48 [ 715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b [ 715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918 [ 715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940 [ 715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0 Notice that this is the only case where we use the lock in this way. In the cleanup kthread and work kthread the lock is only taken around the bit accesses. This was done intentionally to avoid this kind of issue. The way the lock is used, we only protect ordering of bit sets vs bit clears. The Tx writers in the hot path don't need to be protected against the entire kthread loop. The Tx queues threads only need to ensure that they do not re-use an index that is currently in use. The cleanup loop does not need to block all new set bits, since it will re-queue itself if new timestamps are present. Fix the tracker flow so that it uses the same flow as the standard cleanup thread. In addition, ensure the in_use bitmap actually gets cleared properly. This fixes the warning and also avoids the potential deadlock that might have occurred otherwise. Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-11ice: ndo_setup_tc implementation for PRMichal Swiatkowski
Add tc-flower support for VF port representor devices. Implement ndo_setup_tc callback for TC HW offload on VF port representors devices. Implemented both methods: add and delete tc-flower flows. Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC infrastructure for port representor. Implement TC filters replay function required to restore filters settings while switchdev configuration is rebuilt. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: ndo_setup_tc implementation for PFKiran Patil
Implement ndo_setup_tc net device callback for TC HW offload on PF device. ndo_setup_tc provides support for HW offloading various TC filters. Add support for configuring the following filter with tc-flower: - default L2 filters (src/dst mac addresses, ethertype, VLAN) - variations of L3, L3+L4, L2+L3+L4 filters using advanced filters (including ipv4 and ipv6 addresses). Allow for adding/removing TC flows when PF device is configured in eswitch switchdev mode. Two types of actions are supported at the moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT. Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com> Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: Allow changing lan_en and lb_en on all kinds of filtersMichal Swiatkowski
There is no way to change default lan_en and lb_en flags while adding new rule. Add function that allows changing these flags on rule determined by rule id and recipe id. Function checks if the rule is presented on regular rules list or advance rules list and call the appropriate function to update rule entry. As rules with ICE_SW_LKUP_DFLT recipe aren't tracked in a list, implement function which updates flags without searching for rules based only on rule id. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: cleanup rules infoVictor Raj
Change ICE_SW_LKUP_LAST to ICE_MAX_NUM_RECIPES as for now there also can be recipes other than the default. Free all structures created for advanced recipes in cleanup function. Write a function to clean allocated structures on advanced rule info. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: allow deleting advanced rulesShivanshu Shukla
To remove advanced rule the same protocols list like in adding should be send to function. Based on this information list of advanced rules is searched to find the correct rule id. Remove advanced rule if it forwards to only one VSI. If it forwards to list of VSI remove only input VSI from this list. Introduce function to remove rule by id. It is used in case rule needs to be removed even if it forwards to the list of VSI. Allow removing all advanced rules from a particular VSI. It is useful in rebuilding VSI path. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Shivanshu Shukla <shivanshu.shukla@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: allow adding advanced rulesGrishma Kotecha
Define dummy packet headers to allow adding advanced rules in HW. This header is used as admin queue command parameter for adding a rule. The firmware will extract correct fields and will use them in look ups. Define each supported packets header and offsets to words used in recipe. Supported headers: - MAC + IPv4 + UDP - MAC + VLAN + IPv4 + UDP - MAC + IPv4 + TCP - MAC + VLAN + IPv4 + TCP - MAC + IPv6 + UDP - MAC + VLAN + IPv6 + UDP - MAC + IPv6 + TCP - MAC + VLAN + IPv6 + TCP Add code for creating an advanced rule. Rule needs to match defined dummy packet, if not return error, which means that this type of rule isn't currently supported. The first step in adding advanced rule is searching for an advanced recipe matching this kind of rule. If it doesn't exist new recipe is created. Dummy packet has to be filled with the correct header field value from the rule definition. It will be used to do look up in HW. Support searching for existing advance rule entry. It is used in case of adding the same rule on different VSI. In this case, instead of creating new rule, the existing one should be updated with refreshed VSI list. Add initialization for prof_res_bm_init flag to zero so that the possible resource for fv in the files can be initialized. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: create advanced switch recipeDan Nowlin
These changes introduce code for creating advanced recipes for the switch in hardware. There are a couple of recipes already defined in the HW. They apply to matching on basic protocol headers, like MAC, VLAN, MACVLAN, ethertype or direction (promiscuous), etc.. If the user wants to match on other protocol headers (eg. ip address, src/dst port etc.) or different variation of already supported protocols, there is a need to create new, more complex recipe. That new recipe is referred as 'advanced recipe', and the filtering rule created on top of that recipe is called 'advanced rule'. One recipe can have up to 5 words, but the first word is always reserved for match on switch id, so the driver can define up to 4 words for one recipe. To support recipes with more words up to 5 recipes can be chained, so 20 words can be programmed for look up. Input for adding recipe function is a list of protocols to support. Based on this list correct profile is being chosen. Correct profile means that it contains all protocol types from a list. Each profile have up to 48 field vector words and each of this word have protocol id and offset. These two fields need to match with input data for adding recipe function. If the correct profile can't be found the function returns an error. The next step after finding the correct profile is grouping words into groups. One group can have up to 4 words. This is done to simplify sending recipes to HW (because recipe also can have up to 4 words). In case of chaining (so when look up consists of more than 4 words) last recipe will always have results from the previous recipes used as words. A recipe to profile map is used to store information about which profile is associate with this recipe. This map is an array of 64 elements (max number of recipes) and each element is a 256 bits bitmap (max number of profiles) Profile to recipe map is used to store information about which recipe is associate with this profile. This map is an array of 256 elements (max number of profiles) and each element is a 64 bits bitmap (max number of recipes) Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: manage profiles and field vectorsDan Nowlin
Implement functions to manage profiles and field vectors in hardware. In hardware, there are up to 256 profiles and each of these profiles can have 48 field vector words. Each field vector word is described by protocol id and offset in the packet. To add a new recipe all used profiles need to be searched. If the profile contains all required protocol ids and offsets from the recipe it can be used. The driver has to add this profile to recipe association to tell hardware that newly added recipe is going to be associated with this profile. The amount of used profiles depend on the package. To avoid searching across not used profile, max profile id value is calculated at init flow. The profile is considered as unused when all field vector words in the profile are invalid (protocol id 0xff and offset 0x1ff). Profiles are read from the package section ICE_SID_FLD_VEC_SW. Empty field vector words can be used for recipe results. Store all unused field vector words in prof_res_bm. It is a 256 elements array (max number of profiles) each element is a 48 bit bitmap (max number of field vector words). For now, support only non-tunnel profiles type. Co-developed-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: implement low level recipes functionsGrishma Kotecha
Add code to manage recipes and profiles on admin queue layer. Allow the driver to add a new recipe and update an existing one. Get a recipe and get a recipe to profile association is mostly used in update existing recipes code. Only default recipes can be updated. An update is done by reading recipes from HW, changing their params and calling add recipe command. Support following admin queue commands: - ice_aqc_opc_add_recipe (0x0290) - create a recipe with protocol header information and other details that determine how this recipe filter works - ice_aqc_opc_recipe_to_profile (0x0291) - associate a switch recipe to a profile - ice_aqc_opc_get_recipe (0x0292) - get details of an existing recipe - ice_aqc_opc_get_recipe_to_profile (0x0293) - get a recipe associated with profile ID Define ICE_AQC_RES_TYPE_RECIPE resource type to hold a switch recipe. It is needed when a new switch recipe needs to be created. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-08Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-07 Michal Swiatkowski says: The following patch series introduces basic switchdev model support in ice driver. Implement the following blocks of switchdev framework: - VF port representors creation - control plane VSI definition - exception path (a. k. a. "slow-path") - to allow a virtual switch or linux bridge to receive any packet that doesn't match any hw filter - link state management of virtual ports - query virtual port statistics Hardware offload support in switchdev mode is out of scope of this patchset. Devlink interface is used to toggle between switchdev and legacy (the default) modes of the driver. --- Note: This series includes the use enum ice_status, however, we have patches in our queue to remove it from the driver [1]. We are working through the patches that precede the removal series. [1] https://patchwork.ozlabs.org/project/intel-wired-lan/list/?series=265957 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07ice: add port representor ethtool ops and statsWojciech Drewek
Introduce the following ethtool operations for VF's representor: -get_drvinfo -get_strings -get_ethtool_stats -get_sset_count -get_link In all cases, existing operations were used with minor changes which allow us to detect if ethtool op was called for representor. Only VF VSI stats will be available for representor. Implement ndo_get_stats64 for port representor. This will update VF VSI stats and read them. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: switchdev slow pathGrzegorz Nitka
Slow path means allowing packet to go from uplink to representor and from representor to correct VF on Rx site and from VF to representor and to uplink on Tx site. To accomplish this driver, has to set correct Tx descriptor. When packet is sent from representor to VF, destination should be set to VF VSI. When packet is sent from uplink port destination should be uplink to bypass switch infrastructure and send packet outside. On Rx site driver should check source VSI field from Rx descriptor and based on that forward packed to correct netdev. To allow this there is a target netdevs table in control plane VSI struct. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: rebuild switchdev when resetting all VFsGrzegorz Nitka
As resetting all VFs behaves mostly like creating new VFs also eswitch infrastructure has to be recreated. The easiest way to do that is to rebuild eswitch after resetting VFs. Implement helper functions to start and stop all representors queues. This is used to disable traffic on port representors. In rebuild path: - NAPI has to be disabled - eswitch environment has to be set up - new port representors have to be created, because the old one had pointer to not existing VFs - new control plane VSI ring should be remapped - NAPI hast to be enabled - rxdid has to be set to FLEX_NIC_2, because this descriptor id support source_vsi, which is needed on control plane VSI queues - port representors queues have to be started Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: enable/disable switchdev when managing VFsGrzegorz Nitka
Only way to enable switchdev is to create VFs when the eswitch mode is set to switchdev. Check if correct mode is set and enable switchdev in function which creating VFs. Disable switchdev when user change number of VFs to 0. Changing eswitch mode back to legacy when VFs are created in switchdev mode isn't allowed. As switchdev takes care of managing filter rules, adding new rules on VF is blocked. In case of resetting VF driver has to update pointer in ice_repr struct, because after reset VSI related things can change. Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: introduce new type of VSI for switchdevGrzegorz Nitka
New type of VSI has to be defined for switchdev control plane VSI. Number of allocated Tx and Rx queue has to be equal to amount of VFs, because each port representor should have one Tx and Rx queue. Also to not increase number of used irqs too much, control plane VSI uses only one q_vector and handle all queues in one irq. To allow handling all queues in one irq , new function to clean msix for eswitch was introduced. This function will schedule napi for each representor instead of scheduling it only for one like in normal clean irq function. Only one additional msix has to be requested. Always try to request it in ice_ena_msix_range function. Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: set and release switchdev environmentGrzegorz Nitka
Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: allow changing lan_en and lb_en on dflt rulesMichal Swiatkowski
There is no way to change default lan_en and lb_en flags while adding new rule. Add function that allows changing these flags on ICE_SW_LKUP_DFLT recipe and any rule id. lan_en allows packet to go outside if rule is matched. Clearing this bit will block packet from sending it outside. lb_en allows packet to be forwarded to other VSI. Clearing this bit will block packet from forwarding it to other VSI. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: manage VSI antispoof and destination overrideMichal Swiatkowski
Implement functions to make setting VSI security config easier. Main function ice_update_security fills security section field and checks against error in updating VSI. Reset functions are responsible for correct filling config according to user expectations. This helper is needed because destination override is located in this section. Driver has to set this bit to allow strering Tx packet on VSI based on value in Tx descriptors. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: allow process VF opcodes in different waysMichal Swiatkowski
In switchdev driver shouldn't add MAC, VLAN and promisc filters on iavf demand but should return success to not break normal iavf flow. Achieve that by creating table of functions pointer with default functions used to parse iavf command. While parse iavf command, call correct function from table instead of calling function direct. When port representors are being created change functions in table to new one that behaves correctly for switchdev puprose (ignoring new filters). Change back to default ops when representors are being removed. Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: introduce VF port representorMichal Swiatkowski
Port representor is used to manage VF from host side. To allow it each created representor registers netdevice with random hw address. Also devlink port is created for all representors. Port representor name is created based on switch id or managed by devlink core if devlink port was registered with success. Open and stop ndo ops are implemented to allow managing the VF link state. Link state is tracked in VF struct. Struct ice_netdev_priv is extended by pointer to representor field. This is needed to get correct representor from netdev struct mostly used in ndo calls. Implement helper functions to check if given netdev is netdev of port representor (ice_is_port_repr_netdev) and to get representor from netdev (ice_netdev_to_repr). As driver mostly will create or destroy port representors on all VFs instead of on single one, write functions to add and remove representor for each VF. Representor struct contains pointer to source VSI, which is VSI configured on VF, backpointer to VF, backpointer to netdev, q_vector pointer and metadata_dst which will be used in data path. Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: Move devlink port to PF/VF structWojciech Drewek
Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: support basic E-Switch mode controlMichal Swiatkowski
Write set and get eswitch mode functions used by devlink ops. Use new pf struct member eswitch_mode to track current eswitch mode in driver. Changing eswitch mode is only allowed when there are no VFs created. Create new file for eswitch related code. Add config flag ICE_SWITCHDEV to allow user to choose if switchdev support should be enabled or disabled. Use case examples: - show current eswitch mode ('legacy' is the default one) [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy - move to 'switchdev' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode switchdev [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode switchdev - create 2 VFs [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs - unsuccessful attempt to change eswitch mode while VFs are created [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy devlink answers: Operation not supported - destroy VFs [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs - restore 'legacy' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>