summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2023-03-22wifi: ath12k: fix firmware assert during channel switch for peer staAditya Kumar Singh
Currently, during change in bandwidth for peer sta, host sends the new value of channel width via WMI_PEER_CHWIDTH set peer param command alone. This can lead to firmware assert in some scenario since before the command, firmware was having value of channel width and its corresponding phymode. After the command, host tries to set the new value of channel width alone which can become incompatible when compared with its phymode. For example: Bandwidth Upgrade ~~~~~~~~~~~~~~~~~~ After association, sta is in 40 MHz bandwidth in 11ax-HE40 phymode. After bandwidth upgrades, sta moves to 80 MHz but as per phymode, max bandwidth is still 40 MHz. Hence, firmware assert is seen. So in this case first phymode should be moved to 11ax-HE80 followed by bandwidth change. Bandwidth Downgrade ~~~~~~~~~~~~~~~~~~ Similarly, reverse of above is also possible when sta is in 40 MHz bandwidth in 11ax-HE40 phymode. Bandwidth should be changed to 20 MHz and if host sends phymode first then, phymode will become 11ax-HE20 and will be incompatible with bandwidth value and hence firmware assert will be seen. Hence, in this case first channel width should be set followed by phymode. Fix this issue by sending WMI set peer param command for phymode as well as bandwidth based on the type of bandwidth change i.e upgrade or downgrade. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230315113202.8774-1-quic_aarasahu@quicinc.com
2023-03-22wifi: ath12k: fix memory leak in ath12k_qmi_driver_event_work()Rajat Soni
Currently the buffer pointed by event is not freed in case ATH12K_FLAG_UNREGISTERING bit is set, this causes memory leak. Add a goto skip instead of return, to ensure event and all the list entries are freed properly. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Rajat Soni <quic_rajson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230315090632.15065-1-quic_rajson@quicinc.com
2023-03-22wifi: ath11k: fix BUFFER_DONE read on monitor ring rx bufferHarshitha Prem
Perform dma_sync_single_for_cpu() on monitor ring rx buffer before reading BUFFER_DONE tag and do dma_unmap_single() only after device had set BUFFER_DONE tag to the buffer. Also when BUFFER_DONE tag is not set, allow the buffer to get read next time without freeing skb. This helps to fix AP+Monitor VAP with flood traffic scenario to see monitor ring rx buffer overrun missing BUFFER_DONE tag to be set. Also remove redundant rx dma buf free performed on DP rx_mon_status_refill_ring. Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Signed-off-by: Sathishkumar Muruganandam <quic_murugana@quicinc.com> Signed-off-by: Harshitha Prem <quic_hprem@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230309164434.32660-1-quic_hprem@quicinc.com
2023-03-21net/sonic: use dma_mapping_error() for error checkZhang Changzhong
The DMA address returned by dma_map_single() should be checked with dma_mapping_error(). Fix it accordingly. Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6645a4b5c1e364312103f48b7b36783b94e197a2.1679370343.git.fthain@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: add TX_MM_HOLD to ocelot_mm_stats_layoutVladimir Oltean
The lack of a definition for this counter is what initially prompted me to investigate a problem which really manifested itself as the previous change, "net: mscc: ocelot: fix transfer from region->buf to ocelot->stats". When TX_MM_HOLD is defined in enum ocelot_stat but not in struct ocelot_stat_layout ocelot_mm_stats_layout, this creates a hole, which due to the aforementioned bug, makes all counters following TX_MM_HOLD be recorded off by one compared to their correct position. So for example, a non-zero TX_PMAC_OCTETS would be reported as TX_MERGE_FRAGMENTS, TX_PMAC_UNICAST would be reported as TX_PMAC_OCTETS, TX_PMAC_64 would be reported as TX_PMAC_PAUSE, etc etc. This is because the size of the hole (1) is much smaller than the size of the region, so the phenomenon where the stats are off-by-one, rather than lost, prevails. However, the phenomenon where stats are lost can be seen too, for example with DROP_LOCAL, which is at the beginning of its own region (offset 0x000400 vs the previous 0x0002b0 constitutes a discontinuity). This is also reported as off by one and saved to TX_PMAC_1527_MAX, but that counter is not reported to the unstructured "ethtool -S", as opposed to DROP_LOCAL which is (as "drop_local"). Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: fix transfer from region->buf to ocelot->statsVladimir Oltean
To understand the problem, we need some definitions. The driver is aware of multiple counters (enum ocelot_stat), yet not all switches supported by the driver implement all counters. There are 2 statistics layouts: ocelot_stats_layout and ocelot_mm_stats_layout, the latter having 36 counters more than the former. ocelot->stats[] is not a compact array, i.e. there are elements within it which are not going to be populated for ocelot_stats_layout. On the other hand, ocelot->stats[] is easily indexable, for example "tx_octets" for port 3 can be found at ocelot->stats[3 * OCELOT_NUM_STATS + OCELOT_STAT_TX_OCTETS], and that is why we keep it sparse. Regions, as created by ocelot_prepare_stats_regions(), are compact (every element from region->buf will correspond to a counter that is present in this switch's layout) but are not easily indexable. Let's define holes as the ranges of values of enum ocelot_stat for which ocelot_stats_layout doesn't have a "reg" defined. For example, there is a hole between OCELOT_STAT_RX_GREEN_PRIO_7 and OCELOT_STAT_TX_OCTETS which is of 23 elements that are only present on ocelot_mm_stats_layout, and as such, they are also present in enum ocelot_stat. Let's define the left extremity of the hole - the last enum ocelot_stat still defined - as A (in this case OCELOT_STAT_RX_GREEN_PRIO_7) and the right extremity - the first enum ocelot_stat that is defined after a series of undefined ones - as B (in this case OCELOT_STAT_TX_OCTETS). There is a bug in the procedure which transfers stats from region->buf[] to ocelot->stats[]. For each hole in the ocelot_stats_layout, the logic transfers the stats starting with enum ocelot_stat B to ocelot->stats[] index A + 1. So all stats after a hole are saved to a position which is off by B - A + 1 elements. This causes 2 kinds of issues: (a) counters which shouldn't increment increment (b) counters which should increment don't Holes in the ocelot_stat_layout automatically imply the end of a region and the beginning of a new one; however the reverse is not necessarily true. For example, for ocelot_mm_stat_layout, there could be multiple regions (which indicate discontinuities in register addresses) while there is no hole (which indicates discontinuities in enum ocelot_stat values). In the example above, the stats from the second region->buf[] are not transferred to ocelot->stats starting with index "port * OCELOT_NUM_STATS + OCELOT_STAT_TX_OCTETS" as they should, but rather, starting with element "port * OCELOT_NUM_STATS + OCELOT_STAT_RX_GREEN_PRIO_7 + 1". That stats[] array element is not reported to user space for switches that use ocelot_stat_layout, and that is how issue (b) occurs. However, if the length of the second region is larger than the hole, then some stats will start to be transferred to the ocelot->stats[] indices which *are* reported to user space, but those indices contain wrong values (corresponding to unexpected counters). This is how issue (a) occurs. The procedure, as it was introduced in commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats"), was not buggy, because there were no holes in the struct ocelot_stat_layout instances at that time. The problem is that when those holes were introduced, the function was not updated to take them into consideration. To update the procedure, we need to know, for each region, which enum ocelot_stat corresponds to its region->base. We have no way of deducing that based on the contents of struct ocelot_stats_region, so we need to add this information. Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: fix stats region batchingVladimir Oltean
The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to "u32 reg". However, "u32 reg" is not quite a register address, but an enum ocelot_reg, which in itself encodes an enum ocelot_target target in the upper bits, and an index into the ocelot->map[target][] array in the lower bits. So, whereas the previous code comparison between stats_layout[i].offset and last + 1 was correct (because those "offsets" at the time were 32-bit relative addresses), the new code, comparing layout[i].reg to last + 4 is not correct, because the "reg" here is an enum/index, not an actual register address. What we want to compare are indeed register addresses, but to do that, we need to actually go through the same motions as __ocelot_bulk_read_ix() itself. With this bug, all statistics counters are deemed by ocelot_prepare_stats_regions() as constituting their own region. (Truncated) log on VSC9959 (Felix) below (prints added by me): Before: region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] After: region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Since commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats") intended bulking as a performance improvement, and since now, with trivial-sized regions, performance is even worse than without bulking at all, this could easily qualify as a performance regression. Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: atheros: atl1c: remove unused atl1c_irq_reset functionTom Rix
clang with W=1 reports drivers/net/ethernet/atheros/atl1c/atl1c_main.c:214:20: error: unused function 'atl1c_irq_reset' [-Werror,-Wunused-function] static inline void atl1c_irq_reset(struct atl1c_adapter *adapter) ^ This function is not used, so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230320232317.1729464-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net/mlx5: E-Switch, Fix an Oops in error handling codeDan Carpenter
The error handling dereferences "vport". There is nothing we can do if it is an error pointer except returning the error code. Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5: Read the TC mapping of all priorities on ETS queryMaher Sanalla
When ETS configurations are queried by the user to get the mapping assignment between packet priority and traffic class, only priorities up to maximum TCs are queried from QTCT register in FW to retrieve their assigned TC, leaving the rest of the priorities mapped to the default TC #0 which might be misleading. Fix by querying the TC mapping of all priorities on each ETS query, regardless of the maximum number of TCs configured in FW. Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Overcome slow response for first macsec ASO WQEEmeel Hakim
First ASO WQE poll causes a cache miss in hardware hence the resut is delayed. It causes to the situation where such WQE is polled earlier than it is needed. Add logic to retry ASO CQ polling operation. Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")  Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Initialize link speed to zeroRoy Novich
mlx5e_port_max_linkspeed does not guarantee value assignment for speed. Avoid cases where link_speed might be used uninitialized. In case mlx5e_port_max_linkspeed fails, a default link speed of 50000 will be used for the calculations. Fixes: 3f6d08d196b2 ("net/mlx5e: Add RSS support for hairpin") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5: Fix steering rules cleanupLama Kayal
vport's mc, uc and multicast rules are not deleted in teardown path when EEH happens. Since the vport's promisc settings(uc, mc and all) in firmware are reset after EEH, mlx5 driver will try to delete the above rules in the initialization path. This cause kernel crash because these software rules are no longer valid. Fix by nullifying these rules right after delete to avoid accessing any dangling pointers. Call Trace: __list_del_entry_valid+0xcc/0x100 (unreliable) tree_put_node+0xf4/0x1b0 [mlx5_core] tree_remove_node+0x30/0x70 [mlx5_core] mlx5_del_flow_rules+0x14c/0x1f0 [mlx5_core] esw_apply_vport_rx_mode+0x10c/0x200 [mlx5_core] esw_update_vport_rx_mode+0xb4/0x180 [mlx5_core] esw_vport_change_handle_locked+0x1ec/0x230 [mlx5_core] esw_enable_vport+0x130/0x260 [mlx5_core] mlx5_eswitch_enable_sriov+0x2a0/0x2f0 [mlx5_core] mlx5_device_enable_sriov+0x74/0x440 [mlx5_core] mlx5_load_one+0x114c/0x1550 [mlx5_core] mlx5_pci_resume+0x68/0xf0 [mlx5_core] eeh_report_resume+0x1a4/0x230 eeh_pe_dev_traverse+0x98/0x170 eeh_handle_normal_event+0x3e4/0x640 eeh_handle_event+0x4c/0x370 eeh_event_handler+0x14c/0x210 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0x84 Fixes: a35f71f27a61 ("net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Block entering switchdev mode with ns inconsistencyGavin Li
Upon entering switchdev mode, VF/SF representors are spawned in the devlink instance's net namespace, whereas the PF net device transforms into the uplink representor, remaining in the net namespace the PF net device was in. Therefore, if a PF net device's namespace is different from its parent devlink net namespace, entering switchdev mode can create an illegal situation where all representors sharing the same core device are NOT in the same net namespace. To avoid this issue, block entering switchdev mode for devices whose child netdev net namespace has diverged from the parent devlink's. Fixes: 7768d1971de6 ("net/mlx5: E-Switch, Add control for encapsulation") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Set uplink rep as NETNS_LOCALGavin Li
Previously, NETNS_LOCAL was not set for uplink representors, inconsistent with VF representors, and allowed the uplink representor to be moved between net namespaces and separated from the VF representors it shares the core device with. Such usage would break the isolation model of namespaces, as devices in different namespaces would have access to shared memory. To solve this issue, set NETNS_LOCAL for uplink representors if eswitch is in switchdev mode. Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21igc: Remove obsolete DMA coalescing codeSasha Neftin
DMA coalescing is not applicable for i225 parts. This patch comes to tidy up the driver code. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21igbvf: add PCI reset handler functionsDawid Wesierski
There was a problem with resuming ping after conducting a PCI reset. This commit adds two functions, igbvf_io_prepare and igbvf_io_done, which, after being added to the pci_error_handlers struct, will prepare the drivers for a PCI reset and then bring the interface up and reset it after. This will prevent the driver from ending up in incorrect state. Test_and_set_bit is highly reliable in this context, so we are not including a timeout in this commit This introduces 900ms - 1100ms of overhead to this operation but it's in non-time-critical flow. And also allows the driver to continue functioning after the reset. Functionality documented in ethernet-controller-i350-datasheet 4.2.1.3 https://www.intel.com/content/www/us/en/products/details/ethernet/gigabit-controllers/i350-controllers/docs.html Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21igb: refactor igb_ptp_adjfine_82580 to use diff_by_scaled_ppmAndrii Staikov
Driver's .adjfine interface functions use adjust_by_scaled_ppm and diff_by_scaled_ppm introduced in commit 1060707e3809 ("ptp: introduce helpers to adjust by scaled parts per million") to calculate the required adjustment in a concise manner, but not igb_ptp_adjfine_82580. Fix it by introducing IGB_82580_BASE_PERIOD and changing function logic to use diff_by_scaled_ppm. Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21i40e: fix flow director packet filter programmingRadoslaw Tyl
Initialize to zero structures to build a valid Tx Packet used for the filter programming. Fixes: a9219b332f52 ("i40e: VLAN field for flow director") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21iavf: fix hang on reboot with iceStefan Assmann
When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: remove filters only if VSI is deletedMichal Swiatkowski
Filters shouldn't be removed in VSI rebuild path. Removing them on PF VSI results in no rule for PF MAC after changing for example queues amount. Remove all filters only in the VSI remove flow. As unload should also cause the filter to be removed introduce, a new function ice_stop_eth(). It will unroll ice_start_eth(), so remove filters and close VSI. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: check if VF exists before mode checkMichal Swiatkowski
Setting trust on VF should return EINVAL when there is no VF. Move checking for switchdev mode after checking if VF exists. Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Kalyan Kodamagula <kalyan.kodamagula@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: fix rx buffers handling for flow director packetsPiotr Raczynski
Adding flow director filters stopped working correctly after commit 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side"). As a result, only first flow director filter can be added, adding next filter leads to NULL pointer dereference attached below. Rx buffer handling and reallocation logic has been optimized, however flow director specific traffic was not accounted for. As a result driver handled those packets incorrectly since new logic was based on ice_rx_ring::first_desc which was not set in this case. Fix this by setting struct ice_rx_ring::first_desc to next_to_clean for flow director received packets. [ 438.544867] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 438.551840] #PF: supervisor read access in kernel mode [ 438.556978] #PF: error_code(0x0000) - not-present page [ 438.562115] PGD 7c953b2067 P4D 0 [ 438.565436] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 438.569794] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.2.0-net-bug #1 [ 438.577531] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [ 438.588470] RIP: 0010:ice_clean_rx_irq+0x2b9/0xf20 [ice] [ 438.593860] Code: 45 89 f7 e9 ac 00 00 00 8b 4d 78 41 31 4e 10 41 09 d5 4d 85 f6 0f 84 82 00 00 00 49 8b 4e 08 41 8b 76 1c 65 8b 3d 47 36 4a 3f <48> 8b 11 48 c1 ea 36 39 d7 0f 85 a6 00 00 00 f6 41 08 02 0f 85 9c [ 438.612605] RSP: 0018:ff8c732640003ec8 EFLAGS: 00010082 [ 438.617831] RAX: 0000000000000800 RBX: 00000000000007ff RCX: 0000000000000000 [ 438.624957] RDX: 0000000000000800 RSI: 0000000000000000 RDI: 0000000000000000 [ 438.632089] RBP: ff4ed275a2158200 R08: 00000000ffffffff R09: 0000000000000020 [ 438.639222] R10: 0000000000000000 R11: 0000000000000020 R12: 0000000000001000 [ 438.646356] R13: 0000000000000000 R14: ff4ed275d0daffe0 R15: 0000000000000000 [ 438.653485] FS: 0000000000000000(0000) GS:ff4ed2738fa00000(0000) knlGS:0000000000000000 [ 438.661563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.667310] CR2: 0000000000000000 CR3: 0000007c9f0d6006 CR4: 0000000000771ef0 [ 438.674444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.681573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.688697] PKRU: 55555554 [ 438.691404] Call Trace: [ 438.693857] <IRQ> [ 438.695877] ? profile_tick+0x17/0x80 [ 438.699542] ice_msix_clean_ctrl_vsi+0x24/0x50 [ice] [ 438.702571] ice 0000:b1:00.0: VF 1: ctrl_vsi irq timeout [ 438.704542] __handle_irq_event_percpu+0x43/0x1a0 [ 438.704549] handle_irq_event+0x34/0x70 [ 438.704554] handle_edge_irq+0x9f/0x240 [ 438.709901] iavf 0000:b1:01.1: Failed to add Flow Director filter with status: 6 [ 438.714571] __common_interrupt+0x63/0x100 [ 438.714580] common_interrupt+0xb4/0xd0 [ 438.718424] iavf 0000:b1:01.1: Rule ID: 127 dst_ip: 0.0.0.0 src_ip 0.0.0.0 UDP: dst_port 4 src_port 0 [ 438.722255] </IRQ> [ 438.722257] <TASK> [ 438.722257] asm_common_interrupt+0x22/0x40 [ 438.722262] RIP: 0010:cpuidle_enter_state+0xc8/0x430 [ 438.722267] Code: 6e e9 25 ff e8 f9 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 d7 f1 24 ff 45 84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 438.722269] RSP: 0018:ffffffff86003e50 EFLAGS: 00000246 [ 438.784108] RAX: ff4ed2738fa00000 RBX: ffbe72a64fc01020 RCX: 0000000000000000 [ 438.791234] RDX: 0000000000000000 RSI: ffffffff858d84de RDI: ffffffff85893641 [ 438.798365] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000003158af9d [ 438.805490] R10: 0000000000000008 R11: 0000000000000354 R12: ffffffff862365a0 [ 438.812622] R13: 000000661b472a87 R14: 0000000000000002 R15: 0000000000000000 [ 438.819757] cpuidle_enter+0x29/0x40 [ 438.823333] do_idle+0x1b6/0x230 [ 438.826566] cpu_startup_entry+0x19/0x20 [ 438.830492] rest_init+0xcb/0xd0 [ 438.833717] arch_call_rest_init+0xa/0x30 [ 438.837731] start_kernel+0x776/0xb70 [ 438.841396] secondary_startup_64_no_verify+0xe5/0xeb [ 438.846449] </TASK> Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21net: pasemi: Fix return type of pasemi_mac_start_tx()Nathan Chancellor
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. A warning in clang aims to catch these at compile time, which reveals: drivers/net/ethernet/pasemi/pasemi_mac.c:1665:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict] .ndo_start_xmit = pasemi_mac_start_tx, ^~~~~~~~~~~~~~~~~~~ 1 error generated. ->ndo_start_xmit() in 'struct net_device_ops' expects a return type of 'netdev_tx_t', not 'int'. Adjust the return type of pasemi_mac_start_tx() to match the prototype's to resolve the warning. While PowerPC does not currently implement support for kCFI, it could in the future, which means this warning becomes a fatal CFI failure at run time. Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230319-pasemi-incompatible-pointer-types-strict-v1-1-1b9459d8aef0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-21net: geneve: accept every ethertypeJosef Miegl
The Geneve encapsulation, as defined in RFC 8926, has a Protocol Type field, which states the Ethertype of the payload appearing after the Geneve header. Commit 435fe1c0c1f7 ("net: geneve: support IPv4/IPv6 as inner protocol") introduced a new IFLA_GENEVE_INNER_PROTO_INHERIT flag that allowed the use of other Ethertypes than Ethernet. However, it did not get rid of a restriction that prohibits receiving payloads other than Ethernet, instead the commit white-listed additional Ethertypes, IPv4 and IPv6. This patch removes this restriction, making it possible to receive any Ethertype as a payload, if the IFLA_GENEVE_INNER_PROTO_INHERIT flag is set. The restriction was set in place back in commit 0b5e8b8eeae4 ("net: Add Geneve tunneling protocol driver"), which implemented a protocol layer driver for Geneve to be used with Open vSwitch. The relevant discussion about introducing the Ethertype white-list can be found here: https://lore.kernel.org/netdev/CAEP_g=_1q3ACX5NTHxLDnysL+dTMUVzdLpgw1apLKEdDSWPztw@mail.gmail.com/ <quote> >> + if (unlikely(geneveh->proto_type != htons(ETH_P_TEB))) > > Why? I thought the point of geneve carrying protocol field was to > allow protocols other than Ethernet... is this temporary maybe? Yes, it is temporary. Currently OVS only handles Ethernet packets but this restriction can be lifted once we have a consumer that is capable of handling other protocols. </quote> This white-list was then ported to a generic Geneve netdevice in commit 371bd1061d29 ("geneve: Consolidate Geneve functionality in single module."). Preserving the Ethertype white-list at this point made sense, as the Geneve device could send out only Ethernet payloads anyways. However, now that the Geneve netdevice supports encapsulating other payloads with IFLA_GENEVE_INNER_PROTO_INHERIT and we have a consumer capable of other protocols, it seems appropriate to lift the restriction and allow any Geneve payload to be received. Signed-off-by: Josef Miegl <josef@miegl.cz> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20230319220954.21834-1-josef@miegl.cz Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-21net: dsa: b53: add support for BCM63xx RGMIIsÁlvaro Fernández Rojas
BCM63xx RGMII ports require additional configuration in order to work. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230319220805.124024-1-noltari@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-21net: dsa: qca8k: remove assignment of an_enabled in pcs_get_state()Russell King (Oracle)
pcs_get_state() implementations are not supposed to alter an_enabled. Remove this assignment. Fixes: b3591c2a3661 ("net: dsa: qca8k: Switch to PHYLINK instead of PHYLIB") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1pdsE5-00Dl2l-8F@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-21net: dsa: mv88e6xxx: fix mdio bus' phy_mask memberMarek Behún
Commit 2c7e46edbd03 ("net: dsa: mv88e6xxx: mask apparently non-existing phys during probing") added non-trivial bus->phy_mask in mv88e6xxx_mdio_register() in order to avoid excessive mdio bus transactions during probing. But the mask is incorrect for switches with non-zero phy_base_addr (such as 88E6341). Fix this. Fixes: 2c7e46edbd03 ("net: dsa: mv88e6xxx: mask apparently non-existing phys during probing") Signed-off-by: Marek Behún <kabel@kernel.org> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230319140238.9470-1-kabel@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-20octeontx2-vf: Add missing free for alloc_percpuJiasheng Jiang
Add the free_percpu for the allocated "vf->hw.lmt_info" in order to avoid memory leak, same as the "pf->hw.lmt_info" in `drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c`. Fixes: 5c0512072f65 ("octeontx2-pf: cn10k: Use runtime allocated LMTLINE region") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Geethasowjanya Akula <gakula@marvell.com> Link: https://lore.kernel.org/r/20230317064337.18198-1-jiasheng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: cxgb3: remove unused fl_to_qset functionTom Rix
clang with W=1 reports drivers/net/ethernet/chelsio/cxgb3/sge.c:169:32: error: unused function 'fl_to_qset' [-Werror,-Wunused-function] static inline struct sge_qset *fl_to_qset(const struct sge_fl *q, int qidx) ^ This function is not used, so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230319172433.1708161-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: dsa: mt7530: use external PCS driverDaniel Golle
Implement regmap access wrappers, for now only to be used by the pcs-mtk-lynxi driver. Make use of this external PCS driver and drop the now reduntant implementation in mt7530.c. As a nice side effect the SGMII registers can now also more easily be inspected for debugging via /sys/kernel/debug/regmap. Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: ethernet: mtk_eth_soc: switch to external PCS driverDaniel Golle
Now that we got a PCS driver, use it and remove the now redundant PCS code and it's header macros from the Ethernet driver. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: pcs: add driver for MediaTek SGMII PCSDaniel Golle
The SGMII core found in several MediaTek SoCs is identical to what can also be found in MediaTek's MT7531 Ethernet switch IC. As this has not always been clear, both drivers developed different implementations to deal with the PCS. Recently Alexander Couzens pointed out this fact which lead to the development of this shared driver. Add a dedicated driver, mostly by copying the code now found in the Ethernet driver. The now redundant code will be removed by a follow-up commit. Suggested-by: Alexander Couzens <lynxis@fe80.eu> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: ethernet: mtk_eth_soc: ppe: add support for flow accountingDaniel Golle
The PPE units found in MT7622 and newer support packet and byte accounting of hw-offloaded flows. Add support for reading those counters as found in MediaTek's SDK[1]. [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/bc6a6a375c800dc2b80e1a325a2c732d1737df92 Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: ethernet: mtk_eth_soc: set MDIO bus clock frequencyDaniel Golle
Set MDIO bus clock frequency and allow setting a custom maximum frequency from device tree. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: ethernet: mtk_eth_soc: add support for MT7981 SoCDaniel Golle
The MediaTek MT7981 SoC comes with two 1G/2.5G SGMII ports, just like MT7986. In addition MT7981 is equipped with a built-in 1000Base-T PHY which can be used with GMAC1. As many MT7981 boards make use of inverting SGMII signal polarity, add new device-tree attribute 'mediatek,pn_swap' to support them. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20r8169: consolidate disabling ASPM before EPHY accessHeiner Kallweit
Now that rtl_hw_aspm_clkreq_enable() is a no-op for chip versions < 32, we can consolidate disabling ASPM before EPHY access in rtl_hw_start(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: phy: meson-gxl: reuse functionality of the SMSC PHY driverHeiner Kallweit
The Amlogic Meson internal PHY's have the same register layout as certain SMSC PHY's (also for non-c22-standard registers). This seems to be more than just coincidence. Apparently they also need the same workaround for EDPD mode (energy detect power down). Therefore let's reuse SMSC PHY driver functionality in the meson-gxl PHY driver. Tested with a G12A internal PHY. I don't have GXL test hw, therefore I replace only the callbacks that are identical in the SMSC PHY driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: phy: smsc: export functions for use by meson-gxl PHY driverHeiner Kallweit
The Amlogic Meson internal PHY's have the same register layout as certain SMSC PHY's (also for non-c22-standard registers). This seems to be more than just coincidence. Apparently they also need the same workaround for EDPD mode (energy detect power down). Therefore let's export SMSC PHY driver functionality for use by the meson-gxl PHY driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Chris Healy <healych@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net/ps3_gelic_net: Use dma_mapping_errorGeoff Levand
The current Gelic Etherenet driver was checking the return value of its dma_map_single call, and not using the dma_mapping_error() routine. Fixes runtime problems like these: DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net/ps3_gelic_net: Fix RX sk_buff lengthGeoff Levand
The Gelic Ethernet device needs to have the RX sk_buffs aligned to GELIC_NET_RXBUF_ALIGN, and also the length of the RX sk_buffs must be a multiple of GELIC_NET_RXBUF_ALIGN. The current Gelic Ethernet driver was not allocating sk_buffs large enough to allow for this alignment. Also, correct the maximum and minimum MTU sizes, and add a new preprocessor macro for the maximum frame size, GELIC_NET_MAX_FRAME. Fixes various randomly occurring runtime network errors. Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3") Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20usb: plusb: remove unused pl_clear_QuickLink_features functionTom Rix
clang with W=1 reports drivers/net/usb/plusb.c:65:1: error: unused function 'pl_clear_QuickLink_features' [-Werror,-Wunused-function] pl_clear_QuickLink_features(struct usbnet *dev, int val) ^ This static function is not used, so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: usb: lan78xx: Limit packet length to skb->lenSzymon Heidrich
Packet length retrieved from descriptor may be larger than the actual socket buffer length. In such case the cloned skb passed up the network stack will leak kernel memory contents. Additionally prevent integer underflow when size is less than ETH_FCS_LEN. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net/mlx5e: Update IPsec per SA packets/bytes countRaed Salem
Providing per SA packets/bytes statistics mandates creating unique counter per SA flow for Rx/Tx, whenever offloaded SA statistics is desired query the specific SA counter to provide the stack with the needed data. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/7d5ce20ac495f3054afb633128700e7b7eeeb3cd.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5e: Use one rule to count all IPsec Tx offloaded trafficRaed Salem
Currently one counter is shared between all IPsec Tx offloaded rules to count the total amount of packets/bytes that was IPsec Tx offloaded, replace this scheme by adding a new flow table (ft) with one rule that counts all flows that passes through this table (like Rx status ft), this ft is pointed by all IPsec Tx offloaded rules. The above allows to have a counter per tx flow rule in while keeping a separate global counter that store the aggregation outcome of all these per flow counters. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/09b9119d1deb6e482fd2d17e1f5760d7c5be1e48.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5e: Support IPsec acquire default SARaed Salem
During XFRM stack acquire flow, a default SA is created to be updated later, once acquire netlink message is handled in user space. This SA is also passed to IPsec offload supporting driver, however this SA acts only as placeholder and does not have context suitable for offloading in HW yet. Identify this kind of SA by special offload flag (XFRM_DEV_OFFLOAD_FLAG_ACQ), and create a SW only context. In such cases with special mark so it won't be installed in HW in addition flow and on remove/delete free this SW only context. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/8f36d6b61631dcd73fef0a0ac623456030bc9db0.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5e: Allow policies with reqid 0, to support IKE policy holesRaed Salem
IKE policies hole, is special policy that exists to allow for IKE traffic to bypass IPsec encryption even though there is already a policies and SA(s) configured on same endpoints, these policies does not nessecarly have the reqid configured, so need to add an exception for such policies. These kind of policies are allowed under the condition that at least upper protocol and/or ips are not 0. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/cbcadde312c24de74c47d9b0616f86a5818cc9bf.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5e: Use chains for IPsec policy priority offloadPaul Blakey
Currently, policy priority field is ignored and so order of matching is unpredictable. Use chains for RX/TX policy offload to support the priority field. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/9ef3ef88858217932696ad413b1b147b799a11be.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5: fs_core: Allow ignore_flow_level on TX destPaul Blakey
ignore_flow_level is also supported by firmware on TX, remove this limitation. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/d0025722bfac0a82da758eb540fbf1ff3cacdf74.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net/mlx5: fs_chains: Refactor to detach chains from tc usagePaul Blakey
To support more generic chains that will be used on other namespaces and without tc, refactor to remove the dependency on tc terms. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/bb8570d532d569285b5bff981578507bd15350cb.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>