Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-11-10
This series contains updates to i40e and igc drivers and the MAINTAINERS
file.
Slawomir fixes updating VF MAC addresses to fix various issues related
to reporting and setting of these addresses for i40e.
Dan Carpenter fixes a possible used before being initialized issue for
i40e.
Vinicius fixes reporting of netdev stats for igc.
Tony updates repositories for Intel Ethernet Drivers.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
MAINTAINERS: Update repositories for Intel Ethernet Drivers
igc: Fix returning wrong statistics
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
i40e: Fix MAC address setting for a VF via Host/VM
====================
Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In function ‘strncpy’,
inlined from ‘sky2_name’ at drivers/net/ethernet/marvell/sky2.c:4903:3,
inlined from ‘sky2_probe’ at drivers/net/ethernet/marvell/sky2.c:5049:2:
./include/linux/string.h:297:30: warning: ‘__builtin_strncpy’ specified bound 16 equals destination size [-Wstringop-truncation]
None of the device names are 16 characters long, so it was never an
issue. But replace the strncpy with an snprintf() to prevent the
theoretical overflow.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20201110023222.1479398-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Stop the queue and ask for the credits if queue reaches to
threashold.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
context id and port id should be filled while sending tcb update.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If TCP congestion caused a very small packets which only has some
part fo the TAG, and that too is not till the end. HW can't handle
such case, so falling back to sw crypto in such cases.
v1->v2:
- Marked chcr_ktls_sw_fallback() static.
Fixes: dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If its a last packet and fin is set. Make sure FIN is informed
to HW before skb gets freed.
Fixes: 429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There could be a case where ACK for tls exchanges prior to start
marker is missed out, and by the time tls is offloaded. This pkt
should not be discarded and handled carefully. It could be
plaintext alone or plaintext + finish as well.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a record starts in middle, reset TCB UNA so that we could
avoid sending out extra packet which is needed to make it 16
byte aligned to start AES CTR.
Check also considers prev_seq, which should be what is
actually sent, not the skb data length.
Avoid updating partial TAG to HW at any point of time, that's
why we need to check if remaining part is smaller than TAG
size, then reset TX_MAX to be TAG starting sequence number.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If an skb has only header part which doesn't start from
beginning, is not being handled properly.
Fixes: dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
trimmed length calculation goes wrong if skb has only tag part
to send. It should be zero if there is no data bytes apart from
TAG.
Fixes: dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Creating SKB per tls record and freeing the original one causes
panic. There will be race if connection reset is requested. By
freeing original skb, refcnt will be decremented and that means,
there is no pending record to send, and so tls_dev_del will be
requested in control path while SKB of related connection is in
queue.
Better approach is to use same SKB to send one record (partial
data) at a time. We still have to create a new SKB when partial
last part of a record is requested.
This fix introduces new API cxgb4_write_partial_sgl() to send
partial part of skb. Present cxgb4_write_sgl can only provide
feasibility to start from an offset which limits to header only
and it can write sgls for the whole skb len. But this new API
will help in both. It can start from any offset and can end
writing in middle of the skb.
v4->v5:
- Removed extra changes.
Fixes: 429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Checksum update was missing in the WR.
Fixes: 429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a possibility of linear skbs coming in. Correcting
the length extraction logic.
v2->v3:
- Separated un-related changes from this patch.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If skb has retransmit data starting before start marker, e.g. ccs,
decrypted bit won't be set for that, and if it has some data to
encrypt, then it must be given to crypto ULD. So in place of
decrypted, check if socket is tls offloaded. Also, unless skb has
some data to encrypt, no need to give it for tls offload handling.
v2->v3:
- Removed ifdef.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The return pointer of dwc_eth_dwmac_data's .probe isn't used, and
"probe" usually return int, so change the prototype to follow standard
way. Secondly, it can simplify the tegra_eqos_probe() code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201109160440.3a736ee3@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the net core, the struct net_device_ops -> ndo_set_rx_mode()
callback is called with the dev->addr_list_lock spinlock held.
However, this driver's ndo_set_rx_mode callback eventually calls
lan743x_dp_write(), which acquires a mutex. Mutex acquisition
may sleep, and this is not allowed when holding a spinlock.
Fix by removing the dp_lock mutex entirely. Its purpose is to
prevent concurrent accesses to the data port. No concurrent
accesses are possible, because the dev->addr_list_lock
spinlock in the core only lets through one thread at a time.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 6f197fb63850 ("lan743x: Added fixed link and RGMII support")
assumes that chips with an internal PHY will never have a devicetree
entry. This is incorrect: even for these chips, a devicetree entry
can be useful e.g. to pass the mac address from bootloader to chip:
&pcie {
status = "okay";
host@0 {
reg = <0 0 0 0 0>;
#address-cells = <3>;
#size-cells = <2>;
lan7430: ethernet@0 {
/* LAN7430 with internal PHY */
compatible = "microchip,lan743x";
status = "okay";
reg = <0 0 0 0 0>;
/* filled in by bootloader */
local-mac-address = [00 00 00 00 00 00];
};
};
};
If a devicetree entry is present, the driver will not attach the chip
to its internal phy, and the chip will be non-operational.
Fix by tweaking the phy connection algorithm:
- first try to connect to a phy specified in the devicetree
(could be 'real' phy, or just a 'fixed-link')
- if that doesn't succeed, try to connect to an internal phy, even
if the chip has a devnode
Tested on a LAN7430 with internal PHY. I cannot test a device using
fixed-link, as I do not have access to one.
Fixes: 6f197fb63850 ("lan743x: Added fixed link and RGMII support")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201108171224.23829-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following coccicheck warning:
./drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:2415:5-11: Unneeded variable: "retval". Return "0" on line 2435
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1604837580-12419-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The '!=' expression itself is bool, no need to convert it to bool.
Fix the following coccicheck warning:
./drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1477:34-39: WARNING: conversion to bool not needed here
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1604797919-10157-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'igc_update_stats()' was not updating 'netdev->stats', so the returned
statistics, for example, requested by:
$ ip -s link show dev enp3s0
were not being updated and were always zero.
Fix by returning a set of statistics that are actually being
updated (adapter->stats64).
Fixes: c9a11c23ceb6 ("igc: Add netdev")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The "failure" variable is used without being initialized. It should be
set to false.
Fixes: 8cbf74149903 ("i40e, xsk: move buffer allocation out of the Rx processing loop")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix MAC setting flow for the PF driver.
Update the unicast VF's MAC address in VF structure if it is
a new setting in i40e_vc_add_mac_addr_msg.
When unicast MAC address gets deleted, record that and
set the new unicast MAC address that is already waiting in the filter
list. This logic is based on the order of messages arriving to
the PF driver.
Without this change the MAC address setting was interpreted
incorrectly in the following use cases:
1) Print incorrect VF MAC or zero MAC
ip link show dev $pf
2) Don't preserve MAC between driver reload
rmmod iavf; modprobe iavf
3) Update VF MAC when macvlan was set
ip link add link $vf address $mac $vf.1 type macvlan
4) Failed to update mac address when VF was trusted
ip link set dev $vf address $mac
This includes all other configurations including above commands.
Fixes: f657a6e1313b ("i40e: Fix VF driver MAC address configuration")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix the following coccinelle warnings:
./drivers/net/ethernet/mellanox/mlx4/en_rx.c:687:1-17: WARNING: Assignment of 0/1 to bool variable
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/1604732038-6057-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set all EHL/TGL phy_addr to -1 so that the driver will automatically
detect it at run-time by probing all the possible 32 addresses.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Link: https://lore.kernel.org/r/20201106094341.4241-1-vee.khee.wong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch fixes NULL pointer dereference due to NULL pcs_config
in pcs_ops.
Fixes: e4e143e26ce8 ("net: macb: add support for high speed interface")
Reported-by: Nicolas Ferre <Nicolas.Ferre@microchip.com>
Link: https://lore.kernel.org/netdev/2db854c7-9ffb-328a-f346-f68982723d29@microchip.com/
Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/1604599113-2488-1-git-send-email-pthombar@cadence.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With CONFIG_BRIDGE=m the compilation fails:
ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event':
prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled'
in case the driver is statically enabled.
Fix it by adding 'BRIDGE || BRIDGE=n' dependency.
Fixes: e1189d9a5fbe ("net: marvell: prestera: Add Switchdev driver implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201106161128.24069-1-vadym.kochan@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-03
v1->v2:
- Fix fixes line tag in patch #1
- Toss ktls refcount leak fix, Maxim will look further into the root
cause.
- Toss eswitch chain 0 prio patch, until we determine if it is needed
for -rc and net.
* tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix incorrect access of RCU-protected xdp_prog
net/mlx5e: Fix VXLAN synchronization after function reload
net/mlx5: E-switch, Avoid extack error log for disabled vport
net/mlx5: Fix deletion of duplicate rules
net/mlx5e: Use spin_lock_bh for async_icosq_lock
net/mlx5e: Protect encap route dev from concurrent release
net/mlx5e: Fix modify header actions memory leak
====================
Link: https://lore.kernel.org/r/20201105202129.23644-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RTL8125B has same or similar short packet hw padding bug as RTL8168evl.
The main workaround has been extended accordingly, however we have to
disable also hw checksumming for short packets on affected new chip
versions. Instead of checking for an affected chip version let's
simply disable hw checksumming for short packets in general.
v2:
- remove the version checks and disable short packet hw csum in general
- reflect this in commit title and message
Fixes: 0439297be951 ("r8169: add support for RTL8125B")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/7fbb35f0-e244-ef65-aa55-3872d7d38698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The caller of rtl8169_tso_csum_v2() frees the skb if false is returned.
eth_skb_pad() internally frees the skb on error what would result in a
double free. Therefore use __skb_put_padto() directly and instruct it
to not free the skb on error.
Fixes: b423e9ae49d7 ("r8169: fix offloaded tx checksum for small packets.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the gcc warning:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c:2673:9: warning: this 'for' clause does not guard... [-Wmisleading-indentation]
2673 | for (i = 0; i < n; ++i) \
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1604467444-23043-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
MDIO spec does not require an MDC at all times, only when MDIO
transactions are occurring. This patch allows the xilinx_axienet
driver to disable the MDC when not in use, and re-enable it when
needed. It also simplifies the driver by removing MDC disable
and enable in device reset sequence.
Signed-off-by: Clayton Rayment <clayton.rayment@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce helper functions to enable/disable MDIO interface clock. This
change serves a preparatory patch for the coming feature to dynamically
control the management bus clock.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
where it restructures do_reset. There are patches being tested that
would require major rework if this is committed first.
We will resend this after the other patches have been applied.
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201106191745.1679846-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc3, including fixes from wireless, can, and
netfilter subtrees.
Current merge window - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous releases - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even if
inner packet has IP_DF (don't fragment) set in its header (when
TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break gso
support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous releases - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need to
be dropped
- always clone the skbs to make sure they have a reference on the
socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200"
* tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
ionic: check port ptr before use
r8169: work around short packet hw bug on RTL8125
net: openvswitch: silence suspicious RCU usage warning
chelsio/chtls: fix always leaking ctrl_skb
chelsio/chtls: fix memory leaks caused by a race
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-11-03
This series includes updates to mlx5 software steering component.
1) Few improvements in the DR area, such as removing unneeded checks,
renaming to better general names, refactor in some places, etc.
2) Software steering (DR) Memory management improvements
This patch series contains SW Steering memory management improvements:
using buddy allocator instead of an existing bucket allocator, and
several other optimizations.
The buddy system is a memory allocation and management algorithm
that manages memory in power of two increments.
The algorithm is well-known and well-described, such as here:
https://en.wikipedia.org/wiki/Buddy_memory_allocation
Linux uses this algorithm for managing and allocating physical pages,
as described here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html
In our case, although the algorithm in principal is similar to the
Linux physical page allocator, the "building blocks" and the circumstances
are different: in SW steering, buddy allocator doesn't really allocates
a memory, but rather manages ICM (Interconnect Context Memory) that was
previously allocated and registered.
The ICM memory that is used in SW steering is always power
of 2 (order), so buddy system is a good fit for this.
Patches in this series:
[PATH 4] net/mlx5: DR, Add buddy allocator utilities
This patch adds a modified implementation of a well-known buddy allocator,
adjusted for SW steering needs: the algorithm in principal is similar to
the Linux physical page allocator, but in our case buddy allocator doesn't
really allocate a memory, but rather manages ICM memory that was previously
allocated and registered.
[PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
This patch changes ICM management of SW steering to use buddy-system mechanism
Instead of the previous bucket management.
[PATH 6] net/mlx5: DR, Sync chunks only during free
This patch makes syncing happen only when freeing memory chunks.
[PATH 7] net/mlx5: DR, ICM memory pools sync optimization
This patch adds tracking of pool's "hot" memory and makes the
check whether steering sync is required much shorter and faster.
[PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused
This patch adds tracking buddy's used ICM memory,
and frees the buddy if all its memory becomes unused.
3) Misc code cleanups
* tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net: mlx5: Replace in_irq() usage
net/mlx5: Cleanup kernel-doc warnings
net/mlx4: Cleanup kernel-doc warnings
net/mlx5e: Validate stop_room size upon user input
net/mlx5: DR, Free unused buddy ICM memory
net/mlx5: DR, ICM memory pools sync optimization
net/mlx5: DR, Sync chunks only during free
net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
net/mlx5: DR, Add buddy allocator utilities
net/mlx5: DR, Rename matcher functions to be more HW agnostic
net/mlx5: DR, Rename builders HW specific names
net/mlx5: DR, Remove unused member of action struct
====================
Link: https://lore.kernel.org/r/20201105201242.21716-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rq->xdp_prog is RCU-protected and should be accessed only with
rcu_access_pointer for the NULL check in mlx5e_poll_rx_cq.
rq->xdp_prog may change on the fly only from one non-NULL value to
another non-NULL value, so the checks in mlx5e_xdp_handle and
mlx5e_poll_rx_cq will have the same result during one NAPI cycle,
meaning that no additional synchronization is needed.
Fixes: fe45386a2082 ("net/mlx5e: Use RCU to protect rq->xdp_prog")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
During driver reload, perform firmware tear-down which results in
firmware losing the configured VXLAN ports. These ports are still
available in the driver's database. Fix this by cleaning up driver's
VXLAN database in the nic unload flow, before firmware tear-down. With
that, minimize mlx5_vxlan_destroy() to remove only what was added in
mlx5_vxlan_create() and warn on leftover UDP ports.
Fixes: 18a2b7f969c9 ("net/mlx5: convert to new udp_tunnel infrastructure")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.
Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When a rule is duplicated, the refcount of the rule is increased so only
the second deletion of the rule should cause destruction of the FTE.
Currently, the FTE will be destroyed in the first deletion of rule since
the modify_mask will be 0.
Fix it and call to destroy FTE only if all the rules (FTE's children)
have been removed.
Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
async_icosq_lock may be taken from softirq and non-softirq contexts. It
requires protection with spin_lock_bh, otherwise a softirq may be
triggered in the middle of the critical section, and it may deadlock if
it tries to take the same lock. This patch fixes such a scenario by
using spin_lock_bh to disable softirqs on that CPU while inside the
critical section.
Fixes: 8d94b590f1e4 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
device and not necessary mlx5 eswitch port representor. As such, in order
to ensure that route_dev is not destroyed concurrent the code needs either
explicitly take reference to the device before releasing reference to
rtable instance or ensure that caller holds rtnl lock. First approach is
chosen as a fix since rtnl lock dependency was intentionally removed from
mlx5 TC layer.
To prevent unprotected usage of route_dev in encap code take a reference to
the device before releasing rt. Don't save direct pointer to the device in
mlx5_encap_entry structure and use ifindex instead. Modify users of
route_dev pointer to properly obtain the net device instance from its
ifindex.
Fixes: 61086f391044 ("net/mlx5e: Protect encap hash table with mutex")
Fixes: 6707f74be862 ("net/mlx5e: Update hw flows when encap source mac changed")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.
Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Fixes: 2f4fe4cab073 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.
The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.
mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
$ git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/ | \
xargs scripts/kernel-doc -none
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_I2C' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_DONTCARE' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:118:
warning: Function parameter or member 'cb_arg' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Function parameter or member 'conn' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Excess function parameter 'fdev' description ...
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
|
|
$ git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/ | \
xargs scripts/kernel-doc -none
drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Function parameter or member 'in_param' not described ...
drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Excess function parameter 'out_param' description ...
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
|
|
Stop room is a space that may be taken by WQEs in the SQ during a packet
transmit. It is used to check if next packet has enough room in the SQ.
Stop room guarantees this packet can be served and if not, the queue is
stopped, so no more packets are passed to the driver until it's ready.
Currently, stop_room size is calculated and validated upon tx queues
allocation. This makes it impossible to know if user provided valid
input for certain parameters when interface is down.
Instead, store stop_room in mlx5e_sq_param and create
mlx5e_validate_params(), to validate its fields upon user input even
when the interface is down.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Track buddy's used ICM memory, and free it if all
of the buddy's memory bacame unused.
Do this only for STEs.
MODIFY_ACTION buddies are much smaller, so in case there
is a large amount of modify_header actions, which result
in large amount of MODIFY_ACTION buddies, doing this
cleanup during sync will result in performance hit while
not freeing significant amount of memory.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Track the pool's hot ICM memory when freeing/allocating
chunk, so that when checking if the sync is required, just
check if the pool hot memory has reached the sync threshold.
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When freeing chunks, we want to sync the steering
so that all the "hot" memory will be written to ICM
and all the chunks that are in the hot_list will be
actually destroyed.
When allocating from the pool, we don't have a need
to sync the steering, as we're not freeing anything,
and sync might just hurt the performance in terms of
flow-per-second offloaded.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|