git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-10-31	net: ftmac100: prepare data path for receiving single segment packets > 1514	Vladimir Oltean
	Eliminate one check in the data path and move it elsewhere, to where our real limitation is. We'll want to start processing "too long" frames in the driver (currently there is a hardware MAC setting which drops theses). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20221028183220.155948-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31	net: dpaa2: Add some debug prints on deferred probe	Sean Anderson
	When this device is deferred, there is often no way to determine what the cause was. Add some debug prints to make it easier to figure out what is blocking the probe. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/20221027190005.400839-1-sean.anderson@seco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31	net: mvneta: Remove unused variable i	Colin Ian King
	Variable i is just being incremented and it's never used anywhere else. The variable and the increment are redundant so remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: xgbe: convert to .adjfine and adjust_by_scaled_ppm	Jacob Keller
	The xgbe implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use adjust_by_scaled_ppm to calculate the new addend value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: ravb: convert to .adjfine and adjust_by_scaled_ppm	Jacob Keller
	The ravb implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use the adjust_by_scaled_ppm helper function to calculate the new addend. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Sergey Shtylyov <s.shtylyov@omp.ru> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Cc: linux-renesas-soc@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: lan743x: use diff_by_scaled_ppm in .adjfine implementation	Jacob Keller
	Update the lan743x driver to use the recently added diff_by_scaled_ppm helper function. This reduces the amount of code required in lan743x_ptp.c driver file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: lan743x: remove .adjfreq implementation	Jacob Keller
	The lan743x driver implements both .adjfreq and .adjfine, but the core PTP subsystem prefers .adjfine if implemented. There is no reason to carry a .adjfreq implementation, so we can remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: mlx5: convert to .adjfine and adjust_by_scaled_ppm	Jacob Keller
	The mlx5 implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this to the .adjfine interface and use adjust_by_scaled_ppm for the calculation of the new mult value. Note that the mlx5_ptp_adjfreq_real_time function expects input in terms of ppb, so use the scaled_ppm_to_ppb to convert before passing to this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Shirly Ohnona <shirlyo@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Gal Pressman <gal@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Aya Levin <ayal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: mlx4: convert to .adjfine and adjust_by_scaled_ppm	Jacob Keller
	The mlx4 implementation of .adjfreq is implemented in terms of a straight forward "base * ppb / 1 billion" calculation. Convert this driver to .adjfine and use adjust_by_scaled_ppm to perform the calculation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	ptp: introduce helpers to adjust by scaled parts per million	Jacob Keller
	Many drivers implement the .adjfreq or .adjfine PTP op function with the same basic logic: 1. Determine a base frequency value 2. Multiply this by the abs() of the requested adjustment, then divide by the appropriate divisor (1 billion, or 65,536 billion). 3. Add or subtract this difference from the base frequency to calculate a new adjustment. A few drivers need the difference and direction rather than the combined new increment value. I recently converted the Intel drivers to .adjfine and the scaled parts per million (65.536 parts per billion) logic. To avoid overflow with minimal loss of precision, mul_u64_u64_div_u64 was used. The basic logic used by all of these drivers is very similar, and leads to a lot of duplicate code to perform the same task. Rather than keep this duplicate code, introduce diff_by_scaled_ppm and adjust_by_scaled_ppm. These helper functions calculate the difference or adjustment necessary based on the scaled parts per million input. The diff_by_scaled_ppm function returns true if the difference should be subtracted, and false otherwise. Update the Intel drivers to use the new helper functions. Other vendor drivers will be converted to .adjfine and this helper function in the following changes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	net: microchip: sparx5: kunit test: change test_callbacks and test_vctrl to ↵	Yang Yingliang
	static test_callbacks and test_vctrl are only used in vcap_api_kunit.c now, change them to static. Fixes: 67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	net: fec: fix improper use of NETDEV_TX_BUSY	Zhang Changzhong
	The ndo_start_xmit() method must not free skb when returning NETDEV_TX_BUSY, since caller is going to requeue freed skb. Fix it by returning NETDEV_TX_OK in case of dma_map_single() fails. Fixes: 79f339125ea3 ("net: fec: Add software TSO support") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	net: hns: hnae: remove unnecessary __module_get() and module_put()	Yang Yingliang
	hnae_ae_register() is called from hns_dsaf_probe(), the refcount of module hnae has already be got in resolve_symbol() while calling the function, so the __module_get()/module_put() can be removed. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31	drivers: net: convert to boolean for the mac_managed_pm flag	Denis Kirjanov
	Signed-off-by: Dennis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28	Merge tag 'mlx5-updates-2022-10-24' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-10-24 SW steering updates from Yevgeny Kliteynik: 1) 1st Four patches: small fixes / optimizations for SW steering: - Patch 1: Don't abort destroy flow if failed to destroy table - continue and free everything else. - Patches 2 and 3 deal with fast teardown: + Skip sync during fast teardown, as PCI device is not there any more. + Check device state when polling CQ - otherwise SW steering keeps polling the CQ forever, because nobody is there to flush it. - Patch 4: Removing unneeded function argument. 2) Deal with the hiccups that we get during rules insertion/deletion, which sometimes reach 1/4 of a second. While insertion/deletion rate improvement was not the focus here, it still is a by-product of removing these hiccups. Another by-product is the reduced standard deviation in measuring the duration of rules insertion/deletion bursts. In the testing we add K rules (warm-up phase), and then continuously do insertion/deletion bursts of N rules. During the test execution, the driver measures hiccups (amount and duration) and total time for insertion/deletion of a batch of rules. Here are some numbers, before and after these patches: +--------------------------------------------+-----------------+----------------+ \| \| Create rules \| Delete rules \| \| +--------+--------+--------+-------+ \| \| Before \| After \| Before \| After \| +--------------------------------------------+--------+--------+--------+-------+ \| Max hiccup [msec] \| 253 \| 42 \| 254 \| 68 \| +--------------------------------------------+--------+--------+--------+-------+ \| Avg duration of 10K rules add/remove [msec]\| 140.07 \| 124.32 \| 106.99 \| 99.51 \| +--------------------------------------------+--------+--------+--------+-------+ \| Num of hiccups per 100K rules add/remove \| 7.77 \| 7.97 \| 12.60 \| 11.57 \| +--------------------------------------------+--------+--------+--------+-------+ \| Avg hiccup duration [msec] \| 36.92 \| 33.25 \| 36.15 \| 33.74 \| +--------------------------------------------+--------+--------+--------+-------+ - Patch 5: Allocate a short array on stack instead of dynamically- it is destroyed at the end of the function. - Patch 6: Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation. Chunk destruction tend to come in large batches (during pool syncing), so instead of doing huge memory initialization during pool sync, we amortize this by doing small initsializations on chunk creation. - Patch 7: In order to simplifies error flow and allows cleaner addition of new pools, handle creation/destruction of all the domain's memory pools and other memory-related fields in a separate init/uninit functions. - Patch 8: During rehash, write each table row immediately instead of waiting for the whole table to be ready and writing it all - saves allocations of ste_send_info structures and improves performance. - Patch 9: Instead of allocating/freeing send info objects dynamically, manage them in pool. The number of send info objects doesn't depend on number of rules, so after pre-populating the pool with an initial batch of send info objects, the pool is not expected to grow. This way we save alloc/free during writing STEs to ICM, which by itself can sometimes take up to 40msec. - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered the alloc/free "hiccups" frequency. - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator. - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have more syncs, each sync will be shorter and will help with insertion rate stability. Also, notice that the overall number of hiccups wasn't increased due to all the other patches. - Patch 13: Keep track of hot ICM chunks in an array instead of list. After steering sync, we traverse the hot list and finally free all the chunks. It appears that traversing a long list takes unusually long time due to cache misses on many entries, which causes a big "hiccup" during rule insertion. This patch replaces the list with pre-allocated array that stores only the bookkeeping information that is needed to later free the chunks in its buddy allocator. - Patch 14: Remove the unneeded buddy used_list - we don't need to have the list of used chunks, we only need the total amount of used memory. * tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Remove the buddy used_list net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list net/mlx5: DR, Lower sync threshold for ICM hot memory net/mlx5: DR, Allocate htbl from its own slab allocator net/mlx5: DR, Allocate icm_chunks from their own slab allocator net/mlx5: DR, Manage STE send info objects in pool net/mlx5: DR, In rehash write the line in the entry immediately net/mlx5: DR, Handle domain memory resources init/uninit separately net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy net/mlx5: DR, Check device state when polling CQ net/mlx5: DR, Fix the SMFS sync_steering for fast teardown net/mlx5: DR, In destroy flow, free resources even if FW command failed ==================== Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	ice: Add additional CSR registers to ETHTOOL_GREGS	Lukasz Czapnik
	In the event of a Tx hang it can be useful to read a variety of hardware registers to capture some state about why the transmit queue got stuck. Extend the ETHTOOL_GREGS dump provided by the ice driver with several CSR registers that provide such relevant information regarding the hardware Tx state. This enables capturing relevant data to enable debugging such a Tx hang. Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20221027104239.1691549-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: ethernet: adi: adin1110: Fix notifiers	Alexandru Tachici
	ADIN1110 was registering netdev_notifiers on each device probe. This leads to warnings/probe failures because of double registration of the same notifier when to adin1110/2111 devices are connected to the same system. Move the registration of netdev_notifiers in module init call, in this way multiple driver instances can use the same notifiers. Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Link: https://lore.kernel.org/r/20221027095655.89890-2-alexandru.tachici@analog.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: add support for in-band 802.3z negotiation	Russell King (Oracle)
	As a result of help from Frank Wunderlich to investigate and test, we now know how to program this PCS for in-band 802.3z negotiation. Add support for this by moving the contents of the two functions into the common mtk_pcs_config() function and adding the register settings for 802.3z negotiation. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: move and correct link timer programming	Russell King (Oracle)
	Program the link timer appropriately for the interface mode being used, using the newly introduced phylink helper that provides the nanosecond link timer interval. The intervals are 1.6ms for SGMII based protocols and 10ms for 802.3z based protocols. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: add advertisement programming	Russell King (Oracle)
	Program the advertisement into the mtk PCS block. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: move interface speed selection	Russell King (Oracle)
	Move the selection of the underlying interface speed to the pcs_config function, so we always program the interface speed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: move PHY power up	Russell King (Oracle)
	The PHY power up is common to both configuration paths, so move it into the parent function. We need to do this for all serdes modes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: add out of band forcing of speed and duplex in pcs_link_up	Russell King (Oracle)
	Add support for forcing the link speed and duplex setting in the pcs_link_up() method for out of band modes, which will be useful when we finish converting the pcs_config() method. Until then, we still have to force duplex for 802.3z modes to work correctly. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: convert mtk_sgmii to use regmap_update_bits()	Russell King (Oracle)
	mtk_sgmii does a lot of read-modify-write operations, for which there is a specific regmap function. Use this function instead of open-coding the operations. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: add pcs_get_state() implementation	Russell King (Oracle)
	Add a pcs_get_state() implementation which uses the advertisements to compute the resulting link modes, and BMSR contents to determine negotiation and link status. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: eliminate unnecessary error handling	Russell King (Oracle)
	The functions called by the pcs_config() method always return zero, so there is no point trying to handle an error from these functions. Make these functions void, eliminate the "err" variable and simply return zero from the pcs_config() function itself. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: mtk_eth_soc: add definitions for PCS	Russell King (Oracle)
	As a result of help from Frank Wunderlich to investigate and test, we know a bit more about the PCS on the Mediatek platforms. Update the definitions from this investigation. This PCS appears similar, but not identical to the Lynx PCS. Although not included in this patch, but for future reference, the PHY ID registers at offset 4 read as 0x4d544950 'MTIP'. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).	Thomas Gleixner
	Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	net: emaclite: update reset_lock member documentation	Radhey Shyam Pandey
	Instead of generic description, mention what reset_lock actually protects i.e. lock to serialize xmit and tx_timeout execution. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28	net: txgbe: Set MAC address and register netdev	Jiawen Wu
	Add MAC address related operations, and register netdev. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28	net: txgbe: Reset hardware	Jiawen Wu
	Reset and initialize the hardware by configuring the MAC layer. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28	net: txgbe: Store PCI info	Jiawen Wu
	Get PCI config space info, set LAN id and check flash status. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-27	net: dpaa2-eth: Simplify bool conversion	Yang Li
	./drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c:453:42-47: WARNING: conversion to bool not needed here Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2577 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20221026051824.38730-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	ionic: refactor use of ionic_rx_fill()	Neel Patel
	The same pre-work code is used before each call to ionic_rx_fill(), so bring it in and make it a part of the routine. Signed-off-by: Neel Patel <neel@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	ionic: enable tunnel offloads	Neel Patel
	Support stateless offloads for GRE, VXLAN, GENEVE, IPXIP4 and IPXIP6 when the FW supports them. Signed-off-by: Neel Patel <neel@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	ionic: new ionic device identity level and VF start control	Shannon Nelson
	A new ionic dev_cmd is added to the interface in ionic_if.h, with a new capabilities field in the ionic device identity to signal its availability in the FW. The identity level code is incremented to '2' to show support for this new capabilities bitfield. If the driver has indicated with the new identity level that it has the VF_CTRL command, newer FW will wait for the start command before starting the VFs after a FW update or crash recovery. This patch updates the driver to make use of the new VF start control in fw_up path to be sure that the PF has set the user attributes on the VF before the FW allows the VFs to restart. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	ionic: only save the user set VF attributes	Shannon Nelson
	Report the current FW values for the VF attributes, but don't save the FW values locally, only save the vf attributes that are given to us from the user. This allows us to replay user data, and doesn't end up confusing things like "who set the mac address". Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	ionic: replay VF attributes after fw crash recovery	Shannon Nelson
	The VF attributes that the user has set into the FW through the PF can be lost over a FW crash recovery. Much like we already replay the PF mac/vlan filters, we now add a replay in the recovery path to be sure the FW has the up-to-date VF configurations. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c 2871edb32f46 ("can: kvaser_usb: Fix possible completions during init_completion") abb8670938b2 ("can: kvaser_usb_leaf: Ignore stale bus-off after start") 8d21f5927ae6 ("can: kvaser_usb_leaf: Fix improved state not being reported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net: enetc: survive memory pressure without crashing	Vladimir Oltean
	Under memory pressure, enetc_refill_rx_ring() may fail, and when called during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not checked for. An extreme case of memory pressure will result in exactly zero buffers being allocated for the RX ring, and in such a case it is expected that hardware drops all RX packets due to lack of buffers. This does not happen, because the reset-default value of the consumer and produces index is 0, and this makes the ENETC think that all buffers have been initialized and that it owns them (when in reality none were). The hardware guide explains this best: \| Configure the receive ring producer index register RBaPIR with a value \| of 0. The producer index is initially configured by software but owned \| by hardware after the ring has been enabled. Hardware increments the \| index when a frame is received which may consume one or more BDs. \| Hardware is not allowed to increment the producer index to match the \| consumer index since it is used to indicate an empty condition. The ring \| can hold at most RBLENR[LENGTH]-1 received BDs. \| \| Configure the receive ring consumer index register RBaCIR. The \| consumer index is owned by software and updated during operation of the \| of the BD ring by software, to indicate that any receive data occupied \| in the BD has been processed and it has been prepared for new data. \| - If consumer index and producer index are initialized to the same \| value, it indicates that all BDs in the ring have been prepared and \| hardware owns all of the entries. \| - If consumer index is initialized to producer index plus N, it would \| indicate N BDs have been prepared. Note that hardware cannot start if \| only a single buffer is prepared due to the restrictions described in \| (2). \| - Software may write consumer index to match producer index anytime \| while the ring is operational to indicate all received BDs prior have \| been processed and new BDs prepared for hardware. Normally, the value of rx_ring->rcir (consumer index) is brought in sync with the rx_ring->next_to_use software index, but this only happens if page allocation ever succeeded. When PI==CI==0, the hardware appears to receive frames and write them to DMA address 0x0 (?!), then set the READY bit in the BD. The enetc_clean_rx_ring() function (and its XDP derivative) is naturally not prepared to handle such a condition. It will attempt to process those frames using the rx_swbd structure associated with index i of the RX ring, but that structure is not fully initialized (enetc_new_page() does all of that). So what happens next is undefined behavior. To operate using no buffer, we must initialize the CI to PI + 1, which will block the hardware from advancing the CI any further, and drop everything. The issue was seen while adding support for zero-copy AF_XDP sockets, where buffer memory comes from user space, which can even decide to supply no buffers at all (example: "xdpsock --txonly"). However, the bug is present also with the network stack code, even though it would take a very determined person to trigger a page allocation failure at the perfect time (a series of ifup/ifdown under memory pressure should eventually reproduce it given enough retries). Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: Fix macsec sci endianness at rx sa update	Raed Salem
	The cited commit at rx sa update operation passes the sci object attribute, in the wrong endianness and not as expected by the HW effectively create malformed hw sa context in case of update rx sa consequently, HW produces unexpected MACsec packets which uses this sa. Fix by passing sci to create macsec object with the correct endianness, while at it add __force u64 to prevent sparse check error of type "sparse: error: incorrect type in assignment". Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-16-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function	Raed Salem
	The cited commit produces a sparse check error of type "sparse: error: restricted __be64 degrades to integer". The offending line wrongly did a bitwise operation between two different storage types one of 64 bit when the other smaller side is 16 bit which caused the above sparse error, furthermore bitwise operation usage here is wrong in the first place as the constant MACSEC_PORT_ES is not a bitwise field. Fix by using the right mask to get the lower 16 bit if the sci number, and use comparison operator '==' instead of bitwise '&' operator. Fixes: 3b20949cb21b ("net/mlx5e: Add MACsec RX steering rules") Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-15-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: Fix macsec rx security association (SA) update/delete	Raed Salem
	The cited commit adds the support for update/delete MACsec Rx SA, naturally, these operations need to check if the SA in question exists to update/delete the SA and return error code otherwise, however they do just the opposite i.e. return with error if the SA exists Fix by change the check to return error in case the SA in question does not exist, adjust error message and code accordingly. Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-14-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: Fix macsec coverity issue at rx sa update	Raed Salem
	The cited commit at update rx sa operation passes object attributes to MACsec object create function without initializing/setting all attributes fields leaving some of them with garbage values, therefore violating the implicit assumption at create object function, which assumes that all input object attributes fields are set. Fix by initializing the object attributes struct to zero, thus leaving unset fields with the legal zero value. Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-13-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5: Fix crash during sync firmware reset	Suresh Devarakonda
	When setting Bluefield to DPU NIC mode using mlxconfig tool + sync firmware reset flow, we run into scenario where the host was not eswitch manager at the time of mlx5 driver load but becomes eswitch manager after the sync firmware reset flow. This results in null pointer access of mpfs structure during mac filter add. This change prevents null pointer access but mpfs table entries will not be added. Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate") Signed-off-by: Suresh Devarakonda <ramad@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5: Update fw fatal reporter state on PCI handlers successful recover	Roy Novich
	Update devlink health fw fatal reporter state to "healthy" is needed by strictly calling devlink_health_reporter_state_update() after recovery was done by PCI error handler. This is needed when fw_fatal reporter was triggered due to PCI error. Poll health is called and set reporter state to error. Health recovery failed (since EEH didn't re-enable the PCI). PCI handlers keep on recover flow and succeed later without devlink acknowledgment. Fix this by adding devlink state update at the end of the PCI handler recovery process. Fixes: 6181e5cb752e ("devlink: add support for reporter recovery completion") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed	Roi Dayan
	On multi table split the driver creates a new attr instance with data being copied from prev attr instance zeroing action flags. Also need to reset dests properties to avoid incorrect dests per attr. Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5e: TC, Reject forwarding from internal port to internal port	Ariel Levkovich
	Reject TC rules that forward from internal port to internal port as it is not supported. This include rules that are explicitly have internal port as the filter device as well as rules that apply on tunnel interfaces as the route device for the tunnel interface can be an internal port. Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5: Fix possible use-after-free in async command interface	Tariq Toukan
	mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb </TASK> Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net/mlx5: ASO, Create the ASO SQ with the correct timestamp format	Saeed Mahameed
	mlx5 SQs must select the timestamp format explicitly according to the active clock mode, select the current active timestamp mode so ASO SQ create will succeed. This fixes the following error prints when trying to create ipsec ASO SQ while the timestamp format is real time mode. mlx5_cmd_out_err:778:(pid 34874): CREATE_SQ(0x904) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xd61c0b), err(-22) mlx5_aso_create_sq:285:(pid 34874): Failed to open aso wq sq, err=-22 mlx5e_ipsec_init:436:(pid 34874): IPSec initialization failed, -22 Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>