summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2024-09-10i2c: imx: Switch to RUNTIME_PM_OPS()Fabio Estevam
Replace SET_RUNTIME_PM_OPS() with its modern RUNTIME_PM_OPS() alternative. The combined usage of pm_ptr() and RUNTIME_PM_OPS() allows the compiler to evaluate if the runtime suspend/resume() functions are used at build time or are simply dead code. This allows removing the __maybe_unused notation from the runtime suspend/resume() functions. Signed-off-by: Fabio Estevam <festevam@denx.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-10i2c: mt65xx: Avoid double initialization of restart_flag in isrAngeloGioacchino Del Regno
In the mtk_i2c_irq() handler, variable restart_flag is initialized to zero and then reassigned with I2C_RS_TRANSFER if and only if auto_restart is enabled. Avoid a double initialization of this variable by transferring the auto_restart check to the restart_flag declaration. This commit brings no functional changes. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-10i2c: don't use ',' after delimitersWolfram Sang
Delimiters are meant to be last, no need for a ',' there. Remove a superfluous newline in the ali1535 driver while here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-10i2c: designware: Fix wrong setting for {ss,fs,hs}_{h,l}cnt registersAdrian Huang
When disabling CONFIG_X86_AMD_PLATFORM_DEVICE option, the driver 'drivers/acpi/acpi_apd.c' won't be compiled. This leads to a situation where BMC (Baseboard Management Controller) cannot retrieve the memory temperature via the i2c interface after i2c DW driver is loaded. Note that BMC can retrieve the memory temperature before booting into OS. [Debugging Detail] 1. dev->pclk and dev->clk are NULL when calling devm_clk_get_optional() in dw_i2c_plat_probe(). 2. The callings of i2c_dw_scl_hcnt() in i2c_dw_set_timings_master() return 65528 (-8 in integer format) or 65533 (-3 in integer format). The following log shows SS's HCNT/LCNT: i2c_designware AMDI0010:01: Standard Mode HCNT:LCNT = 65533:65535 3. The callings of i2c_dw_scl_lcnt() in i2c_dw_set_timings_master() return 65535 (-1 in integer format). The following log shows SS's HCNT/LCNT: i2c_designware AMDI0010:01: Fast Mode HCNT:LCNT = 65533:65535 4. i2c_dw_init_master() configures the register IC_SS_SCL_HCNT with the value 65533. However, the DW i2c databook mentioned the value cannot be higher than 65525. Quote from the DW i2c databook: NOTE: This register must not be programmed to a value higher than 65525, because DW_apb_i2c uses a 16-bit counter to flag an I2C bus idle condition when this counter reaches a value of IC_SS_SCL_HCNT + 10. 5. Since ss_hcnt, ss_lcnt, fs_hcnt, and fs_lcnt are the invalid values, we should not write the corresponding registers. Fix the issue by reading dev->{ss,fs,hs}_hcnt and dev->{ss,fs,hs}_lcnt from HW registers if ic_clk is not set. Reported-by: Dong Wang <wangdong28@lenovo.com> Suggested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Tested-by: Dong Wang <wangdong28@lenovo.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/linux-i2c/8295cbe1-a7c5-4a35-a189-5d0bff51ede6@linux.intel.com/
2024-09-09clk: rockchip: remove unused mclk_pdm0_p/pdm0_p definitionsArnd Bergmann
When -Wunused-const-variable is enabled (not the default), there is a warning about two definitions in this file: In file included from drivers/clk/rockchip/clk-rk3576.c:14: drivers/clk/rockchip/clk-rk3576.c:334:7: error: 'mclk_pdm0_p' defined but not used [-Werror=unused-const-variable=] 334 | PNAME(mclk_pdm0_p) = { "mclk_pdm0_src_top", "xin24m" }; | ^~~~~~~~~~~ drivers/clk/rockchip/clk.h:564:43: note: in definition of macro 'PNAME' 564 | #define PNAME(x) static const char *const x[] __initconst | ^ drivers/clk/rockchip/clk-rk3576.c:333:7: error: 'pdm0_p' defined but not used [-Werror=unused-const-variable=] 333 | PNAME(pdm0_p) = { "clk_pdm0_src_top", "xin24m" }; | ^~~~~~ drivers/clk/rockchip/clk.h:564:43: note: in definition of macro 'PNAME' 564 | #define PNAME(x) static const char *const x[] __initconst | ^ Remove them for the moment. If they are needed later, they can be added back at that point. Fixes: cc40f5baa91b ("clk: rockchip: Add clock controller for the RK3576") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240909121116.254036-1-arnd@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-09-09clk: qcom: clk-alpha-pll: Simplify the zonda_pll_adjust_l_val()Satya Priya Kakitapalli
In zonda_pll_adjust_l_val() replace the divide operator with comparison operator to fix below build error and smatch warning. drivers/clk/qcom/clk-alpha-pll.o: In function `clk_zonda_pll_set_rate': clk-alpha-pll.c:(.text+0x45dc): undefined reference to `__aeabi_uldivmod' smatch warnings: drivers/clk/qcom/clk-alpha-pll.c:2129 zonda_pll_adjust_l_val() warn: replace divide condition '(remainder * 2) / prate' with '(remainder * 2) >= prate' Fixes: f4973130d255 ("clk: qcom: clk-alpha-pll: Update set_rate for Zonda PLL") Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202408110724.8pqbpDiD-lkp@intel.com/ Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Link: https://lore.kernel.org/r/20240906113905.641336-1-quic_skakitap@quicinc.com Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-09-09idpf: enable WB_ON_ITRJoshua Hay
Tell hardware to write back completed descriptors even when interrupts are disabled. Otherwise, descriptors might not be written back until the hardware can flush a full cacheline of descriptors. This can cause unnecessary delays when traffic is light (or even trigger Tx queue timeout). The example scenario to reproduce the Tx timeout if the fix is not applied: - configure at least 2 Tx queues to be assigned to the same q_vector, - generate a huge Tx traffic on the first Tx queue - try to send a few packets using the second Tx queue. In such a case Tx timeout will appear on the second Tx queue because no completion descriptors are written back for that queue while interrupts are disabled due to NAPI polling. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: fix netdev Tx queue stop/wakeMichal Kubiak
netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: refactor Tx completion routinesJoshua Hay
Add a mechanism to guard against stashing partial packets into the hash table to make the driver more robust, with more efficient decision making when cleaning. Don't stash partial packets. This can happen when an RE (Report Event) completion is received in flow scheduling mode, or when an out of order RS (Report Status) completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field libeth_sqe::nr_frags to track the number of fragments/tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TxQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the Tx ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: convert to libeth Tx buffer completionAlexander Lobakin
&idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libeth helpers to do it properly and reduce the copy-paste around the Tx code. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09regulator: tps6287x: Constify struct regulator_descChristophe JAILLET
'struct regulator_desc' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 4974 736 16 5726 165e drivers/regulator/tps6287x-regulator.o After: ===== text data bss dec hex filename 5294 416 16 5726 165e drivers/regulator/tps6287x-regulator.o -- Compile tested only Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/7727e493490d37775a653905dfe0cc1d8478f8e0.1725908163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-09regulator: wm8400: Constify struct regulator_descChristophe JAILLET
'struct regulator_desc' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 4419 2512 0 6931 1b13 drivers/regulator/wm8400-regulator.o After: ===== text data bss dec hex filename 6307 624 0 6931 1b13 drivers/regulator/wm8400-regulator.o -- Compile tested only Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/fde33ecfd9bbdbdc1da1620c9f3b1b7a72f9d805.1725906876.git.christophe.jaillet@wanadoo.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-09net/mlx5: Fix bridge mode operations when there are no VFsBenjamin Poirier
Currently, trying to set the bridge mode attribute when numvfs=0 leads to a crash: bridge link set dev eth2 hwmode vepa [ 168.967392] BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] [ 168.969989] RIP: 0010:mlx5_add_flow_rules+0x1f/0x300 [mlx5_core] [...] [ 168.976037] Call Trace: [ 168.976188] <TASK> [ 168.978620] _mlx5_eswitch_set_vepa_locked+0x113/0x230 [mlx5_core] [ 168.979074] mlx5_eswitch_set_vepa+0x7f/0xa0 [mlx5_core] [ 168.979471] rtnl_bridge_setlink+0xe9/0x1f0 [ 168.979714] rtnetlink_rcv_msg+0x159/0x400 [ 168.980451] netlink_rcv_skb+0x54/0x100 [ 168.980675] netlink_unicast+0x241/0x360 [ 168.980918] netlink_sendmsg+0x1f6/0x430 [ 168.981162] ____sys_sendmsg+0x3bb/0x3f0 [ 168.982155] ___sys_sendmsg+0x88/0xd0 [ 168.985036] __sys_sendmsg+0x59/0xa0 [ 168.985477] do_syscall_64+0x79/0x150 [ 168.987273] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 168.987773] RIP: 0033:0x7f8f7950f917 (esw->fdb_table.legacy.vepa_fdb is null) The bridge mode is only relevant when there are multiple functions per port. Therefore, prevent setting and getting this setting when there are no VFs. Note that after this change, there are no settings to change on the PF interface using `bridge link` when there are no VFs, so the interface no longer appears in the `bridge link` output. Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Verify support for scheduling element and TSAR typeCarolina Jubran
Before creating a scheduling element in a NIC or E-Switch scheduler, ensure that the requested element type is supported. If the element is of type Transmit Scheduling Arbiter (TSAR), also verify that the specific TSAR type is supported. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Fixes: 85c5f7c9200e ("net/mlx5: E-switch, Create QoS on demand") Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Explicitly set scheduling element and TSAR typeCarolina Jubran
Ensure the scheduling element type and TSAR type are explicitly initialized in the QoS rate group creation. This prevents potential issues due to default values. Fixes: 1ae258f8b343 ("net/mlx5: E-switch, Introduce rate limiting groups API") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5e: Add missing link mode to ptys2ext_ethtool_mapShahar Shitrit
Add MLX5E_400GAUI_8_400GBASE_CR8 to the extended modes in ptys2ext_ethtool_table, since it was missing. Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5e: Add missing link modes to ptys2ethtool_mapShahar Shitrit
Add MLX5E_1000BASE_T and MLX5E_100BASE_TX to the legacy modes in ptys2legacy_ethtool_table, since they were missing. Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Update the list of the PCI supported devicesMaher Sanalla
Add the upcoming ConnectX-9 device ID to the table of supported PCI device IDs. Fixes: f908a35b2218 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09ieee802154: Fix build errorJinjie Ruan
If REGMAP_SPI is m and IEEE802154_MCR20A is y, mcr20a.c:(.text+0x3ed6c5b): undefined reference to `__devm_regmap_init_spi' ld: mcr20a.c:(.text+0x3ed6cb5): undefined reference to `__devm_regmap_init_spi' Select REGMAP_SPI for IEEE802154_MCR20A to fix it. Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/20240909131740.1296608-1-ruanjinjie@huawei.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-09-09cxl/region: Remove lock from memory notifier callbackIra Weiny
In testing Dynamic Capacity Device (DCD) support, a lockdep splat revealed an ABBA issue between the memory notifiers and the DCD extent processing code.[0] Changing the lock ordering within DCD proved difficult because regions must be stable while searching for the proper region and then the device lock must be held to properly notify the DAX region driver of memory changes. Dan points out in the thread that notifiers should be able to trust that it is safe to access static data. Region data is static once the device is realized and until it's destruction. Thus it is better to manage the notifiers within the region driver. Remove the need for a lock by ensuring the notifiers are active only during the region's lifetime. Furthermore, remove cxl_region_nid() because resource can't be NULL while the region is stable. Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0] Cc: Ying Huang <ying.huang@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20240904-fix-notifiers-v3-1-576b4e950266@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init()Yanfei Xu
Cases can be divided into two categories which are DVSEC range enabled and not enabled when HDM decoders exist but is not enabled. To avoid checking info->mem_enabled, which indicates the enablement of DVSEC range, every time, we can check !info->mem_enabled once in advance. This simplification can make the code clearer. No functional change intended. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240828084231.1378789-5-yanfei.xu@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09cxl/pci: Check Mem_info_valid bit for each applicable DVSECYanfei Xu
In theory a device might set the mem_info_valid bit for a first range after it is ready but before as second range has reached that state. Therefore, the correct approach is to check the Mem_info_valid bit for each applicable DVSEC range against HDM_COUNT, rather than only for the DVSEC range 1. Consequently, let's move the check into the "for loop" that handles each DVSEC range. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240828084231.1378789-4-yanfei.xu@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09cxl/pci: Remove duplicated implementation of waiting for memory_info_validYanfei Xu
commit ce17ad0d5498 ("cxl: Wait Memory_Info_Valid before access memory related info") added another implementation, which is cxl_dvsec_mem_range_valid(), of waiting for memory_info_valid without realizing it duplicated wait_for_valid(). Remove wait_for_valid() and retain cxl_dvsec_mem_range_valid() as the former is hardcoded to check only the Memory_Info_Valid bit of DVSEC range 1, while the latter allows for selection between DVSEC range 1 or 2 via parameter. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240828084231.1378789-3-yanfei.xu@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09cxl/pci: Fix to record only non-zero rangesYanfei Xu
The function cxl_dvsec_rr_decode() retrieves and records DVSEC ranges into info->dvsec_range[], regardless of whether it is non-zero range, and the variable info->ranges indicates the number of non-zero ranges. However, in cxl_hdm_decode_init(), the validation for info->dvsec_range[] occurs in a for loop that iterates based on info->ranges. It may result in zero range to be validated but non-zero range not be validated, in turn, the number of allowed ranges is to be 0. Address it by only record non-zero ranges. This fix is not urgent as it requires a configuration that zeroes out the first dvsec range while populating the second. This has not been observed, but it is theoretically possible. If this gets picked up for -stable, no harm done, but there is no urgency to backport. Fixes: 560f78559006 ("cxl/pci: Retrieve CXL DVSEC memory info") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240828084231.1378789-2-yanfei.xu@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09RDMA/bnxt_re: Fix the max WQE size for static WQE supportSelvin Xavier
When variable size WQE is supported, max_qp_sges reported is more than 6. For devices that supports variable size WQE, the Send WQE size calculation is wrong when an an older library that doesn't support variable size WQE is used. Set the WQE size to 128 when static WQE is supported. Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/bnxt_re: Fix the compatibility flag for variable size WQESelvin Xavier
For older adapters that doesn't support variable size WQE, driver is wrongly reporting that variable WQE is supported, when the latest library is used. Report the variable WQE capability only if the driver supports it. Fixes: 10a104c0debb ("RDMA/bnxt_re: Enable variable size WQEs for user space applications") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Fix MR cache temp entries cleanupMichael Guralnik
Fix the cleanup of the temp cache entries that are dynamically created in the MR cache. The cleanup of the temp cache entries is currently scheduled only when a new entry is created. Since in the cleanup of the entries only the mkeys are destroyed and the cache entry stays in the cache, subsequent registrations might reuse the entry and it will eventually be filled with new mkeys without cleanup ever getting scheduled again. On workloads that register and deregister MRs with a wide range of properties we see the cache ends up holding many cache entries, each holding the max number of mkeys that were ever used through it. Additionally, as the cleanup work is scheduled to run over the whole cache, any mkey that is returned to the cache after the cleanup was scheduled will be held for less than the intended 30 seconds timeout. Solve both issues by dropping the existing remove_ent_work and reusing the existing per-entry work to also handle the temp entries cleanup. Schedule the work to run with a 30 seconds delay every time we push an mkey to a clean temp entry. This ensures the cleanup runs on each entry only 30 seconds after the first mkey was pushed to an empty entry. As we have already been distinguishing between persistent and temp entries when scheduling the cache_work_func, it is not being scheduled in any other flows for the temp entries. Another benefit from moving to a per-entry cleanup is we now not required to hold the rb_tree mutex, thus enabling other flow to run concurrently. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Limit usage of over-sized mkeys from the MR cacheMichael Guralnik
When searching the MR cache for suitable cache entries, don't use mkeys larger than twice the size required for the MR. This should ensure the usage of mkeys closer to the minimal required size and reduce memory waste. On driver init we create entries for mkeys with clear attributes and powers of 2 sizes from 4 to the max supported size. This solves the issue for anyone using mkeys that fit these requirements. In the use case where an MR is registered with different attributes, like an access flag we can't UMR, we'll create a new cache entry to store it upon dereg. Without this fix, any later registration with same attributes and smaller size will use the newly created cache entry and it's mkeys, disregarding the memory waste of using mkeys larger than required. For example, one worst-case scenario can be when registering and deregistering a 1GB mkey with ATS enabled which will cause the creation of a new cache entry to hold those type of mkeys. A user registering a 4k MR with ATS will end up using the new cache entry and an mkey that can support a 1GB MR, thus wasting x250k memory than actually needed in the HW. Additionally, allow all small registration to use the smallest size cache entry that is initialized on driver load even if size is larger than twice the required size. Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/8ba3a6e3748aace2026de8b83da03aba084f78f4.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Fix counter update on MR cache mkey creationMichael Guralnik
After an mkey is created, update the counter for pending mkeys before reshceduling the work that is filling the cache. Rescheduling the work with a full MR cache entry and a wrong 'pending' counter will cause us to miss disabling the fill_to_high_water flag. Thus leaving the cache full but with an indication that it's still needs to be filled up to it's full size (2 * limit). Next time an mkey will be taken from the cache, we'll unnecessarily continue the process of filling the cache to it's full size. Fixes: 57e7071683ef ("RDMA/mlx5: Implement mkeys management via LIFO queue") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/0f44f462ba22e45f72cb3d0ec6a748634086b8d0.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Drop redundant work canceling from clean_keys()Michael Guralnik
The canceling of dealyed work in clean_keys() is a leftover from years back and was added to prevent races in the cleanup process of MR cache. The cleanup process was rewritten a few years ago and the canceling of delayed work and flushing of workqueue was added before the call to clean_keys(). Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/943d21f5a9dba7b98a3e1d531e3561ffe9745d71.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/erdma: Return QP state in erdma_query_qpCheng Xu
Fix qp_state and cur_qp_state to return correct values in struct ib_qp_attr. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/erdma: Add disassociate ucontext supportCheng Xu
All IO pages mapped to user space are handled by rdma_user_mmap_io, so add empty stub for disassociate ucontext. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/erdma: Refactor the initialization and destruction of EQCheng Xu
We extracted the common parts of the initialization/destruction process to make the code cleaner. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Enable ATS when allocating kernel MRsMaher Sanalla
When creating kernel MRs, it is not definitive whether they will be used for peer-to-peer transactions or for other usecases, since address mapping is performed only after the MR is created. Since peer-to-peer transactions benefit significantly from ATS performance-wise, enable ATS on newly-allocated kernel MRs when supported. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Link: https://patch.msgid.link/fafd4c9f14cf438d2882d88649c2947e1d05d0b4.1725273403.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09net/mlx5: HWS, added API and enabled HWS supportYevgeny Kliteynik
Enabling HWS support in the mlx5 driver: - added HWS API header - added HWS files in the mlx5 driver makefile - added kconfig flag that enables HWS compilation Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added send engine and context handlingYevgeny Kliteynik
Added implementation of send engine and handling of HWS context. Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added debug dump and internal headersYevgeny Kliteynik
Added debug dump of the existing HWS state, and all the required internal definitions. To dump the HWS state, cat the following debugfs node: cat /sys/kernel/debug/mlx5/<PCI>/steering/fdb/ctx_<ctx_id> Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added backward-compatible API handlingYevgeny Kliteynik
Added implementation of backward-compatible (BWC) steering API. Native HWS API is very different from SWS API: - SWS is synchronous (rule creation/deletion API call returns when the rule is created/deleted), while HWS is asynchronous (it requires polling for completion in order to know when the rule creation/deletion happened) - SWS manages its own memory (it allocates/frees all the needed memory for steering rules, while HWS requires the rules memory to be allocated/freed outside the API In order to make HWS fit the existing fs-core steering API paradigm, this patch adds implementation of backward-compatible (BWC) steering API that has the bahaviour similar to SWS: among others, it encompasses all the rules' memory management and completion polling, presenting the usual synchronous API for the upper layer. A user that wishes to utilize the full speed potential of HWS can call the HWS async API and have rule insertion/deletion batching, lower memory management overhead, and lower CPU utilization. Such approach will be taken by the future Connection Tracking. Note that BWC steering doesn't support yet rules that require more than one match STE - complex rules. This support will be added later on. Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added memory management handlingYevgeny Kliteynik
Added object pools and buddy allocator functionality. Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added vport handlingYevgeny Kliteynik
Vport is a virtual eswitch port that is associated with its virtual function (VF), physical function (PF) or sub-function (SF). This patch adds handling of vports in HWS. Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added modify header pattern and args handlingYevgeny Kliteynik
Packet headers/metadta manipulations are split into two parts: - Header Modify Pattern: an object that describes which fields will be modified and in which way - Header Modify Argument: an object that provides the values to be used for header modification Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added FW commands handlingYevgeny Kliteynik
This patch adds implementation of FW object handling, such as creation/destruction, modification, and querying. Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added matchers functionalityYevgeny Kliteynik
Matcher object encompasses all the building blocks that are needed in order to perform flow steering of a given flow: - flow table that serves as entering point of this matcher - Rule Table Context (RTC) objects to hold ll the Steering Table Entries (STEs), both for matching the flow and for performing actions - rules that describe the set of matching parameters for a flow and actions to perform in case of a hit. This patch adds implementation of matchers handling in HWS. Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added definers handlingYevgeny Kliteynik
The Match Definer combines packet fields and a mask, creating a key which can be used for packet matching during steering flow processing. This patch adds handling of definer objects in HWS. Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added rules handlingYevgeny Kliteynik
Steering rule is a concept that includes match parameters for a flow, and actions to perform on the flows that match these parameters. This patch adds rules handling part of HW Steering. Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added tables handlingYevgeny Kliteynik
Flow tables are SW objects that are comprised of list of matchers, that in turn define the properties of a flow to match on and set of actions to perform on the flows in case of match hit or miss. Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: HWS, added actions handlingYevgeny Kliteynik
When a packet matches a flow, the actions specified for the flow are applied. The supported actions include (but not limited to) the following: - drop: packet processing is stopped - go to vport: packet is forwarded to a specified vport - go to flow table: packet is forwarded to a specified table and processing continues there - push/pop vlan: add/remove vlan header respectively to/from the packet - insert/remove header: add/remove a user-defined header to/from the packet - counter: count the packet bytes in the specified counter - tag: tag the matching flow with a provided tag value - reformat: change the packet format by adding or removing some of its headers - modify header: modify the value of the packet headers with set/add/copy ops - range: match packet on range of values Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Added missing definitions in preparation for HW SteeringYevgeny Kliteynik
As part of preparation for HWS, added missing definitions in qp.h and fs_core.h: - FS_FT_FDB_RX/TX table types that are used by HWS in addition to an existing FS_FT_FDB - MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE that is used by HWS to require fence in WQE Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09net/mlx5: Added missing mlx5_ifc definition for HW SteeringYevgeny Kliteynik
Add mlx5_ifc definitions that are required for HWS support. Note that due to change in the mlx5_ifc_flow_table_context_bits structure that now includes both SWS and HWS bits in a union, this patch also includes small change in one of SWS files that was required for compilation. Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09igb: Always call igb_xdp_ring_update_tail() under Tx lockSriram Yagnaraman
Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment and lockdep assert to indicate that. This is needed to share the same TX ring between XDP, XSK and slow paths. Furthermore, the current XDP implementation is racy on tail updates. Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add lockdep assert and fixes tag] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>