summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-14dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove callShuai Xue
The remove call stack is missing idxd cleanup to free bitmap, ida and the idxd_device. Call idxd_free() helper routines to make sure we exit gracefully. Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Cc: stable@vger.kernel.org Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20250404120217.48772-9-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probeShuai Xue
Memory allocated for idxd is not freed if an error occurs during idxd_pci_probe(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error. Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Link: https://lore.kernel.org/r/20250404120217.48772-8-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: fix memory leak in error handling path of idxd_allocShuai Xue
Memory allocated for idxd is not freed if an error occurs during idxd_alloc(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error. Fixes: a8563a33a5e2 ("dmanegine: idxd: reformat opcap output to match bitmap_parse() input") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Link: https://lore.kernel.org/r/20250404120217.48772-7-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: Add missing cleanups in cleanup internalsShuai Xue
The idxd_cleanup_internals() function only decreases the reference count of groups, engines, and wqs but is missing the step to release memory resources. To fix this, use the cleanup helper to properly release the memory resources. Fixes: ddf742d4f3f1 ("dmaengine: idxd: Add missing cleanup for early error out in probe call") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20250404120217.48772-6-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: Add missing cleanup for early error out in idxd_setup_internalsShuai Xue
The idxd_setup_internals() is missing some cleanup when things fail in the middle. Add the appropriate cleanup routines: - cleanup groups - cleanup enginces - cleanup wqs to make sure it exits gracefully. Fixes: defe49f96012 ("dmaengine: idxd: fix group conf_dev lifetime") Cc: stable@vger.kernel.org Suggested-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20250404120217.48772-5-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groupsShuai Xue
Memory allocated for groups is not freed if an error occurs during idxd_setup_groups(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error. Fixes: defe49f96012 ("dmaengine: idxd: fix group conf_dev lifetime") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Link: https://lore.kernel.org/r/20250404120217.48772-4-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: fix memory leak in error handling path of idxd_setup_enginesShuai Xue
Memory allocated for engines is not freed if an error occurs during idxd_setup_engines(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error. Fixes: 75b911309060 ("dmaengine: idxd: fix engine conf_dev lifetime") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Link: https://lore.kernel.org/r/20250404120217.48772-3-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqsShuai Xue
Memory allocated for wqs is not freed if an error occurs during idxd_setup_wqs(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error. Fixes: 7c5dd23e57c1 ("dmaengine: idxd: fix wq conf_dev 'struct device' lifetime") Fixes: 700af3a0a26c ("dmaengine: idxd: add 'struct idxd_dev' as wrapper for conf_dev") Fixes: de5819b99489 ("dmaengine: idxd: track enabled workqueues in bitmap") Fixes: b0325aefd398 ("dmaengine: idxd: add WQ operation cap restriction support") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Link: https://lore.kernel.org/r/20250404120217.48772-2-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14xfs: Fix comment on xfs_trans_ail_update_bulk()Carlos Maiolino
This function doesn't take the AIL lock, but should be called with AIL lock held. Also (hopefuly) simplify the comment. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: Fix a comment on xfs_ail_deleteCarlos Maiolino
It doesn't return anything. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: Fail remount with noattr2 on a v5 with v4 enabledNirjhar Roy (IBM)
Bug: When we compile the kernel with CONFIG_XFS_SUPPORT_V4=y, remount with "-o remount,noattr2" on a v5 XFS does not fail explicitly. Reproduction: mkfs.xfs -f /dev/loop0 mount /dev/loop0 /mnt/scratch mount -o remount,noattr2 /dev/loop0 /mnt/scratch However, with CONFIG_XFS_SUPPORT_V4=n, the remount correctly fails explicitly. This is because the way the following 2 functions are defined: static inline bool xfs_has_attr2 (struct xfs_mount *mp) { return !IS_ENABLED(CONFIG_XFS_SUPPORT_V4) || (mp->m_features & XFS_FEAT_ATTR2); } static inline bool xfs_has_noattr2 (const struct xfs_mount *mp) { return mp->m_features & XFS_FEAT_NOATTR2; } xfs_has_attr2() returns true when CONFIG_XFS_SUPPORT_V4=n and hence, the following if condition in xfs_fs_validate_params() succeeds and returns -EINVAL: /* * We have not read the superblock at this point, so only the attr2 * mount option can set the attr2 feature by this stage. */ if (xfs_has_attr2(mp) && xfs_has_noattr2(mp)) { xfs_warn(mp, "attr2 and noattr2 cannot both be specified."); return -EINVAL; } With CONFIG_XFS_SUPPORT_V4=y, xfs_has_attr2() always return false and hence no error is returned. Fix: Check if the existing mount has crc enabled(i.e, of type v5 and has attr2 enabled) and the remount has noattr2, if yes, return -EINVAL. I have tested xfs/{189,539} in fstests with v4 and v5 XFS with both CONFIG_XFS_SUPPORT_V4=y/n and they both behave as expected. This patch also fixes remount from noattr2 -> attr2 (on a v4 xfs). Related discussion in [1] [1] https://lore.kernel.org/all/Z65o6nWxT00MaUrW@dread.disaster.area/ Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: fix zoned GC data corruption due to wrong bv_offsetChristoph Hellwig
xfs_zone_gc_write_chunk writes out the data buffer read in earlier using the same bio, and currenly looks at bv_offset for the offset into the scratch folio for that. But commit 26064d3e2b4d ("block: fix adding folio to bio") changed how bv_page and bv_offset are calculated for adding larger folios, breaking this fragile logic. Switch to extracting the full physical address from the old bio_vec, and calculate the offset into the folio from that instead. This fixes data corruption during garbage collection with heavy rockdsb workloads. Thanks to Hans for tracking down the culprit commit during long bisection sessions. Fixes: 26064d3e2b4d ("block: fix adding folio to bio") Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <Hans.Holmberg@wdc.com> Tested-by: Hans Holmberg <Hans.Holmberg@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14xfs: free up mp->m_free[0].count in error caseWengang Wang
In xfs_init_percpu_counters(), memory for mp->m_free[0].count wasn't freed in error case. Free it up in this patch. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Fixes: 712bae96631852 ("xfs: generalize the freespace and reserved blocks handling") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-05-14dma-buf: insert memory barrier before updating num_fencesHyejeong Choi
smp_store_mb() inserts memory barrier after storing operation. It is different with what the comment is originally aiming so Null pointer dereference can be happened if memory update is reordered. Signed-off-by: Hyejeong Choi <hjeong.choi@samsung.com> Fixes: a590d0fdbaa5 ("dma-buf: Update reservation shared_count after adding the new fence") CC: stable@vger.kernel.org Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250513020638.GA2329653@au1-maretx-p37.eng.sarc.samsung.com Signed-off-by: Christian König <christian.koenig@amd.com>
2025-05-14nvme: all namespaces in a subsystem must adhere to a common atomic write sizeAlan Adamson
The first namespace configured in a subsystem sets the subsystem's atomic write size based on its AWUPF or NAWUPF. Subsequent namespaces must have an atomic write size (per their AWUPF or NAWUPF) less than or equal to the subsystem's atomic write size, or their probing will be rejected. Signed-off-by: Alan Adamson <alan.adamson@oracle.com> [hch: fold in review comments from John Garry] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com>
2025-05-14io_uring/fdinfo: grab ctx->uring_lock around io_uring_show_fdinfo()Jens Axboe
Not everything requires locking in there, which is why the 'has_lock' variable exists. But enough does that it's a bit unwieldy to manage. Wrap the whole thing in a ->uring_lock trylock, and just return with no output if we fail to grab it. The existing trylock() will already have greatly diminished utility/output for the failure case. This fixes an issue with reading the SQE fields, if the ring is being actively resized at the same time. Reported-by: Jann Horn <jannh@google.com> Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-14phy: Fix error handling in tegra_xusb_port_initMa Ke
If device_add() fails, do not use device_unregister() for error handling. device_unregister() consists two functions: device_del() and put_device(). device_unregister() should only be called after device_add() succeeded because device_del() undoes what device_add() does if successful. Change device_unregister() to put_device() call before returning from the function. As comment of device_add() says, 'if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count'. Found by code review. Cc: stable@vger.kernel.org Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20250303072739.3874987-1-make24@iscas.ac.cn Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: renesas: rcar-gen3-usb2: Set timing registers only onceClaudiu Beznea
phy-rcar-gen3-usb2 driver exports 4 PHYs. The timing registers are common to all PHYs. There is no need to set them every time a PHY is initialized. Set timing register only when the 1st PHY is initialized. Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/r/20250507125032.565017-6-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power offClaudiu Beznea
Assert PLL reset on PHY power off. This saves power. Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/r/20250507125032.565017-5-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver dataClaudiu Beznea
The phy-rcar-gen3-usb2 driver exposes four individual PHYs that are requested and configured by PHY users. The struct phy_ops APIs access the same set of registers to configure all PHYs. Additionally, PHY settings can be modified through sysfs or an IRQ handler. While some struct phy_ops APIs are protected by a driver-wide mutex, others rely on individual PHY-specific mutexes. This approach can lead to various issues, including: 1/ the IRQ handler may interrupt PHY settings in progress, racing with hardware configuration protected by a mutex lock 2/ due to msleep(20) in rcar_gen3_init_otg(), while a configuration thread suspends to wait for the delay, another thread may try to configure another PHY (with phy_init() + phy_power_on()); re-running the phy_init() goes to the exact same configuration code, re-running the same hardware configuration on the same set of registers (and bits) which might impact the result of the msleep for the 1st configuring thread 3/ sysfs can configure the hardware (though role_store()) and it can still race with the phy_init()/phy_power_on() APIs calling into the drivers struct phy_ops To address these issues, add a spinlock to protect hardware register access and driver private data structures (e.g., calls to rcar_gen3_is_any_rphy_initialized()). Checking driver-specific data remains necessary as all PHY instances share common settings. With this change, the existing mutex protection is removed and the cleanup.h helpers are used. While at it, to keep the code simpler, do not skip regulator_enable()/regulator_disable() APIs in rcar_gen3_phy_usb2_power_on()/rcar_gen3_phy_usb2_power_off() as the regulators enable/disable operations are reference counted anyway. Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/r/20250507125032.565017-4-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: renesas: rcar-gen3-usb2: Move IRQ request in probeClaudiu Beznea
Commit 08b0ad375ca6 ("phy: renesas: rcar-gen3-usb2: move IRQ registration to init") moved the IRQ request operation from probe to struct phy_ops::phy_init API to avoid triggering interrupts (which lead to register accesses) while the PHY clocks (enabled through runtime PM APIs) are not active. If this happens, it results in a synchronous abort. One way to reproduce this issue is by enabling CONFIG_DEBUG_SHIRQ, which calls free_irq() on driver removal. Move the IRQ request and free operations back to probe, and take the runtime PM state into account in IRQ handler. This commit is preparatory for the subsequent fixes in this series. Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/r/20250507125032.565017-3-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bindClaudiu Beznea
It has been observed on the Renesas RZ/G3S SoC that unbinding and binding the PHY driver leads to role autodetection failures. This issue occurs when PHY 3 is the first initialized PHY. PHY 3 does not have an interrupt associated with the USB2_INT_ENABLE register (as rcar_gen3_int_enable[3] = 0). As a result, rcar_gen3_init_otg() is called to initialize OTG without enabling PHY interrupts. To resolve this, add rcar_gen3_is_any_otg_rphy_initialized() and call it in role_store(), role_show(), and rcar_gen3_init_otg(). At the same time, rcar_gen3_init_otg() is only called when initialization for a PHY with interrupt bits is in progress. As a result, the struct rcar_gen3_phy::otg_initialized is no longer needed. Fixes: 549b6b55b005 ("phy: renesas: rcar-gen3-usb2: enable/disable independent irqs") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/r/20250507125032.565017-2-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14phy: tegra: xusb: remove a stray unlockDan Carpenter
We used to take a lock in tegra186_utmi_bias_pad_power_on() but now we have moved the lock into the caller. Unfortunately, when we moved the lock this unlock was left behind and it results in a double unlock. Delete it now. Fixes: b47158fb4295 ("phy: tegra: xusb: Use a bitmask for UTMI pad power state tracking") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/aAjmR6To4EnvRl4G@stanley.mountain Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()Wentao Liang
The function snd_es1968_capture_open() calls the function snd_pcm_hw_constraint_pow2(), but does not check its return value. A proper implementation can be found in snd_cx25821_pcm_open(). Add error handling for snd_pcm_hw_constraint_pow2() and propagate its error code. Fixes: b942cf815b57 ("[ALSA] es1968 - Fix stuttering capture") Cc: stable@vger.kernel.org # v2.6.22 Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20250514092444.331-1-vulab@iscas.ac.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-14octeontx2-pf: Fix ethtool support for SDP representorsHariprasad Kelam
The hardware supports multiple MAC types, including RPM, SDP, and LBK. However, features such as link settings and pause frames are only available on RPM MAC, and not supported on SDP or LBK. This patch updates the ethtool operations logic accordingly to reflect this behavior. Fixes: 2f7f33a09516 ("octeontx2-pf: Add representors for sdp MAC") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-14regulator: max20086: fix invalid memory accessCosmin Tanislav
max20086_parse_regulators_dt() calls of_regulator_match() using an array of struct of_regulator_match allocated on the stack for the matches argument. of_regulator_match() calls devm_of_regulator_put_matches(), which calls devres_alloc() to allocate a struct devm_of_regulator_matches which will be de-allocated using devm_of_regulator_put_matches(). struct devm_of_regulator_matches is populated with the stack allocated matches array. If the device fails to probe, devm_of_regulator_put_matches() will be called and will try to call of_node_put() on that stack pointer, generating the following dmesg entries: max20086 6-0028: Failed to read DEVICE_ID reg: -121 kobject: '\xc0$\xa5\x03' (000000002cebcb7a): is not initialized, yet kobject_put() is being called. Followed by a stack trace matching the call flow described above. Switch to allocating the matches array using devm_kcalloc() to avoid accessing the stack pointer long after it's out of scope. This also has the advantage of allowing multiple max20086 to probe without overriding the data stored inside the global of_regulator_match. Fixes: bfff546aae50 ("regulator: Add MAX20086-MAX20089 driver") Signed-off-by: Cosmin Tanislav <demonsingur@gmail.com> Link: https://patch.msgid.link/20250508064947.2567255-1-demonsingur@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-14spi: spi-sun4i: fix early activationAlessandro Grassi
The SPI interface is activated before the CPOL setting is applied. In that moment, the clock idles high and CS goes low. After a short delay, CPOL and other settings are applied, which may cause the clock to change state and idle low. This transition is not part of a clock cycle, and it can confuse the receiving device. To prevent this unexpected transition, activate the interface while CPOL and the other settings are being applied. Signed-off-by: Alessandro Grassi <alessandro.grassi@mailbox.org> Link: https://patch.msgid.link/20250502095520.13825-1-alessandro.grassi@mailbox.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-14phy: phy-rockchip-samsung-hdptx: Fix PHY PLL output 50.25MHz errorAlgea Cao
When using HDMI PLL frequency division coefficient at 50.25MHz that is calculated by rk_hdptx_phy_clk_pll_calc(), it fails to get PHY LANE lock. Although the calculated values are within the allowable range of PHY PLL configuration. In order to fix the PHY LANE lock error and provide the expected 50.25MHz output, manually compute the required PHY PLL frequency division coefficient and add it to ropll_tmds_cfg configuration table. Signed-off-by: Algea Cao <algea.cao@rock-chips.com> Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250427095124.3354439-1-algea.cao@rock-chips.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14net: enetc: fix implicit declaration of function FIELD_PREPWei Fang
The kernel test robot reported the following error: drivers/net/ethernet/freescale/enetc/ntmp.c: In function 'ntmp_fill_request_hdr': drivers/net/ethernet/freescale/enetc/ntmp.c:203:38: error: implicit declaration of function 'FIELD_PREP' [-Wimplicit-function-declaration] 203 | cbd->req_hdr.access_method = FIELD_PREP(NTMP_ACCESS_METHOD, | ^~~~~~~~~~ Therefore, add "bitfield.h" to ntmp_private.h to fix this issue. Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505101047.NTMcerZE-lkp@intel.com/ Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-14net: wangxun: Correct clerical errors in commentsJiawen Wu
There are wrong "#endif" comments in .h files need to be corrected. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-14phy: starfive: jh7110-usb: Fix USB 2.0 host occasional detection failureHal Feng
JH7110 USB 2.0 host fails to detect USB 2.0 devices occasionally. With a long time of debugging and testing, we found that setting Rx clock gating control signal to normal power consumption mode can solve this problem. Signed-off-by: Hal Feng <hal.feng@starfivetech.com> Link: https://lore.kernel.org/r/20250422101244.51686-1-hal.feng@starfivetech.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-14nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathingAlan Adamson
A change to QEMU resulted in all nvme controllers (single and multi-controller subsystems) to have its CMIC.MCTRS bit set which indicates the subsystem supports multiple controllers and it is possible a namespace can be shared between those multiple controllers in a multipath configuration. When a namespace of a CMIC.MCTRS enabled subsystem is allocated, a multipath node is created. The queue limits for this node are inherited from the namespace being allocated. When inheriting queue limits, the features being inherited need to be specified. The atomic write feature (BLK_FEAT_ATOMIC_WRITES) was not specified so the atomic queue limits were not inherited by the multipath disk node which resulted in the sysfs atomic write attributes being zeroed. The fix is to include BLK_FEAT_ATOMIC_WRITES in the list of features to be inherited. Signed-off-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-14xfrm: Sanitize marks before insertPaul Chaignon
Prior to this patch, the mark is sanitized (applying the state's mask to the state's value) only on inserts when checking if a conflicting XFRM state or policy exists. We discovered in Cilium that this same sanitization does not occur in the hot-path __xfrm_state_lookup. In the hot-path, the sk_buff's mark is simply compared to the state's value: if ((mark & x->mark.m) != x->mark.v) continue; Therefore, users can define unsanitized marks (ex. 0xf42/0xf00) which will never match any packet. This commit updates __xfrm_state_insert and xfrm_policy_insert to store the sanitized marks, thus removing this footgun. This has the side effect of changing the ip output, as the returned mark will have the mask applied to it when printed. Fixes: 3d6acfa7641f ("xfrm: SA lookups with mark") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> Co-developed-by: Louis DeLosSantos <louis.delos.devel@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-05-13qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd()Abdun Nihaal
In one of the error paths in qlcnic_sriov_channel_cfg_cmd(), the memory allocated in qlcnic_sriov_alloc_bc_mbx_args() for mailbox arguments is not freed. Fix that by jumping to the error path that frees them, by calling qlcnic_free_mbx_args(). This was found using static analysis. Fixes: f197a7aa6288 ("qlcnic: VF-PF communication channel implementation") Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250512044829.36400-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: phy: remove stub for mdiobus_register_board_infoHeiner Kallweit
The functionality of mdiobus_register_board_info() typically isn't optional for the caller. Therefore remove the stub. Note: Currently we have only one caller of mdiobus_register_board_info(), in a DSA/PHYLINK context. Therefore CONFIG_MDIO_DEVICE is selected anyway. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/410a2222-c4e8-45b0-9091-d49674caeb00@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: mlxsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the mlxsw driver to the new API, so that the ndo_eth_ioctl() path can be removed completely. The UAPI is still ioctl-only, but it's best to remove the "ioctl" mentions from the driver in case a netlink variant appears. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250512154411.848614-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: ipa: Make the SMEM item ID constantKonrad Dybcio
It can't vary, stop storing the same magic number everywhere. Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Alex Elder <elder@kernel.org> Link: https://patch.msgid.link/20250512-topic-ipa_smem-v1-1-302679514a0d@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13docs: networking: timestamping: improve stacked PHC sentenceVladimir Oltean
The first paragraph makes no grammatical sense. I suppose a portion of the intended sentece is missing: "[The challenge with ] stacked PHCs (...) is that they uncover bugs". Rephrase, and at the same time simplify the structure of the sentence a little bit, it is not easy to follow. Fixes: 94d9f78f4d64 ("docs: networking: timestamping: add section for stacked PHC devices") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://patch.msgid.link/20250512131751.320283-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: enetc: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the ENETC driver to the new API, so that the ndo_eth_ioctl() path can be removed completely. Move the enetc_hwtstamp_get() and enetc_hwtstamp_set() calls away from enetc_ioctl() to dedicated net_device_ops for the LS1028A PF and VF (NETC v4 does not yet implement enetc_ioctl()), adapt the prototypes and export these symbols (enetc_ioctl() is also exported). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250512112402.4100618-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net: txgbe: Fix pending interruptJiawen Wu
For unknown reasons, sometimes the value of MISC interrupt is 0 in the IRQ handle function. In this case, wx_intr_enable() is also should be invoked to clear the interrupt. Otherwise, the next interrupt would never be reported. Fixes: a9843689e2de ("net: txgbe: add sriov function support") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/F4F708403CE7090B+20250512100652.139510-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5e: Disable MACsec offload for uplink representor profileCarolina Jubran
MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features. If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set. Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features(). Kernel log: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13Merge branch 'net-mlx5-hws-complex-matchers-and-rehash-mechanism-fixes'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5: HWS, Complex Matchers and rehash mechanism fixes Motivation: ---------- A matcher can match a certain set of match parameters. However, the number and size of match params for a single matcher are limited — all the parameters must fit within a single definer. A common example of this limitation is IPv6 address matching, where matching both source and destination IPs requires more bits than a single definer can support. SW Steering addresses this limitation by chaining multiple Steering Table Entries (STEs) within the same matcher, where each STE matches on a subset of the parameters. In HW Steering, such chaining is not possible — the matcher's STEs are managed in a hash table, and a single definer is used to calculate the hash index for STEs. Overview: -------- To address this limitation in HW Steering, we introduce *Complex Matchers*, which consist of two chained matchers. This allows matching on twice as many parameters. Complex Matchers are filled with *Complex Rules* — rules that are split into two parts and inserted into their respective matchers. The first half of the Complex Matcher is a regular matcher and points to the second half, which is an *Isolated Matcher*. An Isolated Matcher has its own isolated table and is accessible only by traffic coming from the first half of the Complex Matcher. This splitting of matchers/rules into multiple parts is transparent to users. It is hidden behind the BWC HWS API. It becomes visible only when dumping steering debug information, where the Complex Matcher appears as two separate matchers: one in the user-created table and another in its isolated table. Implementation Details: ---------------------- All user actions are performed on the second part of the rules only. The first part handles matching and applies two actions: modify header (set metadata, see details below) and go-to-table (directing traffic to the isolated table containing the isolated matcher). Rule updates (updating rule actions) are applied to the second part of the rule since user-provided actions are not executed in the first matcher. We use REG_C_6 metadata register to set and match on unique per-rule tag (see details below). Splitting rules into two parts introduces new challenges: 1. Invalid Combinations Consider two rules with different matching values: - Rule 1: A+B - Rule 2: C+D Let's split the rules into two parts as follows: |-----Complex Matcher-------| | | | 1st matcher 2nd matcher | | |---| |---| | | | A | | B | | | |---| -----> |---| | | | C | | D | | | |---| |---| | | | |---------------------------| Splitting these rules results in invalid combinations: A+D and C+B: any packet that matched on A will be forwarded to the 2nd matcher, where it will try to match on B (which is legal, and it is what the user asked for), but it will also try to match on D (which is not what the user asked for). To resolve this, we assign unique tags to each rule on the first matcher and match on these tags on the second matcher: |----------| |---------| | A | | B, TagA | | action: | | | | set TagA | | | |----------| --> |---------| | C | | D, TagB | | action: | | | | set TagB | | | |----------| |---------| 2. Duplicated Entries: Consider two rules with overlapping values: - Rule 1: A+B - Rule 2: A+D Let's split the rules into two parts as follows: |---| |---| | A | | B | |---| --> |---| | | | D | |---| |---| This leads to the duplicated entries on the first matcher, which HWS doesn't allow: subsequent delete of either of the rules will delete the only entry in the first matcher, leaving the remaining rule broken. To address this, we use a reference count for entries in the first matcher and delete STEs only when their refcount reaches zero. Both challenges are resolved by having a per-matcher data structure (implemented with rhashtable) that manages refcounts for the first part of the rules and holds unique tags (managed via IDA) for these rules to set and to match on the second matcher. Limitations: ----------- We utilize metadata register REG_C_6 in this implementation, so its usage anywhere along the flow that might include the need for Complex Matcher is prohibited. The number and size of match parameters remain limited — now constrained by what can be represented by two definers instead of one. This architectural limitation arises from the structure of Complex Matchers. If future requirements demand more parameters, Complex Matchers can be extended beyond two matchers. Additionally, there is an implementation limit of 32 match parameters per matcher (disregarding parameter size). This limit can be lifted if needed. Patches: ------- - Patches 1-3: small additions/refactoring in preparation for Complex Matcher: exposed mlx5hws_table_ft_set_next_ft() in header, added definer function to convert field name enum to string, expose the polling function mlx5hws_bwc_queue_poll() in a header. - Patch 4: in preparation for Complex Matcher, this patch adds support for Isolated Matcher. - Patch 5: the main patch - Complex Matchers implementation. [2] Patch 6: fixing the usecase where rule insertion was failing, but rehash couldn't be initiated if the number of rules in the table is below the rehash threshold. Patch 7: fixing the usecase where many rules in parallel would require rehash, due to the way the counting of rules was done. Patch 8: fixing the case where rules were requiring action template extension in parallel, leading to unneeded extensions with the same templates. Patch 9: refactor and simplify the rehash loop. Patch 10: dump error completion details, which helps a lot in trying to understand what went wrong, especially during rehash. ==================== Link: https://patch.msgid.link/1746992290-568936-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, dump bad completion detailsYevgeny Kliteynik
Failing to insert/delete a rule should not happen. If it does happen, it would be good to know at which stage it happened and what was the failure. This patch adds printing of bad CQE details. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-11-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, rework rehash loopYevgeny Kliteynik
Reworking the rehash loop - simplifying the code and making it less error prone: - Instead of doing round-robin on all the queues with batch of rules in each cycle, just go over all the queues and move all the rules that belong to this queue. - If at some stage of moving the rule we get a failure (which should not happen), this can't be rolled back. So instead of aborting rehash and leaving the matcher in a broken state, allow the loop to continue: attempt to move the rest of the rules and delete the old matcher. A rule that failed to move to a new matcher will loose its match STE once the rehash is completed and the old matcher is deleted, so the rule won't match any traffic any more. This rule's packets will fall back to the steering pipeline w/o HW offload. Rehash procedure will return an error, which will cause the rule insertion to fail for the rule that started this whole rehash. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-10-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, fix redundant extension of action templatesYevgeny Kliteynik
When a rule is inserted into a matcher, we search for the suitable action template. If such template is not found, action template array is extended with the new template. However, when several threads are performing this in parallel, there is a race - we can end up with extending the action templates array with the same template. This patch is doing the following: - refactor the code to find action template index in rule create and update, have the common code in an auxiliary function - after locking all the queues, check again if the action template array still needs to be extended Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-9-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, fix counting of rules in the matcherYevgeny Kliteynik
Currently the counter that counts number of rules in a matcher is increased only when rule insertion is completed. In a multi-threaded usecase this can lead to a scenario that many rules can be in process of insertion in the same matcher, while none of them has completed the insertion and the rule counter is not updated. This results in a rule insertion failure for many of them at first attempt, which leads to all of them requiring rehash and requiring locking of all the queue locks. This patch fixes the case by increasing the rule counter in the beginning of insertion process and decreasing in case of any failure. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-8-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, force rehash when rule insertion failedYevgeny Kliteynik
Rules are inserted into hash table in accordance with their hash index. When a certain number of rules is reached, the table is rehashed: a bigger new table is allocated and all the rules are moved there. But sometimes a new rule can't be inserted into the hash table because its index is full, even though the number of rules in the table is well below the threshold. The hash function is not perfect, so such cases are not rare. When that happens, we want to do the same rehash, in order to increase the table size and lower the probability for such cases. This patch fixes the usecase where rule insertion was failing, but rehash couldn't be initiated due to low number of rules: it adds flag that denotes that rehash is required, even if the number of rules in the table is below the rehash threshold. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, support complex matchersYevgeny Kliteynik
This patch adds support for Complex Matchers/Rules Overview: -------- A matcher can match on a certain set of match parameters. However, the number and size of match params for a single matcher are limited: all the parameters must fit within a single definer. A common example of this limitation is IPv6 address matching, where matching both source and destination IPs requires more bits than a single definer can support. SW Steering addresses this limitation by chaining multiple Steering Table Entries (STEs) within the same matcher, where each STE matches on a subset of the parameters. In HW Steering, such chaining is not possible — the matcher's STEs are managed in a hash table, and a single definer is used to calculate the hash index for STEs. To address this limitation in HW Steering, we introduce Complex Matchers, which consist of two chained matchers. This allows matching on twice as many parameters. Complex Matchers are filled with Complex Rules — rules that are split into two parts and inserted into their respective matchers. The first half of the Complex Matcher is a regular matcher and points to the second half, which is an Isolated Matcher. An Isolated Matcher has its own isolated table and is accessible only by traffic coming from the first half of the Complex Matcher. This splitting of matchers/rules into multiple parts is transparent to users. It is hidden under the BWC HWS API. It becomes visible only when dumping steering debug information, where the Complex Matcher appears as two separate matchers: one in the user-created table and another in its isolated table. Some implementation details: --------------------------- All user actions are performed on the second part of the rules only. The first part handles matching and applies two actions: modify header (set metadata, see details below) and go-to-table (directing traffic to the isolated table containing the isolated matcher). Rule updates (updating rule actions) are applied to the second part of the rule since user-provided actions are not executed in the first matcher. We use REG_C_6 metadata register to set and match on unique per-rule tag (see details below). Splitting rules into two parts introduces new challenges: 1. Invalid Combinations Consider two rules with different matching values: - Rule 1: A+B - Rule 2: C+D Let's split the rules into two parts as follows: |---| |---| | A | | B | |---| --> |---| | C | | D | |---| |---| Splitting these rules results in invalid combinations like A+D and C+B. To resolve this, we assign unique tags to each rule on the first matcher and match these tags on the second matcher (the tag is implemented through modify_hdr action that sets value to metadata register REG_C_6): |----------| |---------| | A | | B, TagA | | action: | | | | set TagA | | | |----------| --> |---------| | C | | D, TagB | | action: | | | | set TagB | | | |----------| |---------| 2. Duplicated Entries: Consider two rules with overlapping values: - Rule 1: A+B - Rule 2: A+D Let's split the rules into two parts as follows: |---| |---| | A | | B | |---| --> |---| | | | D | |---| |---| This leads to the duplicated entries on the first matcher, which HWS doesn't allow: subsequent delete of either of the rules will delete the only entry in the first matcher, leaving the remaining rule broken. To address this, we use a reference count for entries in the first matcher and delete STEs only when their refcount reaches zero. Both challenges are resolved by having a per-matcher data structure (implemented with rhashtable) that manages refcounts for the first part of the rules and holds unique tags (managed via IDA) for these rules to set and to match on the second matcher. Limitations: ----------- We utilize metadata register REG_C_6 in this implementation, so its usage anywhere along the steering of the flow that might include the need for Complex Matcher is prohibited. The number and size of match parameters remain limited — now it is constrained by what can be represented by two definers instead of one. This architectural limitation arises from the structure of Complex Matchers. If future requirements demand more parameters, Complex Matchers can be extended beyond two matchers. Additionally, there is an implementation limit of 32 match parameters per rule (disregarding parameter size). This limit can be lifted if needed. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, introduce isolated matchersYevgeny Kliteynik
In preparation for complex matcher support, introduce the isolated matcher. Isolated matcher is a matcher that has its own isolated table. It is used as the second half of the complex matcher: when the rule is split into two parts (complex rule), then matching on the first part will send the packet to the isolated matcher that will try to match on the second part. In case of miss, the packet goes back to the matcher's end flow table. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13net/mlx5: HWS, expose polling function in header fileYevgeny Kliteynik
In preparation for complex matcher, expose the function that is polling queue for completion (mlx5hws_bwc_queue_poll) in header file, so that it will be used by complex matcher code. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>