summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-03-13regulator: check that dummy regulator has been probed before using itChristian Eggers
Due to asynchronous driver probing there is a chance that the dummy regulator hasn't already been probed when first accessing it. Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Link: https://patch.msgid.link/20250313103051.32430-3-ceggers@arri.de Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-13net: mana: cleanup mana struct after debugfs_remove()Shradha Gupta
When on a MANA VM hibernation is triggered, as part of hibernate_snapshot(), mana_gd_suspend() and mana_gd_resume() are called. If during this mana_gd_resume(), a failure occurs with HWC creation, mana_port_debugfs pointer does not get reinitialized and ends up pointing to older, cleaned-up dentry. Further in the hibernation path, as part of power_down(), mana_gd_shutdown() is triggered. This call, unaware of the failures in resume, tries to cleanup the already cleaned up mana_port_debugfs value and hits the following bug: [ 191.359296] mana 7870:00:00.0: Shutdown was called [ 191.359918] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 191.360584] #PF: supervisor write access in kernel mode [ 191.361125] #PF: error_code(0x0002) - not-present page [ 191.361727] PGD 1080ea067 P4D 0 [ 191.362172] Oops: Oops: 0002 [#1] SMP NOPTI [ 191.362606] CPU: 11 UID: 0 PID: 1674 Comm: bash Not tainted 6.14.0-rc5+ #2 [ 191.363292] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 191.364124] RIP: 0010:down_write+0x19/0x50 [ 191.364537] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 de cd ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 16 65 48 8b 05 88 24 4c 6a 48 89 43 08 48 8b 5d [ 191.365867] RSP: 0000:ff45fbe0c1c037b8 EFLAGS: 00010246 [ 191.366350] RAX: 0000000000000000 RBX: 0000000000000098 RCX: ffffff8100000000 [ 191.366951] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098 [ 191.367600] RBP: ff45fbe0c1c037c0 R08: 0000000000000000 R09: 0000000000000001 [ 191.368225] R10: ff45fbe0d2b01000 R11: 0000000000000008 R12: 0000000000000000 [ 191.368874] R13: 000000000000000b R14: ff43dc27509d67c0 R15: 0000000000000020 [ 191.369549] FS: 00007dbc5001e740(0000) GS:ff43dc663f380000(0000) knlGS:0000000000000000 [ 191.370213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.370830] CR2: 0000000000000098 CR3: 0000000168e8e002 CR4: 0000000000b73ef0 [ 191.371557] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 191.372192] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 191.372906] Call Trace: [ 191.373262] <TASK> [ 191.373621] ? show_regs+0x64/0x70 [ 191.374040] ? __die+0x24/0x70 [ 191.374468] ? page_fault_oops+0x290/0x5b0 [ 191.374875] ? do_user_addr_fault+0x448/0x800 [ 191.375357] ? exc_page_fault+0x7a/0x160 [ 191.375971] ? asm_exc_page_fault+0x27/0x30 [ 191.376416] ? down_write+0x19/0x50 [ 191.376832] ? down_write+0x12/0x50 [ 191.377232] simple_recursive_removal+0x4a/0x2a0 [ 191.377679] ? __pfx_remove_one+0x10/0x10 [ 191.378088] debugfs_remove+0x44/0x70 [ 191.378530] mana_detach+0x17c/0x4f0 [ 191.378950] ? __flush_work+0x1e2/0x3b0 [ 191.379362] ? __cond_resched+0x1a/0x50 [ 191.379787] mana_remove+0xf2/0x1a0 [ 191.380193] mana_gd_shutdown+0x3b/0x70 [ 191.380642] pci_device_shutdown+0x3a/0x80 [ 191.381063] device_shutdown+0x13e/0x230 [ 191.381480] kernel_power_off+0x35/0x80 [ 191.381890] hibernate+0x3c6/0x470 [ 191.382312] state_store+0xcb/0xd0 [ 191.382734] kobj_attr_store+0x12/0x30 [ 191.383211] sysfs_kf_write+0x3e/0x50 [ 191.383640] kernfs_fop_write_iter+0x140/0x1d0 [ 191.384106] vfs_write+0x271/0x440 [ 191.384521] ksys_write+0x72/0xf0 [ 191.384924] __x64_sys_write+0x19/0x20 [ 191.385313] x64_sys_call+0x2b0/0x20b0 [ 191.385736] do_syscall_64+0x79/0x150 [ 191.386146] ? __mod_memcg_lruvec_state+0xe7/0x240 [ 191.386676] ? __lruvec_stat_mod_folio+0x79/0xb0 [ 191.387124] ? __pfx_lru_add+0x10/0x10 [ 191.387515] ? queued_spin_unlock+0x9/0x10 [ 191.387937] ? do_anonymous_page+0x33c/0xa00 [ 191.388374] ? __handle_mm_fault+0xcf3/0x1210 [ 191.388805] ? __count_memcg_events+0xbe/0x180 [ 191.389235] ? handle_mm_fault+0xae/0x300 [ 191.389588] ? do_user_addr_fault+0x559/0x800 [ 191.390027] ? irqentry_exit_to_user_mode+0x43/0x230 [ 191.390525] ? irqentry_exit+0x1d/0x30 [ 191.390879] ? exc_page_fault+0x86/0x160 [ 191.391235] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 191.391745] RIP: 0033:0x7dbc4ff1c574 [ 191.392111] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 191.393412] RSP: 002b:00007ffd95a23ab8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 191.393990] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007dbc4ff1c574 [ 191.394594] RDX: 0000000000000005 RSI: 00005a6eeadb0ce0 RDI: 0000000000000001 [ 191.395215] RBP: 00007ffd95a23ae0 R08: 00007dbc50003b20 R09: 0000000000000000 [ 191.395805] R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000005 [ 191.396404] R13: 00005a6eeadb0ce0 R14: 00007dbc500045c0 R15: 00007dbc50001ee0 [ 191.396987] </TASK> To fix this, we explicitly set such mana debugfs variables to NULL after debugfs_remove() is called. Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device") Cc: stable@vger.kernel.org Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1741688260-28922-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devicesCarolina Jubran
mlx5_eswitch_get_vepa returns -EPERM if the device lacks eswitch_manager capability, blocking mlx5e_bridge_getlink from retrieving VEPA mode. Since mlx5e_bridge_getlink implements ndo_bridge_getlink, returning -EPERM causes bridge link show to fail instead of skipping devices without this capability. To avoid this, return -EOPNOTSUPP from mlx5e_bridge_getlink when mlx5_eswitch_get_vepa fails, ensuring the command continues processing other devices while ignoring those without the necessary capability. Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-7-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5: Bridge, fix the crash caused by LAG state checkJianbo Liu
When removing LAG device from bridge, NETDEV_CHANGEUPPER event is triggered. Driver finds the lower devices (PFs) to flush all the offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns false if one of PF is unloaded. In such case, mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of the alive PF, and the flush is skipped. Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be ignored in this case because the upper interface for bond is deleted, and the entry will never be aged because lastuse is never updated. To make things worse, as the entry is alive, mlx5 bridge workqueue keeps sending that event, which is then handled by kernel bridge notifier. It causes the following crash when accessing the passed bond netdev which is already destroyed. To fix this issue, remove such checks. LAG state is already checked in commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding bond to bridge"), driver still need to skip offload if LAG becomes invalid state after initialization. Oops: stack segment: 0000 [#1] SMP CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G OE 6.11.0_mlnx #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core] RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge] Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7 RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297 RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0 RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60 R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? die+0x38/0x60 ? do_trap+0x10b/0x120 ? do_error_trap+0x64/0xa0 ? exc_stack_segment+0x33/0x50 ? asm_exc_stack_segment+0x22/0x30 ? br_switchdev_event+0x2c/0x110 [bridge] ? sched_balance_newidle.isra.149+0x248/0x390 notifier_call_chain+0x4b/0xa0 atomic_notifier_call_chain+0x16/0x20 mlx5_esw_bridge_update+0xec/0x170 [mlx5_core] mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core] process_scheduled_works+0x81/0x390 worker_thread+0x106/0x250 ? bh_worker+0x110/0x110 kthread+0xb7/0xe0 ? kthread_park+0x80/0x80 ret_from_fork+0x2d/0x50 ? kthread_park+0x80/0x80 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5: Lag, Check shared fdb before creating MultiPort E-SwitchShay Drory
Currently, MultiPort E-Switch is requesting to create a LAG with shared FDB without checking the LAG is supporting shared FDB. Add the check. Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5: Fix incorrect IRQ pool usage when releasing IRQsShay Drory
mlx5_irq_pool_get() is a getter for completion IRQ pool only. However, after the cited commit, mlx5_irq_pool_get() is called during ctrl IRQ release flow to retrieve the pool, resulting in the use of an incorrect IRQ pool. Hence, use the newly introduced mlx5_irq_get_pool() getter to retrieve the correct IRQ pool based on the IRQ itself. While at it, rename mlx5_irq_pool_get() to mlx5_irq_table_get_comp_irq_pool() which accurately reflects its purpose and improves code readability. Fixes: 0477d5168bbb ("net/mlx5: Expose SFs IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5: HWS, Rightsize bwc matcher priorityVlad Dogaru
The bwc layer was clamping the matcher priority from 32 bits to 16 bits. This didn't show up until a matcher was resized, since the initial native matcher was created using the correct 32 bit value. The fix also reorders fields to avoid some padding. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13net/mlx5: DR, use the right action structs for STEv3Yevgeny Kliteynik
Some actions in ConnectX-8 (STEv3) have different structure, and they are handled separately in ste_ctx_v3. This separate handling was missing two actions: INSERT_HDR and REMOVE_HDR, which broke SWS for Linux Bridge. This patch resolves the issue by introducing dedicated callbacks for the insert and remove header functions, with version-specific implementations for each STE variant. Fixes: 4d617b57574f ("net/mlx5: DR, add support for ConnectX-8 steering") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13reset: mchp: sparx5: Fix for lan966xHoratiu Vultur
With the blamed commit it seems that lan966x doesn't seem to boot anymore when the internal CPU is used. The reason seems to be the usage of the devm_of_iomap, if we replace this with devm_ioremap, this seems to fix the issue as we use the same region also for other devices. Fixes: 0426a920d6269c ("reset: mchp: sparx5: Map cpu-syscon locally in case of LAN966x") Reviewed-by: Herve Codina <herve.codina@bootlin.com> Tested-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20250227105502.25125-1-horatiu.vultur@microchip.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-03-13drm/sched: Fix fence reference count leakqianyi liu
The last_scheduled fence leaks when an entity is being killed and adding the cleanup callback fails. Decrement the reference count of prev when dma_fence_add_callback() fails, ensuring proper balance. Cc: stable@vger.kernel.org # v6.2+ [phasta: add git tag info for stable kernel] Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: qianyi liu <liuqianyi125@gmail.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250311060251.4041101-1-liuqianyi125@gmail.com
2025-03-13gpio: cdev: use raw notifier for line state eventsBartosz Golaszewski
We use a notifier to implement the mechanism of informing the user-space about changes in GPIO line status. We register with the notifier when the GPIO character device file is opened and unregister when the last reference to the associated file descriptor is dropped. Since commit fcc8b637c542 ("gpiolib: switch the line state notifier to atomic") we use the atomic notifier variant. Atomic notifiers call rcu_synchronize in atomic_notifier_chain_unregister() which caused a significant performance regression in some circumstances, observed by user-space when calling close() on the GPIO device file descriptor. Replace the atomic notifier with the raw variant and provide synchronization with a read-write spinlock. Fixes: fcc8b637c542 ("gpiolib: switch the line state notifier to atomic") Reported-by: David Jander <david@protonic.nl> Closes: https://lore.kernel.org/all/20250311110034.53959031@erd003.prtnl/ Tested-by: David Jander <david@protonic.nl> Tested-by: Kent Gibson <warthog618@gmail.com> Link: https://lore.kernel.org/r/20250311-gpiolib-line-state-raw-notifier-v2-1-138374581e1e@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-13gpiolib: don't check the retval of get_direction() when registering a chipBartosz Golaszewski
During chip registration we should neither check the return value of gc->get_direction() nor hold the SRCU lock when calling it. The former is because pin controllers may have pins set to alternate functions and return errors from their get_direction() callbacks. That's alright - we should default to the safe INPUT state and not bail-out. The latter is not needed because we haven't registered the chip yet so there's nothing to protect against dynamic removal. In fact: we currently hit a lockdep splat. Revert to calling the gc->get_direction() callback directly and *not* checking its value. Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/81f890fc-6688-42f0-9756-567efc8bb97a@samsung.com/ Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250226-retval-fixes-v2-1-c8dc57182441@linaro.org Tested-by: Gene C <arch@sapience.com> Link: https://lore.kernel.org/r/20250311175631.83779-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-12Merge tag 'spi-fix-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of driver specific fixes, an error handling fix for the Atmel QuadSPI driver and a fix for a nasty synchronisation issue in the data path for the Microchip driver which affects larger transfers. There's also a MAINTAINERS update for the Samsung driver" * tag 'spi-fix-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: microchip-core: prevent RX overflows when transmit size > FIFO size MAINTAINERS: add tambarus as R for Samsung SPI spi: atmel-quadspi: remove references to runtime PM on error path
2025-03-12drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flagsNatalie Vock
PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: 0cce5f285d9a ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3e3fcd29b505cebed659311337ea03b7698767fc) Cc: stable@vger.kernel.org # 6.12.x
2025-03-12drm/amd/amdkfd: Evict all queues even HWS remove queue failedYifan Zha
[Why] If reset is detected and kfd need to evict working queues, HWS moving queue will be failed. Then remaining queues are not evicted and in active state. After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted. So remove queue will be failed again. [How] Keep removing all queues even if HWS returns failed. It will not affect cpsch as it checks reset_domain->sem. v2: If any queue failed, evict queue returns error. v3: Declare err inside the if-block. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 42c854b8fb0cce512534aa2b7141948e80c6ebb0) Cc: stable@vger.kernel.org
2025-03-12RDMA/hns: Fix wrong value of max_sge_rdJunxian Huang
There is no difference between the sge of READ and non-READ operations in hns RoCE. Set max_sge_rd to the same value as max_send_sge. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12RDMA/hns: Fix missing xa_destroy()Junxian Huang
Add xa_destroy() for xarray in driver. Fixes: 5c1f167af112 ("RDMA/hns: Init SRQ table for hip08") Fixes: 27e19f451089 ("RDMA/hns: Convert cq_table to XArray") Fixes: 736b5a70db98 ("RDMA/hns: Convert qp_table_tree to XArray") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common()Junxian Huang
When ib_copy_to_udata() fails in hns_roce_create_qp_common(), hns_roce_qp_remove() should be called in the error path to clean up resources in hns_roce_qp_store(). Fixes: 0f00571f9433 ("RDMA/hns: Use new SQ doorbell register for HIP09") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12RDMA/hns: Fix invalid sq params not being blockedJunxian Huang
SQ params from userspace are checked in by set_user_sq_size(). But when the check fails, the function doesn't return but instead keep running and overwrite 'ret'. As a result, the invalid params will not get blocked actually. Add a return right after the failed check. Besides, although the check result of kernel sq params will not be overwritten, to keep coding style unified, move default_congest_type() before set_kernel_sq_size(). Fixes: 6ec429d5887a ("RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db()Junxian Huang
Currently the condition of unmapping sdb in error path is not exactly the same as the condition of mapping in alloc_user_qp_db(). This may cause a problem of unmapping an unmapped db in some case, such as when the QP is XRC TGT. Unified the two conditions. Fixes: 90ae0b57e4a5 ("RDMA/hns: Combine enable flags of qp") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12RDMA/hns: Fix soft lockup during bt pages loopJunxian Huang
Driver runs a for-loop when allocating bt pages and mapping them with buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated, it may require a considerable loop count. This will lead to soft lockup: watchdog: BUG: soft lockup - CPU#27 stuck for 22s! ... Call trace: hem_list_alloc_mid_bt+0x124/0x394 [hns_roce_hw_v2] hns_roce_hem_list_request+0xf8/0x160 [hns_roce_hw_v2] hns_roce_mtr_create+0x2e4/0x360 [hns_roce_hw_v2] alloc_mr_pbl+0xd4/0x17c [hns_roce_hw_v2] hns_roce_reg_user_mr+0xf8/0x190 [hns_roce_hw_v2] ib_uverbs_reg_mr+0x118/0x290 watchdog: BUG: soft lockup - CPU#35 stuck for 23s! ... Call trace: hns_roce_hem_list_find_mtt+0x7c/0xb0 [hns_roce_hw_v2] mtr_map_bufs+0xc4/0x204 [hns_roce_hw_v2] hns_roce_mtr_create+0x31c/0x3c4 [hns_roce_hw_v2] alloc_mr_pbl+0xb0/0x160 [hns_roce_hw_v2] hns_roce_reg_user_mr+0x108/0x1c0 [hns_roce_hw_v2] ib_uverbs_reg_mr+0x120/0x2bc Add a cond_resched() to fix soft lockup during these loops. In order not to affect the allocation performance of normal-size buffer, set the loop count of a 100GB MR as the threshold to call cond_resched(). Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12USB: serial: ftdi_sio: add support for Altera USB Blaster 3Boon Khai Ng
The Altera USB Blaster 3, available as both a cable and an on-board solution, is primarily used for programming and debugging FPGAs. It interfaces with host software such as Quartus Programmer, System Console, SignalTap, and Nios Debugger. The device utilizes either an FT2232 or FT4232 chip. Enabling the support for various configurations of the on-board USB Blaster 3 by including the appropriate VID/PID pairs, allowing it to function as a serial device via ftdi_sio. Note that this check-in does not include support for the cable solution, as it does not support UART functionality. The supported configurations are determined by the hardware design and include: 1) PID 0x6022, FT2232, 1 JTAG port (Port A) + Port B as UART 2) PID 0x6025, FT4232, 1 JTAG port (Port A) + Port C as UART 3) PID 0x6026, FT4232, 1 JTAG port (Port A) + Port C, D as UART 4) PID 0x6029, FT4232, 1 JTAG port (Port B) + Port C as UART 5) PID 0x602a, FT4232, 1 JTAG port (Port B) + Port C, D as UART 6) PID 0x602c, FT4232, 1 JTAG port (Port A) + Port B as UART 7) PID 0x602d, FT4232, 1 JTAG port (Port A) + Port B, C as UART 8) PID 0x602e, FT4232, 1 JTAG port (Port A) + Port B, C, D as UART These configurations allow for flexibility in how the USB Blaster 3 is used, depending on the specific needs of the hardware design. Signed-off-by: Boon Khai Ng <boon.khai.ng@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2025-03-12media: rtl2832_sdr: assign vb2 lock before vb2_queue_initHans Verkuil
Commit c780d01cf1a6 ("media: vb2: vb2_core_queue_init(): sanity check lock and wait_prepare/finish") added a sanity check to ensure that if there are no wait_prepare/finish callbacks set by the driver, then the vb2_queue lock must be set, since otherwise the vb2 core cannot do correct locking. The rtl2832_sdr.c triggered this warning: it turns out that while the driver does set this lock, it sets it too late. So move it up to before the vb2_queue_init() call. Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Closes: https://lore.kernel.org/linux-media/20241211042355.8479-1-user@am64/ Fixes: 8fcd2795d22a ("media: rtl2832_sdr: drop vb2_ops_wait_prepare/finish") Cc: stable@vger.kernel.org Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-03-12block: change blk_mq_add_to_batch() third argument type to boolShin'ichiro Kawasaki
Commit 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions") modified the evaluation criteria for the third argument, 'ioerror', in the blk_mq_add_to_batch() function. Initially, the function had checked if 'ioerror' equals zero. Following the commit, it started checking for negative error values, with the presumption that such values, for instance -EIO, would be passed in. However, blk_mq_add_to_batch() callers do not pass negative error values. Instead, they pass status codes defined in various ways: - NVMe PCI and Apple drivers pass NVMe status code - virtio_blk driver passes the virtblk request header status byte - null_blk driver passes blk_status_t These codes are either zero or positive, therefore the revised check fails to function as intended. Specifically, with the NVMe PCI driver, this modification led to the failure of the blktests test case nvme/039. In this test scenario, errors are artificially injected to the NVMe driver, resulting in positive NVMe status codes passed to blk_mq_add_to_batch(), which unexpectedly processes the failed I/O in a batch. Hence the failure. To correct the ioerror check within blk_mq_add_to_batch(), make all callers to uniformly pass the argument as boolean. Modify the callers to check their specific status codes and pass the boolean value 'is_error'. Also describe the arguments of blK_mq_add_to_batch as kerneldoc. Fixes: 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20250311104359.1767728-3-shinichiro.kawasaki@wdc.com [axboe: fold in documentation update] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-12Merge tag 'wireless-2025-03-12' of ↵David S. Miller
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes berg says: ==================== Few more fixes: - cfg80211/mac80211 - stop possible runaway wiphy worker - EHT should not use reserved MPDU size bits - don't run worker for stopped interfaces - fix SA Query processing with MLO - fix lookup of assoc link BSS entries - correct station flush on unauthorize - iwlwifi: - TSO fixes - fix non-MSI-X platforms - stop possible runaway restart worker - rejigger maintainers so I'm not CC'ed on everything ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-12RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp pathSaravanan Vajravel
Driver is always clearing the mask that sets the VLAN ID/Service Level in the adapter. Recent change for supporting multiple traffic class exposed this issue. Allow setting SL and VLAN_ID while QP is moved from INIT to RTR state. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: c64b16a37b6d ("RDMA/bnxt_re: Support different traffic class") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741670196-2919-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-12i2c: sis630: Fix an error handling path in sis630_probe()Christophe JAILLET
If i2c_add_adapter() fails, the request_region() call in sis630_setup() must be undone by a corresponding release_region() call, as done in the remove function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/3d607601f2c38e896b10207963c6ab499ca5c307.1741033587.git.christophe.jaillet@wanadoo.fr Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-03-12i2c: ali15x3: Fix an error handling path in ali15x3_probe()Christophe JAILLET
If i2c_add_adapter() fails, the request_region() call in ali15x3_setup() must be undone by a corresponding release_region() call, as done in the remove function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/9b2090cbcc02659f425188ea05f2e02745c4e67b.1741031878.git.christophe.jaillet@wanadoo.fr
2025-03-12i2c: ali1535: Fix an error handling path in ali1535_probe()Christophe JAILLET
If i2c_add_adapter() fails, the request_region() call in ali1535_setup() must be undone by a corresponding release_region() call, as done in the remove function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/0daf63d7a2ce74c02e2664ba805bbfadab7d25e5.1741031571.git.christophe.jaillet@wanadoo.fr
2025-03-12i2c: omap: fix IRQ stormsAndreas Kemnade
On the GTA04A5 writing a reset command to the gyroscope causes IRQ storms because NACK IRQs are enabled and therefore triggered but not acked. Sending a reset command to the gyroscope by i2cset 1 0x69 0x14 0xb6 with an additional debug print in the ISR (not the thread) itself causes [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00 [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1 [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110) [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010) [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102) [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102) [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102) repeating till infinity [...] (0x2 = NACK, 0x100 = Bus free, which is not enabled) Apparently no other IRQ bit gets set, so this stalls. Do not ignore enabled interrupts and make sure they are acked. If the NACK IRQ is not needed, it should simply not enabled, but according to the above log, caring about it is necessary unless the Bus free IRQ is enabled and handled. The assumption that is will always come with a ARDY IRQ, which was the idea behind ignoring it, proves wrong. It is true for simple reads from an unused address. To still avoid the i2cdetect trouble which is the reason for commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings"), avoid doing much about NACK in omap_i2c_xfer_data() which is used by both IRQ mode and polling mode, so also the false detection fix is extended to polling usage and IRQ storms are avoided. By changing this, the hardirq handler is not needed anymore to filter stuff. The mentioned gyro reset now just causes a -ETIMEDOUT instead of hanging the system. Fixes: c770657bd261 ("i2c: omap: Fix standard mode false ACK readings"). CC: stable@kernel.org Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Aniket Limaye <a-limaye@ti.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250228140420.379498-1-andreas@kemnade.info
2025-03-12mmc: atmel-mci: Add missing clk_disable_unprepare()Gu Bowen
The error path when atmci_configure_dma() set dma fails in atmci driver does not correctly disable the clock. Add the missing clk_disable_unprepare() to the error path for pair with clk_prepare_enable(). Fixes: 467e081d23e6 ("mmc: atmel-mci: use probe deferring if dma controller is not ready yet") Signed-off-by: Gu Bowen <gubowen5@huawei.com> Acked-by: Aubin Constans <aubin.constans@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250225022856.3452240-1-gubowen5@huawei.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-03-11Merge tag 'hyperv-fixes-signed-20250311' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Patches to fix Hyper-v framebuffer code (Michael Kelley and Saurabh Sengar) - Fix for Hyper-V output argument to hypercall that changes page visibility (Michael Kelley) - Fix for Hyper-V VTL mode (Naman Jain) * tag 'hyperv-fixes-signed-20250311' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio() x86/hyperv: Fix output argument to hypercall that changes page visibility fbdev: hyperv_fb: Allow graceful removal of framebuffer fbdev: hyperv_fb: Simplify hvfb_putmem fbdev: hyperv_fb: Fix hang in kdump kernel when on Hyper-V Gen 2 VMs drm/hyperv: Fix address space leak when Hyper-V DRM device is removed fbdev: hyperv_fb: iounmap() the correct memory when removing a device x86/hyperv/vtl: Stop kernel from probing VTL0 low memory
2025-03-11drm/i915: Increase I915_PARAM_MMAP_GTT_VERSION version to indicate support ↵José Roberto de Souza
for partial mmaps Commit 255fc1703e42 ("drm/i915/gem: Calculate object page offset for partial memory mapping") was the last patch of several patches fixing multiple partial mmaps. But without a bump in I915_PARAM_MMAP_GTT_VERSION there is no clean way for UMD to know if it can do multiple partial mmaps. Fixes: 255fc1703e42 ("drm/i915/gem: Calculate object page offset for partial memory mapping") Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306210827.171147-1-jose.souza@intel.com (cherry picked from commit bfef148f3680e6b9d28e7fca46d9520f80c5e50e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-11Merge tag 'samsung-clk-fixes-6.14' of ↵Stephen Boyd
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into clk-fixes Pull Samsung clk driver fixes from Krzysztof Kozlowski: - Google GS101: Fix synchronous external abort during system suspend. The driver access registers not available for OS, although issue would not be visible in earlier kernels due to missing suspend support. - Tesla FSD: Correct PLL142XX lock time * tag 'samsung-clk-fixes-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: clk: samsung: update PLL locktime for PLL142XX used on FSD platform clk: samsung: gs101: fix synchronous external abort in samsung_clk_save()
2025-03-11Merge tag 'pinctrl-v6.14-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix the regmap settings for bcm281xx, this was missing the stride - NULL check for the Nuvoton npcm8xx devm_kasprintf() - Enable the Spacemit pin controller by default in the SoC config. The SoC will not boot without it so this one is pretty much required * tag 'pinctrl-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: spacemit: enable config option pinctrl: nuvoton: npcm8xx: Add NULL check in npcm8xx_gpio_fw pinctrl: bcm281xx: Fix incorrect regmap max_registers value
2025-03-11platform/surface: aggregator_registry: Add Support for Surface Pro 11Lukas Hetzenecker
Add SAM client device nodes for the Surface Pro 11 (Intel). Like with the Surface Pro 10 already, the node group is compatible, so it can be reused. Signed-off-by: Lukas Hetzenecker <lukas@hetzenecker.me> Link: https://lore.kernel.org/r/20250310232803.23691-1-lukas@hetzenecker.me Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-11platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()Dan Carpenter
There are a few problems in this code: First, if amd_pmf_tee_init() fails then the function returns directly instead of cleaning up. We cannot simply do a "goto error;" because the amd_pmf_tee_init() cleanup calls tee_shm_free(dev->fw_shm_pool); and amd_pmf_tee_deinit() calls it as well leading to a double free. I have re-written this code to use an unwind ladder to free the allocations. Second, if amd_pmf_start_policy_engine() fails on every iteration though the loop then the code calls amd_pmf_tee_deinit() twice which is also a double free. Call amd_pmf_tee_deinit() inside the loop for each failed iteration. Also on that path the error codes are not necessarily negative kernel error codes. Set the error code to -EINVAL. There is a very subtle third bug which is that if the call to input_register_device() in amd_pmf_register_input_device() fails then we call input_unregister_device() on an input device that wasn't registered. This will lead to a reference counting underflow because of the device_del(&dev->dev) in __input_unregister_device(). It's unlikely that anyone would ever hit this bug in real life. Fixes: 376a8c2a1443 ("platform/x86/amd/pmf: Update PMF Driver for Compatibility with new PMF-TA") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/232231fc-6a71-495e-971b-be2a76f6db4c@stanley.mountain Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-11qlcnic: fix memory leak issues in qlcnic_sriov_common.cHaoxiang Li
Add qlcnic_sriov_free_vlans() in qlcnic_sriov_alloc_vlans() if any sriov_vlans fails to be allocated. Add qlcnic_sriov_free_vlans() to free the memory allocated by qlcnic_sriov_alloc_vlans() if "sriov->allowed_vlans" fails to be allocated. Fixes: 91b7282b613d ("qlcnic: Support VLAN id config.") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Link: https://patch.msgid.link/20250307094952.14874-1-haoxiang_li2024@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-11nvme: move error logging from nvme_end_req() to __nvme_end_req()Shin'ichiro Kawasaki
Before the Commit 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions"), blk_mq_add_to_batch() did not add failed passthrough requests to batch, and returned false. After the commit, blk_mq_add_to_batch() always adds passthrough requests to batch regardless of whether the request failed or not, and returns true. This affected error logging feature in the NVME driver. Before the commit, the call chain of failed passthrough request was as follows: nvme_handle_cqe() blk_mq_add_to_batch() .. false is returned, then call nvme_pci_complete_rq() nvme_pci_complete_rq() nvme_complete_rq() nvme_end_req() nvme_log_err_passthru() .. error logging __nvme_end_req() .. end of the rqeuest After the commit, the call chain is as follows: nvme_handle_cqe() blk_mq_add_to_batch() .. true is returned, then set nvme_pci_complete_batch() .. nvme_pci_complete_batch() nvme_complete_batch() nvme_complete_batch_req() __nvme_end_req() .. end of the request, without error logging To make the error logging feature work again for passthrough requests, move the nvme_log_err_passthru() call from nvme_end_req() to __nvme_end_req(). While at it, move nvme_log_error() call for non-passthrough requests together with nvme_log_err_passthru(). Even though the trigger commit does not affect non-passthrough requests, move it together for code simplicity. Fixes: 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250311104359.1767728-2-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-11regulator: dummy: force synchronous probingChristian Eggers
Sometimes I get a NULL pointer dereference at boot time in kobject_get() with the following call stack: anatop_regulator_probe() devm_regulator_register() regulator_register() regulator_resolve_supply() kobject_get() By placing some extra BUG_ON() statements I could verify that this is raised because probing of the 'dummy' regulator driver is not completed ('dummy_regulator_rdev' is still NULL). In the JTAG debugger I can see that dummy_regulator_probe() and anatop_regulator_probe() can be run by different kernel threads (kworker/u4:*). I haven't further investigated whether this can be changed or if there are other possibilities to force synchronization between these two probe routines. On the other hand I don't expect much boot time penalty by probing the 'dummy' regulator synchronously. Cc: stable@vger.kernel.org Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14") Signed-off-by: Christian Eggers <ceggers@arri.de> Link: https://patch.msgid.link/20250311091803.31026-1-ceggers@arri.de Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-11rtase: Fix improper release of ring list entries in rtase_sw_resetJustin Lai
Since rtase_init_ring, which is called within rtase_sw_reset, adds ring entries already present in the ring list back into the list, it causes the ring list to form a cycle. This results in list_for_each_entry_safe failing to find an endpoint during traversal, leading to an error. Therefore, it is necessary to remove the previously added ring_list nodes before calling rtase_init_ring. Fixes: 079600489960 ("rtase: Implement net_device_ops") Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306070510.18129-1-justinlai0215@realtek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-11bonding: fix incorrect MAC address setting to receive NS messagesHangbin Liu
When validation on the backup slave is enabled, we need to validate the Neighbor Solicitation (NS) messages received on the backup slave. To receive these messages, the correct destination MAC address must be added to the slave. However, the target in bonding is a unicast address, which we cannot use directly. Instead, we should first convert it to a Solicited-Node Multicast Address and then derive the corresponding MAC address. Fix the incorrect MAC address setting on both slave_set_ns_maddr() and slave_set_ns_maddrs(). Since the two function names are similar. Add some description for the functions. Also only use one mac_addr variable in slave_set_ns_maddr() to save some code and logic. Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device") Acked-by: Jay Vosburgh <jv@jvosburgh.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306023923.38777-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-11drm/dp_mst: Fix locking when skipping CSN before topology probingImre Deak
The handling of the MST Connection Status Notify message is skipped if the probing of the topology is still pending. Acquiring the drm_dp_mst_topology_mgr::probe_lock for this in drm_dp_mst_handle_up_req() is problematic: the task/work this function is called from is also responsible for handling MST down-request replies (in drm_dp_mst_handle_down_rep()). Thus drm_dp_mst_link_probe_work() - holding already probe_lock - could be blocked waiting for an MST down-request reply while drm_dp_mst_handle_up_req() is waiting for probe_lock while processing a CSN message. This leads to the probe work's down-request message timing out. A scenario similar to the above leading to a down-request timeout is handling a CSN message in drm_dp_mst_handle_conn_stat(), holding the probe_lock and sending down-request messages while a second CSN message sent by the sink subsequently is handled by drm_dp_mst_handle_up_req(). Fix the above by moving the logic to skip the CSN handling to drm_dp_mst_process_up_req(). This function is called from a work (separate from the task/work handling new up/down messages), already holding probe_lock. This solves the above timeout issue, since handling of down-request replies won't be blocked by probe_lock. Fixes: ddf983488c3e ("drm/dp_mst: Skip CSN if topology probing is not done yet") Cc: Wayne Lin <Wayne.Lin@amd.com> Cc: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org # v6.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307183152.3822170-1-imre.deak@intel.com
2025-03-10eth: bnxt: fix memory leak in queue resetTaehee Yoo
When the queue is reset, the bnxt_alloc_one_tpa_info() is called to allocate tpa_info for the new queue. And then the old queue's tpa_info should be removed by the bnxt_free_one_tpa_info(), but it is not called. So memory leak occurs. It adds the bnxt_free_one_tpa_info() in the bnxt_queue_mem_free(). unreferenced object 0xffff888293cc0000 (size 16384): comm "ncdevmem", pid 2076, jiffies 4296604081 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 40 75 78 93 82 88 ff ff ........@ux..... 40 75 78 93 02 00 00 00 00 00 00 00 00 00 00 00 @ux............. backtrace (crc 5d7d4798): ___kmalloc_large_node+0x10d/0x1b0 __kmalloc_large_node_noprof+0x17/0x60 __kmalloc_noprof+0x3f6/0x520 bnxt_alloc_one_tpa_info+0x5f/0x300 [bnxt_en] bnxt_queue_mem_alloc+0x8e8/0x14f0 [bnxt_en] netdev_rx_queue_restart+0x233/0x620 net_devmem_bind_dmabuf_to_queue+0x2a3/0x600 netdev_nl_bind_rx_doit+0xc00/0x10a0 genl_family_rcv_msg_doit+0x1d4/0x2b0 genl_rcv_msg+0x3fb/0x6c0 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x447/0x710 netlink_sendmsg+0x712/0xbc0 __sys_sendto+0x3fd/0x4d0 __x64_sys_sendto+0xdc/0x1b0 Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10eth: bnxt: fix kernel panic in the bnxt_get_queue_stats{rx | tx}Taehee Yoo
When qstats-get operation is executed, callbacks of netdev_stats_ops are called. The bnxt_get_queue_stats{rx | tx} collect per-queue stats from sw_stats in the rings. But {rx | tx | cp}_ring are allocated when the interface is up. So, these rings are not allocated when the interface is down. The qstats-get is allowed even if the interface is down. However, the bnxt_get_queue_stats{rx | tx}() accesses cp_ring and tx_ring without null check. So, it needs to avoid accessing rings if the interface is down. Reproducer: ip link set $interface down ./cli.py --spec netdev.yaml --dump qstats-get OR ip link set $interface down python ./stats.py Splat looks like: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1680fa067 P4D 1680fa067 PUD 16be3b067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1495 Comm: python3 Not tainted 6.14.0-rc4+ #32 5cd0f999d5a15c574ac72b3e4b907341 Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en] Code: c6 87 b5 18 00 00 02 eb a2 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 01 RSP: 0018:ffffabef43cdb7e0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffc04c8710 RCX: 0000000000000000 RDX: ffffabef43cdb858 RSI: 0000000000000000 RDI: ffff8d504e850000 RBP: ffff8d506c9f9c00 R08: 0000000000000004 R09: ffff8d506bcd901c R10: 0000000000000015 R11: ffff8d506bcd9000 R12: 0000000000000000 R13: ffffabef43cdb8c0 R14: ffff8d504e850000 R15: 0000000000000000 FS: 00007f2c5462b080(0000) GS:ffff8d575f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000167fd0000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x15a/0x460 ? sched_balance_find_src_group+0x58d/0xd10 ? exc_page_fault+0x6e/0x180 ? asm_exc_page_fault+0x22/0x30 ? bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en cdd546fd48563c280cfd30e9647efa420db07bf1] netdev_nl_stats_by_netdev+0x2b1/0x4e0 ? xas_load+0x9/0xb0 ? xas_find+0x183/0x1d0 ? xa_find+0x8b/0xe0 netdev_nl_qstats_get_dumpit+0xbf/0x1e0 genl_dumpit+0x31/0x90 netlink_dump+0x1a8/0x360 Fixes: af7b3b4adda5 ("eth: bnxt: support per-queue statistics") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250309134219.91670-6-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10eth: bnxt: do not update checksum in bnxt_xdp_build_skb()Taehee Yoo
The bnxt_rx_pkt() updates ip_summed value at the end if checksum offload is enabled. When the XDP-MB program is attached and it returns XDP_PASS, the bnxt_xdp_build_skb() is called to update skb_shared_info. The main purpose of bnxt_xdp_build_skb() is to update skb_shared_info, but it updates ip_summed value too if checksum offload is enabled. This is actually duplicate work. When the bnxt_rx_pkt() updates ip_summed value, it checks if ip_summed is CHECKSUM_NONE or not. It means that ip_summed should be CHECKSUM_NONE at this moment. But ip_summed may already be updated to CHECKSUM_UNNECESSARY in the XDP-MB-PASS path. So the by skb_checksum_none_assert() WARNS about it. This is duplicate work and updating ip_summed in the bnxt_xdp_build_skb() is not needed. Splat looks like: WARNING: CPU: 3 PID: 5782 at ./include/linux/skbuff.h:5155 bnxt_rx_pkt+0x479b/0x7610 [bnxt_en] Modules linked in: bnxt_re bnxt_en rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_] CPU: 3 UID: 0 PID: 5782 Comm: socat Tainted: G W 6.14.0-rc4+ #27 Tainted: [W]=WARN Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_rx_pkt+0x479b/0x7610 [bnxt_en] Code: 54 24 0c 4c 89 f1 4c 89 ff c1 ea 1f ff d3 0f 1f 00 49 89 c6 48 85 c0 0f 84 4c e5 ff ff 48 89 c7 e8 ca 3d a0 c8 e9 8f f4 ff ff <0f> 0b f RSP: 0018:ffff88881ba09928 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000c7590303 RCX: 0000000000000000 RDX: 1ffff1104e7d1610 RSI: 0000000000000001 RDI: ffff8881c91300b8 RBP: ffff88881ba09b28 R08: ffff888273e8b0d0 R09: ffff888273e8b070 R10: ffff888273e8b010 R11: ffff888278b0f000 R12: ffff888273e8b080 R13: ffff8881c9130e00 R14: ffff8881505d3800 R15: ffff888273e8b000 FS: 00007f5a2e7be080(0000) GS:ffff88881ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff2e708ff8 CR3: 000000013e3b0000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ? __warn+0xcd/0x2f0 ? bnxt_rx_pkt+0x479b/0x7610 ? report_bug+0x326/0x3c0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x14/0x50 ? asm_exc_invalid_op+0x16/0x20 ? bnxt_rx_pkt+0x479b/0x7610 ? bnxt_rx_pkt+0x3e41/0x7610 ? __pfx_bnxt_rx_pkt+0x10/0x10 ? napi_complete_done+0x2cf/0x7d0 __bnxt_poll_work+0x4e8/0x1220 ? __pfx___bnxt_poll_work+0x10/0x10 ? __pfx_mark_lock.part.0+0x10/0x10 bnxt_poll_p5+0x36a/0xfa0 ? __pfx_bnxt_poll_p5+0x10/0x10 __napi_poll.constprop.0+0xa0/0x440 net_rx_action+0x899/0xd00 ... Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be able to reproduce this issue. Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250309134219.91670-5-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logicTaehee Yoo
When a queue is restarted, it sets MRU to 0 for stopping packet flow. MRU variable is a member of vnic_info[], the first vnic_info is default and the second is ntuple. Only when ntuple is enabled(ethtool -K eth0 ntuple on), vnic_info for ntuple is allocated in init logic. The bp->nr_vnics indicates how many vnic_info are allocated. However bnxt_queue_{start | stop}() accesses vnic_info[BNXT_VNIC_NTUPLE] regardless of ntuple state. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Fixes: b9d2956e869c ("bnxt_en: stop packet flow during bnxt_queue_stop/start") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-4-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10eth: bnxt: return fail if interface is down in bnxt_queue_mem_alloc()Taehee Yoo
The bnxt_queue_mem_alloc() is called to allocate new queue memory when a queue is restarted. It internally accesses rx buffer descriptor corresponding to the index. The rx buffer descriptor is allocated and set when the interface is up and it's freed when the interface is down. So, if queue is restarted if interface is down, kernel panic occurs. Splat looks like: BUG: unable to handle page fault for address: 000000000000b240 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 UID: 0 PID: 1563 Comm: ncdevmem2 Not tainted 6.14.0-rc2+ #9 844ddba6e7c459cafd0bf4db9a3198e Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en] Code: 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 89 f5 53 48 89 fb 4c 8d b5 40 05 00 00 48 83 ec 15 RSP: 0018:ffff9dcc83fef9e8 EFLAGS: 00010202 RAX: ffffffffc0457720 RBX: ffff934ed8d40000 RCX: 0000000000000000 RDX: 000000000000001f RSI: ffff934ea508f800 RDI: ffff934ea508f808 RBP: ffff934ea508f800 R08: 000000000000b240 R09: ffff934e84f4b000 R10: ffff9dcc83fefa30 R11: ffff934e84f4b000 R12: 000000000000001f R13: ffff934ed8d40ac0 R14: ffff934ea508fd40 R15: ffff934e84f4b000 FS: 00007fa73888c740(0000) GS:ffff93559f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000b240 CR3: 0000000145a2e000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x15a/0x460 ? exc_page_fault+0x6e/0x180 ? asm_exc_page_fault+0x22/0x30 ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7] ? bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7] netdev_rx_queue_restart+0xc5/0x240 net_devmem_bind_dmabuf_to_queue+0xf8/0x200 netdev_nl_bind_rx_doit+0x3a7/0x450 genl_family_rcv_msg_doit+0xd9/0x130 genl_rcv_msg+0x184/0x2b0 ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 ... Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250309134219.91670-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10eth: bnxt: fix truesize for mb-xdp-pass caseTaehee Yoo
When mb-xdp is set and return is XDP_PASS, packet is converted from xdp_buff to sk_buff with xdp_update_skb_shared_info() in bnxt_xdp_build_skb(). bnxt_xdp_build_skb() passes incorrect truesize argument to xdp_update_skb_shared_info(). The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags but the skb_shared_info was wiped by napi_build_skb() before. So it stores sinfo->nr_frags before bnxt_xdp_build_skb() and use it instead of getting skb_shared_info from xdp_get_shared_info_from_buff(). Splat looks like: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at net/core/skbuff.c:6072 skb_try_coalesce+0x504/0x590 Modules linked in: xt_nat xt_tcpudp veth af_packet xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_coms CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.14.0-rc2+ #3 RIP: 0010:skb_try_coalesce+0x504/0x590 Code: 4b fd ff ff 49 8b 34 24 40 80 e6 40 0f 84 3d fd ff ff 49 8b 74 24 48 40 f6 c6 01 0f 84 2e fd ff ff 48 8d 4e ff e9 25 fd ff ff <0f> 0b e99 RSP: 0018:ffffb62c4120caa8 EFLAGS: 00010287 RAX: 0000000000000003 RBX: ffffb62c4120cb14 RCX: 0000000000000ec0 RDX: 0000000000001000 RSI: ffffa06e5d7dc000 RDI: 0000000000000003 RBP: ffffa06e5d7ddec0 R08: ffffa06e6120a800 R09: ffffa06e7a119900 R10: 0000000000002310 R11: ffffa06e5d7dcec0 R12: ffffe4360575f740 R13: ffffe43600000000 R14: 0000000000000002 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffffa0755f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f147b76b0f8 CR3: 00000001615d4000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ? __warn+0x84/0x130 ? skb_try_coalesce+0x504/0x590 ? report_bug+0x18a/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? skb_try_coalesce+0x504/0x590 inet_frag_reasm_finish+0x11f/0x2e0 ip_defrag+0x37a/0x900 ip_local_deliver+0x51/0x120 ip_sublist_rcv_finish+0x64/0x70 ip_sublist_rcv+0x179/0x210 ip_list_rcv+0xf9/0x130 How to reproduce: <Node A> ip link set $interface1 xdp obj xdp_pass.o ip link set $interface1 mtu 9000 up ip a a 10.0.0.1/24 dev $interface1 <Node B> ip link set $interfac2 mtu 9000 up ip a a 10.0.0.2/24 dev $interface2 ping 10.0.0.1 -s 65000 Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be able to reproduce this issue. Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-2-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10net: usb: lan78xx: Sanitize return values of register read/write functionsOleksij Rempel
usb_control_msg() returns the number of transferred bytes or a negative error code. The current implementation propagates the transferred byte count, which is unintended. This affects code paths that assume a boolean success/failure check, such as the EEPROM detection logic. Fix this by ensuring lan78xx_read_reg() and lan78xx_write_reg() return only 0 on success and preserve negative error codes. This approach is consistent with existing usage, as the transferred byte count is not explicitly checked elsewhere. Fixes: 8b1b2ca83b20 ("net: usb: lan78xx: Improve error handling in EEPROM and OTP operations") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/ac965de8-f320-430f-80f6-b16f4e1ba06d@sirena.org.uk Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20250307101223.3025632-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>