summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-25smb: client: let smbd_post_send_iter() respect the peers max_send_size and ↵Stefan Metzmacher
transmit all data We should not send smbdirect_data_transfer messages larger than the negotiated max_send_size, typically 1364 bytes, which means 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. This happened when doing an SMB2 write with more than 1340 bytes (which is done inline as it's below rdma_readwrite_threshold). It means the peer resets the connection. When testing between cifs.ko and ksmbd.ko something like this is logged: client: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: \\carina Send error in SessSetup = -11 smb2_reconnect: 12 callbacks suppressed CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536 and: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: smbd_recv:1894 disconnected siw: got TERMINATE. layer 1, type 2, code 2 The ksmbd dmesg is showing things like: smb_direct: Recv error. status='local length error (1)' opcode=128 smb_direct: disconnected smb_direct: Recv error. status='local length error (1)' opcode=128 ksmbd: smb_direct: disconnected ksmbd: sock_read failed: -107 As smbd_post_send_iter() limits the transmitted number of bytes we need loop over it in order to transmit the whole iter. Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Meetakshi Setiya <msetiya@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: <stable+noautosel@kernel.org> # sp->max_send_size should be info->max_send_size in backports Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-25drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector typeJayesh Choudhary
By default, HPD was disabled on SN65DSI86 bridge. When the driver was added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable call which was moved to other function calls subsequently. Later on, commit "c312b0df3b13" added detect utility for DP mode. But with HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced state always return 1 (always connected state). Set HPD_DISABLE bit conditionally based on display sink's connector type. Since the HPD_STATE is reflected correctly only after waiting for debounce time (~100-400ms) and adding this delay in detect() is not feasible owing to the performace impact (glitches and frame drop), remove runtime calls in detect() and add hpd_enable()/disable() bridge hooks with runtime calls, to detect hpd properly without any delay. [0]: <https://www.ti.com/lit/gpn/SN65DSI86> (Pg. 32) Fixes: c312b0df3b13 ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP") Cc: Max Krummenacher <max.krummenacher@toradex.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com
2025-06-25EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMsAvadhut Naik
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based SOCs has an Address Mask and a Secondary Address Mask register associated with it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity during init using these two registers. Currently, the module primarily considers only the Address Mask register for computing DIMM sizes. The Secondary Address Mask register is only considered for odd CS. Additionally, if it has been considered, the Address Mask register is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose total capacity is a power of two (32GB, 64GB, etc), this is not an issue since only the Address Mask register is used. For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of two (48GB, 96GB, etc), however, the Secondary Address Mask register is used in conjunction with the Address Mask register. However, since the module only considers either of the two registers for a CS, the size computed by the module is incorrect. The Secondary Address Mask register is not considered for even CS, and the Address Mask register is not considered for odd CS. Introduce a new helper function so that both Address Mask and Secondary Address Mask registers are considered, when valid, for computing DIMM sizes. Furthermore, also rename some variables for greater clarity. Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs") Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com
2025-06-25KVM: SVM: Add missing member in SNP_LAUNCH_START command structureNikunj A Dadhania
The sev_data_snp_launch_start structure should include a 4-byte desired_tsc_khz field before the gosvw field, which was missed in the initial implementation. As a result, the structure is 4 bytes shorter than expected by the firmware, causing the gosvw field to start 4 bytes early. Fix this by adding the missing 4-byte member for the desired TSC frequency. Fixes: 3a45dc2b419e ("crypto: ccp: Define the SEV-SNP commands") Cc: stable@vger.kernel.org Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Vaishali Thakkar <vaishali.thakkar@suse.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250408093213.57962-3-nikunj@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25io_uring: fix resource leak in io_import_dmabuf()Penglei Jiang
Replace the return statement with setting ret = -EINVAL and jumping to the err label to ensure resources are released via io_release_dmabuf. Fixes: a5c98e942457 ("io_uring/zcrx: dmabuf backed zerocopy receive") Signed-off-by: Penglei Jiang <superman.xpt@gmail.com> Link: https://lore.kernel.org/r/20250625102703.68336-1-superman.xpt@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-25Documentation: KVM: Fix unexpected unindent warningsBinbin Wu
Add proper indentations to bullet list items to resolve the warning: "Bullet list ends without a blank line; unexpected unindent." Closes:https://lore.kernel.org/kvm/20250623162110.6e2f4241@canb.auug.org.au/ Fixes: cf207eac06f6 ("KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>") Fixes: 25e8b1dd4883 ("KVM: TDX: Exit to userspace for GetTdVmCallInfo") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20250625014829.82289-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25ata: ahci: Use correct DMI identifier for ASUSPRO-D840SA LPM quirkNiklas Cassel
ASUS store the board name in DMI_PRODUCT_NAME rather than DMI_PRODUCT_VERSION. (Apparently it is only Lenovo that stores the model-name in DMI_PRODUCT_VERSION.) Use the correct DMI identifier, DMI_PRODUCT_NAME, to match the ASUSPRO-D840SA board, such that the quirk actually gets applied. Cc: stable@vger.kernel.org Reported-by: Andy Yang <andyybtc79@gmail.com> Tested-by: Andy Yang <andyybtc79@gmail.com> Closes: https://lore.kernel.org/linux-ide/aFb3wXAwJSSJUB7o@ryzen/ Fixes: b5acc3628898 ("ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard") Reviewed-by: Hans de Goede <hansg@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250624074029.963028-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-06-25mtk-sd: reset host->mrq on prepare_data() errorSergey Senozhatsky
Do not leave host with dangling ->mrq pointer if we hit the msdc_prepare_data() error out path. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: f5de469990f1 ("mtk-sd: Prevent memory corruption from DMA map failure") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250625052106.584905-1-senozhatsky@chromium.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-06-25platform/mellanox: mlxbf-pmc: Fix duplicate event ID for CACHE_DATA1Alok Tiwari
same ID (103) was assigned to both GDC_BANK0_G_RSE_PIPE_CACHE_DATA0 and GDC_BANK0_G_RSE_PIPE_CACHE_DATA1. This could lead to incorrect event mapping. Updated the ID to 104 to ensure uniqueness. Fixes: 423c3361855c ("platform/mellanox: mlxbf-pmc: Add support for BlueField-3") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20250619060502.3594350-1-alok.a.tiwari@oracle.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-25platform/x86: thinkpad_acpi: handle HKEY 0x1402 eventMark Pearson
2025 Thinkpads F11 key launch the Intel Unison app on Windows, which does some sort of smart sharing between laptop and phone. Map this key event to KEY_LINK_PHONE as the closest thing we have. This prevents an error message being displayed on key press. Reported-by: Damjan Georgievski <gdamjan@gmail.com> Closes: https://sourceforge.net/p/ibm-acpi/mailman/message/59189556/ Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250620181119.2519546-1-mpearson-lenovo@squebb.ca [ij: converted directory to pre-lenovo move as this is fixes material.] Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-25platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8406CARahul Chandra
Add a DMI quirk entry for the ASUS Zenbook Duo UX8406CA 2025 model to use the existing zenbook duo keyboard quirk. Signed-off-by: Rahul Chandra <rahul@chandra.net> Link: https://lore.kernel.org/r/20250624073301.602070-1-rahul@chandra.net Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-25platform/x86: dell-lis3lv02d: Add Latitude 5500Paul Menzel
Add 0x29 as the accelerometer address for the Dell Latitude 5500 to lis3lv02d_devices[]. The address was verified as below: $ cd /sys/bus/pci/drivers/i801_smbus/0000:00:1f.4 $ ls -d i2c-? i2c-2 $ sudo modprobe i2c-dev $ sudo i2cdetect 2 WARNING! This program can confuse your I2C bus, cause data loss and worse! I will probe file /dev/i2c-2. I will probe address range 0x08-0x77. Continue? [Y/n] Y 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: 08 -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- 29 -- -- -- -- -- -- 30: 30 -- -- -- -- 35 UU UU -- -- -- -- -- -- -- -- 40: -- -- -- -- 44 -- -- -- -- -- -- -- -- -- -- -- 50: UU -- 52 -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- $ echo lis3lv02d 0x29 | sudo tee /sys/bus/i2c/devices/i2c-2/new_device lis3lv02d 0x29 $ sudo dmesg [ 0.000000] Linux version 6.12.32-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.32-1 (2025-06-07) […] [ 0.000000] DMI: Dell Inc. Latitude 5500/0M14W7, BIOS 1.38.0 03/06/2025 […] [ 609.063488] i2c_dev: i2c /dev entries driver [ 639.135020] i2c i2c-2: new_device: Instantiated device lis3lv02d at 0x29 Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250622080721.4661-1-pmenzel@molgen.mpg.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-25Merge tag 'iwlwifi-fixes-2025-06-25' of ↵Johannes Berg
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi-fixes: fix failure in interface up ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25RDMA/mlx5: Fix vport loopback for MPV devicePatrisious Haddad
Always enable vport loopback for both MPV devices on driver start. Previously in some cases related to MPV RoCE, packets weren't correctly executing loopback check at vport in FW, since it was disabled. Due to complexity of identifying such cases for MPV always enable vport loopback for both GVMIs when binding the slave to the master port. Fixes: 0042f9e458a5 ("RDMA/mlx5: Enable vport loopback when user context or QP mandate") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/d4298f5ebb2197459e9e7221c51ecd6a34699847.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-06-25RDMA/mlx5: Fix CC counters query for MPVPatrisious Haddad
In case, CC counters are querying for the second port use the correct core device for the query instead of always using the master core device. Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/9cace74dcf106116118bebfa9146d40d4166c6b0.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-06-25RDMA/mlx5: Fix HW counters query for non-representor devicesPatrisious Haddad
To get the device HW counters, a non-representor switchdev device should use the mlx5_ib_query_q_counters() function and query all of the available counters. While a representor device in switchdev mode should use the mlx5_ib_query_q_counters_vport() function and query only the Q_Counters without the PPCNT counters and congestion control counters, since they aren't relevant for a representor device. Currently a non-representor switchdev device skips querying the PPCNT counters and congestion control counters, leaving them unupdated. Fix that by properly querying those counters for non-representor devices. Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/56bf8af4ca8c58e3fb9f7e47b1dca2009eeeed81.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-06-25IB/core: Annotate umem_mutex acquisition under fs_reclaim for lockdepOr Har-Toov
Following the fix in the previous commit ("IB/mlx5: Fix potential deadlock in MR deregistration"), teach lockdep explicitly about the locking order between fs_reclaim and umem_mutex. The previous commit resolved a potential deadlock scenario where kzalloc(GFP_KERNEL) was called while holding umem_mutex, which could lead to reclaim and eventually invoke the MMU notifier (mlx5_ib_invalidate_range()), causing a recursive acquisition of umem_mutex. To prevent such issues from reoccurring unnoticed in future code changes, add a lockdep annotation in ib_init_umem_odp() that simulates taking umem_mutex inside a reclaim context. This makes lockdep aware of this locking dependency and ensures that future violations—such as calling kzalloc() or any memory allocator that may enter reclaim while holding umem_mutex—will immediately raise a lockdep warning. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/9d31b9d8fe1db648a9f47cec3df6b8463319dee5.1750061698.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-06-25IB/mlx5: Fix potential deadlock in MR deregistrationOr Har-Toov
The issue arises when kzalloc() is invoked while holding umem_mutex or any other lock acquired under umem_mutex. This is problematic because kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke mmu_notifier_invalidate_range_start(). This function can lead to mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again, resulting in a deadlock. The problematic flow: CPU0 | CPU1 ---------------------------------------|------------------------------------------------ mlx5_ib_dereg_mr() | → revoke_mr() | → mutex_lock(&umem_odp->umem_mutex) | | mlx5_mkey_cache_init() | → mutex_lock(&dev->cache.rb_lock) | → mlx5r_cache_create_ent_locked() | → kzalloc(GFP_KERNEL) | → fs_reclaim() | → mmu_notifier_invalidate_range_start() | → mlx5_ib_invalidate_range() | → mutex_lock(&umem_odp->umem_mutex) → cache_ent_find_and_store() | → mutex_lock(&dev->cache.rb_lock) | Additionally, when kzalloc() is called from within cache_ent_find_and_store(), we encounter the same deadlock due to re-acquisition of umem_mutex. Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr() and before acquiring rb_lock. This ensures that we don't hold umem_mutex while performing memory allocations that could trigger the reclaim path. This change prevents the deadlock by ensuring proper lock ordering and avoiding holding locks during memory allocation operations that could trigger the reclaim path. The following lockdep warning demonstrates the deadlock: python3/20557 is trying to acquire lock: ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at: mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] but task is already holding lock: ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x7b/0x1a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x60/0xd0 mem_cgroup_css_alloc+0x6f/0x9b0 cgroup_init_subsys+0xa4/0x240 cgroup_init+0x1c8/0x510 start_kernel+0x747/0x760 x86_64_start_reservations+0x25/0x30 x86_64_start_kernel+0x73/0x80 common_startup_64+0x129/0x138 -> #2 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x91/0xd0 __kmalloc_cache_noprof+0x4d/0x4c0 mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib] mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib] mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib] __mlx5_ib_add+0x4b/0x190 [mlx5_ib] mlx5r_probe+0xd9/0x320 [mlx5_ib] auxiliary_bus_probe+0x42/0x70 really_probe+0xdb/0x360 __driver_probe_device+0x8f/0x130 driver_probe_device+0x1f/0xb0 __driver_attach+0xd4/0x1f0 bus_for_each_dev+0x79/0xd0 bus_add_driver+0xf0/0x200 driver_register+0x6e/0xc0 __auxiliary_driver_register+0x6a/0xc0 do_one_initcall+0x5e/0x390 do_init_module+0x88/0x240 init_module_from_file+0x85/0xc0 idempotent_init_module+0x104/0x300 __x64_sys_finit_module+0x68/0xc0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (&dev->cache.rb_lock){+.+.}-{4:4}: __mutex_lock+0x98/0xf10 __mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib] mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib] ib_dereg_mr_user+0x85/0x1f0 [ib_core] uverbs_free_mr+0x19/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x21/0x80 [ib_uverbs] uverbs_destroy_uobject+0x60/0x3d0 [ib_uverbs] uobj_destroy+0x57/0xa0 [ib_uverbs] ib_uverbs_cmd_verbs+0x4d5/0x1210 [ib_uverbs] ib_uverbs_ioctl+0x129/0x230 [ib_uverbs] __x64_sys_ioctl+0x596/0xaa0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&umem_odp->umem_mutex){+.+.}-{4:4}: __lock_acquire+0x1826/0x2f00 lock_acquire+0xd3/0x2e0 __mutex_lock+0x98/0xf10 mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x18e/0x1f0 unmap_vmas+0x182/0x1a0 exit_mmap+0xf3/0x4a0 mmput+0x3a/0x100 do_exit+0x2b9/0xa90 do_group_exit+0x32/0xa0 get_signal+0xc32/0xcb0 arch_do_signal_or_restart+0x29/0x1d0 syscall_exit_to_user_mode+0x105/0x1d0 do_syscall_64+0x79/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Chain exists of: &dev->cache.rb_lock --> mmu_notifier_invalidate_range_start --> &umem_odp->umem_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&umem_odp->umem_mutex); lock(mmu_notifier_invalidate_range_start); lock(&umem_odp->umem_mutex); lock(&dev->cache.rb_lock); *** DEADLOCK *** Fixes: abb604a1a9c8 ("RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/3c8f225a8a9fade647d19b014df1172544643e4a.1750061612.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-06-25um: vector: Reduce stack usage in vector_eth_configure()Tiwei Bie
When compiling with clang (19.1.7), initializing *vp using a compound literal may result in excessive stack usage. Fix it by initializing the required fields of *vp individually. Without this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:1472 ... With this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:208 ... Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506221017.WtB7Usua-lkp@intel.com/ Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250623110829.314864-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: Use correct data source in fpregs_legacy_set()Tiwei Bie
Read from the buffer pointed to by 'from' instead of '&buf', as 'buf' contains no valid data when 'ubuf' is NULL. Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: vfio: Prevent duplicate device assignmentsTiwei Bie
Ensure devices are assigned only once. Reject subsequent requests for duplicate assignments. Fixes: a0e2cb6a9063 ("um: Add VFIO-based virtual PCI driver") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-4-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: ubd: Add missing error check in start_io_thread()Tiwei Bie
The subsequent call to os_set_fd_block() overwrites the previous return value. OR the two return values together to fix it. Fixes: f88f0bdfc32f ("um: UBD Improvements") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25drm/i915: fix build error some moreArnd Bergmann
An earlier patch fixed a build failure with clang, but I still see the same problem with some configurations using gcc: drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask': include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON' 116 | BUILD_BUG_ON(bit > As I understand it, the problem is that the function is not always fully inlined, but the __builtin_constant_p() can still evaluate the argument as being constant. Marking it as __always_inline so far works for me in all configurations. Fixes: a7137b1825b5 ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled") Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit ef69f9dd1cd7301cdf04ba326ed28152a3affcf6) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-25ALSA: usb: qcom: fix NULL pointer dereference in qmi_stop_sessionPei Xiao
The find_substream() call may return NULL, but the error path dereferenced 'subs' unconditionally via dev_err(&subs->dev->dev, ...), causing a NULL pointer dereference when subs is NULL. Fix by switching to &uadev[idx].udev->dev which is always valid in this context. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Link: https://patch.msgid.link/86ac2939273ac853535049e60391c09d7688714e.1750755508.git.xiaopei01@kylinos.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-24io_uring: don't assume uaddr alignment in io_vec_fill_bvecPavel Begunkov
There is no guaranteed alignment for user pointers. Don't use mask trickery and adjust the offset by bv_offset. Cc: stable@vger.kernel.org Reported-by: David Hildenbrand <david@redhat.com> Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/io-uring/19530391f5c361a026ac9b401ff8e123bde55d98.1750771718.git.asml.silence@gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24io_uring/rsrc: don't rely on user vaddr alignmentPavel Begunkov
There is no guaranteed alignment for user pointers, however the calculation of an offset of the first page into a folio after coalescing uses some weird bit mask logic, get rid of it. Cc: stable@vger.kernel.org Reported-by: David Hildenbrand <david@redhat.com> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/io-uring/e387b4c78b33f231105a601d84eefd8301f57954.1750771718.git.asml.silence@gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24io_uring/rsrc: fix folio unpinningPavel Begunkov
syzbot complains about an unmapping failure: [ 108.070381][ T14] kernel BUG at mm/gup.c:71! [ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025 [ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work [ 108.174205][ T14] Call trace: [ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P) [ 108.178138][ T14] unpin_user_page+0x80/0x10c [ 108.180189][ T14] io_release_ubuf+0x84/0xf8 [ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c [ 108.184345][ T14] io_rsrc_data_free+0x148/0x298 [ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0 [ 108.188991][ T14] io_ring_ctx_free+0x48/0x480 [ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8 [ 108.193207][ T14] process_one_work+0x7e8/0x155c [ 108.195431][ T14] worker_thread+0x958/0xed8 [ 108.197561][ T14] kthread+0x5fc/0x75c [ 108.199362][ T14] ret_from_fork+0x10/0x20 We can pin a tail page of a folio, but then io_uring will try to unpin the head page of the folio. While it should be fine in terms of keeping the page actually alive, mm folks say it's wrong and triggers a debug warning. Use unpin_user_folio() instead of unpin_user_page*. Cc: stable@vger.kernel.org Debugged-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/io-uring/a28b0f87339ac2acf14a645dad1e95bbcbf18acd.1750771718.git.asml.silence@gmail.com/ [axboe: adapt to current tree, massage commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24ublk: setup ublk_io correctly in case of ublk_get_data() failureMing Lei
If ublk_get_data() fails, -EIOCBQUEUED is returned and the current command becomes ASYNC. And the only reason is that mapping data can't move on, because of no enough pages or pending signal, then the current ublk request has to be requeued. Once the request need to be requeued, we have to setup `ublk_io` correctly, including io->cmd and flags, otherwise the request may not be forwarded to ublk server successfully. Fixes: 9810362a57cb ("ublk: don't call ublk_dispatch_req() for NEED_GET_DATA") Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/linux-block/CAGVVp+VN9QcpHUz_0nasFf5q9i1gi8H8j-G-6mkBoqa3TyjRHA@mail.gmail.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Link: https://lore.kernel.org/r/20250624104121.859519-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI headerCaleb Sander Mateos
UBLK_F_SUPPORT_ZERO_COPY has a very old comment describing the initial idea for how zero-copy would be implemented. The actual implementation added in commit 1f6540e2aabb ("ublk: zc register/unregister bvec") uses io_uring registered buffers rather than shared memory mapping. Remove the inaccurate remarks about mapping ublk request memory into the ublk server's address space and requiring 4K block size. Replace them with a description of the current zero-copy mechanism. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250621171015.354932-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24ublk: fix narrowing warnings in UAPI headerCaleb Sander Mateos
When a C++ file compiled with -Wc++11-narrowing includes the UAPI header linux/ublk_cmd.h, ublk_sqe_addr_to_auto_buf_reg()'s assignments of u64 values to u8, u16, and u32 fields result in compiler warnings. Add explicit casts to the intended types to avoid these warnings. Drop the unnecessary bitmasks. Reported-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250621162842.337452-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24selftests: ublk: don't take same backing file for more than one ublk devicesMing Lei
Don't use same backing file for more than one ublk devices, and avoid concurrent write on same file from more ublk disks. Fixes: 8ccebc19ee3d ("selftests: ublk: support UBLK_F_AUTO_BUF_REG") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250623011934.741788-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24ublk: build batch from IOs in same io_ring_ctx and io taskMing Lei
ublk_queue_cmd_list() dispatches the whole batch list by scheduling task work via the tail request's io_uring_cmd, this way is fine even though more than one io_ring_ctx are involved for this batch since it is just one running context. However, the task work handler ublk_cmd_list_tw_cb() takes `issue_flags` of tail uring_cmd's io_ring_ctx for completing all commands. This way is wrong if any uring_cmd is issued from different io_ring_ctx. Fixes it by always building batch IOs from same io_ring_ctx and io task because ublk_dispatch_req() does validate task context, and IO needs to be aborted in case of running from fallback task work context. For typical per-queue or per-io daemon implementation, this way shouldn't make difference from performance viewpoint, because single io_ring_ctx is taken in each daemon for normal use case. Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250625022554.883571-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-24scsi: ufs: core: Fix spelling of a sysfs attribute nameBart Van Assche
Change "resourse" into "resource" in the name of a sysfs attribute. Fixes: d829fc8a1058 ("scsi: ufs: sysfs: unit descriptor") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250624181658.336035-1-bvanassche@acm.org Reviewed-by: Avri Altman <avri.altman@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-24scsi: core: Enforce unlimited max_segment_size when virt_boundary_mask is setChristoph Hellwig
The virt_boundary_mask limit requires an unlimited max_segment_size for bio splitting to not corrupt data. Historically, the block layer tried to validate this, although the check was half-hearted until the addition of the atomic queue limits API. The full blown check then triggered issues with stacked devices incorrectly inheriting limits such as the virt boundary and got disabled in commit b561ea56a264 ("block: allow device to have both virt_boundary_mask and max segment size") instead of fixing the issue properly. Ensure that the SCSI mid layer doesn't set the default low max_segment_size limit for this case, and check for invalid max_segment_size values in the host template, similar to the original block layer check given that SCSI devices can't be stacked. This fixes reported data corruption on storvsc, although as far as I can tell storvsc always failed to properly set the max_segment_size limit as the SCSI APIs historically applied that when setting up the host, while storvsc only set the virt_boundary_mask when configuring the scsi_device. Fixes: 81988a0e6b03 ("storvsc: get rid of bounce buffer") Fixes: b561ea56a264 ("block: allow device to have both virt_boundary_mask and max segment size") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250624125233.219635-3-hch@lst.de Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-24scsi: RDMA/srp: Don't set a max_segment_size when virt_boundary_mask is setChristoph Hellwig
virt_boundary_mask implies an unlimited max_segment_size. Setting both can lead to data corruption because __blk_rq_map_sg() can split requests so that the virt_boundary_mask is not respected if max_segment_size is not UINT_MAX. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250624125233.219635-2-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-24scsi: sd: Fix VPD page 0xb7 length checkjackysliu
sd_read_block_limits_ext() currently assumes that vpd->len excludes the size of the page header. However, vpd->len describes the size of the entire VPD page, therefore the sanity check is incorrect. In practice this is not really a problem since we don't attach VPD pages unless they actually report data trailing the header. But fix the length check regardless. This issue was identified by Wukong-Agent (formerly Tencent Woodpecker), a code security AI agent, through static code analysis. [mkp: rewrote patch description] Signed-off-by: jackysliu <1972843537@qq.com> Link: https://lore.kernel.org/r/tencent_ADA5210D1317EEB6CD7F3DE9FE9DA4591D05@qq.com Fixes: 96b171d6dba6 ("scsi: core: Query the Block Limits Extension VPD page") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-24bnxt: properly flush XDP redirect listsYan Zhai
We encountered following crash when testing a XDP_REDIRECT feature in production: [56251.579676] list_add corruption. next->prev should be prev (ffff93120dd40f30), but was ffffb301ef3a6740. (next=ffff93120dd 40f30). [56251.601413] ------------[ cut here ]------------ [56251.611357] kernel BUG at lib/list_debug.c:29! [56251.621082] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [56251.632073] CPU: 111 UID: 0 PID: 0 Comm: swapper/111 Kdump: loaded Tainted: P O 6.12.33-cloudflare-2025.6. 3 #1 [56251.653155] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE [56251.663877] Hardware name: MiTAC GC68B-B8032-G11P6-GPU/S8032GM-HE-CFR, BIOS V7.020.B10-sig 01/22/2025 [56251.682626] RIP: 0010:__list_add_valid_or_report+0x4b/0xa0 [56251.693203] Code: 0e 48 c7 c7 68 e7 d9 97 e8 42 16 fe ff 0f 0b 48 8b 52 08 48 39 c2 74 14 48 89 f1 48 c7 c7 90 e7 d9 97 48 89 c6 e8 25 16 fe ff <0f> 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 e8 e7 d9 97 4c 89 [56251.725811] RSP: 0018:ffff93120dd40b80 EFLAGS: 00010246 [56251.736094] RAX: 0000000000000075 RBX: ffffb301e6bba9d8 RCX: 0000000000000000 [56251.748260] RDX: 0000000000000000 RSI: ffff9149afda0b80 RDI: ffff9149afda0b80 [56251.760349] RBP: ffff9131e49c8000 R08: 0000000000000000 R09: ffff93120dd40a18 [56251.772382] R10: ffff9159cf2ce1a8 R11: 0000000000000003 R12: ffff911a80850000 [56251.784364] R13: ffff93120fbc7000 R14: 0000000000000010 R15: ffff9139e7510e40 [56251.796278] FS: 0000000000000000(0000) GS:ffff9149afd80000(0000) knlGS:0000000000000000 [56251.809133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [56251.819561] CR2: 00007f5e85e6f300 CR3: 00000038b85e2006 CR4: 0000000000770ef0 [56251.831365] PKRU: 55555554 [56251.838653] Call Trace: [56251.845560] <IRQ> [56251.851943] cpu_map_enqueue.cold+0x5/0xa [56251.860243] xdp_do_redirect+0x2d9/0x480 [56251.868388] bnxt_rx_xdp+0x1d8/0x4c0 [bnxt_en] [56251.877028] bnxt_rx_pkt+0x5f7/0x19b0 [bnxt_en] [56251.885665] ? cpu_max_write+0x1e/0x100 [56251.893510] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.902276] __bnxt_poll_work+0x190/0x340 [bnxt_en] [56251.911058] bnxt_poll+0xab/0x1b0 [bnxt_en] [56251.919041] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.927568] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.935958] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.944250] __napi_poll+0x2b/0x160 [56251.951155] bpf_trampoline_6442548651+0x79/0x123 [56251.959262] __napi_poll+0x5/0x160 [56251.966037] net_rx_action+0x3d2/0x880 [56251.973133] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.981265] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.989262] ? __hrtimer_run_queues+0x162/0x2a0 [56251.996967] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.004875] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.012673] ? bnxt_msix+0x62/0x70 [bnxt_en] [56252.019903] handle_softirqs+0xcf/0x270 [56252.026650] irq_exit_rcu+0x67/0x90 [56252.032933] common_interrupt+0x85/0xa0 [56252.039498] </IRQ> [56252.044246] <TASK> [56252.048935] asm_common_interrupt+0x26/0x40 [56252.055727] RIP: 0010:cpuidle_enter_state+0xb8/0x420 [56252.063305] Code: dc 01 00 00 e8 f9 79 3b ff e8 64 f7 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 a5 32 3a ff 45 84 ff 0f 85 ae 01 00 00 fb 45 85 f6 <0f> 88 88 01 00 00 48 8b 04 24 49 63 ce 4c 89 ea 48 6b f1 68 48 29 [56252.088911] RSP: 0018:ffff93120c97fe98 EFLAGS: 00000202 [56252.096912] RAX: ffff9149afd80000 RBX: ffff9141d3a72800 RCX: 0000000000000000 [56252.106844] RDX: 00003329176c6b98 RSI: ffffffe36db3fdc7 RDI: 0000000000000000 [56252.116733] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000000000004e [56252.126652] R10: ffff9149afdb30c4 R11: 071c71c71c71c71c R12: ffffffff985ff860 [56252.136637] R13: 00003329176c6b98 R14: 0000000000000002 R15: 0000000000000000 [56252.146667] ? cpuidle_enter_state+0xab/0x420 [56252.153909] cpuidle_enter+0x2d/0x40 [56252.160360] do_idle+0x176/0x1c0 [56252.166456] cpu_startup_entry+0x29/0x30 [56252.173248] start_secondary+0xf7/0x100 [56252.179941] common_startup_64+0x13e/0x141 [56252.186886] </TASK> From the crash dump, we found that the cpu_map_flush_list inside redirect info is partially corrupted: its list_head->next points to itself, but list_head->prev points to a valid list of unflushed bq entries. This turned out to be a result of missed XDP flush on redirect lists. By digging in the actual source code, we found that commit 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX ring structures") incorrectly overwrites the event mask for XDP_REDIRECT in bnxt_rx_xdp. We can stably reproduce this crash by returning XDP_TX and XDP_REDIRECT randomly for incoming packets in a naive XDP program. Properly propagate the XDP_REDIRECT events back fixes the crash. Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers") Tested-by: Andrew Rzeznik <arzeznik@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Link: https://patch.msgid.link/aFl7jpCNzscumuN2@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-24Merge tag 'selinux-pr-20250624' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "Another small SELinux patch to fix a problem seen by the dracut-ng folks during early boot when SELinux is enabled, but the policy has yet to be loaded" * tag 'selinux-pr-20250624' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: change security_compute_sid to return the ssid or tsid on match
2025-06-24vsock/uapi: fix linux/vm_sockets.h userspace compilation errorsStefano Garzarella
If a userspace application just include <linux/vm_sockets.h> will fail to build with the following errors: /usr/include/linux/vm_sockets.h:182:39: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’ 182 | unsigned char svm_zero[sizeof(struct sockaddr) - | ^~~~~~ /usr/include/linux/vm_sockets.h:183:39: error: ‘sa_family_t’ undeclared here (not in a function) 183 | sizeof(sa_family_t) - | Include <sys/socket.h> for userspace (guarded by ifndef __KERNEL__) where `struct sockaddr` and `sa_family_t` are defined. We already do something similar in <linux/mptcp.h> and <linux/if.h>. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250623100053.40979-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-24bcachefs: fix bch2_journal_keys_peek_prev_min() underflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Use wait_on_allocator() when allocating journalKent Overstreet
wait_on_allocator() emits debug info when we hang trying to allocate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24x86/traps: Initialize DR7 by writing its architectural reset valueXin Li (Intel)
Initialize DR7 by writing its architectural reset value to always set bit 10, which is reserved to '1', when "clearing" DR7 so as not to trigger unanticipated behavior if said bit is ever unreserved, e.g. as a feature enabling flag with inverted polarity. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sean Christopherson <seanjc@google.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com
2025-06-24x86/traps: Initialize DR6 by writing its architectural reset valueXin Li (Intel)
Initialize DR6 by writing its architectural reset value to avoid incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads to a false bus lock detected warning. The Intel SDM says: 1) Certain debug exceptions may clear bits 0-3 of DR6. 2) BLD induced #DB clears DR6.BLD and any other debug exception doesn't modify DR6.BLD. 3) RTM induced #DB clears DR6.RTM and any other debug exception sets DR6.RTM. To avoid confusion in identifying debug exceptions, debug handlers should set DR6.BLD and DR6.RTM, and clear other DR6 bits before returning. The DR6 architectural reset value 0xFFFF0FF0, already defined as macro DR6_RESERVED, satisfies these requirements, so just use it to reinitialize DR6 whenever needed. Since clear_all_debug_regs() no longer zeros all debug registers, rename it to initialize_debug_regs() to better reflect its current behavior. Since debug_read_clear_dr6() no longer clears DR6, rename it to debug_read_reset_dr6() to better reflect its current behavior. Fixes: ebb1064e7c2e9 ("x86/traps: Handle #DB for bus lock") Reported-by: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/lkml/06e68373-a92b-472e-8fd9-ba548119770c@intel.com/ Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-2-xin%40zytor.com
2025-06-24KVM: selftests: Add back the missing check of MONITOR/MWAIT availabilityChenyi Qiang
The revamp of monitor/mwait test missed the original check of feature availability [*]. If MONITOR/MWAIT is not supported or is disabled by IA32_MISC_ENABLE on the host, executing MONITOR or MWAIT instruction from guest doesn't cause monitor/mwait VM exits, but a #UD. [*] https://lore.kernel.org/all/20240411210237.34646-1-zide.chen@intel.com/ Reported-by: Xuelian Guo <xuelian.guo@intel.com> Fixes: 80fd663590cf ("selftests: kvm: revamp MONITOR/MWAIT tests") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250620062219.342930-1-chenyi.qiang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-24bcachefs: Check for bad write buffer key when moving from journalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blockedAlan Huang
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24KVM: Allow CPU to reschedule while setting per-page memory attributesLiam Merwick
When running an SEV-SNP guest with a sufficiently large amount of memory (1TB+), the host can experience CPU soft lockups when running an operation in kvm_vm_set_mem_attributes() to set memory attributes on the whole range of guest memory. watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372] CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 #1 PREEMPT(voluntary) Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024 RIP: 0010:xas_create+0x78/0x1f0 Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87 RSP: 0018:ffffad890a34b940 EFLAGS: 00000286 RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000 RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868 R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868 FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> xas_store+0x58/0x630 __xa_store+0xa5/0x130 xa_store+0x2c/0x50 kvm_vm_set_mem_attributes+0x343/0x710 [kvm] kvm_vm_ioctl+0x796/0xab0 [kvm] __x64_sys_ioctl+0xa3/0xd0 do_syscall_64+0x8c/0x7a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f5578d031bb Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000 R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120 R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0 While looping through the range of memory setting the attributes, call cond_resched() to give the scheduler a chance to run a higher priority task on the runqueue if necessary and avoid staying in kernel mode long enough to trigger the lockup. Fixes: 5a475554db1e ("KVM: Introduce per-page memory attributes") Cc: stable@vger.kernel.org # 6.12.x Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/20250609091121.2497429-2-liam.merwick@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-24KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.David Woodhouse
To avoid imposing an ordering constraint on userspace, allow 'invalid' event channel targets to be configured in the IRQ routing table. This is the same as accepting interrupts targeted at vCPUs which don't exist yet, which is already the case for both Xen event channels *and* for MSIs (which don't do any filtering of permitted APIC ID targets at all). If userspace actually *triggers* an IRQ with an invalid target, that will fail cleanly, as kvm_xen_set_evtchn_fast() also does the same range check. If KVM enforced that the IRQ target must be valid at the time it is *configured*, that would force userspace to create all vCPUs and do various other parts of setup (in this case, setting the Xen long_mode) before restoring the IRQ table. Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/e489252745ac4b53f1f7f50570b03fb416aa2065.camel@infradead.org [sean: massage comment] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-24KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masksSean Christopherson
Use a preallocated per-vCPU bitmap for tracking the unpacked set of vCPUs being targeted for Hyper-V's paravirt TLB flushing. If KVM_MAX_NR_VCPUS is set to 4096 (which is allowed even for MAXSMP=n builds), putting the vCPU mask on-stack pushes kvm_hv_flush_tlb() past the default FRAME_WARN limit. arch/x86/kvm/hyperv.c:2001:12: error: stack frame size (1288) exceeds limit (1024) in 'kvm_hv_flush_tlb' [-Werror,-Wframe-larger-than] 2001 | static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc) | ^ 1 error generated. Note, sparse_banks was given the same treatment by commit 7d5e88d301f8 ("KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'"), for the exact same reason. Reported-by: Abinash Lalotra <abinashsinghlalotra@gmail.com> Closes: https://lore.kernel.org/all/20250613111023.786265-1-abinashsinghlalotra@gmail.com Link: https://lore.kernel.org/all/aEylI-O8kFnFHrOH@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-24KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULLSean Christopherson
When creating an SEV-ES vCPU for intra-host migration, set its vmsa_pa to INVALID_PAGE to harden against doing VMRUN with a bogus VMSA (KVM checks for a valid VMSA page in pre_sev_run()). Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20250602224459.41505-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>