Age | Commit message (Collapse) | Author |
|
Acknowledge the byte count submitted by the target.
When I2C_SMBUS_BLOCK_DATA read operation is executed by
i2c_smbus_xfer_emulated(), the length of the second (read) message is set
to 1. Length of the block is supposed to be obtained from the target by the
underlying bus driver.
The i2c_imx_isr_read() function should emit the acknowledge on i2c bus
after reading the first byte (i.e., byte count) while processing such
message (as defined in Section 6.5.7 of System Management Bus
Specification [1]). Without this acknowledge, the target does not submit
subsequent bytes and the controller only reads 0xff's.
In addition, store the length of block data obtained from the target in
the buffer provided by i2c_smbus_xfer_emulated() - otherwise the first
byte of actual data is erroneously interpreted as length of the data
block.
[1] https://smbus.org/specs/SMBus_3_3_20240512.pdf
Fixes: 5f5c2d4579ca ("i2c: imx: prevent rescheduling in non dma mode")
Signed-off-by: Lukasz Kucharczyk <lukasz.kucharczyk@leica-geosystems.com>
Cc: <stable@vger.kernel.org> # v6.13+
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Carlos Song <carlos.song@nxp.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250520122252.1475403-1-lukasz.kucharczyk@leica-geosystems.com
|
|
The `name` field in `obj->externs` points into the BTF data at initial
open time. However, some functions may invalidate this after opening and
before loading (e.g. `bpf_map__set_value_size`), which results in
pointers into freed memory and undefined behavior.
The simplest solution is to simply `strdup` these strings, similar to
the `essent_name`, and free them at the same time.
In order to test this path, the `global_map_resize` BPF selftest is
modified slightly to ensure the presence of an extern, which causes this
test to fail prior to the fix. Given there isn't an obvious API or error
to test against, I opted to add this to the existing test as an aspect
of the resizing feature rather than duplicate the test.
Fixes: 9d0a23313b1a ("libbpf: Add capability for resizing datasec maps")
Signed-off-by: Adin Scannell <amscanne@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com
|
|
cxl_find_rec_dram() is used to find a DRAM event record based on the
inputted attributes. Different repair_type of the inputted attributes
will check the DRAM event record in different ways.
When EDAC driver is performing a memory rank sparing, it should use
CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM
event record checking.
Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One fix for a runtime PM underflow when removing the Cadence QuadSPI
driver"
* tag 'spi-fix-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence-quadspi: Fix pm runtime unbalance
|
|
The generate '[FAILED TO PARSE]' strings in trace-cmd report output like this:
rm-5298 [001] 6084.533748493: smb3_exit_err: [FAILED TO PARSE] xid=972 func_name=cifs_rmdir rc=-39
rm-5298 [001] 6084.533959234: smb3_enter: [FAILED TO PARSE] xid=973 func_name=cifs_closedir
rm-5298 [001] 6084.533967630: smb3_close_enter: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361
rm-5298 [001] 6084.534004008: smb3_cmd_enter: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566
rm-5298 [001] 6084.552248232: smb3_cmd_done: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566
rm-5298 [001] 6084.552280542: smb3_close_done: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361
rm-5298 [001] 6084.552316034: smb3_exit_done: [FAILED TO PARSE] xid=973 func_name=cifs_closedir
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes all in drivers.
ufs and megaraid_sas are small and obvious.
The large diffstat in fnic comes from two pieces: the addition of
quite a bit of logging (no change to function) and the reworking of
the timeout allocation path for the two conditions that can occur
simultaneously to prevent reusing the same abort frame and then both
trying to free it"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Fix missing DMA mapping error in fnic_send_frame()
scsi: fnic: Set appropriate logging level for log message
scsi: fnic: Add and improve logs in FDMI and FDMI ABTS paths
scsi: fnic: Turn off FDMI ACTIVE flags on link down
scsi: fnic: Fix crash in fnic_wq_cmpl_handler when FDMI times out
scsi: ufs: core: Fix clk scaling to be conditional in reset and restore
scsi: megaraid_sas: Fix invalid node index
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Johannes Berg:
- fix FP registers in seccomp mode
- prevent duplicate devices in VFIO support
- don't ignore errors in UBD thread start
- reduce stack use with clang 19
* tag 'uml-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: vector: Reduce stack usage in vector_eth_configure()
um: Use correct data source in fpregs_legacy_set()
um: vfio: Prevent duplicate device assignments
um: ubd: Add missing error check in start_io_thread()
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Just a few fixes:
- iwlegacy: work around large stack with clang/kasan
- mac80211: fix integer overflow
- mac80211: fix link struct init vs. RCU publish
- iwlwifi: fix warning on IFF_UP
* tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: finish link init before RCU publish
wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version
wifi: mac80211: fix beacon interval calculation overflow
wifi: iwlegacy: work around excessive stack usage on clang/kasan
====================
Link: https://patch.msgid.link/20250625115433.41381-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A bigger array of vecs could've been allocated, but
io_ring_buffers_peek() still decided to cap the mapped range depending
on how much data was available. Hence don't rely on the segment count
to know if the request should be marked as needing cleanup, always
check upfront if the iov array is different than the fast_iov array.
Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
transmit all data
We should not send smbdirect_data_transfer messages larger than
the negotiated max_send_size, typically 1364 bytes, which means
24 bytes of the smbdirect_data_transfer header + 1340 payload bytes.
This happened when doing an SMB2 write with more than 1340 bytes
(which is done inline as it's below rdma_readwrite_threshold).
It means the peer resets the connection.
When testing between cifs.ko and ksmbd.ko something like this
is logged:
client:
CIFS: VFS: RDMA transport re-established
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
siw: got TERMINATE. layer 1, type 2, code 2
CIFS: VFS: \\carina Send error in SessSetup = -11
smb2_reconnect: 12 callbacks suppressed
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: reconnect tcon failed rc = -11
CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536
and:
CIFS: VFS: RDMA transport re-established
siw: got TERMINATE. layer 1, type 2, code 2
CIFS: VFS: smbd_recv:1894 disconnected
siw: got TERMINATE. layer 1, type 2, code 2
The ksmbd dmesg is showing things like:
smb_direct: Recv error. status='local length error (1)' opcode=128
smb_direct: disconnected
smb_direct: Recv error. status='local length error (1)' opcode=128
ksmbd: smb_direct: disconnected
ksmbd: sock_read failed: -107
As smbd_post_send_iter() limits the transmitted number of bytes
we need loop over it in order to transmit the whole iter.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Meetakshi Setiya <msetiya@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: <stable+noautosel@kernel.org> # sp->max_send_size should be info->max_send_size in backports
Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
By default, HPD was disabled on SN65DSI86 bridge. When the driver was
added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable
call which was moved to other function calls subsequently.
Later on, commit "c312b0df3b13" added detect utility for DP mode. But with
HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced
state always return 1 (always connected state).
Set HPD_DISABLE bit conditionally based on display sink's connector type.
Since the HPD_STATE is reflected correctly only after waiting for debounce
time (~100-400ms) and adding this delay in detect() is not feasible
owing to the performace impact (glitches and frame drop), remove runtime
calls in detect() and add hpd_enable()/disable() bridge hooks with runtime
calls, to detect hpd properly without any delay.
[0]: <https://www.ti.com/lit/gpn/SN65DSI86> (Pg. 32)
Fixes: c312b0df3b13 ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP")
Cc: Max Krummenacher <max.krummenacher@toradex.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com
|
|
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based
SOCs has an Address Mask and a Secondary Address Mask register associated with
it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity
during init using these two registers.
Currently, the module primarily considers only the Address Mask register for
computing DIMM sizes. The Secondary Address Mask register is only considered
for odd CS. Additionally, if it has been considered, the Address Mask register
is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose
total capacity is a power of two (32GB, 64GB, etc), this is not an issue
since only the Address Mask register is used.
For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of
two (48GB, 96GB, etc), however, the Secondary Address Mask register is used
in conjunction with the Address Mask register. However, since the module only
considers either of the two registers for a CS, the size computed by the
module is incorrect. The Secondary Address Mask register is not considered for
even CS, and the Address Mask register is not considered for odd CS.
Introduce a new helper function so that both Address Mask and Secondary
Address Mask registers are considered, when valid, for computing DIMM sizes.
Furthermore, also rename some variables for greater clarity.
Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com
|
|
Replace the return statement with setting ret = -EINVAL and jumping to
the err label to ensure resources are released via io_release_dmabuf.
Fixes: a5c98e942457 ("io_uring/zcrx: dmabuf backed zerocopy receive")
Signed-off-by: Penglei Jiang <superman.xpt@gmail.com>
Link: https://lore.kernel.org/r/20250625102703.68336-1-superman.xpt@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ASUS store the board name in DMI_PRODUCT_NAME rather than
DMI_PRODUCT_VERSION. (Apparently it is only Lenovo that stores the
model-name in DMI_PRODUCT_VERSION.)
Use the correct DMI identifier, DMI_PRODUCT_NAME, to match the
ASUSPRO-D840SA board, such that the quirk actually gets applied.
Cc: stable@vger.kernel.org
Reported-by: Andy Yang <andyybtc79@gmail.com>
Tested-by: Andy Yang <andyybtc79@gmail.com>
Closes: https://lore.kernel.org/linux-ide/aFb3wXAwJSSJUB7o@ryzen/
Fixes: b5acc3628898 ("ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard")
Reviewed-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250624074029.963028-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
Do not leave host with dangling ->mrq pointer if we hit
the msdc_prepare_data() error out path.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: f5de469990f1 ("mtk-sd: Prevent memory corruption from DMA map failure")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250625052106.584905-1-senozhatsky@chromium.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
same ID (103) was assigned to both GDC_BANK0_G_RSE_PIPE_CACHE_DATA0
and GDC_BANK0_G_RSE_PIPE_CACHE_DATA1. This could lead to incorrect
event mapping.
Updated the ID to 104 to ensure uniqueness.
Fixes: 423c3361855c ("platform/mellanox: mlxbf-pmc: Add support for BlueField-3")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20250619060502.3594350-1-alok.a.tiwari@oracle.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
2025 Thinkpads F11 key launch the Intel Unison app on Windows,
which does some sort of smart sharing between laptop and phone.
Map this key event to KEY_LINK_PHONE as the closest thing we have.
This prevents an error message being displayed on key press.
Reported-by: Damjan Georgievski <gdamjan@gmail.com>
Closes: https://sourceforge.net/p/ibm-acpi/mailman/message/59189556/
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250620181119.2519546-1-mpearson-lenovo@squebb.ca
[ij: converted directory to pre-lenovo move as this is fixes material.]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add a DMI quirk entry for the ASUS Zenbook Duo UX8406CA 2025 model to use
the existing zenbook duo keyboard quirk.
Signed-off-by: Rahul Chandra <rahul@chandra.net>
Link: https://lore.kernel.org/r/20250624073301.602070-1-rahul@chandra.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add 0x29 as the accelerometer address for the Dell Latitude 5500 to
lis3lv02d_devices[].
The address was verified as below:
$ cd /sys/bus/pci/drivers/i801_smbus/0000:00:1f.4
$ ls -d i2c-?
i2c-2
$ sudo modprobe i2c-dev
$ sudo i2cdetect 2
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-2.
I will probe address range 0x08-0x77.
Continue? [Y/n] Y
0 1 2 3 4 5 6 7 8 9 a b c d e f
00: 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- 29 -- -- -- -- -- --
30: 30 -- -- -- -- 35 UU UU -- -- -- -- -- -- -- --
40: -- -- -- -- 44 -- -- -- -- -- -- -- -- -- -- --
50: UU -- 52 -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
$ echo lis3lv02d 0x29 | sudo tee /sys/bus/i2c/devices/i2c-2/new_device
lis3lv02d 0x29
$ sudo dmesg
[ 0.000000] Linux version 6.12.32-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.32-1 (2025-06-07)
[…]
[ 0.000000] DMI: Dell Inc. Latitude 5500/0M14W7, BIOS 1.38.0 03/06/2025
[…]
[ 609.063488] i2c_dev: i2c /dev entries driver
[ 639.135020] i2c i2c-2: new_device: Instantiated device lis3lv02d at 0x29
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250622080721.4661-1-pmenzel@molgen.mpg.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi-fixes: fix failure in interface up
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Always enable vport loopback for both MPV devices on driver start.
Previously in some cases related to MPV RoCE, packets weren't correctly
executing loopback check at vport in FW, since it was disabled.
Due to complexity of identifying such cases for MPV always enable vport
loopback for both GVMIs when binding the slave to the master port.
Fixes: 0042f9e458a5 ("RDMA/mlx5: Enable vport loopback when user context or QP mandate")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/d4298f5ebb2197459e9e7221c51ecd6a34699847.1750064969.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In case, CC counters are querying for the second port use the correct
core device for the query instead of always using the master core device.
Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/9cace74dcf106116118bebfa9146d40d4166c6b0.1750064969.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
To get the device HW counters, a non-representor switchdev device
should use the mlx5_ib_query_q_counters() function and query all of
the available counters. While a representor device in switchdev mode
should use the mlx5_ib_query_q_counters_vport() function and query only
the Q_Counters without the PPCNT counters and congestion control counters,
since they aren't relevant for a representor device.
Currently a non-representor switchdev device skips querying the PPCNT
counters and congestion control counters, leaving them unupdated.
Fix that by properly querying those counters for non-representor devices.
Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Link: https://patch.msgid.link/56bf8af4ca8c58e3fb9f7e47b1dca2009eeeed81.1750064969.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Following the fix in the previous commit ("IB/mlx5: Fix potential
deadlock in MR deregistration"), teach lockdep explicitly about the
locking order between fs_reclaim and umem_mutex.
The previous commit resolved a potential deadlock scenario where
kzalloc(GFP_KERNEL) was called while holding umem_mutex, which could
lead to reclaim and eventually invoke the MMU notifier
(mlx5_ib_invalidate_range()), causing a recursive acquisition of
umem_mutex.
To prevent such issues from reoccurring unnoticed in future code
changes, add a lockdep annotation in ib_init_umem_odp() that simulates
taking umem_mutex inside a reclaim context. This makes lockdep aware
of this locking dependency and ensures that future violations—such as
calling kzalloc() or any memory allocator that may enter reclaim while
holding umem_mutex—will immediately raise a lockdep warning.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/9d31b9d8fe1db648a9f47cec3df6b8463319dee5.1750061698.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The issue arises when kzalloc() is invoked while holding umem_mutex or
any other lock acquired under umem_mutex. This is problematic because
kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke
mmu_notifier_invalidate_range_start(). This function can lead to
mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again,
resulting in a deadlock.
The problematic flow:
CPU0 | CPU1
---------------------------------------|------------------------------------------------
mlx5_ib_dereg_mr() |
→ revoke_mr() |
→ mutex_lock(&umem_odp->umem_mutex) |
| mlx5_mkey_cache_init()
| → mutex_lock(&dev->cache.rb_lock)
| → mlx5r_cache_create_ent_locked()
| → kzalloc(GFP_KERNEL)
| → fs_reclaim()
| → mmu_notifier_invalidate_range_start()
| → mlx5_ib_invalidate_range()
| → mutex_lock(&umem_odp->umem_mutex)
→ cache_ent_find_and_store() |
→ mutex_lock(&dev->cache.rb_lock) |
Additionally, when kzalloc() is called from within
cache_ent_find_and_store(), we encounter the same deadlock due to
re-acquisition of umem_mutex.
Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr()
and before acquiring rb_lock. This ensures that we don't hold
umem_mutex while performing memory allocations that could trigger
the reclaim path.
This change prevents the deadlock by ensuring proper lock ordering and
avoiding holding locks during memory allocation operations that could
trigger the reclaim path.
The following lockdep warning demonstrates the deadlock:
python3/20557 is trying to acquire lock:
ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at:
mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib]
but task is already holding lock:
ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at:
unmap_vmas+0x7b/0x1a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x60/0xd0
mem_cgroup_css_alloc+0x6f/0x9b0
cgroup_init_subsys+0xa4/0x240
cgroup_init+0x1c8/0x510
start_kernel+0x747/0x760
x86_64_start_reservations+0x25/0x30
x86_64_start_kernel+0x73/0x80
common_startup_64+0x129/0x138
-> #2 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0x91/0xd0
__kmalloc_cache_noprof+0x4d/0x4c0
mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib]
mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib]
mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib]
__mlx5_ib_add+0x4b/0x190 [mlx5_ib]
mlx5r_probe+0xd9/0x320 [mlx5_ib]
auxiliary_bus_probe+0x42/0x70
really_probe+0xdb/0x360
__driver_probe_device+0x8f/0x130
driver_probe_device+0x1f/0xb0
__driver_attach+0xd4/0x1f0
bus_for_each_dev+0x79/0xd0
bus_add_driver+0xf0/0x200
driver_register+0x6e/0xc0
__auxiliary_driver_register+0x6a/0xc0
do_one_initcall+0x5e/0x390
do_init_module+0x88/0x240
init_module_from_file+0x85/0xc0
idempotent_init_module+0x104/0x300
__x64_sys_finit_module+0x68/0xc0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #1 (&dev->cache.rb_lock){+.+.}-{4:4}:
__mutex_lock+0x98/0xf10
__mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib]
mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib]
ib_dereg_mr_user+0x85/0x1f0 [ib_core]
uverbs_free_mr+0x19/0x30 [ib_uverbs]
destroy_hw_idr_uobject+0x21/0x80 [ib_uverbs]
uverbs_destroy_uobject+0x60/0x3d0 [ib_uverbs]
uobj_destroy+0x57/0xa0 [ib_uverbs]
ib_uverbs_cmd_verbs+0x4d5/0x1210 [ib_uverbs]
ib_uverbs_ioctl+0x129/0x230 [ib_uverbs]
__x64_sys_ioctl+0x596/0xaa0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #0 (&umem_odp->umem_mutex){+.+.}-{4:4}:
__lock_acquire+0x1826/0x2f00
lock_acquire+0xd3/0x2e0
__mutex_lock+0x98/0xf10
mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib]
__mmu_notifier_invalidate_range_start+0x18e/0x1f0
unmap_vmas+0x182/0x1a0
exit_mmap+0xf3/0x4a0
mmput+0x3a/0x100
do_exit+0x2b9/0xa90
do_group_exit+0x32/0xa0
get_signal+0xc32/0xcb0
arch_do_signal_or_restart+0x29/0x1d0
syscall_exit_to_user_mode+0x105/0x1d0
do_syscall_64+0x79/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Chain exists of:
&dev->cache.rb_lock --> mmu_notifier_invalidate_range_start -->
&umem_odp->umem_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&umem_odp->umem_mutex);
lock(mmu_notifier_invalidate_range_start);
lock(&umem_odp->umem_mutex);
lock(&dev->cache.rb_lock);
*** DEADLOCK ***
Fixes: abb604a1a9c8 ("RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/3c8f225a8a9fade647d19b014df1172544643e4a.1750061612.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When compiling with clang (19.1.7), initializing *vp using a compound
literal may result in excessive stack usage. Fix it by initializing the
required fields of *vp individually.
Without this patch:
$ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0
...
0x0000000000000540 vector_eth_configure [vector_kern.o]:1472
...
With this patch:
$ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0
...
0x0000000000000540 vector_eth_configure [vector_kern.o]:208
...
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506221017.WtB7Usua-lkp@intel.com/
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250623110829.314864-1-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Read from the buffer pointed to by 'from' instead of '&buf', as
'buf' contains no valid data when 'ubuf' is NULL.
Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250606124428.148164-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Ensure devices are assigned only once. Reject subsequent requests
for duplicate assignments.
Fixes: a0e2cb6a9063 ("um: Add VFIO-based virtual PCI driver")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250606124428.148164-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The subsequent call to os_set_fd_block() overwrites the previous
return value. OR the two return values together to fix it.
Fixes: f88f0bdfc32f ("um: UBD Improvements")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250606124428.148164-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
An earlier patch fixed a build failure with clang, but I still see the
same problem with some configurations using gcc:
drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask':
include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1
drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON'
116 | BUILD_BUG_ON(bit >
As I understand it, the problem is that the function is not always fully
inlined, but the __builtin_constant_p() can still evaluate the argument
as being constant.
Marking it as __always_inline so far works for me in all configurations.
Fixes: a7137b1825b5 ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled")
Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit ef69f9dd1cd7301cdf04ba326ed28152a3affcf6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The find_substream() call may return NULL, but the error path
dereferenced 'subs' unconditionally via dev_err(&subs->dev->dev, ...),
causing a NULL pointer dereference when subs is NULL.
Fix by switching to &uadev[idx].udev->dev which is always valid
in this context.
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Link: https://patch.msgid.link/86ac2939273ac853535049e60391c09d7688714e.1750755508.git.xiaopei01@kylinos.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
There is no guaranteed alignment for user pointers. Don't use mask
trickery and adjust the offset by bv_offset.
Cc: stable@vger.kernel.org
Reported-by: David Hildenbrand <david@redhat.com>
Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/io-uring/19530391f5c361a026ac9b401ff8e123bde55d98.1750771718.git.asml.silence@gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no guaranteed alignment for user pointers, however the
calculation of an offset of the first page into a folio after coalescing
uses some weird bit mask logic, get rid of it.
Cc: stable@vger.kernel.org
Reported-by: David Hildenbrand <david@redhat.com>
Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/io-uring/e387b4c78b33f231105a601d84eefd8301f57954.1750771718.git.asml.silence@gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
syzbot complains about an unmapping failure:
[ 108.070381][ T14] kernel BUG at mm/gup.c:71!
[ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025
[ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work
[ 108.174205][ T14] Call trace:
[ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P)
[ 108.178138][ T14] unpin_user_page+0x80/0x10c
[ 108.180189][ T14] io_release_ubuf+0x84/0xf8
[ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c
[ 108.184345][ T14] io_rsrc_data_free+0x148/0x298
[ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0
[ 108.188991][ T14] io_ring_ctx_free+0x48/0x480
[ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8
[ 108.193207][ T14] process_one_work+0x7e8/0x155c
[ 108.195431][ T14] worker_thread+0x958/0xed8
[ 108.197561][ T14] kthread+0x5fc/0x75c
[ 108.199362][ T14] ret_from_fork+0x10/0x20
We can pin a tail page of a folio, but then io_uring will try to unpin
the head page of the folio. While it should be fine in terms of keeping
the page actually alive, mm folks say it's wrong and triggers a debug
warning. Use unpin_user_folio() instead of unpin_user_page*.
Cc: stable@vger.kernel.org
Debugged-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com
Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/io-uring/a28b0f87339ac2acf14a645dad1e95bbcbf18acd.1750771718.git.asml.silence@gmail.com/
[axboe: adapt to current tree, massage commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If ublk_get_data() fails, -EIOCBQUEUED is returned and the current command
becomes ASYNC. And the only reason is that mapping data can't move on,
because of no enough pages or pending signal, then the current ublk request
has to be requeued.
Once the request need to be requeued, we have to setup `ublk_io` correctly,
including io->cmd and flags, otherwise the request may not be forwarded to
ublk server successfully.
Fixes: 9810362a57cb ("ublk: don't call ublk_dispatch_req() for NEED_GET_DATA")
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/linux-block/CAGVVp+VN9QcpHUz_0nasFf5q9i1gi8H8j-G-6mkBoqa3TyjRHA@mail.gmail.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Link: https://lore.kernel.org/r/20250624104121.859519-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
UBLK_F_SUPPORT_ZERO_COPY has a very old comment describing the initial
idea for how zero-copy would be implemented. The actual implementation
added in commit 1f6540e2aabb ("ublk: zc register/unregister bvec") uses
io_uring registered buffers rather than shared memory mapping.
Remove the inaccurate remarks about mapping ublk request memory into the
ublk server's address space and requiring 4K block size. Replace them
with a description of the current zero-copy mechanism.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250621171015.354932-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When a C++ file compiled with -Wc++11-narrowing includes the UAPI header
linux/ublk_cmd.h, ublk_sqe_addr_to_auto_buf_reg()'s assignments of u64
values to u8, u16, and u32 fields result in compiler warnings. Add
explicit casts to the intended types to avoid these warnings. Drop the
unnecessary bitmasks.
Reported-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250621162842.337452-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't use same backing file for more than one ublk devices, and avoid
concurrent write on same file from more ublk disks.
Fixes: 8ccebc19ee3d ("selftests: ublk: support UBLK_F_AUTO_BUF_REG")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250623011934.741788-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ublk_queue_cmd_list() dispatches the whole batch list by scheduling task
work via the tail request's io_uring_cmd, this way is fine even though
more than one io_ring_ctx are involved for this batch since it is just
one running context.
However, the task work handler ublk_cmd_list_tw_cb() takes `issue_flags`
of tail uring_cmd's io_ring_ctx for completing all commands. This way is
wrong if any uring_cmd is issued from different io_ring_ctx.
Fixes it by always building batch IOs from same io_ring_ctx and io task
because ublk_dispatch_req() does validate task context, and IO needs to
be aborted in case of running from fallback task work context.
For typical per-queue or per-io daemon implementation, this way shouldn't
make difference from performance viewpoint, because single io_ring_ctx is
taken in each daemon for normal use case.
Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250625022554.883571-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Change "resourse" into "resource" in the name of a sysfs attribute.
Fixes: d829fc8a1058 ("scsi: ufs: sysfs: unit descriptor")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250624181658.336035-1-bvanassche@acm.org
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The virt_boundary_mask limit requires an unlimited max_segment_size for
bio splitting to not corrupt data. Historically, the block layer tried
to validate this, although the check was half-hearted until the addition
of the atomic queue limits API. The full blown check then triggered
issues with stacked devices incorrectly inheriting limits such as the
virt boundary and got disabled in commit b561ea56a264 ("block: allow
device to have both virt_boundary_mask and max segment size") instead of
fixing the issue properly.
Ensure that the SCSI mid layer doesn't set the default low
max_segment_size limit for this case, and check for invalid
max_segment_size values in the host template, similar to the original
block layer check given that SCSI devices can't be stacked.
This fixes reported data corruption on storvsc, although as far as I can
tell storvsc always failed to properly set the max_segment_size limit as
the SCSI APIs historically applied that when setting up the host, while
storvsc only set the virt_boundary_mask when configuring the scsi_device.
Fixes: 81988a0e6b03 ("storvsc: get rid of bounce buffer")
Fixes: b561ea56a264 ("block: allow device to have both virt_boundary_mask and max segment size")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250624125233.219635-3-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
virt_boundary_mask implies an unlimited max_segment_size. Setting both
can lead to data corruption because __blk_rq_map_sg() can split requests
so that the virt_boundary_mask is not respected if max_segment_size is
not UINT_MAX.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250624125233.219635-2-hch@lst.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
sd_read_block_limits_ext() currently assumes that vpd->len excludes the
size of the page header. However, vpd->len describes the size of the entire
VPD page, therefore the sanity check is incorrect.
In practice this is not really a problem since we don't attach VPD
pages unless they actually report data trailing the header. But fix
the length check regardless.
This issue was identified by Wukong-Agent (formerly Tencent Woodpecker), a
code security AI agent, through static code analysis.
[mkp: rewrote patch description]
Signed-off-by: jackysliu <1972843537@qq.com>
Link: https://lore.kernel.org/r/tencent_ADA5210D1317EEB6CD7F3DE9FE9DA4591D05@qq.com
Fixes: 96b171d6dba6 ("scsi: core: Query the Block Limits Extension VPD page")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We encountered following crash when testing a XDP_REDIRECT feature
in production:
[56251.579676] list_add corruption. next->prev should be prev (ffff93120dd40f30), but was ffffb301ef3a6740. (next=ffff93120dd
40f30).
[56251.601413] ------------[ cut here ]------------
[56251.611357] kernel BUG at lib/list_debug.c:29!
[56251.621082] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[56251.632073] CPU: 111 UID: 0 PID: 0 Comm: swapper/111 Kdump: loaded Tainted: P O 6.12.33-cloudflare-2025.6.
3 #1
[56251.653155] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[56251.663877] Hardware name: MiTAC GC68B-B8032-G11P6-GPU/S8032GM-HE-CFR, BIOS V7.020.B10-sig 01/22/2025
[56251.682626] RIP: 0010:__list_add_valid_or_report+0x4b/0xa0
[56251.693203] Code: 0e 48 c7 c7 68 e7 d9 97 e8 42 16 fe ff 0f 0b 48 8b 52 08 48 39 c2 74 14 48 89 f1 48 c7 c7 90 e7 d9 97 48
89 c6 e8 25 16 fe ff <0f> 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 e8 e7 d9 97 4c 89
[56251.725811] RSP: 0018:ffff93120dd40b80 EFLAGS: 00010246
[56251.736094] RAX: 0000000000000075 RBX: ffffb301e6bba9d8 RCX: 0000000000000000
[56251.748260] RDX: 0000000000000000 RSI: ffff9149afda0b80 RDI: ffff9149afda0b80
[56251.760349] RBP: ffff9131e49c8000 R08: 0000000000000000 R09: ffff93120dd40a18
[56251.772382] R10: ffff9159cf2ce1a8 R11: 0000000000000003 R12: ffff911a80850000
[56251.784364] R13: ffff93120fbc7000 R14: 0000000000000010 R15: ffff9139e7510e40
[56251.796278] FS: 0000000000000000(0000) GS:ffff9149afd80000(0000) knlGS:0000000000000000
[56251.809133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56251.819561] CR2: 00007f5e85e6f300 CR3: 00000038b85e2006 CR4: 0000000000770ef0
[56251.831365] PKRU: 55555554
[56251.838653] Call Trace:
[56251.845560] <IRQ>
[56251.851943] cpu_map_enqueue.cold+0x5/0xa
[56251.860243] xdp_do_redirect+0x2d9/0x480
[56251.868388] bnxt_rx_xdp+0x1d8/0x4c0 [bnxt_en]
[56251.877028] bnxt_rx_pkt+0x5f7/0x19b0 [bnxt_en]
[56251.885665] ? cpu_max_write+0x1e/0x100
[56251.893510] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.902276] __bnxt_poll_work+0x190/0x340 [bnxt_en]
[56251.911058] bnxt_poll+0xab/0x1b0 [bnxt_en]
[56251.919041] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.927568] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.935958] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.944250] __napi_poll+0x2b/0x160
[56251.951155] bpf_trampoline_6442548651+0x79/0x123
[56251.959262] __napi_poll+0x5/0x160
[56251.966037] net_rx_action+0x3d2/0x880
[56251.973133] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.981265] ? srso_alias_return_thunk+0x5/0xfbef5
[56251.989262] ? __hrtimer_run_queues+0x162/0x2a0
[56251.996967] ? srso_alias_return_thunk+0x5/0xfbef5
[56252.004875] ? srso_alias_return_thunk+0x5/0xfbef5
[56252.012673] ? bnxt_msix+0x62/0x70 [bnxt_en]
[56252.019903] handle_softirqs+0xcf/0x270
[56252.026650] irq_exit_rcu+0x67/0x90
[56252.032933] common_interrupt+0x85/0xa0
[56252.039498] </IRQ>
[56252.044246] <TASK>
[56252.048935] asm_common_interrupt+0x26/0x40
[56252.055727] RIP: 0010:cpuidle_enter_state+0xb8/0x420
[56252.063305] Code: dc 01 00 00 e8 f9 79 3b ff e8 64 f7 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 a5 32 3a ff 45 84 ff 0f 85 ae
01 00 00 fb 45 85 f6 <0f> 88 88 01 00 00 48 8b 04 24 49 63 ce 4c 89 ea 48 6b f1 68 48 29
[56252.088911] RSP: 0018:ffff93120c97fe98 EFLAGS: 00000202
[56252.096912] RAX: ffff9149afd80000 RBX: ffff9141d3a72800 RCX: 0000000000000000
[56252.106844] RDX: 00003329176c6b98 RSI: ffffffe36db3fdc7 RDI: 0000000000000000
[56252.116733] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000000000004e
[56252.126652] R10: ffff9149afdb30c4 R11: 071c71c71c71c71c R12: ffffffff985ff860
[56252.136637] R13: 00003329176c6b98 R14: 0000000000000002 R15: 0000000000000000
[56252.146667] ? cpuidle_enter_state+0xab/0x420
[56252.153909] cpuidle_enter+0x2d/0x40
[56252.160360] do_idle+0x176/0x1c0
[56252.166456] cpu_startup_entry+0x29/0x30
[56252.173248] start_secondary+0xf7/0x100
[56252.179941] common_startup_64+0x13e/0x141
[56252.186886] </TASK>
From the crash dump, we found that the cpu_map_flush_list inside
redirect info is partially corrupted: its list_head->next points to
itself, but list_head->prev points to a valid list of unflushed bq
entries.
This turned out to be a result of missed XDP flush on redirect lists. By
digging in the actual source code, we found that
commit 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX
ring structures") incorrectly overwrites the event mask for XDP_REDIRECT
in bnxt_rx_xdp. We can stably reproduce this crash by returning XDP_TX
and XDP_REDIRECT randomly for incoming packets in a naive XDP program.
Properly propagate the XDP_REDIRECT events back fixes the crash.
Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers")
Tested-by: Andrew Rzeznik <arzeznik@cloudflare.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Link: https://patch.msgid.link/aFl7jpCNzscumuN2@debian.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"Another small SELinux patch to fix a problem seen by the dracut-ng
folks during early boot when SELinux is enabled, but the policy has
yet to be loaded"
* tag 'selinux-pr-20250624' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: change security_compute_sid to return the ssid or tsid on match
|
|
If a userspace application just include <linux/vm_sockets.h> will fail
to build with the following errors:
/usr/include/linux/vm_sockets.h:182:39: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’
182 | unsigned char svm_zero[sizeof(struct sockaddr) -
| ^~~~~~
/usr/include/linux/vm_sockets.h:183:39: error: ‘sa_family_t’ undeclared here (not in a function)
183 | sizeof(sa_family_t) -
|
Include <sys/socket.h> for userspace (guarded by ifndef __KERNEL__)
where `struct sockaddr` and `sa_family_t` are defined.
We already do something similar in <linux/mptcp.h> and <linux/if.h>.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250623100053.40979-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
wait_on_allocator() emits debug info when we hang trying to allocate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Initialize DR7 by writing its architectural reset value to always set
bit 10, which is reserved to '1', when "clearing" DR7 so as not to
trigger unanticipated behavior if said bit is ever unreserved, e.g. as
a feature enabling flag with inverted polarity.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com
|
|
Initialize DR6 by writing its architectural reset value to avoid
incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads
to a false bus lock detected warning.
The Intel SDM says:
1) Certain debug exceptions may clear bits 0-3 of DR6.
2) BLD induced #DB clears DR6.BLD and any other debug exception
doesn't modify DR6.BLD.
3) RTM induced #DB clears DR6.RTM and any other debug exception
sets DR6.RTM.
To avoid confusion in identifying debug exceptions, debug handlers
should set DR6.BLD and DR6.RTM, and clear other DR6 bits before
returning.
The DR6 architectural reset value 0xFFFF0FF0, already defined as
macro DR6_RESERVED, satisfies these requirements, so just use it to
reinitialize DR6 whenever needed.
Since clear_all_debug_regs() no longer zeros all debug registers,
rename it to initialize_debug_regs() to better reflect its current
behavior.
Since debug_read_clear_dr6() no longer clears DR6, rename it to
debug_read_reset_dr6() to better reflect its current behavior.
Fixes: ebb1064e7c2e9 ("x86/traps: Handle #DB for bus lock")
Reported-by: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/lkml/06e68373-a92b-472e-8fd9-ba548119770c@intel.com/
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250620231504.2676902-2-xin%40zytor.com
|