Age | Commit message (Collapse) | Author |
|
A warning on driver removal started occurring after commit 9dd05df8403b
("net: warn if NAPI instance wasn't shut down"). Disable tx napi before
deleting it in mt76_dma_cleanup().
WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100
CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy)
Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024
RIP: 0010:__netif_napi_del_locked+0xf0/0x100
Call Trace:
<TASK>
mt76_dma_cleanup+0x54/0x2f0 [mt76]
mt7921_pci_remove+0xd5/0x190 [mt7921e]
pci_device_remove+0x47/0xc0
device_release_driver_internal+0x19e/0x200
driver_detach+0x48/0x90
bus_remove_driver+0x6d/0xf0
pci_unregister_driver+0x2e/0xb0
__do_sys_delete_module.isra.0+0x197/0x2e0
do_syscall_64+0x7b/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Tested with mt7921e but the same pattern can be actually applied to other
mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled
in their *_dma_init() functions and only toggled off and on again inside
their suspend/resume/reset paths. So it should be okay to disable tx
napi in such a generic way.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Tested-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
involves scanning all files and caching their metadata (including dentries
and inodes) in memory. Each HDD contains approximately 2 million files,
resulting in a total of ~20 million cached dentries after initialization.
To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
this configuration, memory pressure conditions can still trigger
reclamation of up to 50% of cached dentries, reducing the cache from 20
million to approximately 10 million entries. During the subsequent cache
rebuild period, any HDFS datanode restart operation incurs substantial
latency penalties until full cache recovery completes.
To maintain service stability, we need to preserve more dentries during
memory reclamation. The current minimum reclaim ratio (1/100 of total
dentries) remains too aggressive for our workload. This patch introduces
vfs_cache_pressure_denom for more granular cache pressure control. The
configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000]
effectively maintains the full 20 million dentry cache under memory
pressure, preventing datanode restart performance degradation.
Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/20250511083624.9305-1-laoar.shao@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When getting the directory contents, the entries are first fetched to a
kernel buffer, then they are copied to userspace with dir_emit(). This
second phase is non-blocking as long as the userspace buffer is not paged
out, making it interruptible makes zero sense.
Overload d_type as flags, since it only uses 4 bits from 32.
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/20250513112335.1473177-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Both packet and crypto offloads perform decryption while packet is
arriving to the HW from the wire. It means that there is no possible
way to perform lookup on XFRM if_id as it can't be set to be "before' HW.
So instead of silently ignore this configuration, let's warn users about
misconfiguration.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Users can set any seq/seq_hi/oseq/oseq_hi values. The XFRM core code
doesn't prevent from them to set even 0xFFFFFFFF, however this value
will cause for traffic drop.
Is is happening because SEQ numbers here mean that packet with such
number was processed and next number should be sent on the wire. In this
case, the next number will be 0, and it means overflow which causes to
(expected) packet drops.
While it can be considered as misconfiguration and handled by XFRM
datapath in the same manner as any other SEQ number, let's add
validation to easy for packet offloads implementations which need to
configure HW with next SEQ to send and not with current SEQ like it is
done in core code.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
mt8183_is_volatile_reg() is a large switch-case block that lists out
every register that is volatile. Since many pairs of registers have
consecutive addresses, the cases can be compressed down with the
ellipsis, i.e. GCC extension "case ranges" [1] to cover more addresses
in one case, shortening the source code.
This is not completely the same, since the addresses are 4-byte aligned,
and using the case ranges feature adds all unaligned addresses in
between. In practice this doesn't matter since the unaligned addresses
are blocked by the regmap core. This also ends up compiling slightly
smaller with a reduction of 128 bytes in the text section.
[1] https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-4-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The irq_data table describes all the supported interrupts for the audio
frontend. The parameters are either the same or can be derived from the
interrupt number. This results in a very long table (in source code)
that can be shortened with macros.
Do just that.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-3-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The memif_data table describes all the supported PCM channels for the
audio frontend. Most of the fields are either the same or can be derived
from the interface's name. This results in a very long table (in source
code) that can be shortened with macros.
Do just that. Some "convenience" macros were added to cover non-existent
register fields that would otherwise require multiple layers of macros
to handle.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-2-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add the 'iommus' property to the Tegra QSPI device tree binding.
The property is needed for Tegra234 when using the internal DMA
controller, and is not supported on other Tegra chips, as DMA is
handled by an external controller.
Signed-off-by: Vishwaroop A <va@nvidia.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20250513200043.608292-1-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add support for internal DMA in Tegra234 devices. Tegra234 has an
internal DMA controller, while Tegra241 continues to use an external
DMA controller (GPCDMA). This patch adds support for both internal
and external DMA controllers.
Signed-off-by: Vishwaroop A <va@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20250513200043.608292-2-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Change the port enable failure error message format specifier to make
it less confusing.
Take the chance to align the style ('fail'->'Failed') while at it.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20250514-topic-asoc_print_hexdec-v1-1-85e90947ec4f@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
use too
Expose certain 'struct cpuinfo_x86' fields via asm-offsets for x86_64
too, so that it will be possible to set CPU capabilities from 64-bit
asm code.
32-bit already used these fields, so simply move those offset exports into
the unified asm-offsets.c file.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250514104242.1275040-12-ardb+git@google.com
|
|
In the current code, if an ROC is started on a vif that already has an
active ROC we reject it and warn.
But really there is no such limitation. The actual limitation is to not
have 2 ROCs of the same type simultaneously.
Add a helper function to find a vif that has an active ROC of a given
type, and only if one exist - reject the ROC.
This allows also to remove bss_roc_vif.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.1f8c55198578.I17cb191596ed4e97a4854108f8ca5ca197662a62@changeid
|
|
Various headers are required for these to build properly.
Include the needed files.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.956281013349.I4c537dffb82f5e5042e4a880cde3c6da38a56cbc@changeid
|
|
Context info was introduced in 22000, and was significantly changed in
ax210. The new version of context info was called 'gen3',
probably because in 22000, the gen2 transport was added.
But this name is just wrong:
- if 'gen' enumerates transports, there was not a gen3 transport, just a
few modifications to gen1/2 transports needed for ax210.
- if 'gen' enumerates devices, then we can just use the device names.
Also, context info will soon become a lib, agnostic of the transport
generations.
Simply replace 'gen3' with 'v2'.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.a580bd8d4f74.Ie413a02233f1a5ad538e13071c09760b9d97be3b@changeid
|
|
iwl_pcie_apply_destination is used in all generation.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.7eaf79a07226.I615cfd21001208b366c94a5fcb8e30a7d97f0ac2@changeid
|
|
Map iwl_context_info to the matching struct in the FW.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.a7240935006e.I75e2e13421b5dac2c1bdbd01fdfd34c38f2d3d8c@changeid
|
|
TFD_QUEUE_SIZE_MAX_GEN3 is not used, remove it.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.a0154cca6afb.Ifb4915e0acd51be6a75d33a8b96b3f7b0928b312@changeid
|
|
As those are now the same, unify and adjust the documentation.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.b7ddfade8fec.I2bf97252c4bd751077ade204767eed02d815614d@changeid
|
|
iwlagn_scd_bc_tbl is used for pre-ax210 devices,
and iwl_gen3_bc_tbl_entry is used for ax210 and on. But there is no
difference between the the 22000 version and the AX210+ one.
In order to unify the two, as first step make iwlagn_scd_bc_tbl an entry
as well, and adjust the code. In a later patch both structures will be
unified.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.645cd82ebf48.Iaa7e88179372d60ef31157e379737b5babe54012@changeid
|
|
'GEN3' here really means 'AX210'. Rename.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.b7fb5b854ded.Ib52b84c6e36e312b2eeb84a3cf71c6185fb52ee7@changeid
|
|
We have iwl_tx_cmd for devices older than 22000, iwl_tx_cmd_gen2 for
22000 devices, and iwl_tx_cmd_gen3 ax210 and up.
But the convention for all other APIs is to have the latest version
without any prefix and the older ones - with a _vX prefix,
where X is the highest version that this struct support.
The term 'gen' was introduced as the name of the (back then) new
transport, and should not be used as a device name (for that we have the
actual names: 22000, ax210, etc.)
Now as a new transport, called 'gen3', is going to be written and it can
be confused with this API.
Move iwl_tx_cmd to use the regular versioning convention.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.806e40c8f767.Ibc0e95e43a6fa6d47f72823bf804314d5db84618@changeid
|
|
This version is not used on any device. Don't support it.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Add support and enable decoding of H264 High 10 and 4:2:2 profiles.
Decoded CAPTURE buffer width is aligned to 64 pixels to accommodate HW
requirement of 10-bit format buffers, fixes decoding of all the 4:2:2
and 10bit 4:2:2 flusters tests except two stream that present some
visual artifacts.
- Hi422FREXT17_SONY_A
- Hi422FREXT19_SONY_A
The get_image_fmt() ops is implemented to select an image format
required for the provided SPS control, and returns RKVDEC_IMG_FMT_ANY
for other controls.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
Add support for a get_image_fmt() ops that returns the required image
format.
The CAPTURE format is reset when the required image format changes and
the buffer queue is not busy.
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Jonas Karlman <jonas@kwiboo.se>
Co-developed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
Setting up the control handler calls into .s_ctrl ops. While validating
the controls the ops may need to access some of the context state, which
could lead to a crash if not properly initialized.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
Neither the hardware nor the kernel API support FMO/ASO features
required by the full baseline profile. Therefore limit the minimum
profile to the constrained baseline profile explicitly.
Suggested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
The HW iommu is able to support a 34-bit iova address-space (16GB),
enable this feature for the encoder/decoder driver by shifting the
address by two bits and setting the extended address registers.
Signed-off-by: Jianhua Lin <jianhua.lin@mediatek.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
During initialization, the post processor allocates the same number of
buffers as the buf queue.
As the init function is called in streamon(), if an allocation fails,
streamon will return an error and streamoff() will not be called, keeping
all post processor buffers allocated.
To avoid that, all post proc buffers are freed in case of an allocation
error.
Fixes: 26711491a807 ("media: verisilicon: Refactor postprocessor to store more buffers")
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
mdp_get_plat_device() was added in 2022 but has remained unused.
Remove it.
Fixes: 61890ccaefaf ("media: platform: mtk-mdp3: add MediaTek MDP3 driver")
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Add proper pahole version dependency to CONFIG_GENDWARFKSYMS to avoid
module loading errors
- Fix UAPI header tests for the OpenRISC architecture
- Add dependency on the libdw package in Debian and RPM packages
- Disable -Wdefault-const-init-unsafe warnings on Clang
- Make "make clean ARCH=um" also clean the arch/x86/ directory
- Revert the use of -fmacro-prefix-map=, which causes issues with
debugger usability
* tag 'kbuild-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix typos "module.builtin" to "modules.builtin"
Revert "kbuild, rust: use -fremap-path-prefix to make paths relative"
Revert "kbuild: make all file references relative to source root"
kbuild: fix dependency on sorttable
init: remove unused CONFIG_CC_CAN_LINK_STATIC
um: let 'make clean' properly clean underlying SUBARCH as well
kbuild: Disable -Wdefault-const-init-unsafe
kbuild: rpm-pkg: Add (elfutils-devel or libdw-devel) to BuildRequires
kbuild: deb-pkg: Add libdw-dev:native to Build-Depends-Arch
usr/include: openrisc: don't HDRTEST bpf_perf_event.h
kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86
|
|
Remove hard-coded strings by using the str_disabled_enabled() helper.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250210224246.363318-1-thorsten.blum@linux.dev
|
|
Remove hard-coded strings by using the str_enabled_disabled() and
str_on_off() helper functions.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250117114625.64903-2-thorsten.blum@linux.dev
|
|
Remove hard-coded strings by using the str_write_read() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250210100648.1440-2-thorsten.blum@linux.dev
|
|
strcpy() is deprecated; use strscpy() instead.
Don't cast the destination buffer from 'u8[]' to 'char *' to satisfy the
__must_be_array() requirement of strscpy().
No functional changes intended.
Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250421183110.436265-1-thorsten.blum@linux.dev
|
|
When a device is opened by a userspace driver, via VFIO interface, DMA
window is created. This DMA window has TCE Table and a corresponding
data for userview of TCE table.
When the userspace driver closes the device, all the above infrastructure
is free'ed and the device control given back to kernel. Both DMA window
and TCE table is getting free'ed. But due to a code bug, userview of the
TCE table is not getting free'ed. This is resulting in a memory leak.
Befow is the information from KMEMLEAK
unreferenced object 0xc008000022af0000 (size 16777216):
comm "senlib_unit_tes", pid 9346, jiffies 4294983174
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
kmemleak_vmalloc+0xc8/0x1a0
__vmalloc_node_range+0x284/0x340
vzalloc+0x58/0x70
spapr_tce_create_table+0x4b0/0x8d0
tce_iommu_create_table+0xcc/0x170 [vfio_iommu_spapr_tce]
tce_iommu_create_window+0x144/0x2f0 [vfio_iommu_spapr_tce]
tce_iommu_ioctl.part.0+0x59c/0xc90 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x88/0x280 [vfio]
sys_ioctl+0xf4/0x160
system_call_exception+0x164/0x310
system_call_vectored_common+0xe8/0x278
unreferenced object 0xc008000023b00000 (size 4194304):
comm "senlib_unit_tes", pid 9351, jiffies 4294984116
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
kmemleak_vmalloc+0xc8/0x1a0
__vmalloc_node_range+0x284/0x340
vzalloc+0x58/0x70
spapr_tce_create_table+0x4b0/0x8d0
tce_iommu_create_table+0xcc/0x170 [vfio_iommu_spapr_tce]
tce_iommu_create_window+0x144/0x2f0 [vfio_iommu_spapr_tce]
tce_iommu_create_default_window+0x88/0x120 [vfio_iommu_spapr_tce]
tce_iommu_ioctl.part.0+0x57c/0xc90 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x88/0x280 [vfio]
sys_ioctl+0xf4/0x160
system_call_exception+0x164/0x310
system_call_vectored_common+0xe8/0x278
Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250512224653.35697-1-gbatra@linux.ibm.com
|
|
For multiple devices, both primary and extra devices should be the
same type. `erofs_init_device` has already guaranteed that if the
primary is a file-backed device, extra devices should also be
regular files.
However, if the primary is a block device while the extra device
is a file-backed device, `erofs_init_device` will get an ENOTBLK,
which is not treated as an error in `erofs_fc_get_tree`, and that
leads to an UAF:
erofs_fc_get_tree
get_tree_bdev_flags(erofs_fc_fill_super)
erofs_read_superblock
erofs_init_device // sbi->dif0 is not inited yet,
// return -ENOTBLK
deactivate_locked_super
free(sbi)
if (err is -ENOTBLK)
sbi->dif0.file = filp_open() // sbi UAF
So if -ENOTBLK is hitted in `erofs_init_device`, it means the
primary device must be a block device, and the extra device
is not a block device. The error can be converted to -EINVAL.
Fixes: fb176750266a ("erofs: add file-backed mount support")
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250515014837.3315886-1-shengyong1@xiaomi.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.16-rc1:
Once more, with async flips.
UAPI Changes:
- Add IN_FORMATS_ASYNC property, use in i915.
Cross-subsystem Changes:
- Remove some unused debug code in dma-buf.
Core Changes:
Driver Changes:
- Add Novatek NT37801 panel.
- Allow submitting empty commands in amdxdna.
- Convert cirrus to use managed request_all_regions.
- Move Sitronix from tiny to their own place.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/23ded62c-6a62-4195-9c08-4dfb81eafd72@linux.intel.com
|
|
Michael Kelley says:
====================
hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2"
Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver
for Hyper-V synthetic networking (netvsc) occasionally reports
"nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
that Hyper-V has rejected a network packet transmit request from the
guest, and the outgoing network packet is dropped. Higher level
network protocols presumably recover and resend the packet so there is
no functional error, but performance is slightly impacted. Commit
dca5161f9bd0 is not the cause of the error -- it only added reporting
of an error that was already happening without any notice. The error
has presumably been present since the netvsc driver was originally
introduced into Linux.
This patch set fixes the root cause of the problem, which is that the
netvsc driver in Linux may send an incorrectly formatted VMBus message
to Hyper-V when transmitting the network packet. The incorrect
formatting occurs when the rndis header of the VMBus message crosses a
page boundary due to how the Linux skb head memory is aligned. In such
a case, two PFNs are required to describe the location of the rndis
header, even though they are contiguous in guest physical address
(GPA) space. Hyper-V requires that two PFNs be in a single "GPA range"
data struture, but current netvsc code puts each PFN in its own GPA
range, which Hyper-V rejects as an error in the case of the rndis
header.
The incorrect formatting occurs only for larger packets that netvsc
must transmit via a VMBus "GPA Direct" message. There's no problem
when netvsc transmits a smaller packet by copying it into a pre-
allocated send buffer slot because the pre-allocated slots don't have
page crossing issues.
After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much
more frequently in VMs with 16 or more vCPUs. It may occur every few
seconds, or even more frequently, in a ssh session that outputs a lot
of text. Commit 14ad6ed30a10 subtly changes how skb head memory is
allocated, making it much more likely that the rndis header will cross
a page boundary when the vCPU count is 16 or more. The changes in
commit 14ad6ed30a10 are perfectly valid -- they just had the side
effect of making the netvsc bug more prominent.
One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer()
and just combine them into a single GPA range. Such a fix is very
contained. But conceptually it is fixing the problem at the wrong
level. So this patch set takes the broader approach of maintaining
the already known grouping of contiguous PFNs at a higher level in
the netvsc driver code, and propagating that grouping down to the
creation of the VMBus message to send to Hyper-V. Maintaining the
grouping fixes this problem, and has the added benefit of allowing
netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce
buffering in CoCo VMs.
Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc()
to specify multiple GPA ranges. In current code
vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI
driver, and it always creates a single GPA range.
Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of
netvsc still don't group contiguous PFNs, this patch is functionally
neutral. The VMBus message to Hyper-V still has many GPA ranges, each
with a single PFN. But it lays the groundwork for the next patch.
Patch 3 changes the higher levels of netvsc to preserve the already
known grouping of contiguous PFNs. When the contiguous groupings are
passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple
PFNs are produced, as expected by Hyper-V. This is point at which the
core problem is fixed.
Patches 4 and 5 remove code that is no longer necessary after the
previous patches.
These changes provide a net reduction of about 65 lines of code, which
is an added benefit.
These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo
VMs, and in Dv6-series VMs where the netvsp implementation is in the
OpenHCL paravisor instead of the Hyper-V host.
These changes are built against kernel version 6.15-rc6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
====================
Link: https://patch.msgid.link/20250513000604.1396-1-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the netvsc driver changed to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining
callers. Remove it.
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
init_page_array() now always creates a single page buffer array entry
for the rndis message, even if the rndis message crosses a page
boundary. As such, the number of page buffer array entries used for
the rndis message must no longer be tracked -- it is always just 1.
Remove the rmsg_pgcnt field and use "1" where the value is needed.
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in
SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux
driver for Hyper-V synthetic networking (netvsc) occasionally reports
"nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
that Hyper-V has rejected a network packet transmit request from the
guest, and the outgoing network packet is dropped. Higher level
network protocols presumably recover and resend the packet so there is
no functional error, but performance is slightly impacted. Commit
dca5161f9bd0 is not the cause of the error -- it only added reporting
of an error that was already happening without any notice. The error
has presumably been present since the netvsc driver was originally
introduced into Linux.
The root cause of the problem is that the netvsc driver in Linux may
send an incorrectly formatted VMBus message to Hyper-V when
transmitting the network packet. The incorrect formatting occurs when
the rndis header of the VMBus message crosses a page boundary due to
how the Linux skb head memory is aligned. In such a case, two PFNs are
required to describe the location of the rndis header, even though
they are contiguous in guest physical address (GPA) space. Hyper-V
requires that two rndis header PFNs be in a single "GPA range" data
struture, but current netvsc code puts each PFN in its own GPA range,
which Hyper-V rejects as an error.
The incorrect formatting occurs only for larger packets that netvsc
must transmit via a VMBus "GPA Direct" message. There's no problem
when netvsc transmits a smaller packet by copying it into a pre-
allocated send buffer slot because the pre-allocated slots don't have
page crossing issues.
After commit 14ad6ed30a10 ("net: allow small head cache usage with
large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs
much more frequently in VMs with 16 or more vCPUs. It may occur every
few seconds, or even more frequently, in an ssh session that outputs a
lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is
allocated, making it much more likely that the rndis header will cross
a page boundary when the vCPU count is 16 or more. The changes in
commit 14ad6ed30a10 are perfectly valid -- they just had the side
effect of making the netvsc bug more prominent.
Current code in init_page_array() creates a separate page buffer array
entry for each PFN required to identify the data to be transmitted.
Contiguous PFNs get separate entries in the page buffer array, and any
information about contiguity is lost.
Fix the core issue by having init_page_array() construct the page
buffer array to represent contiguous ranges rather than individual
pages. When these ranges are subsequently passed to
netvsc_build_mpb_array(), it can build GPA ranges that contain
multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete
error status: 2". If instead the network packet is sent by copying
into a pre-allocated send buffer slot, the copy proceeds using the
contiguous ranges rather than individual pages, but the result of the
copying is the same. Also fix rndis_filter_send_request() to construct
a contiguous range, since it has its own page buffer array.
This change has a side benefit in CoCo VMs in that netvsc_dma_map()
calls dma_map_single() on each contiguous range instead of on each
page. This results in fewer calls to dma_map_single() but on larger
chunks of memory, which should reduce contention on the swiotlb.
Since the page buffer array now contains one entry for each contiguous
range instead of for each individual page, the number of entries in
the array can be reduced, saving 208 bytes of stack space in
netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus
messages. This function creates a series of GPA ranges, each of which
contains a single PFN. However, if the rndis header in the VMBus
message crosses a page boundary, the netvsc protocol with the host
requires that both PFNs for the rndis header must be in a single "GPA
range" data structure, which isn't possible with
vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a
new function netvsc_build_mpb_array() to build a VMBus message with
multiple GPA ranges, each of which may contain multiple PFNs. Use
vmbus_sendpacket_mpb_desc() to send this VMBus message to the host.
There's no functional change since higher levels of netvsc don't
maintain or propagate knowledge of contiguous PFNs. Based on its
input, netvsc_build_mpb_array() still produces a separate GPA range
for each PFN and the behavior is the same as with
vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a
subsequent patch to provide the necessary grouping.
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver
and is hardcoded to create a single GPA range. To allow it to also be
used by the netvsc driver to create multiple GPA ranges, no longer
hardcode as having a single GPA range. Allow the calling driver to
specify the rangecount in the supplied descriptor.
Update the storvsc driver to reflect this new approach.
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
cpsw->slaves[slave_no].phy should be equal to netdev->phydev, because it
is assigned from phy_attach_direct(). The latter is indirectly called
from the two identically named cpsw_slave_open() functions, one in
cpsw.c and another in cpsw_new.c.
Thus, the driver should not need custom logic to find the PHY, the core
can find it, and phy_do_ioctl_running() achieves exactly that.
However, that is only the case for cpsw_new and for the cpsw driver in
dual EMAC mode. This is explained in more detail in the previous commit.
Thus, allow the simpler core logic to execute for cpsw_new, and move
cpsw_ndo_ioctl() to cpsw.c.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250512114422.4176010-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the two cpsw drivers to the new API, so that the
ndo_eth_ioctl() path can be removed completely.
The cpsw_hwtstamp_get() and cpsw_hwtstamp_set() methods (and their shim
definitions, for the case where CONFIG_TI_CPTS is not enabled) must have
their prototypes adjusted.
These methods are used by two drivers (cpsw and cpsw_new), with vastly
different configurations:
- cpsw has two operating modes:
- "dual EMAC" - enabled through the "dual_emac" device tree property -
creates one net_device per EMAC / slave interface (but there is no
bridging offload)
- "switch mode" - default - there is a single net_device, with two
EMACs/slaves behind it (and switching between them happens
unbeknownst to the network stack).
- cpsw_new always registers one net_device for each EMAC which doesn't
have status = "disabled". In terms of switching, it has two modes:
- "dual EMAC": default, no switching between ports, no switchdev
offload.
- "switch mode": enabled through the "switch_mode" devlink parameter,
offloads the Linux bridge through switchdev
Essentially, in 3 out of 4 operating modes, there is a bijective
relation between the net_device and the slave. Timestamping can thus be
configured on individual slaves. But in the "switch mode" of the cpsw
driver, ndo_eth_ioctl() targets a single slave, designated using the
"active_slave" device tree property.
To deal with these different cases, the common portion of the drivers,
cpsw_priv.c, has the cpsw_slave_index() function pointer, set to
separate, identically named cpsw_slave_index_priv() by the 2 drivers.
This is all relevant because cpsw_ndo_ioctl() has the old-style
phy_has_hwtstamp() logic which lets the PHY handle the timestamping
ioctls. Normally, that logic should be obsoleted by the more complex
logic in the core, which permits dynamically selecting the timestamp
provider - see dev_set_hwtstamp_phylib().
But I have doubts as to how this works for the "switch mode" of the dual
EMAC driver, because the core logic only engages if the PHY is visible
through ndev->phydev (this is set by phy_attach_direct()).
In cpsw.c, we have:
cpsw_ndo_open()
-> for_each_slave(priv, cpsw_slave_open, priv); // continues on errors
-> of_phy_connect()
-> phy_connect_direct()
-> phy_attach_direct()
OR
-> phy_connect()
-> phy_connect_direct()
-> phy_attach_direct()
The problem for "switch mode" is that the behavior of phy_attach_direct()
called twice in a row for the same net_device (once for each slave) is
probably undefined.
For sure it will overwrite dev->phydev. I don't see any explicit error
checks for this case, and even if there were, the for_each_slave() call
makes them non-fatal to cpsw_ndo_open() anyway.
I have no idea what is the extent to which this provides a usable
result, but the point is: only the last attached PHY will be visible
in dev->phydev, and this may well be a different PHY than
cpsw->slaves[slave_no].phy for the "active_slave".
In dual EMAC mode, as well as in cpsw_new, this should not be a problem.
I don't know whether PHY timestamping is a use case for the cpsw "switch
mode" as well, and I hope that there isn't, because for the sake of
simplicity, I've decided to deliberately break that functionality, by
refusing all PHY timestamping. Keeping it would mean blocking the old
API from ever being removed. In the new dev_set_hwtstamp_phylib() API,
it is not possible to operate on a phylib PHY other than dev->phydev,
and I would very much prefer not adding that much complexity for bizarre
driver decisions.
Final point about the cpsw_hwtstamp_get() conversion: we don't need to
propagate the unnecessary "config.flags = 0;", because dev_get_hwtstamp()
provides a zero-initialized struct kernel_hwtstamp_config.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250512114422.4176010-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
EROFS uses NID to indicate the on-disk inode offset, which can
exceed 32 bits. However, the default encode_fh uses the ino32,
thus it doesn't work if the image is larger than 128GiB.
Let's introduce our own helpers to encode file handles.
It's easy to reproduce:
1. prepare an erofs image with nid bigger than U32_MAX
2. mount -t erofs foo.img /mnt/erofs
3. set exportfs with configuration: /mnt/erofs *(rw,sync,
no_root_squash)
4. mount -t nfs $IP:/mnt/erofs /mnt/nfs
5. md5sum /mnt/nfs/foo # foo is the file which nid bigger
than U32_MAX. # you will get ESTALE error.
In the case of overlayfs, the underlying filesystem's file
handle is encoded in ovl_fb.fid, which is similar to NFS's
case. If the NID of file is larger than U32_MAX, the overlay
will get -ESTALE error when calls exportfs_decode_fh.
Fixes: 3e917cc305c6 ("erofs: make filesystem exportable")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250507094015.14007-1-lihongbo22@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"A few last minute fixes for v6.15"
* tag 'tpmdd-next-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: tis: Double the timeout B to 4s
char: tpm: tpm-buf: Add sanity check fallback in read helpers
tpm: Mask TPM RC in tpm2_start_auth_session()
|
|
Jason Xing says:
====================
misc drivers' sw timestamp changes
This series modified three outstanding drivers among more than 100 drivers
because the software timestamp generation is too early. The idea of this
series is derived from the brief talk[1] with Willem. In conclusion, this
series makes the generation of software timestamp near/before kicking the
doorbell for drivers.
[1]: https://lore.kernel.org/all/681b9d2210879_1f6aad294bc@willemb.c.googlers.com.notmuch/
v2: https://lore.kernel.org/20250508033328.12507-1-kerneljasonxing@gmail.com
====================
Link: https://patch.msgid.link/20250510134812.48199-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make sure the call of skb_tx_timestamp is as close as possbile to the
doorbell.
The patch also adjusts the order of setting SKBTX_IN_PROGRESS and
generate software timestamp so that without SOF_TIMESTAMPING_OPT_TX_SWHW
being set the software and hardware timestamps will not appear in the
error queue of socket nearly at the same time (Please see __skb_tstamp_tx()).
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20250510134812.48199-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|