Age | Commit message (Collapse) | Author |
|
Add a test to verify that detached mounts behave correctly.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-10-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently, detached mounts can only be mounted onto attached mounts.
This limitation makes it impossible to assemble a new private rootfs and
move it into place. That's an extremely powerful concept for container
and service workloads that we should support.
Right now, a detached tree must be created, attached, then it can gain
additional mounts and then it can either be moved (if it doesn't reside
under a shared mount) or a detached mount created again. Lift this
restriction.
In order to allow mounting detached mounts onto other detached mounts
the same permission model used for creating detached mounts from
detached mounts can be used:
(1) Check that the caller is privileged over the owning user namespace
of it's current mount namespace.
(2) Check that the caller is located in the mount namespace of the mount
it wants to create a detached copy of.
The origin mount namespace of the anonymous mount namespace must be the
same as the caller's mount namespace. To establish this the sequence
number of the caller's mount namespace and the origin sequence number of
the anonymous mount namespace are compared.
The caller is always located in a non-anonymous mount namespace since
anonymous mount namespaces cannot be setns()ed into. The caller's mount
namespace will thus always have a valid sequence number.
The owning namespace of any mount namespace, anonymous or non-anonymous,
can never change. A mount attached to a non-anonymous mount namespace
can never change mount namespace.
If the sequence number of the non-anonymous mount namespace and the
origin sequence number of the anonymous mount namespace match, the
owning namespaces must match as well.
Hence, the capability check on the owning namespace of the caller's
mount namespace ensures that the caller has the ability to attach the
mount tree.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-9-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow move_mount() to work with NULL path arguments.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-8-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-7-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add the ability to create detached mounts from detached mounts.
Currently, detached mounts can only be created from attached mounts.
This limitaton prevents various use-cases. For example, the ability to
mount a subdirectory without ever having to make the whole filesystem
visible first.
The current permission model for the OPEN_TREE_CLONE flag of the
open_tree() system call is:
(1) Check that the caller is privileged over the owning user namespace
of it's current mount namespace.
(2) Check that the caller is located in the mount namespace of the mount
it wants to create a detached copy of.
While it is not strictly necessary to do it this way it is consistently
applied in the new mount api. This model will also be used when allowing
the creation of detached mount from another detached mount.
The (1) requirement can simply be met by performing the same check as
for the non-detached case, i.e., verify that the caller is privileged
over its current mount namespace.
To meet the (2) requirement it must be possible to infer the origin
mount namespace that the anonymous mount namespace of the detached mount
was created from.
The origin mount namespace of an anonymous mount is the mount namespace
that the mounts that were copied into the anonymous mount namespace
originate from.
The origin mount namespace of the anonymous mount namespace must be the
same as the caller's mount namespace. To establish this the sequence
number of the caller's mount namespace and the origin sequence number of
the anonymous mount namespace are compared.
The caller is always located in a non-anonymous mount namespace since
anonymous mount namespaces cannot be setns()ed into. The caller's mount
namespace will thus always have a valid sequence number.
The owning namespace of any mount namespace, anonymous or non-anonymous,
can never change. A mount attached to a non-anonymous mount namespace
can never change mount namespace.
If the sequence number of the non-anonymous mount namespace and the
origin sequence number of the anonymous mount namespace match, the
owning namespaces must match as well.
Hence, the capability check on the owning namespace of the caller's
mount namespace ensures that the caller has the ability to copy the
mount tree.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-6-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a helper that verifies whether a caller may copy a given mount tree.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-5-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of acquiring the namespace semaphore and the mount lock
everytime we close a file with FMODE_NEED_UNMOUNT set add a fastpath
that checks whether we need to at all. Most of the time the caller will
have attached the mount to the filesystem hierarchy and there's nothing
to do.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-4-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
After we've attached a detached mount tree the anonymous mount namespace
must be empty. Add an assert and make this assumption explicit.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-3-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a helper that checks whether a give mount namespace is empty instead
of open-coding the specific data structure check. This also be will be
used in follow-up patches.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-2-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the sequence number of the mount namespace the anonymous mount
namespace has been created from. This information will be used in
follow-up patches.
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-1-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If a user calls p = kmalloc(1024); kfree(p); kfree(p); and 'p' was the
only object in the slab, we may free the slab after the first call to
kfree(). If we do, we clear PGTY_slab and the second call to kfree()
will call free_large_kmalloc(). That will leave a trace in the logs
("object pointer: 0x%p"), but otherwise proceed to free the memory,
which is likely to corrupt the page allocator's metadata.
Allocate a new page type for large kmalloc and mark the memory with it
while it's allocated. That lets us detect this double-free and return
without harming any data structures.
Reported-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Add a test_kfree_rcu_wq_destroy test to verify a kmem_cache_destroy()
from a workqueue context. The problem is that, before destroying any
cache the kvfree_rcu_barrier() is invoked to guarantee that in-flight
freed objects are flushed.
The _barrier() function queues and flushes its own internal workers
which might conflict with a workqueue type a kmem-cache gets destroyed
from.
One example is when a WQ_MEM_RECLAIM workqueue is flushing !WQ_MEM_RECLAIM
events which leads to a kernel splat. See the check_flush_dependency() in
the workqueue.c file.
If this test does not emits any kernel warning, it is passed.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
slab_err() has variadic printf arguments but instead of passing them to
slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
To allow passing them directly, turn slab_bug() to __slab_bug() with a
va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
Then slab_err() can call __slab_bug() without the intermediate buffer.
Also constify fmt everywhere, which also simplifies object_err()'s
call to slab_bug().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
|
|
If a slab object is corrupted or an error occurs in its internal
validation, continuing after restoration may cause other side effects.
At this point, it is difficult to debug because the problem occurred in
the past. It is useful to use WARN() to catch errors at the point of
issue because WARN() could trigger panic for system debugging when
panic_on_warn is enabled. WARN() is added where to detect the error on
slab_err and object_err.
It makes sense to only do the WARN() after printing the logs. slab_err
is splited to __slab_err that calls the WARN() and it is called after
printing logs.
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Previously, the restore occurred after printing the object in slub.
After commit 47d911b02cbe ("slab: make check_object() more consistent"),
the bytes are printed after the restore. This information about the bytes
before the restore is highly valuable for debugging purpose.
For instance, in a event of cache issue, it displays byte patterns
by breaking them down into 64-bytes units. Without this information,
we can only speculate on how it was broken. Hence the corrupted regions
should be printed prior to the restoration process. However if an object
breaks in multiple places, the same log may be output multiple times.
Therefore the slub log is reported only once to prevent redundant printing,
by sending a parameter indicating whether an error has occurred previously.
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
As revealed by this writeup[1], due to the fact that __kmalloc_node (now
renamed to __kmalloc_node_noprof) is an exported symbol and will never
get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
would make the RET_IP inside always point to the same address:
upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof <-- all macros all the way down here
__kmalloc_node_noprof
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to
That literally means all kmalloc invoked via kvmalloc would use the same
seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
this hardening non-functional.
The root cause of this problem, IMHO, is that using RET_IP only cannot
identify the actual allocation site in case of kmalloc being called
inside non-inlined wrappers or helper functions. And I believe there
could be similar cases in other functions. Nevertheless, I haven't
thought of any good solution for this. So for now let's solve this
specific case first.
For __kvmalloc_node_noprof, replace __kmalloc_node_noprof and call
__do_kmalloc_node directly instead, so that RET_IP can take the return
address of kvmalloc and differentiate each kvmalloc invocation:
upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof <-- all macros all the way down here
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to
Thanks to Tamás Koczka for the report and discussion!
Link: https://github.com/google/security-research/blob/908d59b573960dc0b90adda6f16f7017aca08609/pocs/linux/kernelctf/CVE-2024-27397_mitigation/docs/exploit.md?plain=1#L259 [1]
Reported-by: Tamás Koczka <poprdi@google.com>
Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Move __kvmalloc_node_noprof (as well as kvfree*, kvrealloc_noprof and
kmalloc_gfp_adjust for consistency) into mm/slub.c so that it can
directly invoke __do_kmalloc_node, which is needed for the next patch.
No functional changes intended.
Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
SLUB is the only remaining allocator. We can therefore get rid of
the logic for allocator-specific flags:
* Merge SLAB_CACHE_FLAGS into SLAB_CORE_FLAGS.
* Remove CACHE_CREATE_MASK and instead mask out SLAB_DEBUG_FLAGS if
!CONFIG_SLUB_DEBUG. SLAB_DEBUG_FLAGS is now defined
unconditionally (no impact on existing code, which ignores it if
!CONFIG_SLUB_DEBUG).
* Define SLAB_FLAGS_PERMITTED in terms of SLAB_CORE_FLAGS and
SLAB_DEBUG_FLAGS (no functional change).
While at it also remove misleading comments that suggest that
multiple allocators are available.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Currently kvfree_rcu() APIs use a system workqueue which is
"system_unbound_wq" to driver RCU machinery to reclaim a memory.
Recently, it has been noted that the following kernel warning can
be observed:
<snip>
workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G E 6.13.2-0_g925d379822da #1
Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
Workqueue: nvme-wq nvme_scan_work
RIP: 0010:check_flush_dependency+0x112/0x120
Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0xa4/0x140
? check_flush_dependency+0x112/0x120
? report_bug+0xe1/0x140
? check_flush_dependency+0x112/0x120
? handle_bug+0x5e/0x90
? exc_invalid_op+0x16/0x40
? asm_exc_invalid_op+0x16/0x20
? timer_recalc_next_expiry+0x190/0x190
? check_flush_dependency+0x112/0x120
? check_flush_dependency+0x112/0x120
__flush_work.llvm.1643880146586177030+0x174/0x2c0
flush_rcu_work+0x28/0x30
kvfree_rcu_barrier+0x12f/0x160
kmem_cache_destroy+0x18/0x120
bioset_exit+0x10c/0x150
disk_release.llvm.6740012984264378178+0x61/0xd0
device_release+0x4f/0x90
kobject_put+0x95/0x180
nvme_put_ns+0x23/0xc0
nvme_remove_invalid_namespaces+0xb3/0xd0
nvme_scan_work+0x342/0x490
process_scheduled_works+0x1a2/0x370
worker_thread+0x2ff/0x390
? pwq_release_workfn+0x1e0/0x1e0
kthread+0xb1/0xe0
? __kthread_parkme+0x70/0x70
ret_from_fork+0x30/0x40
? __kthread_parkme+0x70/0x70
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
<snip>
To address this switch to use of independent WQ_MEM_RECLAIM
workqueue, so the rules are not violated from workqueue framework
point of view.
Apart of that, since kvfree_rcu() does reclaim memory it is worth
to go with WQ_MEM_RECLAIM type of wq because it is designed for
this purpose.
Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
Reported-by: Keith Busch <kbusch@kernel.org>
Closes: https://lore.kernel.org/all/Z7iqJtCjHKfo8Kho@kbusch-mbp/
Cc: stable@vger.kernel.org
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
After phy initialization, some phy operations can only be executed while
in lower P states. Ensure GUSB3PIPECTL.SUSPENDENABLE and
GUSB2PHYCFG.SUSPHY are set soon after initialization to avoid blocking
phy ops.
Previously the SUSPENDENABLE bits are only set after the controller
initialization, which may not happen right away if there's no gadget
driver or xhci driver bound. Revise this to clear SUSPENDENABLE bits
only when there's mode switching (change in GCTL.PRTCAPDIR).
Fixes: 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init")
Cc: stable <stable@kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/633aef0afee7d56d2316f7cc3e1b2a6d518a8cc9.1738280911.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add missing "avdd-0v9-supply" and "avdd-1v8-supply" properties to the "hdmi"
node in the Pine64 RockPro64 board dtsi file. To achieve this, also add the
associated "vcca_0v9" regulator that produces the 0.9 V supply, [1][2] which
hasn't been defined previously in the board dtsi file.
This also eliminates the following warnings from the kernel log:
dwhdmi-rockchip ff940000.hdmi: supply avdd-0v9 not found, using dummy regulator
dwhdmi-rockchip ff940000.hdmi: supply avdd-1v8 not found, using dummy regulator
There are no functional changes to the way board works with these additions,
because the "vcc1v8_dvp" and "vcca_0v9" regulators are always enabled, [1][2]
but these additions improve the accuracy of hardware description.
These changes apply to the both supported hardware revisions of the Pine64
RockPro64, i.e. to the production-run revisions 2.0 and 2.1. [1][2]
[1] https://files.pine64.org/doc/rockpro64/rockpro64_v21-SCH.pdf
[2] https://files.pine64.org/doc/rockpro64/rockpro64_v20-SCH.pdf
Fixes: e4f3fb490967 ("arm64: dts: rockchip: add initial dts support for Rockpro64")
Cc: stable@vger.kernel.org
Suggested-by: Diederik de Haas <didi.debian@cknow.org>
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Tested-by: Diederik de Haas <didi.debian@cknow.org>
Link: https://lore.kernel.org/r/df3d7e8fe74ed5e727e085b18c395260537bb5ac.1740941097.git.dsimic@manjaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Firmware reports support for additional SRQs in the max_srq_ext field.
In CREQ_QUERY_FUNC response, if MAX_SRQ_EXTENDED flag is set, driver
should derive the total number of max SRQs by the summation of
"max_srq" and "max_srq_ext" fields.
Fixes: b1b66ae094cd ("bnxt_en: Use FW defined resource limits for RoCE")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Preethi G <preethi.gurusiddalingeswaraswamy@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1741021178-2569-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The modulo operation returns wrong result without the
paranthesis and that resulted in wrong QP table indexing.
Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1741021178-2569-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Driver is creating QP table too early while probing before
querying firmware capabilities. Driver currently is using
a hard coded values of 64K as size while creating QP table.
This resulted in a crash when firmwre supported QP count is
more than 64K.
To fix the issue, move the QP tabel creation after the firmware
capabilities are queried. Use the firmware returned maximum value
of QPs while creating the QP table.
Fixes: b1b66ae094cd ("bnxt_en: Use FW defined resource limits for RoCE")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1741021178-2569-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In rdma-core, the following failures appear.
"
$ ./build/bin/run_tests.py -k device
ssssssss....FF........s
======================================================================
FAIL: test_query_device (tests.test_device.DeviceTest.test_query_device)
Test ibv_query_device()
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ubuntu/rdma-core/tests/test_device.py", line 63, in
test_query_device
self.verify_device_attr(attr, dev)
File "/home/ubuntu/rdma-core/tests/test_device.py", line 200, in
verify_device_attr
assert attr.sys_image_guid != 0
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
======================================================================
FAIL: test_query_device_ex (tests.test_device.DeviceTest.test_query_device_ex)
Test ibv_query_device_ex()
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ubuntu/rdma-core/tests/test_device.py", line 222, in
test_query_device_ex
self.verify_device_attr(attr_ex.orig_attr, dev)
File "/home/ubuntu/rdma-core/tests/test_device.py", line 200, in
verify_device_attr
assert attr.sys_image_guid != 0
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
"
The root cause is: before a net device is set with rxe, this net device
is used to generate a sys_image_guid.
Fixes: 2ac5415022d1 ("RDMA/rxe: Remove the direct link to net_device")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20250302215444.3742072-1-yanjun.zhu@linux.dev
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull for-6.14-fixes to receive:
9360dfe4cbd6 ("sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()")
which conflicts with:
337d1b354a29 ("sched_ext: Move built-in idle CPU selection policy to a separate file")
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.
To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.
Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull affs fixes from David Sterba:
"Two fixes from Simon Tatham. They're real bugfixes for problems with
OFS floppy disks created on linux and then read in the emulated
Workbench environment"
* tag 'affs-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: don't write overlarge OFS data block size fields
affs: generate OFS sequence numbers starting at 1
|
|
Pull xfs cleanups from Carlos Maiolino:
"Just a few cleanups"
* tag 'xfs-fixes-6.14-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove the XBF_STALE check from xfs_buf_rele_cached
xfs: remove most in-flight buffer accounting
xfs: decouple buffer readahead from the normal buffer read path
xfs: reduce context switches for synchronous buffered I/O
|
|
Updated the MODULE_COMPRESS_NONE with MODULE_COMPRESS as it was no longer
available from kernel modules. As MODULE_COMPRESS and MODULE_DECOMPRESS
depends on MODULES removing MODULES as well.
Fixes: c7ff693fa209 ("module: Split modules_install compression and in-kernel decompression")
Signed-off-by: Arulpandiyan Vadivel <arulpandiyan.vadivel@siemens.com>
Link: https://lore.kernel.org/r/20250302103831.285381-1-arulpandiyan.vadivel@siemens.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Rearrange misplaced functions in sorted order.
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: R Sundar <prosunofficial@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20241119021719.7659-2-prosunofficial@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Since these functions handle moving between C strings and non-C strings,
they should check for the appropriate presence/lack of the nonstring
attribute on arguments.
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
In preparation for adding more type checking to the memtostr/strtomem*()
helpers, introduce the ability to check for the "nonstring" attribute.
This is the reverse of what was added to strscpy*() in commit 559048d156ff
("string: Check for "nonstring" attribute on strscpy() arguments").
Note that __annotated() must be explicitly tested for, as GCC added
__builtin_has_attribute() after it added the "nonstring" attribute. Do
so here to avoid the !__annotated() test triggering build failures
when __builtin_has_attribute() was missing but __nonstring was defined.
(I've opted to squash this fix into this patch so we don't end up with
a possible bisection target that would leave the kernel unbuildable.)
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Michael Kelley <mhklinux@outlook.com>
Closes: https://lore.kernel.org/all/adbe8dd1-a725-4811-ae7e-76fe770cf096@linux.vnet.ibm.com/
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe events fixes from Masami Hiramatsu:
- probe-events: Remove unused MAX_ARG_BUF_LEN macro - it is not used
- fprobe-events: Log error for exceeding the number of entry args.
Since the max number of entry args is limited, it should be checked
and rejected when the parser detects it.
- tprobe-events: Reject invalid tracepoint name
If a user specifies an invalid tracepoint name (e.g. including '/')
then the new event is not defined correctly in the eventfs.
- tprobe-events: Fix a memory leak when tprobe defined with $retval
There is a memory leak if tprobe is defined with $retval.
* tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro
tracing: fprobe-events: Log error for exceeding the number of entry args
tracing: tprobe-events: Reject invalid tracepoint name
tracing: tprobe-events: Fix a memory leak when tprobe with $retval
|
|
Print out the index of mismatching XSAVE bytes using unsigned decimal
format. Some versions of clang complain about trying to print an integer
as an unsigned char.
x86/sev_smoke_test.c:55:51: error: format specifies type 'unsigned char'
but the argument has type 'int' [-Werror,-Wformat]
Fixes: 8c53183dbaa2 ("selftests: kvm: add test for transferring FPU state into VMSA")
Link: https://lore.kernel.org/r/20250228233852.3855676-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs
spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried
to write to a read-only page. If a vCPU manages to complete an entire
iteration of the loop without hitting a read-only page, *and* the vCPU
observes mprotect_ro_done before starting a second iteration, then the
vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and
get out of sequence.
Replace the "do-while (!r)" loop around the associated _vcpu_run() with
a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit
-EFAULT, and retrying on success is super confusion, hides KVM bugs, and
complicates this fix. The do-while loop was semi-unintentionally added
specifically to fudge around a KVM x86 bug, and said bug is unhittable
without modifying the test to force x86 down the !(x86||arm64) path.
On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger
emulation of the store to memory. Due a (very, very) longstanding bug in
KVM x86's emulator, emulate writes to guest memory that fail during
__kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO. While that
is desirable in the !memslot case, it's wrong in this case as the failure
happens due to __copy_to_user() hitting a read-only page, not an emulated
MMIO region.
But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards
are clobbered to force x86 down the common path, and of course the
unexpected MMIO is a KVM bug, i.e. *should* cause a test failure.
Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Never rely on the CPU to restore/load host DR0..DR3 values, even if the
CPU supports DebugSwap, as there are no guarantees that SNP guests will
actually enable DebugSwap on APs. E.g. if KVM were to rely on the CPU to
load DR0..DR3 and skipped them during hw_breakpoint_restore(), KVM would
run with clobbered-to-zero DRs if an SNP guest created APs without
DebugSwap enabled.
Update the comment to explain the dangers, and hopefully prevent breaking
KVM in the future.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When running SEV-SNP guests on a CPU that supports DebugSwap, always save
the host's DR0..DR3 mask MSR values irrespective of whether or not
DebugSwap is enabled, to ensure the host values aren't clobbered by the
CPU. And for now, also save DR0..DR3, even though doing so isn't
necessary (see below).
SVM_VMGEXIT_AP_CREATE is deeply flawed in that it allows the *guest* to
create a VMSA with guest-controlled SEV_FEATURES. A well behaved guest
can inform the hypervisor, i.e. KVM, of its "requested" features, but on
CPUs without ALLOWED_SEV_FEATURES support, nothing prevents the guest from
lying about which SEV features are being enabled (or not!).
If a misbehaving guest enables DebugSwap in a secondary vCPU's VMSA, the
CPU will load the DR0..DR3 mask MSRs on #VMEXIT, i.e. will clobber the
MSRs with '0' if KVM doesn't save its desired value.
Note, DR0..DR3 themselves are "ok", as DR7 is reset on #VMEXIT, and KVM
restores all DRs in common x86 code as needed via hw_breakpoint_restore().
I.e. there is no risk of host DR0..DR3 being clobbered (when it matters).
However, there is a flaw in the opposite direction; because the guest can
lie about enabling DebugSwap, i.e. can *disable* DebugSwap without KVM's
knowledge, KVM must not rely on the CPU to restore DRs. Defer fixing
that wart, as it's more of a documentation issue than a bug in the code.
Note, KVM added support for DebugSwap on commit d1f85fbe836e ("KVM: SEV:
Enable data breakpoints in SEV-ES"), but that is not an appropriate Fixes,
as the underlying flaw exists in hardware, not in KVM. I.e. all kernels
that support SEV-SNP need to be patched, not just kernels with KVM's full
support for DebugSwap (ignoring that DebugSwap support landed first).
Opportunistically fix an incorrect statement in the comment; on CPUs
without DebugSwap, the CPU does NOT save or load debug registers, i.e.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When writing the header guard for gpu_scheduler_trace.h, a typo,
apparently, occurred.
Fix the typo and document the scope of the guard.
Fixes: 353da3c520b4 ("drm/amdgpu: add tracepoint for scheduler (v2)")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218124149.118002-2-phasta@kernel.org
|
|
The devm_memremap() function returns error pointers on error,
it doesn't return NULL.
Fixes: c7cefce03e69 ("hwmon: (xgene) access mailbox as RAM")
Signed-off-by: Xinghuo Chen <xinghuo.chen@foxmail.com>
Link: https://lore.kernel.org/r/tencent_9AD8E7683EC29CAC97496B44F3F865BA070A@qq.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
syzbot is able to crash hosts [1], using llc and devices
not supporting IFF_TX_SKB_SHARING.
In this case, e1000 driver calls eth_skb_pad(), while
the skb is shared.
Simply replace skb_get() by skb_clone() in net/llc/llc_s_ac.c
Note that e1000 driver might have an issue with pktgen,
because it does not clear IFF_TX_SKB_SHARING, this is an
orthogonal change.
We need to audit other skb_get() uses in net/llc.
[1]
kernel BUG at net/core/skbuff.c:2178 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 16371 Comm: syz.2.2764 Not tainted 6.14.0-rc4-syzkaller-00052-gac9c34d1e45a #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:pskb_expand_head+0x6ce/0x1240 net/core/skbuff.c:2178
Call Trace:
<TASK>
__skb_pad+0x18a/0x610 net/core/skbuff.c:2466
__skb_put_padto include/linux/skbuff.h:3843 [inline]
skb_put_padto include/linux/skbuff.h:3862 [inline]
eth_skb_pad include/linux/etherdevice.h:656 [inline]
e1000_xmit_frame+0x2d99/0x5800 drivers/net/ethernet/intel/e1000/e1000_main.c:3128
__netdev_start_xmit include/linux/netdevice.h:5151 [inline]
netdev_start_xmit include/linux/netdevice.h:5160 [inline]
xmit_one net/core/dev.c:3806 [inline]
dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3822
sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343
__dev_xmit_skb net/core/dev.c:4045 [inline]
__dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4621
dev_queue_xmit include/linux/netdevice.h:3313 [inline]
llc_sap_action_send_test_c+0x268/0x320 net/llc/llc_s_ac.c:144
llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline]
llc_sap_next_state net/llc/llc_sap.c:182 [inline]
llc_sap_state_process+0x239/0x510 net/llc/llc_sap.c:209
llc_ui_sendmsg+0xd0d/0x14e0 net/llc/af_llc.c:993
sock_sendmsg_nosec net/socket.c:718 [inline]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+da65c993ae113742a25f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67c020c0.050a0220.222324.0011.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4204eccc7b2a ("ASoC: tegra: Add support for S24_LE audio format")
added support for the S24_LE audio format, but duplicated S16_LE in
OUT_DAI() for ADX instead.
Fix this by adding support for the S24_LE audio format.
Compile-tested only.
Cc: stable@vger.kernel.org
Fixes: 4204eccc7b2a ("ASoC: tegra: Add support for S24_LE audio format")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250222225700.539673-2-thorsten.blum@linux.dev
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Temperatures are reported in units of Celsius however hwmon expects
values to be in millidegree of Celsius. Userspace tools observe values
close to zero and report it as "Not available" or incorrect values like
0C or 1C. Add a simple conversion to fix that.
Before the change:
wsa884x-virtual-0
Adapter: Virtual device
temp1: +0.0°C
--
wsa884x-virtual-0
Adapter: Virtual device
temp1: +0.0°C
Also reported as N/A before first amplifier power on.
After this change and initial wsa884x power on:
wsa884x-virtual-0
Adapter: Virtual device
temp1: +39.0°C
--
wsa884x-virtual-0
Adapter: Virtual device
temp1: +37.0°C
Tested on sm8550 only.
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://patch.msgid.link/20250221044024.1207921-1-alexey.klimov@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
create_sdw_dailinks()
Initialize current_be_id to 0 to handle the unlikely case when there are
no devices connected to a DAI.
In this case create_sdw_dailink() would return without touching the passed
pointer to current_be_id.
Found by gcc -fanalyzer
Fixes: 59bf457d8055 ("ASoC: intel: sof_sdw: Factor out SoundWire DAI creation")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20250303065552.78328-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
nft_ct_pcpu_template is a per-CPU variable and relies on disabled BH for its
locking. The refcounter is read and if its value is set to one then the
refcounter is incremented and variable is used - otherwise it is already
in use and left untouched.
Without per-CPU locking in local_bh_disable() on PREEMPT_RT the
read-then-increment operation is not atomic and therefore racy.
This can be avoided by using unconditionally __refcount_inc() which will
increment counter and return the old value as an atomic operation.
In case the returned counter is not one, the variable is in use and we
need to decrement counter. Otherwise we can use it.
Use __refcount_inc() instead of read and a conditional increment.
Fixes: edee4f1e9245 ("netfilter: nft_ct: add zone id set support")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The module parameter defines number of iso packets per one URB. User is
allowed to set any value to the parameter of type int, which can lead to
various kinds of weird and incorrect behavior like integer overflows,
truncations, etc. Number of packets should be a small non-negative number.
Since this parameter is read-only, its value can be validated on driver
probe.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Link: https://patch.msgid.link/20250303100413.835-1-m.masimov@mt-integration.ru
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add PCI ID for the Diamond Rapids Platforms
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250226214728.1256747-1-david.e.box@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Based on the dmesg messages from the original reporter:
[ 4.964073] ACPI: \_SB_.PCI0.LPCB.EC__.HKEY: BCTG evaluated but flagged as error
[ 4.964083] thinkpad_acpi: Error probing battery 2
Lenovo ThinkPad X131e also needs this battery quirk.
Reported-by: Fan Yang <804284660@qq.com>
Tested-by: Fan Yang <804284660@qq.com>
Co-developed-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250221164825.77315-1-jeffbai@aosc.io
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Volume buttons on Microsoft Surface Go 4 tablet didn't send any events.
Add Surface Go 4 DMI match to button_array_table to fix this.
Signed-off-by: Dmitry Panchenko <dmitry@d-systems.ee>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20250220154016.3620917-1-dmitry@d-systems.ee
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
When firmware traces are enabled, the firmware dumps 48-bit timestamps
for each trace as two 32-bit values, highest 32 bits (of which only 16
useful) first.
The driver was reassembling them the other way round i.e. interpreting
the first value in memory as the lowest 32 bits, and the second value
as the highest 32 bits (then truncated to 16 bits).
Due to this, firmware trace dumps showed very large timestamps even for
traces recorded shortly after GPU boot. The timestamps in these dumps
would also sometimes jump backwards because of the truncation.
Example trace dumped after loading the powervr module and enabling
firmware traces, where each line is commented with the timestamp value
in hexadecimal to better show both issues:
[93540092739584] : Host Sync Partition marker: 1 // 0x551300000000
[28419798597632] : GPU units deinit // 0x19d900000000
[28548647616512] : GPU deinit // 0x19f700000000
Update logic to reassemble the timestamps halves in the correct order.
Fixes: cb56cd610866 ("drm/imagination: Add firmware trace to debugfs")
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-fix-fw-trace-timestamps-v1-1-dba4aeb030ca@imgtec.com
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
|