Age | Commit message (Collapse) | Author |
|
Both regs_get_kernel_stack_nth() and regs_get_register() are not
inlined. With the new ftrace funcgraph-args feature they show up in
function graph tracing:
4) | sched_core_idle_cpu(cpu=4) {
4) 0.257 us | regs_get_register(regs=0x37fe00afa10, offset=2);
4) 0.218 us | regs_get_register(regs=0x37fe00afa10, offset=3);
4) 0.225 us | regs_get_register(regs=0x37fe00afa10, offset=4);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=5);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=6);
4) 0.245 us | regs_get_kernel_stack_nth(regs=0x37fe00afa10, n=20);
This is subtoptimal, since both functions are supposed to be ftrace
internal helper functions. If they appear in ftrace traces this reduces
readability significantly, plus this adds tons of extra useless extra
entries.
Address this by moving both functions and required helpers to ptrace.h and
always inline them. This way they don't appear in traces anymore. In
addition the overhead that comes with functions calls is also reduced.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
asm/thread_info.h requires PAGE_SIZE, which is defined in vdso/page.h,
but doesn't need to include asm/lowcore.h or asm/page.h.
Therefore change the includes accordingly and reduce header dependencies.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When calling the diag for DCSS unload on a non-IPL CPU, the sclp maximum
memory detection on the next IPL would falsely return the end of the
previously loaded DCSS.
This is because of an issue in z/VM, so work around it by always calling
the diag for DCSS unload on IPL CPU 0. That CPU cannot be set offline,
so the dcss_diag() call can directly be scheduled to CPU 0.
The wrong maximum memory value returned by sclp would only affect KASAN
kernels. When a DCSS within the falsely reported extra memory range is
loaded and accessed again, it would result in a kernel crash:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 001c0000a3ffe000 TEID: 001c0000a3ffe803
Fault in home space mode while using kernel ASCE.
AS:000000039955400b R2:00000003fe3b400b R3:000000037a2a8007 S:0000000000000020
Oops: 0010 ilc:3 [#1]SMP
[...]
CPU: 2 UID: 0 PID: 1563 Comm: mount Kdump: loaded Not tainted 6.15.0-rc5-11546-g3ea93fb3d026-dirty #7 NONE
Hardware name: IBM 3931 A01 704 (z/VM 7.4.0)
Krnl PSW : 0704c00180000000 000da6f2b338faf2 (kasan_check_range+0x172/0x310)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000040 001c0000a3ffe000 000000051fff0000 0000000000001000
0000000000000000 000da6f233380ff6 00000000000001f8 0000000000000000
001c0000a3ffe200 0000000000000040 001c0000a3ffe200 0000000000000200
000003ff97a2cfa8 0000000000000000 0000000000000010 000da672b58af070
Krnl Code: 000da6f2b338fae2: 41101008 la %r1,8(%r1)
000da6f2b338fae6: eca100268064 cgrj %r10,%r1,8,000da6f2b338fb32
#000da6f2b338faec: ebe00002000c srlg %r14,%r0,2
>000da6f2b338faf2: e3b010000002 ltg %r11,0(%r1)
000da6f2b338faf8: a77400a8 brc 7,000da6f2b338fc48
000da6f2b338fafc: 41b01008 la %r11,8(%r1)
000da6f2b338fb00: b904001b lgr %r1,%r11
000da6f2b338fb04: e3a0b0000002 ltg %r10,0(%r11)
Call Trace:
[<000da6f2b338faf2>] kasan_check_range+0x172/0x310
[<000da6f2b3390b3c>] __asan_memcpy+0x3c/0x90
[<000da6f233380ff6>] dcssblk_submit_bio+0x3a6/0x620 [dcssblk]
[<000da6f2b3eb403c>] __submit_bio+0x25c/0x4a0
[<000da6f2b3eb43bc>] __submit_bio_noacct+0x13c/0x450
[<000da6f2b3eb4bde>] submit_bio_noacct_nocheck+0x50e/0x620
[<000da6f2b34f4978>] mpage_readahead+0x318/0x3f0
[<000da6f2b31edbe6>] read_pages+0x156/0x740
[<000da6f2b31ee594>] page_cache_ra_unbounded+0x3c4/0x610
[<000da6f2b31ef094>] force_page_cache_ra+0x1f4/0x2d0
[<000da6f2b31d092e>] filemap_get_pages+0x2ce/0xaa0
[<000da6f2b31d1428>] filemap_read+0x328/0x9a0
[<000da6f2b3e9b7e8>] blkdev_read_iter+0x228/0x3b0
[<000da6f2b340f7a6>] vfs_read+0x5b6/0x7f0
[<000da6f2b34110be>] ksys_read+0x10e/0x1e0
[<000da6f2b4e7acb2>] __do_syscall+0x122/0x1f0
[<000da6f2b4e93ffe>] system_call+0x6e/0x90
Last Breaking-Event-Address:
[<000da6f2b338faac>] kasan_check_range+0x12c/0x310
Kernel panic - not syncing: Fatal exception: panic_on_oops
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Harald Freudenberger says:
====================
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
The new flag PKEY_XFLAG_NOMEMALLOC tells the PKEY layer (and
subsidiary layers) that it must not allocate any memory causing IO
operations. Note that the patches for this pkey/zcrypt/AP extensions
are currently in the features branch but may be seen in the master
branch with the next merge.
There is still some confusion about the way how paes treats the key
within the transformation context. The tfm context may be shared by
multiple requests running en/decryption with the same key. So the tfm
context is supposed to be read-only.
The s390 protected key support is in fact an encrypted key with the
wrapping key sitting in the firmware. On each invocation of a
protected key instruction the firmware unwraps the pkey and performs
the operation. Part of the protected key is a hash about the wrapping
key used - so the firmware is able to detect if a protected key
matches to the wrapping key or not. If there is a mismatch the cpacf
operation fails with cc 1 (key invalid). Such a situation can occur
for example with a kvm live guest migration to another machine where
the guest simple awakens in a new environment. As the wrapping key is
NOT transfered, after the reawakening all protected key cpacf
operations fail with "key invalid". There exist other situations
where a protected key cpacf operation may run into "key invalid" and
thus the code needs to be prepared for such cpacf failures.
The recovery is simple: via pkey API the source key material (in real
cases this is usually a secure key bound to a HSM) needs to generate
a new protected key which is the wrapped by the wrapping key of the
current firmware.
So the paes tfms hold the source key material to be able to
re-generate the protected key at any time. A naive implementation
would hold the protected key in some kind of running context (for
example the request context) and only the source key would be stored
in the tfm context. But the derivation of the protected key from the
source key is an expensive and time consuming process often involving
interaction with a crypto card. And such a naive implementation would
then for every tfm in use trigger the derivation process individual.
So why not store the protected key in tfm context and only the very
first process hitting the "invalid key" cc runs the derivation and
updates the protected key stored in the tfm. The only really important
thing is that the protected key update and cloning from this value
needs to be done in a atomic fashion.
Please note that there are still race conditions where the protected
key stored in the tfm may get updated by an (outdated) protected key
value. This is not an issue and the code handles this correctly by
again re-deriving the protected key. The only fact that matters, is
that the protected key must always be in a state where the cpacf
instructions can figure out if it is valid (the hash part of the
protected key matches to the hash of the wrapping key) or invalid
(and refuse the crypto operation with "invalid key").
Changelog:
v1 - first version. Applied and tested on top of the mentioned
pkey/zcrypt/AP changes. Selftests and multithreaded testcases
executed via AP_ALG interface run successful and even instrumented
code (with some sleeps to force asynch pathes) ran fine.
Code is good enough for a first code review and collecting feedback.
v2 - A new patch which does a slight rework of the cpacf_pcc() inline
function to return the condition code.
A rework of the paes implementation based on feedback from Herbert
and Ingo:
- the spinlock is now consequently used to protect updates and
changes on the protected key and protected key state within
the transformation context.
- setkey() is now synchronous
- the walk is now held in the request context and thus the
postponing of a request to the engine and later processing
can continue at exactly the same state.
- the param block needed for the cpacf instructions is constructed
once and held in the request context.
- if a request can't get handled synchronous, it is postponed
for asynch processing via an instance of the crpyto engine.
With v2 comes a patch which updates the crypto engine docu
in Documentation/crypto. Feel free to use it or drop it or
do some rework - at least it needs some review.
v2 was only posted internal to collect some feedback within IBM.
v3 - Slight improvements based on feedback from Finn.
v4 - With feedback from Holger and Herbert Xu. Holger gave some good
hints about better readability of the code and I picked nearly
all his suggestions. Herbert noted that once a request goes via
engine to keep the sequence as long as there are requests
enqueued the following requests should also go via engine. This
is now realized via a via_engine_ctr atomic counter in the tfm
context.
Stress tested with lots of debug code to run through all the
failure paths of the code. Looks good.
v5 - Fixed two typos and 1 too long line in the commit message found
by Holger. Added Acked-by and Reviewed-by.
Removed patch #3 which updates the crypto engine docu - this
will go separate. All prepared for picking in the s390 subsystem.
====================
Link: https://lore.kernel.org/r/20250514090955.72370-1-freude@linux.ibm.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any
memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20250514090955.72370-3-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Some of the pcc sub-functions have a protected key as
input and thus may run into the situation that this
key may be invalid for example due to live guest migration
to another physical hardware.
Rework the inline assembler function cpacf_pcc() to
return the condition code (cc) as return value:
0 - cc code 0 (normal completion)
1 - cc code 1 (prot key wkvp mismatch or src op out of range)
2 - cc code 2 (something invalid, scalar multiply infinity, ...)
Note that cc 3 (partial completion) is handled within the asm code
and never returned.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20250514090955.72370-2-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
PARAVIRT_XXL is exclusively utilized by XEN_PV, which is only compatible
with 64-bit machines.
Clearly designate PARAVIRT_XXL as 64-bit only and remove ifdefs to
support CONFIG_PGTABLE_LEVELS < 5.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-5-kirill.shutemov@linux.intel.com
|
|
Both Intel and AMD CPUs support 5-level paging, which is expected to
become more widely adopted in the future. All major x86 Linux
distributions have the feature enabled.
Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
|
|
5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
being phased out, making 5-level paging support mandatory.
Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
any associated conditional statements.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-3-kirill.shutemov@linux.intel.com
|
|
Dynamic memory layout is used by KASLR and 5-level paging.
CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support
unconditional which requires unconditional support of dynamic memory
layout.
Remove CONFIG_DYNAMIC_MEMORY_LAYOUT.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-2-kirill.shutemov@linux.intel.com
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.15-rc7
- designware: cleanup properly on probe failure
|
|
Add a helper to check if an event is in freq mode to improve readability.
No functional changes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250516182853.2610284-2-kan.liang@linux.intel.com
|
|
Pull smb client fixes from Steve French:
- Fix memory leak in mkdir error path
- Fix max rsize miscalculation after channel reconnect
* tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix zero rsize error messages
smb: client: fix memory leak during error handling for POSIX mkdir
|
|
Guoyu Yin reported a splat in the ipmr netns cleanup path:
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline]
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Modules linked in:
CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 #1
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline]
RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8
RSP: 0018:ffff888109547c58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868
RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005
RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9
R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001
R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058
FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0
Call Trace:
<TASK>
ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160
ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177
setup_net+0x47d/0x8e0 net/core/net_namespace.c:394
copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516
create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228
ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342
__do_sys_unshare kernel/fork.c:3413 [inline]
__se_sys_unshare kernel/fork.c:3411 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3411
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f84f532cc29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400
RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328
</TASK>
The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and
the sanity check for such build is still too loose.
Address the issue consolidating the relevant sanity check in a single
helper regardless of the kernel configuration. Also share it between
the ipv4 and ipv6 code.
Reported-by: Guoyu Yin <y04609127@gmail.com>
Fixes: 50b94204446e ("ipmr: tune the ipmr_can_free_table() checks.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not recycle the page twice in airoha_qdma_rx_process routine in case
of error. Just run dev_kfree_skb() if the skb has been allocated and marked
for recycling. Run page_pool_put_full_page() directly if the skb has not
been allocated yet.
Moreover, rely on DMA address from queue entry element instead of reading
it from the DMA descriptor for DMA syncing in airoha_qdma_rx_process().
Fixes: e12182ddb6e71 ("net: airoha: Enable Rx Scatter-Gather")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250515-airoha-fix-rx-process-error-condition-v2-1-657e92c894b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, I'll be honest and say I think this is larger than
I'd prefer at this point, the main blow out point is that xe has two
larger fixes.
One is a fix for active context utilisation reporting, it's for a
reported regression and will end up in stable anyways, so I don't see
any point in holding it up.
The second is a fix for mixed cpu/gpu atomics, which are currently
broken, but are also not something your average desktop/laptop user is
going to hit in normal operation, and having them fixed now is better
than threading them through stable later.
Other than those, it's mostly the usual, a bunch of amdgpu randoms and
a few other minor fixes.
dma-buf:
- Avoid memory reordering in fence handling
meson:
- Avoid integer overflow in mode-clock calculations
panel-mipi-dbi:
- Fix output with drm_client_setup_with_fourcc()
amdgpu:
- Fix CSA unmap
- Fix MALL size reporting on GFX11.5
- AUX fix
- DCN 3.5 fix
- VRR fix
- DP MST fix
- DML 2.1 fixes
- Silence DP AUX spam
- DCN 4.0.1 cursor fix
- VCN 4.0.5 fix
ivpu:
- Fix buffer size in debugfs code
gpuvm:
- Add timeslicing and allocation restriction for SVM
xe:
- Fix shrinker debugfs name
- Add HW workaround to Xe2
- Fix SVM when mixing GPU and CPU atomics
- Fix per client engine utilization due to active contexts not saving
timestamp with lite restore enabled"
* tag 'drm-fixes-2025-05-17' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
drm/xe: Add WA BB to capture active context utilization
drm/xe: Save the gt pointer in lrc and drop the tile
drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value
drm/xe: Timeslice GPU on atomic SVM fault
drm/gpusvm: Add timeslicing support to GPU SVM
drm/xe: Strict migration policy for atomic SVM faults
drm/gpusvm: Introduce devmem_only flag for allocation
drm/xe/xe2hpg: Add Wa_22021007897
drm/amdgpu: read back register after written for VCN v4.0.5
Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
dma-buf: insert memory barrier before updating num_fences
drm/xe: Fix the gem shrinker name
drm/amd/display: Avoid flooding unnecessary info messages
drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
drm/amd/display: check stream id dml21 wrapper to get plane_id
drm/amd/display: fix link_set_dpms_off multi-display MST corner case
drm/amd/display: Defer BW-optimization-blocked DRR adjustments
Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35"
drm/amd/display: Correct the reply value when AUX write incomplete
drm/amdgpu: fix incorrect MALL size for GFX1151
...
|
|
Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces
the device and then reactivates it for the ESXi to know about the new mtu.
So, technically the OS stack can start using the new mtu before ESXi knows
about the new mtu.
This can lead to issues for TSO packets which use mss as per the new mtu
configured. This patch fixes this issue by moving the mtu write after
device quiesce.
Cc: stable@vger.kernel.org
Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Changes v1-> v2:
Moved MTU write after destroy of rx rings
Link: https://patch.msgid.link/20250515190457.8597-1-ronak.doshi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).
Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.
Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).
However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).
Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).
Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.
Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pointer arithmentic for accessing the tail tag only works
for linear skbs.
For nonlinear skbs, it reads uninitialized memory inside the
skb headroom, essentially randomizing the tag. I have observed
it gets set to 6 most of the time.
Example where ksz9477_rcv thinks that the packet from port 1 comes from port 6
(which does not exist for the ksz9896 that's in use), dropping the packet.
Debug prints added by me (not included in this patch):
[ 256.645337] ksz9477_rcv:323 tag0=6
[ 256.645349] skb len=47 headroom=78 headlen=0 tailroom=0
mac=(64,14) mac_len=14 net=(78,0) trans=78
shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=1 iif=3
priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 256.645377] dev name=end1 feat=0x0002e10200114bb3
[ 256.645386] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645395] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645403] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645411] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645420] skb headroom: 00000040: ff ff ff ff ff ff 00 1c 19 f2 e2 db 08 06
[ 256.645428] skb frag: 00000000: 00 01 08 00 06 04 00 01 00 1c 19 f2 e2 db 0a 02
[ 256.645436] skb frag: 00000010: 00 83 00 00 00 00 00 00 0a 02 a0 2f 00 00 00 00
[ 256.645444] skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
[ 256.645452] ksz_common_rcv:92 dsa_conduit_find_user returned NULL
Call skb_linearize before trying to access the tag.
This patch fixes ksz9477_rcv which is used by the ksz9896 I have at
hand, and also applies the same fix to ksz8795_rcv which seems to have
the same problem.
Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@cherry.de>
CC: stable@vger.kernel.org
Fixes: 016e43a26bab ("net: dsa: ksz: Add KSZ8795 tag code")
Fixes: 8b8010fb7876 ("dsa: add support for Microchip KSZ tail tagging")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20250515072920.2313014-1-jakob.unterwurzacher@cherry.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller reports the following issue:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
netdev_lock include/linux/netdevice.h:2751 [inline]
netdev_lock_ops include/net/netdev_lock.h:42 [inline]
dev_set_promiscuity+0x10e/0x260 net/core/dev_api.c:285
bond_set_promiscuity drivers/net/bonding/bond_main.c:922 [inline]
bond_change_rx_flags+0x219/0x690 drivers/net/bonding/bond_main.c:4732
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
^^ all of the above is under rcu lock
team_change_rx_flags+0x1b3/0x330 drivers/net/team/team_core.c:1785
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
hsr_del_port+0x25e/0x2d0 net/hsr/hsr_slave.c:233
hsr_netdev_notify+0x827/0xb60 net/hsr/hsr_main.c:104
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2214 [inline]
call_netdevice_notifiers net/core/dev.c:2228 [inline]
unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11970
rtnl_delete_link net/core/rtnetlink.c:3522 [inline]
rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564
rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x758/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
team_change_rx_flags runs under rcu lock which means we can't grab
instance lock for the lower devices. Switch to team->lock, similar
to what we already do for team_set_mac_address and team_change_mtu.
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+53485086a41dbb43270a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=53485086a41dbb43270a
Link: https://lore.kernel.org/netdev/6822cc81.050a0220.f2294.00e8.GAE@google.com
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250514220319.3505158-1-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The debugfs summary output could access uninitialized elements in
the freq_in[] and signal_out[] arrays, causing NULL pointer
dereferences and triggering a kernel Oops (page_fault_oops).
This patch adds u8 fields (nr_freq_in, nr_signal_out) to track the
number of initialized elements, with a maximum of 4 per array.
The summary output functions are updated to respect these limits,
preventing out-of-bounds access and ensuring safe array handling.
Widen the label variables because the change confuses GCC about
max length of the strings.
Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250514073541.35817-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH
transport errors
- NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock
path
- NFSv4: Fix a race in nfs_local_open_fh()
- NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn
- NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces
since IP addresses are not guaranteed to refer to the same nodes
- NFS: Don't flush file data while holding multiple directory locks in
nfs_rename()
* tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Avoid flushing data while holding directory locks in nfs_rename()
NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()
NFSv4/pnfs: Reset the layout state after a layoutreturn
NFS/localio: Fix a race in nfs_local_open_fh()
nfs: nfs3acl: drop useless assignment in nfs3_get_acl()
nfs: direct: drop useless initializer in nfs_direct_write_completion()
nfs: move the nfs4_data_server_cache into struct nfs_net
nfs: don't share pNFS DS connections between net namespaces
nfs: handle failure of nfs_get_lock_context in unlock path
pNFS/flexfiles: Record the RPC errors in the I/O tracepoints
NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors
NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
|
|
The Linux client assumes that all filehandles are non-volatile for
renames within the same directory (otherwise sillyrename cannot work).
However, the existence of the Linux 'subtree_check' export option has
meant that nfs_rename() has always assumed it needs to flush writes
before attempting to rename.
Since NFSv4 does allow the client to query whether or not the server
exhibits this behaviour, and since knfsd does actually set the
appropriate flag when 'subtree_check' is enabled on an export, it
should be OK to optimise away the write flushing behaviour in the cases
where it is clearly not needed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
If there isn't a valid layout, or the layout stateid has changed, the
cleanup after a layout return should clear out the old data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
A preparation patch, just open code io_req_cqe_overflow().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It's a faily obscure feature, and registered credentials would for that
mostly be a static thing. Don't bother including code to dump the
personalities indices.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Rather than wrap fdinfo.c in one big if, handle it on the Makefile
side instead. io_uring.c already conditionally sets fops->fdinfo()
anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Merge in 6.15 io_uring fixes, mostly so that the fdinfo changes can
get easily extended without causing merge conflicts.
* io_uring-6.15:
io_uring/fdinfo: grab ctx->uring_lock around io_uring_show_fdinfo()
io_uring/memmap: don't use page_address() on a highmem page
io_uring/uring_cmd: fix hybrid polling initialization issue
io_uring/sqpoll: Increase task_work submission batch size
io_uring: ensure deferred completions are flushed for multishot
io_uring: always arm linked timeouts prior to issue
io_uring/fdinfo: annotate racy sq/cq head/tail reads
io_uring: fix 'sync' handling of io_fallback_tw()
io_uring: don't duplicate flushing in io_req_post_cqe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Fix to zone block devices to make the maximum segment count match what
the block layer is capable of"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- fixes for atomic writes (Alan Adamson)
- fixes for polled CQs in nvmet-epf (Damien Le Moal)
- fix for polled CQs in nvme-pci (Keith Busch)
- fix compile on odd configs that need to be forced to inline
(Kees Cook)
- one more quirk (Ilya Guterman)
- Fix for missing allocation of an integrity buffer for some cases
- Fix for a regression with ublk command cancelation
* tag 'block-6.15-20250515' of git://git.kernel.dk/linux:
ublk: fix dead loop when canceling io command
nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
nvme: all namespaces in a subsystem must adhere to a common atomic write size
nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ
nvmet: pci-epf: improve debug message
nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()
nvmet: pci-epf: do not fall back to using INTX if not supported
nvmet: pci-epf: clear completion queue IRQ flag on delete
nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
nvme-pci: make nvme_pci_npages_prp() __always_inline
block: always allocate integrity buffer when required
|
|
Standalone "nologreplay" mount option has been marked deprecated since
commit 74ef00185eb8 ("btrfs: introduce "rescue=" mount option"), which
dates back to v5.9 (2020).
Furthermore there is no other filesystem with the same named mount
option, so this one is btrfs specific and we will not hit the same
problem when removing "norecovery" mount option.
So let's remove the standalone "nologreplay" mount option.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a regression with highmem and mapping of regions, where
the coalescing code assumes any page is directly mapped
- Fix an issue with HYBRID_IOPOLL and passthrough commands,
where the timer wasn't always setup correctly
- Fix an issue with fdinfo not correctly locking around reading
the rings, which can be an issue if the ring is being resized
at the same time
* tag 'io_uring-6.15-20250515' of git://git.kernel.dk/linux:
io_uring/fdinfo: grab ctx->uring_lock around io_uring_show_fdinfo()
io_uring/memmap: don't use page_address() on a highmem page
io_uring/uring_cmd: fix hybrid polling initialization issue
|
|
Pull xfs fixes from Carlos Maiolino:
"This includes a bug fix for a possible data corruption vector on the
zoned allocator garbage collector"
* tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix comment on xfs_trans_ail_update_bulk()
xfs: Fix a comment on xfs_ail_delete
xfs: Fail remount with noattr2 on a v5 with v4 enabled
xfs: fix zoned GC data corruption due to wrong bv_offset
xfs: free up mp->m_free[0].count in error case
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix ACPI PPTT parsing code to address a regression introduced recently
and add more sanity checking of data supplied by the platform firmware
to avoid using invalid data (Jeremy Linton)"
* tag 'acpi-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PPTT: Fix processor subtable walk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes, the most substantial one being the
Tegra one which fixes spurious errors with default delays for chip
select hold times"
* tag 'spi-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-sun4i: fix early activation
spi: tegra114: Use value to check for invalid delays
spi: loopback-test: Do not split 1024-byte hexdumps
|
|
They look rather messy right now.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-3-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We're going to extend the coredump code in follow-up patches.
Clean it up so we can do this more easily.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-2-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We're going to extend the coredump code in follow-up patches.
Clean it up so we can do this more easily.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-1-664f3caf2516@kernel.org
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"This fixes an invalid memory access in the MAX20086 driver which could
occur during error handling for failed probe due to a hidden use of
devres in the core DT parsing code"
* tag 'regulator-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max20086: fix invalid memory access
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an interrupt storm on system wake-up in gpio-pca953x
- fix an out-of-bounds write in gpio-virtuser
- update MAINTAINERS with an entry for the sloppy logic analyzer
* tag 'gpio-fixes-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: virtuser: fix potential out-of-bound write
gpio: pca953x: fix IRQ storm on system wake up
MAINTAINERS: add me as maintainer for the gpio sloppy logic analyzer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A handful small fixes. The only significant change is the fix for MIDI
2.0 UMP handling in ALSA sequencer, but as MIDI 2.0 stuff is still new
and rarely used, the impact should be pretty limited.
Other than that, quirks for USB-audio and a few cosmetic fixes and
changes in drivers that should be safe to apply"
* tag 'sound-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera
ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()
ALSA: sh: SND_AICA should depend on SH_DMA_API
ALSA: usb-audio: Add sample rate quirk for Audioengine D1
ALSA: ump: Fix a typo of snd_ump_stream_msg_device_info
ALSA/hda: intel-sdw-acpi: Correct sdw_intel_acpi_scan() function parameter
ALSA: seq: Fix delivery of UMP events to group ports
|
|
The perf_sample_data_init() has already set the period of sample, so no
need to do it again.
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250506094907.2724-1-changbin.du@huawei.com
|
|
'rcu/torture-for-6.16' into rcu/for-next
|
|
On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance
instead of 36 for starting.
Fix it by checking for Image files, instead of bzImage which ARM does
not seem to have. With this I see all 36 instances running at the same
time in the batch.
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
Back in the day, rcutorture was about the only thing that tested off-stack
CPU masks, but now any arm64 system with more than 256 CPUs tests it
full time. In fact, it is necessary to hack the kernel to prevent such
a system from testing off-stack CPU masks. This means that there is
no longer much point in rcutorture going out of its way to test this.
And given the differences in how CPUMASK_OFFSTACK is enabled in x86 and
arm64, rcutorture would need to go out of its way.
This commit therefore removes CONFIG_CPUMASK_OFFSTACK=y (and the
CONFIG_MAXSMP=y required to enable it on x86) from TREE01.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
The TREE01.boot nr_cpus kernel boot parameter has been set to 43 for
more than seven years, but it can cause RCU CPU stall warnings on arm64,
most of the time involving the stop-machine subsystem. This should
not be too surprising, given that this causes 43 vCPUs to spin with
interrupts disabled when there are only eight physical CPUs.
The point of this CPU overcommit is to test the ability of expedited RCU
grace period initialization to handle races with incoming CPUs that have
never previously been online. But limiting to 17 CPUs instead of 43
allows time for this code to be exercised, and eliminates (or at least
greatly reduces) the incidence of RCU CPU stall warnings on arm64.
So this commit therefore sets nr_cpus=17 in TREE01.boot.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
Different architectures capitalize their splats differently. Who knew?
This commit therefore checks for both arm64 "Call trace:" and x86
"Call Trace:".
Reported-by: Joel Fernandes <joelagnelf@nvidia.com>
Closes: https://lore.kernel.org/all/553c33d8-2b51-4772-8aef-97b0163bc78e@nvidia.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|
|
Currently, the ->gpwrap is not tested (at all per my testing) due to the
requirement of a large delta between a CPU's rdp->gp_seq and its node's
rnp->gpseq.
This results in no testing of ->gpwrap being set. This patch by default
adds 5 minutes of testing with ->gpwrap forced by lowering the delta
between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is
configurable, including the active time for the setting and a full
testing cycle.
By default, the first 25 minutes of a test will have the _default_
behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes,
we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe
this is reasonable since we at least add a little bit of testing for
usecases where ->gpwrap is set.
[ Apply fix for Dan Carpenter's bug report on init path cleanup. ]
[ Apply kernel doc warning fix from Akira Yokosawa. ]
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
|