Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irqchip driver fixes from Ingo Molnar:
- Remove the MSI_CHIP_FLAG_SET_ACK flag from 5 irqchip drivers
that did not require it
- Fix IRQ handling delays in the riscv-imsic irqchip driver
* tag 'irq-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-imsic: Start local sync timer on correct CPU
irqchip: Drop MSI_CHIP_FLAG_SET_ACK from unsuspecting MSI drivers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix SEV-SNP kdump bugs
- Update the email address of Alexey Makhalov in MAINTAINERS
- Add the CPU feature flag for the Zen6 microarchitecture
- Fix typo in system message
* tag 'x86-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove duplicated word in warning message
x86/CPU/AMD: Add X86_FEATURE_ZEN6
x86/sev: Make sure pages are not skipped during kdump
x86/sev: Do not touch VMSA pages during SNP guest memory kdump
MAINTAINERS: Update Alexey Makhalov's email address
x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fix from Ingo Molnar:
"Fix PEBS-via-PT crash"
* tag 'perf-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix some bugs in kernel-fpu, cpu idle function, hibernation and
uprobes"
* tag 'loongarch-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: uprobes: Remove redundant code about resume_era
LoongArch: uprobes: Remove user_{en,dis}able_single_step()
LoongArch: Save and restore CSR.CNTC for hibernation
LoongArch: Move __arch_cpu_idle() to .cpuidle.text section
LoongArch: Fix MAX_REG_OFFSET calculation
LoongArch: Prevent cond_resched() occurring within kernel-fpu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
- designware: cleanup properly on probe failure
* tag 'i2c-for-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: Fix an error handling path in i2c_dw_pci_probe()
|
|
Both regs_get_kernel_stack_nth() and regs_get_register() are not
inlined. With the new ftrace funcgraph-args feature they show up in
function graph tracing:
4) | sched_core_idle_cpu(cpu=4) {
4) 0.257 us | regs_get_register(regs=0x37fe00afa10, offset=2);
4) 0.218 us | regs_get_register(regs=0x37fe00afa10, offset=3);
4) 0.225 us | regs_get_register(regs=0x37fe00afa10, offset=4);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=5);
4) 0.239 us | regs_get_register(regs=0x37fe00afa10, offset=6);
4) 0.245 us | regs_get_kernel_stack_nth(regs=0x37fe00afa10, n=20);
This is subtoptimal, since both functions are supposed to be ftrace
internal helper functions. If they appear in ftrace traces this reduces
readability significantly, plus this adds tons of extra useless extra
entries.
Address this by moving both functions and required helpers to ptrace.h and
always inline them. This way they don't appear in traces anymore. In
addition the overhead that comes with functions calls is also reduced.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
asm/thread_info.h requires PAGE_SIZE, which is defined in vdso/page.h,
but doesn't need to include asm/lowcore.h or asm/page.h.
Therefore change the includes accordingly and reduce header dependencies.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When calling the diag for DCSS unload on a non-IPL CPU, the sclp maximum
memory detection on the next IPL would falsely return the end of the
previously loaded DCSS.
This is because of an issue in z/VM, so work around it by always calling
the diag for DCSS unload on IPL CPU 0. That CPU cannot be set offline,
so the dcss_diag() call can directly be scheduled to CPU 0.
The wrong maximum memory value returned by sclp would only affect KASAN
kernels. When a DCSS within the falsely reported extra memory range is
loaded and accessed again, it would result in a kernel crash:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 001c0000a3ffe000 TEID: 001c0000a3ffe803
Fault in home space mode while using kernel ASCE.
AS:000000039955400b R2:00000003fe3b400b R3:000000037a2a8007 S:0000000000000020
Oops: 0010 ilc:3 [#1]SMP
[...]
CPU: 2 UID: 0 PID: 1563 Comm: mount Kdump: loaded Not tainted 6.15.0-rc5-11546-g3ea93fb3d026-dirty #7 NONE
Hardware name: IBM 3931 A01 704 (z/VM 7.4.0)
Krnl PSW : 0704c00180000000 000da6f2b338faf2 (kasan_check_range+0x172/0x310)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000040 001c0000a3ffe000 000000051fff0000 0000000000001000
0000000000000000 000da6f233380ff6 00000000000001f8 0000000000000000
001c0000a3ffe200 0000000000000040 001c0000a3ffe200 0000000000000200
000003ff97a2cfa8 0000000000000000 0000000000000010 000da672b58af070
Krnl Code: 000da6f2b338fae2: 41101008 la %r1,8(%r1)
000da6f2b338fae6: eca100268064 cgrj %r10,%r1,8,000da6f2b338fb32
#000da6f2b338faec: ebe00002000c srlg %r14,%r0,2
>000da6f2b338faf2: e3b010000002 ltg %r11,0(%r1)
000da6f2b338faf8: a77400a8 brc 7,000da6f2b338fc48
000da6f2b338fafc: 41b01008 la %r11,8(%r1)
000da6f2b338fb00: b904001b lgr %r1,%r11
000da6f2b338fb04: e3a0b0000002 ltg %r10,0(%r11)
Call Trace:
[<000da6f2b338faf2>] kasan_check_range+0x172/0x310
[<000da6f2b3390b3c>] __asan_memcpy+0x3c/0x90
[<000da6f233380ff6>] dcssblk_submit_bio+0x3a6/0x620 [dcssblk]
[<000da6f2b3eb403c>] __submit_bio+0x25c/0x4a0
[<000da6f2b3eb43bc>] __submit_bio_noacct+0x13c/0x450
[<000da6f2b3eb4bde>] submit_bio_noacct_nocheck+0x50e/0x620
[<000da6f2b34f4978>] mpage_readahead+0x318/0x3f0
[<000da6f2b31edbe6>] read_pages+0x156/0x740
[<000da6f2b31ee594>] page_cache_ra_unbounded+0x3c4/0x610
[<000da6f2b31ef094>] force_page_cache_ra+0x1f4/0x2d0
[<000da6f2b31d092e>] filemap_get_pages+0x2ce/0xaa0
[<000da6f2b31d1428>] filemap_read+0x328/0x9a0
[<000da6f2b3e9b7e8>] blkdev_read_iter+0x228/0x3b0
[<000da6f2b340f7a6>] vfs_read+0x5b6/0x7f0
[<000da6f2b34110be>] ksys_read+0x10e/0x1e0
[<000da6f2b4e7acb2>] __do_syscall+0x122/0x1f0
[<000da6f2b4e93ffe>] system_call+0x6e/0x90
Last Breaking-Event-Address:
[<000da6f2b338faac>] kasan_check_range+0x12c/0x310
Kernel panic - not syncing: Fatal exception: panic_on_oops
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Harald Freudenberger says:
====================
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
The new flag PKEY_XFLAG_NOMEMALLOC tells the PKEY layer (and
subsidiary layers) that it must not allocate any memory causing IO
operations. Note that the patches for this pkey/zcrypt/AP extensions
are currently in the features branch but may be seen in the master
branch with the next merge.
There is still some confusion about the way how paes treats the key
within the transformation context. The tfm context may be shared by
multiple requests running en/decryption with the same key. So the tfm
context is supposed to be read-only.
The s390 protected key support is in fact an encrypted key with the
wrapping key sitting in the firmware. On each invocation of a
protected key instruction the firmware unwraps the pkey and performs
the operation. Part of the protected key is a hash about the wrapping
key used - so the firmware is able to detect if a protected key
matches to the wrapping key or not. If there is a mismatch the cpacf
operation fails with cc 1 (key invalid). Such a situation can occur
for example with a kvm live guest migration to another machine where
the guest simple awakens in a new environment. As the wrapping key is
NOT transfered, after the reawakening all protected key cpacf
operations fail with "key invalid". There exist other situations
where a protected key cpacf operation may run into "key invalid" and
thus the code needs to be prepared for such cpacf failures.
The recovery is simple: via pkey API the source key material (in real
cases this is usually a secure key bound to a HSM) needs to generate
a new protected key which is the wrapped by the wrapping key of the
current firmware.
So the paes tfms hold the source key material to be able to
re-generate the protected key at any time. A naive implementation
would hold the protected key in some kind of running context (for
example the request context) and only the source key would be stored
in the tfm context. But the derivation of the protected key from the
source key is an expensive and time consuming process often involving
interaction with a crypto card. And such a naive implementation would
then for every tfm in use trigger the derivation process individual.
So why not store the protected key in tfm context and only the very
first process hitting the "invalid key" cc runs the derivation and
updates the protected key stored in the tfm. The only really important
thing is that the protected key update and cloning from this value
needs to be done in a atomic fashion.
Please note that there are still race conditions where the protected
key stored in the tfm may get updated by an (outdated) protected key
value. This is not an issue and the code handles this correctly by
again re-deriving the protected key. The only fact that matters, is
that the protected key must always be in a state where the cpacf
instructions can figure out if it is valid (the hash part of the
protected key matches to the hash of the wrapping key) or invalid
(and refuse the crypto operation with "invalid key").
Changelog:
v1 - first version. Applied and tested on top of the mentioned
pkey/zcrypt/AP changes. Selftests and multithreaded testcases
executed via AP_ALG interface run successful and even instrumented
code (with some sleeps to force asynch pathes) ran fine.
Code is good enough for a first code review and collecting feedback.
v2 - A new patch which does a slight rework of the cpacf_pcc() inline
function to return the condition code.
A rework of the paes implementation based on feedback from Herbert
and Ingo:
- the spinlock is now consequently used to protect updates and
changes on the protected key and protected key state within
the transformation context.
- setkey() is now synchronous
- the walk is now held in the request context and thus the
postponing of a request to the engine and later processing
can continue at exactly the same state.
- the param block needed for the cpacf instructions is constructed
once and held in the request context.
- if a request can't get handled synchronous, it is postponed
for asynch processing via an instance of the crpyto engine.
With v2 comes a patch which updates the crypto engine docu
in Documentation/crypto. Feel free to use it or drop it or
do some rework - at least it needs some review.
v2 was only posted internal to collect some feedback within IBM.
v3 - Slight improvements based on feedback from Finn.
v4 - With feedback from Holger and Herbert Xu. Holger gave some good
hints about better readability of the code and I picked nearly
all his suggestions. Herbert noted that once a request goes via
engine to keep the sequence as long as there are requests
enqueued the following requests should also go via engine. This
is now realized via a via_engine_ctr atomic counter in the tfm
context.
Stress tested with lots of debug code to run through all the
failure paths of the code. Looks good.
v5 - Fixed two typos and 1 too long line in the commit message found
by Holger. Added Acked-by and Reviewed-by.
Removed patch #3 which updates the crypto engine docu - this
will go separate. All prepared for picking in the s390 subsystem.
====================
Link: https://lore.kernel.org/r/20250514090955.72370-1-freude@linux.ibm.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any
memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20250514090955.72370-3-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Some of the pcc sub-functions have a protected key as
input and thus may run into the situation that this
key may be invalid for example due to live guest migration
to another physical hardware.
Rework the inline assembler function cpacf_pcc() to
return the condition code (cc) as return value:
0 - cc code 0 (normal completion)
1 - cc code 1 (prot key wkvp mismatch or src op out of range)
2 - cc code 2 (something invalid, scalar multiply infinity, ...)
Note that cc 3 (partial completion) is handled within the asm code
and never returned.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20250514090955.72370-2-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
PARAVIRT_XXL is exclusively utilized by XEN_PV, which is only compatible
with 64-bit machines.
Clearly designate PARAVIRT_XXL as 64-bit only and remove ifdefs to
support CONFIG_PGTABLE_LEVELS < 5.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-5-kirill.shutemov@linux.intel.com
|
|
Both Intel and AMD CPUs support 5-level paging, which is expected to
become more widely adopted in the future. All major x86 Linux
distributions have the feature enabled.
Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
|
|
5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
being phased out, making 5-level paging support mandatory.
Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
any associated conditional statements.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-3-kirill.shutemov@linux.intel.com
|
|
Dynamic memory layout is used by KASLR and 5-level paging.
CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support
unconditional which requires unconditional support of dynamic memory
layout.
Remove CONFIG_DYNAMIC_MEMORY_LAYOUT.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-2-kirill.shutemov@linux.intel.com
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.15-rc7
- designware: cleanup properly on probe failure
|
|
Add a helper to check if an event is in freq mode to improve readability.
No functional changes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250516182853.2610284-2-kan.liang@linux.intel.com
|
|
Pull smb client fixes from Steve French:
- Fix memory leak in mkdir error path
- Fix max rsize miscalculation after channel reconnect
* tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix zero rsize error messages
smb: client: fix memory leak during error handling for POSIX mkdir
|
|
Stefano Garzarella says:
====================
vsock/test: improve sigpipe test reliability
Running the tests continuously I noticed that sometimes the sigpipe
test would fail due to a race between the control message of the test
and the vsock transport messages.
While I was at it I also improved the test by checking the errno we
expect.
v1: https://lore.kernel.org/20250508142005.135857-1-sgarzare@redhat.com
====================
Link: https://patch.msgid.link/20250514141927.159456-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the sigpipe test, we expect send() to fail, but we do not check if
send() fails with the errno we expect (EPIPE).
Add this check and repeat the send() in case of EINTR as we do in other
tests.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250514141927.159456-4-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the other peer calls shutdown(SHUT_RD), there is a chance that
the send() call could occur before the message carrying the close
information arrives over the transport. In such cases, the send()
might still succeed. To avoid this race, let's retry the send() call
a few times, ensuring the test is more reliable.
Sleep a little before trying again to avoid flooding the other peer
and filling its receive buffer, causing false-negative.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250514141927.159456-3-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The timeout API uses signals, so we have documented not to use sleep(),
but we can use nanosleep(2) since POSIX.1 explicitly specifies that it
does not interact with signals.
Let's provide timeout_usleep() for that.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250514141927.159456-2-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Guoyu Yin reported a splat in the ipmr netns cleanup path:
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline]
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Modules linked in:
CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 #1
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline]
RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8
RSP: 0018:ffff888109547c58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868
RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005
RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9
R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001
R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058
FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0
Call Trace:
<TASK>
ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160
ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177
setup_net+0x47d/0x8e0 net/core/net_namespace.c:394
copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516
create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228
ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342
__do_sys_unshare kernel/fork.c:3413 [inline]
__se_sys_unshare kernel/fork.c:3411 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3411
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f84f532cc29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400
RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328
</TASK>
The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and
the sanity check for such build is still too loose.
Address the issue consolidating the relevant sanity check in a single
helper regardless of the kernel configuration. Also share it between
the ipv4 and ipv6 code.
Reported-by: Guoyu Yin <y04609127@gmail.com>
Fixes: 50b94204446e ("ipmr: tune the ipmr_can_free_table() checks.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not recycle the page twice in airoha_qdma_rx_process routine in case
of error. Just run dev_kfree_skb() if the skb has been allocated and marked
for recycling. Run page_pool_put_full_page() directly if the skb has not
been allocated yet.
Moreover, rely on DMA address from queue entry element instead of reading
it from the DMA descriptor for DMA syncing in airoha_qdma_rx_process().
Fixes: e12182ddb6e71 ("net: airoha: Enable Rx Scatter-Gather")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250515-airoha-fix-rx-process-error-condition-v2-1-657e92c894b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
tools: ynl-gen: support sub-messages and rt-link
Sub-messages are how we express "polymorphism" in YNL. Donald added
the support to specs and Python a while back, support them in C, too.
Sub-message is a nest, but the interpretation of the attribute types
within that nest depends on a value of another attribute. For example
in rt-link the "kind" attribute contains the link type (veth, bonding,
etc.) and based on that the right enum has to be applied to interpret
link-specific attributes.
The last message is probably the most interesting to look at, as it
adds a fairly advanced sample.
This patch only contains enough support for rtnetlink, we will need
a little more complexity to support TC, where sub-messages may contain
fixed headers, and where the selector may be in a different nest than
the submessage.
====================
Link: https://patch.msgid.link/20250515231650.1325372-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a fairly complete example of rt-link usage. If run without any
arguments it simply lists the interfaces and some of their attrs.
If run with an arg it tries to create and delete a netkit device.
1 # ./tools/net/ynl/samples/rt-link 1
2 Trying to create a Netkit interface
3 Testing error message for policy being bad:
4 Kernel error: 'Provided default xmit policy not supported' (bad attribute: .linkinfo.data(netkit).policy)
5 1: lo: mtu 65536
6 2: wlp0s1: mtu 1500
7 3: enp0s13: mtu 1500
8 4: dummy0: mtu 1500 kind dummy altname one two
9 5: nk0: mtu 1500 kind netkit primary 0 policy forward
10 6: nk1: mtu 1500 kind netkit primary 1 policy blackhole
11 Trying to delete a Netkit interface (ifindex 6)
Sample creates the device first, it sets an invalid value for a netkit
attribute to trigger reverse parsing. Line 4 shows the error with the
attribute path correctly generated by YNL.
Then sample fixes the bad attribute and re-issues the request, with
NLM_F_ECHO set. This flag causes the notification to be looped back
to the initiating socket (our socket). Sample parses this notification
to save the ifindex of the created netkit.
Sample then proceeds to list the devices. Line 8 above shows a dummy
device with two alt names. Lines 9 and 10 show the netkit devices
the sample itself created.
The "primary" and "policy" attrs are from inside the netkit submsg.
The string values are auto-generated for the enums by YNL.
To clean up sample deletes the interface it created (line 11).
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Switch from including Classic netlink families one by one to excluding.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reverse parsing lets YNL convert bad and missing attr pointers
from extack into a string like "missing attribute nest1.nest2.attr_name".
It's a feature that's unique to YNL C AFAIU (even the Python YNL
can't do nested reverse parsing). Add support for reverse-parsing
of sub-messages.
To simplify the logic and the code annotate the type policies
with extra metadata. Mark the selectors and the messages with
the information we need. We assume that key / selector always
precedes the sub-message while parsing (and also if there are
multiple sub-messages like in rt-link they are interleaved
selector 1 ... submsg 1 ... selector 2 .. submsg 2, not
selector 1 ... selector 2 ... submsg 1 ... submsg 2).
The rt-link sample in a subsequent changes shows reverse parsing
of sub-messages in action.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adjust parsing and rendering appropriately to make sub-messages work.
Rendering is pretty trivial, as the submsg -> netlink conversion looks
like rendering a nest in which only one attr was set. Only trick
is that we use the enum value of the sub-message rather than the nest
as the type, and effectively skip one layer of nesting. A real double
nested struct would look like this:
[SELECTOR]
[SUBMSG]
[NEST]
[MSG1-ATTR]
A submsg "is" the nest so by skipping I mean:
[SELECTOR]
[SUBMSG]
[MSG1-ATTR]
There is no extra validation in YNL if caller has set the selector
matching the submsg type (e.g. link type = "macvlan" but the nest
attrs are set to carry "veth"). Let the kernel handle that.
Parsing side is a little more specialized as we need to render and
insert a new kind of function which switches between what to parse
based on the selector. But code isn't too complicated.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The easiest (or perhaps only sane) way to support submessages in C
is to treat them as if they were nests. Build fake attributes to
that effect in the codegen. Render the submsg as a big nest of all
possible values.
With this in place the main missing part is to hook in the switch
which selects how to parse based on the key.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hook in handling of sub-messages, for now treat them as ignored attrs.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prepare for constructing Struct() instances which represent
sub-messages rather than nested attributes.
Restructure the code / indentation to more easily insert
a case where nested reference comes from annotation other
than the 'nested-attributes' property. Make sure we don't
construct the Struct() object from scratch in multiple
places as the constructor will soon have more arguments.
This should cause no functional change.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We're about to add some code here for sub-messages.
Factor out the nest-related logic to make the code readable.
No functional change.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
C naming info for OVPN which was added since I adjusted
the existing attrs. Also add missing reference to a header needed
for a bridge struct.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250515231650.1325372-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver uses the name LAN88xx for PHYs with phy_id = 0x0007c132. But
with this placeholder name no documentation can be found on the net.
Document the fact that these PHYs are build into the LAN7800 and LAN7850
USB/Ethernet controllers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250515082051.2644450-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since its introduction 6 yrs ago this functions has never had a user.
So remove it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ccbeef28-65ae-4e28-b1db-816c44338dee@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, I'll be honest and say I think this is larger than
I'd prefer at this point, the main blow out point is that xe has two
larger fixes.
One is a fix for active context utilisation reporting, it's for a
reported regression and will end up in stable anyways, so I don't see
any point in holding it up.
The second is a fix for mixed cpu/gpu atomics, which are currently
broken, but are also not something your average desktop/laptop user is
going to hit in normal operation, and having them fixed now is better
than threading them through stable later.
Other than those, it's mostly the usual, a bunch of amdgpu randoms and
a few other minor fixes.
dma-buf:
- Avoid memory reordering in fence handling
meson:
- Avoid integer overflow in mode-clock calculations
panel-mipi-dbi:
- Fix output with drm_client_setup_with_fourcc()
amdgpu:
- Fix CSA unmap
- Fix MALL size reporting on GFX11.5
- AUX fix
- DCN 3.5 fix
- VRR fix
- DP MST fix
- DML 2.1 fixes
- Silence DP AUX spam
- DCN 4.0.1 cursor fix
- VCN 4.0.5 fix
ivpu:
- Fix buffer size in debugfs code
gpuvm:
- Add timeslicing and allocation restriction for SVM
xe:
- Fix shrinker debugfs name
- Add HW workaround to Xe2
- Fix SVM when mixing GPU and CPU atomics
- Fix per client engine utilization due to active contexts not saving
timestamp with lite restore enabled"
* tag 'drm-fixes-2025-05-17' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
drm/xe: Add WA BB to capture active context utilization
drm/xe: Save the gt pointer in lrc and drop the tile
drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value
drm/xe: Timeslice GPU on atomic SVM fault
drm/gpusvm: Add timeslicing support to GPU SVM
drm/xe: Strict migration policy for atomic SVM faults
drm/gpusvm: Introduce devmem_only flag for allocation
drm/xe/xe2hpg: Add Wa_22021007897
drm/amdgpu: read back register after written for VCN v4.0.5
Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
dma-buf: insert memory barrier before updating num_fences
drm/xe: Fix the gem shrinker name
drm/amd/display: Avoid flooding unnecessary info messages
drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
drm/amd/display: check stream id dml21 wrapper to get plane_id
drm/amd/display: fix link_set_dpms_off multi-display MST corner case
drm/amd/display: Defer BW-optimization-blocked DRR adjustments
Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35"
drm/amd/display: Correct the reply value when AUX write incomplete
drm/amdgpu: fix incorrect MALL size for GFX1151
...
|
|
Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces
the device and then reactivates it for the ESXi to know about the new mtu.
So, technically the OS stack can start using the new mtu before ESXi knows
about the new mtu.
This can lead to issues for TSO packets which use mss as per the new mtu
configured. This patch fixes this issue by moving the mtu write after
device quiesce.
Cc: stable@vger.kernel.org
Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Changes v1-> v2:
Moved MTU write after destroy of rx rings
Link: https://patch.msgid.link/20250515190457.8597-1-ronak.doshi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RFS can exhibit lower performance for workloads using short-lived
flows and a small set of 4-tuple.
This is often the case for load-testers, using a pair of hosts,
if the server has a single listener port.
Typical use case :
Server : tcp_crr -T128 -F1000 -6 -U -l30 -R 14250
Client : tcp_crr -T128 -F1000 -6 -U -l30 -c -H server | grep local_throughput
This is because RFS global hash table contains stale information,
when the same RSS key is recycled for another socket and another cpu.
Make sure to undo the changes and go back to initial state when
a flow is disconnected.
Performance of the above test is increased by 22 %,
going from 372604 transactions per second to 457773.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Octavian Purdila <tavip@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250515100354.3339920-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds support for 10Gbs chip RTL8127A.
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250515095303.3138-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).
Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.
Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).
However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).
Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).
Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.
Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pointer arithmentic for accessing the tail tag only works
for linear skbs.
For nonlinear skbs, it reads uninitialized memory inside the
skb headroom, essentially randomizing the tag. I have observed
it gets set to 6 most of the time.
Example where ksz9477_rcv thinks that the packet from port 1 comes from port 6
(which does not exist for the ksz9896 that's in use), dropping the packet.
Debug prints added by me (not included in this patch):
[ 256.645337] ksz9477_rcv:323 tag0=6
[ 256.645349] skb len=47 headroom=78 headlen=0 tailroom=0
mac=(64,14) mac_len=14 net=(78,0) trans=78
shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=1 iif=3
priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 256.645377] dev name=end1 feat=0x0002e10200114bb3
[ 256.645386] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645395] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645403] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645411] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645420] skb headroom: 00000040: ff ff ff ff ff ff 00 1c 19 f2 e2 db 08 06
[ 256.645428] skb frag: 00000000: 00 01 08 00 06 04 00 01 00 1c 19 f2 e2 db 0a 02
[ 256.645436] skb frag: 00000010: 00 83 00 00 00 00 00 00 0a 02 a0 2f 00 00 00 00
[ 256.645444] skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
[ 256.645452] ksz_common_rcv:92 dsa_conduit_find_user returned NULL
Call skb_linearize before trying to access the tag.
This patch fixes ksz9477_rcv which is used by the ksz9896 I have at
hand, and also applies the same fix to ksz8795_rcv which seems to have
the same problem.
Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@cherry.de>
CC: stable@vger.kernel.org
Fixes: 016e43a26bab ("net: dsa: ksz: Add KSZ8795 tag code")
Fixes: 8b8010fb7876 ("dsa: add support for Microchip KSZ tail tagging")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20250515072920.2313014-1-jakob.unterwurzacher@cherry.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch synchronizes code that accesses from both user-space
and IRQ contexts. The `get_stats()` function can be called from both
context.
`dev->stats.tx_errors` and `dev->stats.collisions` are also updated
in the `tx_errors()` function. Therefore, these fields must also be
protected by synchronized.
There is no code that accessses `dev->stats.tx_errors` between the
previous and updated lines, so the updating point can be moved.
Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com>
Link: https://patch.msgid.link/20250515075333.48290-1-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller reports the following issue:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
netdev_lock include/linux/netdevice.h:2751 [inline]
netdev_lock_ops include/net/netdev_lock.h:42 [inline]
dev_set_promiscuity+0x10e/0x260 net/core/dev_api.c:285
bond_set_promiscuity drivers/net/bonding/bond_main.c:922 [inline]
bond_change_rx_flags+0x219/0x690 drivers/net/bonding/bond_main.c:4732
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
^^ all of the above is under rcu lock
team_change_rx_flags+0x1b3/0x330 drivers/net/team/team_core.c:1785
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
hsr_del_port+0x25e/0x2d0 net/hsr/hsr_slave.c:233
hsr_netdev_notify+0x827/0xb60 net/hsr/hsr_main.c:104
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2214 [inline]
call_netdevice_notifiers net/core/dev.c:2228 [inline]
unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11970
rtnl_delete_link net/core/rtnetlink.c:3522 [inline]
rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564
rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x758/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
team_change_rx_flags runs under rcu lock which means we can't grab
instance lock for the lower devices. Switch to team->lock, similar
to what we already do for team_set_mac_address and team_change_mtu.
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+53485086a41dbb43270a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=53485086a41dbb43270a
Link: https://lore.kernel.org/netdev/6822cc81.050a0220.f2294.00e8.GAE@google.com
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250514220319.3505158-1-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by
zero-initializing all stack variables on function entry. The mlx5 XDP
RX path previously allocated a struct mlx5e_xdp_buff on the stack per
received CQE, resulting in measurable performance degradation under
this config.
This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct,
avoiding per-CQE stack allocations and repeated zeroing.
With this change, XDP_DROP and XDP_TX performance matches that of
kernels built without CONFIG_INIT_STACK_ALL_ZERO.
Performance was measured on a ConnectX-6Dx using a single RX channel
(1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from
net-next-6.15.
Stack zeroing disabled:
- XDP_DROP:
* baseline: 31.47 Mpps
* baseline + per-RQ allocation: 32.31 Mpps (+2.68%)
- XDP_TX:
* baseline: 12.41 Mpps
* baseline + per-RQ allocation: 12.95 Mpps (+4.30%)
Stack zeroing enabled:
- XDP_DROP:
* baseline: 24.32 Mpps
* baseline + per-RQ allocation: 32.27 Mpps (+32.7%)
- XDP_TX:
* baseline: 11.80 Mpps
* baseline + per-RQ allocation: 12.24 Mpps (+3.72%)
Reported-by: Sebastiano Miano <mianosebastiano@gmail.com>
Reported-by: Samuel Dobron <sdobron@redhat.com>
Link: https://lore.kernel.org/all/CAMENy5pb8ea+piKLg5q5yRTMZacQqYWAoVLE1FE9WhQPq92E0g@mail.gmail.com/
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/1747253032-663457-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The debugfs summary output could access uninitialized elements in
the freq_in[] and signal_out[] arrays, causing NULL pointer
dereferences and triggering a kernel Oops (page_fault_oops).
This patch adds u8 fields (nr_freq_in, nr_signal_out) to track the
number of initialized elements, with a maximum of 4 per array.
The summary output functions are updated to respect these limits,
preventing out-of-bounds access and ensuring safe array handling.
Widen the label variables because the change confuses GCC about
max length of the strings.
Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250514073541.35817-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current implementation requires syscon compatible for pio property
which is used for driving the switch leds on mt7988.
Replace syscon_regmap_lookup_by_phandle with of_parse_phandle and
device_node_to_regmap to get the regmap already assigned by pinctrl
driver.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Link: https://patch.msgid.link/20250510174933.154589-1-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Three out of four callers to ath12k_mac_get_tx_arvif() have
link_conf pointer already set for other operations. Pass it
as a parameter. Modify ath12k_control_beaconing() to set
link_conf first.
Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250408184501.3715887-4-aloka.dixit@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH
transport errors
- NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock
path
- NFSv4: Fix a race in nfs_local_open_fh()
- NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn
- NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces
since IP addresses are not guaranteed to refer to the same nodes
- NFS: Don't flush file data while holding multiple directory locks in
nfs_rename()
* tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Avoid flushing data while holding directory locks in nfs_rename()
NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()
NFSv4/pnfs: Reset the layout state after a layoutreturn
NFS/localio: Fix a race in nfs_local_open_fh()
nfs: nfs3acl: drop useless assignment in nfs3_get_acl()
nfs: direct: drop useless initializer in nfs_direct_write_completion()
nfs: move the nfs4_data_server_cache into struct nfs_net
nfs: don't share pNFS DS connections between net namespaces
nfs: handle failure of nfs_get_lock_context in unlock path
pNFS/flexfiles: Record the RPC errors in the I/O tracepoints
NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors
NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
|