Age | Commit message (Collapse) | Author |
|
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch makes use of prefetch.w to prefetch cachelines for write
prior to lr/sc loops when using the xchg_small atomic routine.
This patch is inspired by commit 0ea366f5e1b6 ("arm64: atomics:
prefetch the destination word for write prior to stxr").
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-4-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-5-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Enable Linux prefetch and prefetchw primitives using Zicbop.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-3-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-4-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Zicbop introduces cache blocks prefetching instructions, add the
necessary support for the kernel to use it in the coming commits.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-3-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The S-type instructions are first introduced and then used to define the
encoding of the Zicbop prefetching instructions.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-2-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Apparently sec_base doesn't mean relocated symbol value, which seems a
copy-pasting error in the comment. Assigned with the address of section
indexed by sym->st_shndx, it should represent base address of the
relevant section. Let's fix the comment to avoid possible confusion.
Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250326073450.57648-2-ziyao@disroot.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This patch creates image_kexec_ops to load Image binary file
for kexec_file_load() syscall.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-3-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This is the preparative patch for kexec_file_load Image support.
It separates the elf_kexec_load() as two parts:
- the first part loads the vmlinux (or Image)
- the second part loads other segments (e.g. initrd,fdt,purgatory)
And the second part is exported as the load_extra_segments() function
which would be used in both kexec-elf.c and kexec-image.c.
No functional change intended.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Add a section in cmodx to describe how dynamic ftrace works on riscv,
limitations, and assumptions.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-12-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
jump to FTRACE_ADDR if distance is out of reach
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The idea and most of the implementation is taken from the ARM64's
implementation of the same feature. The idea is to place a pointer to
the ftrace_ops as a literal at a fixed offset from the function entry
point, which can be recovered by the common ftrace trampoline.
We use -fpatchable-function-entry to reserve 8 bytes above the function
entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of
CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
to the associated ftrace_ops for that callsite. Functions are aligned to
8 bytes to make sure that the accesses to this literal are atomic.
This approach allows for directly invoking ftrace_ops::func even for
ftrace_ops which are dynamically-allocated (or part of a module),
without going via ftrace_ops_list_func.
We've benchamrked this with the ftrace_ops sample module on Spacemit K1
Jupiter:
Without this patch:
baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
| Number of tracers | Total time (ns) | Per-call average time |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
| 0 | 0 | 1357958 | 13 | - |
| 0 | 1 | 1302375 | 13 | - |
| 0 | 2 | 1302375 | 13 | - |
| 0 | 10 | 1379084 | 13 | - |
| 0 | 100 | 1302458 | 13 | - |
| 0 | 200 | 1302333 | 13 | - |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 13677833 | 136 | 123 |
| 1 | 1 | 18500916 | 185 | 172 |
| 1 | 2 | 22856459 | 228 | 215 |
| 1 | 10 | 58824709 | 588 | 575 |
| 1 | 100 | 505141584 | 5051 | 5038 |
| 1 | 200 | 1580473126 | 15804 | 15791 |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 13561000 | 135 | 122 |
| 2 | 0 | 19707292 | 197 | 184 |
| 10 | 0 | 67774750 | 677 | 664 |
| 100 | 0 | 714123125 | 7141 | 7128 |
| 200 | 0 | 1918065668 | 19180 | 19167 |
+----------+------------+-----------------+------------+---------------+
Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.
With this patch:
v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
| Number of tracers | Total time (ns) | Per-call average time |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
| 0 | 0 | 1459917 | 14 | - |
| 0 | 1 | 1408000 | 14 | - |
| 0 | 2 | 1383792 | 13 | - |
| 0 | 10 | 1430709 | 14 | - |
| 0 | 100 | 1383791 | 13 | - |
| 0 | 200 | 1383750 | 13 | - |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 5238041 | 52 | 38 |
| 1 | 1 | 5228542 | 52 | 38 |
| 1 | 2 | 5325917 | 53 | 40 |
| 1 | 10 | 5299667 | 52 | 38 |
| 1 | 100 | 5245250 | 52 | 39 |
| 1 | 200 | 5238459 | 52 | 39 |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 5239083 | 52 | 38 |
| 2 | 0 | 19449417 | 194 | 181 |
| 10 | 0 | 67718584 | 677 | 663 |
| 100 | 0 | 709840708 | 7098 | 7085 |
| 200 | 0 | 2203580626 | 22035 | 22022 |
+----------+------------+-----------------+------------+---------------+
Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/3 of the
overhead prior to this series (from 122ns to 38ns). This is largely
due to permitting calls to dynamically-allocated ftrace_ops without
going through ftrace_ops_list_func.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
[update kconfig, asm, refactor]
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-10-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Now, we can safely enable dynamic ftrace with kernel preemption.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-9-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
RISC-V spec explicitly calls out that a local fence.i is not enough for
the code modification to be visble from a remote hart. In fact, it
states:
To make a store to instruction memory visible to all RISC-V harts, the
writing hart also has to execute a data FENCE before requesting that all
remote RISC-V harts execute a FENCE.I.
Although current riscv drivers for IPI use ordered MMIO when sending IPIs
in order to synchronize the action between previous csd writes, riscv
does not restrict itself to any particular flavor of IPI. Any driver or
firmware implementation that does not order data writes before the IPI
may pose a risk for code-modifying race.
Thus, add a fence here to order data writes before making the IPI.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-8-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-6-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The new don't-stop-the-world text patching on change only one RISC-V
instruction:
| -8: &ftrace_ops of the associated tracer function.
| <ftrace enable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: jalr t0, lo(ftrace_caller)
|
| -8: &ftrace_nop_ops
| <ftrace disable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: nop
This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.
[ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ]
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The following ftrace patch for riscv uses a data store to update ftrace
function. Therefore, a romote fence is required to order it against
function_trace_op updates. The mechanism is similar to the fence between
function_trace_op and update_ftrace_func in the generic ftrace, so we
leverage the same ftrace_sync_ipi function.
[ alex: Fix build warning when !CONFIG_DYNAMIC_FTRACE ]
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-4-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.
However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.
GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.
Suggested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-3-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in
riscv, so we can remove ifdef jargons of WITH_ARG when it is known that
DYNAMIC_FTRACE is true.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-2-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS")
Acked-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
There is unmatched xe_vm_unlock() in the __xe_exec_queue_init().
Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM")
Fixes: 2b0a0ce0c20b ("drm/xe: Create LRC BO without VM")
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com
(cherry picked from commit 28b996ce73982a44fa86736ca0e3684cb1ae8b24)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Specifying VM during lrc->bo creation requires VM's reference
to be held for the lifetime of lrc->bo as it will use VM's dma
reservation object. Using VM's dma reservation object for
lrc->bo doesn't provide any advantage. Hence do not pass VM
while creating lrc->bo.
v2: Use xe_bo_unpin_map_no_vm (Matthew Brost)
Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc")
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit fbeaad071a98fef87deccee81d564de1c8e8e16d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF
around queue destruction") looks to have been unintentionally removed as
part of handling a conflict in some past merge commit. Add it back.
Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes")
Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250603174213.1543579-2-matthew.auld@intel.com
(cherry picked from commit 9d9fca62dc49d96f97045b6d8e7402a95f8cf92a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The define of the extension type was accidentally used instead of the
one of the property itself. They're both zero, so no functional issue,
but we should use the correct define for code correctness.
Fixes: 41a97c4a1294 ("drm/xe/pxp/uapi: Add API to mark a BO as using PXP")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-6-daniele.ceraolospurio@intel.com
(cherry picked from commit 1d891ee820fd0fbb4101eacb0d922b5050a24933)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
(cherry picked from commit 38fafa9f392f3110d2de431432d43f4eef99cd1b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 0af944f0e308 ("drm/xe: Reject BO eviction if BO is bound to current VM")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 9d5558649f68e2e84a87a909631b30e15ca0f8ec)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The XE driver can be built with or without VSEC support, but fails to link as
built-in if vsec is in a loadable module:
x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init':
(.text+0x1e83e16): undefined reference to `intel_vsec_register'
The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC',
forcing XE to be a loadable module as well, but that causes a circular
dependency:
symbol DRM_XE depends on INTEL_VSEC
symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES
symbol X86_PLATFORM_DEVICES is selected by DRM_XE
The problem here is selecting a symbol from another subsystem, so change
that as well and rephrase the 'select' into the corresponding dependency.
Since X86_PLATFORM_DEVICES is 'default y', there is no change to
defconfig builds here.
Fixes: 0c45e76fcc62 ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250529172355.2395634-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit e4931f8be347ec5f19df4d6d33aea37145378c42)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The result of integer comparison already evaluates to bool. No need for
explicit conversion.
No functional impact.
Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505292205.MoljmkjQ-lkp@intel.com/
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250529160937.490147-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 61761a6b57f2818983466d24aab60baab471ba21)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Move power2/curr2_crit to channel 1 i.e power1/curr1_crit as this
represents the entire card critical power/current.
v2: Update the date of curr1_crit also in hwmon documentation.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 345dadc4f68b ("drm/xe/hwmon: Add infra to support card power and energy attributes")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-3-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 25e963a09e059ffdb15c09cc79cfded855b43668)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Add support to manage power limits using pcode mailbox commands
for supported platforms.
v2:
- Address review comments. (Badal)
- Use mailbox commands instead of registers to manage power limits
for BMG.
- Clamp the maximum power limit to GPU firmware default value.
v3:
- Clamp power limit in write also for platforms with mailbox support.
v4:
- Remove unnecessary debug prints. (Badal)
v5:
- Update description of variable pl1_on_boot to fix kernel-doc error.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW.
- Rectify drm_warn to drm_info.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: e90f7a58e659 ("drm/xe/hwmon: Add HWMON support for BMG")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 7596d839f6228757fe17a810da2d1c5f3305078c)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
In xe_vm_close_and_put() we need to be able to call xe_svm_fini(),
however during vm creation we can call this on the error path, before
having actually initialised the svm state, leading to various splats
followed by a fatal NPD.
Fixes: 6fd979c2f331 ("drm/xe: Add SVM init / close / fini to faulting VMs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4967
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250514152424.149591-4-matthew.auld@intel.com
(cherry picked from commit 4f296d77cf49fcb5f90b4674123ad7f3a0676165)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
In xe_vm_close_and_put() we need to be able to call
flush_work(rebind_work), however during vm creation we can call this on
the error path, before having actually set up the worker, leading to a
splat from flush_work().
It looks like we can simply move the worker init step earlier to fix
this.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250514152424.149591-3-matthew.auld@intel.com
(cherry picked from commit 96af397aa1a2d1032a6e28ff3f4bc0ab4be40e1d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There are two new drivers this cycle. There is also support for a
negative offset for RTCs that have been shipped with a date set using
an epoch that is before 1970. This unfortunately happens with some
products that ship with a vendor kernel and an out of tree driver.
Core:
- support negative offsets for RTCs that have shipped with an epoch
earlier than 1970
New drivers:
- NXP S32G2/S32G3
- Sophgo CV1800
Drivers:
- loongson: fix missing alarm notifications for ACPI
- m41t80: kickstart ocillator upon failure
- mt6359: mt6357 support
- pcf8563: fix wrong alarm register
- sh: cleanups"
* tag 'rtc-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (39 commits)
rtc: mt6359: Add mt6357 support
rtc: test: Test date conversion for dates starting in 1900
rtc: test: Also test time and wday outcome of rtc_time64_to_tm()
rtc: test: Emit the seconds-since-1970 value instead of days-since-1970
rtc: Fix offset calculation for .start_secs < 0
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: pcf8563: fix wrong alarm register
rtc: rzn1: support input frequencies other than 32768Hz
rtc: rzn1: Disable controller before initialization
dt-bindings: rtc: rzn1: add optional second clock
rtc: m41t80: reduce verbosity
rtc: m41t80: kickstart ocillator upon failure
rtc: s32g: add NXP S32G2/S32G3 SoC support
dt-bindings: rtc: add schema for NXP S32G2/S32G3 SoCs
dt-bindings: at91rm9260-rtt: add microchip,sama7d65-rtt
dt-bindings: rtc: at91rm9200: add microchip,sama7d65-rtc
rtc: loongson: Add missing alarm notifications for ACPI RTC events
rtc: sophgo: add rtc support for Sophgo CV1800 SoC
rtc: stm32: drop unused module alias
rtc: s3c: drop unused module alias
...
|
|
Use the helper screen_info_video_type() to get the framebuffer
type from struct screen_info. Handle supported values in sorted
switch statement.
Reading orig_video_isVGA is unreliable. On most systems it is a
VIDEO_TYPE_ constant. On some systems with VGA it is simply set
to 1 to signal the presence of a VGA output. See vga_probe() for
an example. Retrieving the screen_info type with the helper
screen_info_video_type() detects these cases and returns the
appropriate VIDEO_TYPE_ constant. For VGA, sysfb creates a device
named "vga-framebuffer".
The sysfb code has been taken from vga16fb, where it likely didn't
work correctly either. With this bugfix applied, vga16fb loads for
compatible vga-framebuffer devices.
Fixes: 0db5b61e0dc0 ("fbdev/vga16fb: Create EGA/VGA devices in sysfb code")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "Uwe Kleine-König" <u.kleine-koenig@baylibre.com>
Cc: Zsolt Kajtar <soci@c64.rulez.org>
Cc: <stable@vger.kernel.org> # v6.1+
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20250603154838.401882-1-tzimmermann@suse.de
|
|
Apply PCI host-bridge window offsets to screen_info framebuffers. Fixes
invalid access to I/O memory.
Resources behind a PCI host bridge can be relocated by a certain offset
in the kernel's CPU address range used for I/O. The framebuffer memory
range stored in screen_info refers to the CPU addresses as seen during
boot (where the offset is 0). During boot up, firmware may assign a
different memory offset to the PCI host bridge and thereby relocating
the framebuffer address of the PCI graphics device as seen by the kernel.
The information in screen_info must be updated as well.
The helper pcibios_bus_to_resource() performs the relocation of the
screen_info's framebuffer resource (given in PCI bus addresses). The
result matches the I/O-memory resource of the PCI graphics device (given
in CPU addresses). As before, we store away the information necessary to
later update the information in screen_info itself.
Commit 78aa89d1dfba ("firmware/sysfb: Update screen_info for relocated
EFI framebuffers") added the code for updating screen_info. It is based
on similar functionality that pre-existed in efifb. Efifb uses a pointer
to the PCI resource, while the newer code does a memcpy of the region.
Hence efifb sees any updates to the PCI resource and avoids the issue.
v3:
- Only use struct pci_bus_region for PCI bus addresses (Bjorn)
- Clarify address semantics in commit messages and comments (Bjorn)
v2:
- Fixed tags (Takashi, Ivan)
- Updated information on efifb
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reported-by: "Ivan T. Ivanov" <iivanov@suse.de>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1240696
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Fixes: 78aa89d1dfba ("firmware/sysfb: Update screen_info for relocated EFI framebuffers")
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.9+
Link: https://lore.kernel.org/r/20250528080234.7380-1-tzimmermann@suse.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A fairly small update for the dmaengine subsystem. This has a new ARM
dmaengine driver and couple of new device support and few driver
changes:
New support:
- Renesas RZ/V2H(P) dma support for r9a09g057
- Arm DMA-350 driver
- Tegra Tegra264 ADMA support
Updates:
- AMD ptdma driver code removal and optimizations
- Freescale edma error interrupt handler support"
* tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (27 commits)
dmaengine: idxd: Remove unused pointer and macro
arm64: dts: renesas: r9a09g057: Add DMAC nodes
dmaengine: sh: rz-dmac: Add RZ/V2H(P) support
dmaengine: sh: rz-dmac: Allow for multiple DMACs
irqchip/renesas-rzv2h: Add rzv2h_icu_register_dma_req()
dt-bindings: dma: rz-dmac: Document RZ/V2H(P) family of SoCs
dt-bindings: dma: rz-dmac: Restrict properties for RZ/A1H
dmaengine: idxd: Narrow the restriction on BATCH to ver. 1 only
dmaengine: ti: Add NULL check in udma_probe()
fsldma: Set correct dma_mask based on hw capability
dmaengine: idxd: Check availability of workqueue allocated by idxd wq driver before using
dmaengine: xilinx_dma: Set dma_device directions
dmaengine: tegra210-adma: Add Tegra264 support
dt-bindings: Document Tegra264 ADMA support
dmaengine: dw-edma: Add HDMA NATIVE map check
dmaegnine: fsl-edma: add edma error interrupt handler
dt-bindings: dma: fsl-edma: increase maxItems of interrupts and interrupt-names
dmaengine: ARM_DMA350 should depend on ARM/ARM64
dt-bindings: dma: qcom,bam: Document dma-coherent property
dmaengine: Add Arm DMA-350 driver
...
|
|
to 2.55
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Update my email address in MAINTAINERS and .mailmap files.
Signed-off-by: Paulo Alcantara <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Document steps to use SMB over RDMA using the linux SMB client and
KSMBD server
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"As usual featuring couple of new driver and bunch of new device
support and some driver changes to Freescale, rockchip driver along
with couple of yaml binding conversions.
New Support:
- Qualcomm IPQ5424 qusb2 support, IPQ5018 uniphy-pcie driver
- Rockchip usb2 support for RK3562, RK3036 usb2 phy support
- Samsung exynos2200 eusb2 phy support and driver refactoring for
this support, exynos7870 USBDRD support
- Mediatek MT7988 xs-phy support
- Broadcom BCM74110 usb phy support
- Renesas RZ/V2H(P) usb2 phy support
Updates:
- Freescale phy rate claculation updates, i.MX95 tuning support
- Better error handling for amlogic pcie phy
- Rockchip color depth configuration and management support
- Yaml binding conversion for RK3399 Type-C and PCIe Phy"
* tag 'phy-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (77 commits)
phy: tegra: p2u: Broaden architecture dependency
phy: rockchip: inno-usb2: Add usb2 phy support for rk3562
dt-bindings: phy: rockchip,inno-usb2phy: add rk3562
phy: rockchip: inno-usb2: add phy definition for rk3036
dt-bindings: phy: rockchip,inno-usb2phy: add rk3036 compatible
phy: freescale: fsl-samsung-hdmi: Improve LUT search for best clock
phy: freescale: fsl-samsung-hdmi: Refactor finding PHY settings
phy: freescale: fsl-samsung-hdmi: Rename phy_clk_round_rate
phy: renesas: phy-rcar-gen3-usb2: Add USB2.0 PHY support for RZ/V2H(P)
phy: renesas: phy-rcar-gen3-usb2: Sort compatible entries by SoC part number
dt-bindings: phy: renesas,usb2-phy: Document RZ/V2H(P) SoC
dt-bindings: phy: renesas,usb2-phy: Add clock constraint for RZ/G2L family
phy: exynos5-usbdrd: support Exynos USBDRD 3.2 4nm controller
phy: phy-snps-eusb2: add support for exynos2200
phy: phy-snps-eusb2: refactor reference clock init
phy: phy-snps-eusb2: make reset control optional
phy: phy-snps-eusb2: make repeater optional
phy: phy-snps-eusb2: split phy init code
phy: phy-snps-eusb2: refactor constructs names
phy: move phy-qcom-snps-eusb2 out of its vendor sub-directory
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"A couple of small core changes and an Intel driver change:
- sdw_assign_device_num() logic simplification, using internal slave
id for irqs and optimizing computing of port params in specific
stream states
- Intel driver updates for ACE3+ microphone privacy status reporting
and enabling the status in HDA Intel driver"
* tag 'soundwire-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: only compute port params in specific stream states
ASoC: SOF: Intel: hda: Set the mic_privacy flag for soundwire with ACE3+
soundwire: intel: Add awareness of ACE3+ microphone privacy
soundwire: bus: Add internal slave ID and use for IRQs
soundwire: bus: Simplify sdw_assign_device_num()
|
|
syzbot reported that a recent patch forgot to unlock rcu
in the error path.
Adopt the convention that netlbl_conn_setattr() is already using.
Fixes: 6e9f2df1c550 ("calipso: Don't call calipso functions for AF_INET sk.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250604133826.1667664-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel currently validates that the length of the provided nexthop
address does not exceed the specified length. This can lead to the
kernel reading uninitialized memory if user space provided a shorter
length than the specified one.
Fix by validating that the provided length exactly matches the specified
one.
Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At the time rtnl_create_link() is running, dev->netdev_ops is NULL,
we must not use netdev_lock_ops() or risk a NULL deref if
CONFIG_NET_SHAPER is defined.
Use netif_set_group() instead of dev_set_group().
RIP: 0010:netdev_need_ops_lock include/net/netdev_lock.h:33 [inline]
RIP: 0010:netdev_lock_ops include/net/netdev_lock.h:41 [inline]
RIP: 0010:dev_set_group+0xc0/0x230 net/core/dev_api.c:82
Call Trace:
<TASK>
rtnl_create_link+0x748/0xd10 net/core/rtnetlink.c:3674
rtnl_newlink_create+0x25c/0xb00 net/core/rtnetlink.c:3813
__rtnl_newlink net/core/rtnetlink.c:3940 [inline]
rtnl_newlink+0x16d6/0x1c70 net/core/rtnetlink.c:4055
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6944
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
Reported-by: syzbot+9fc858ba0312b42b577e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6840265f.a00a0220.d4325.0009.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250604105815.1516973-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
from_cleanup_net() reads cleanup_net_task locklessly.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid
a potential KCSAN warning, even if the race is harmless.
Fixes: 0734d7c3d93c ("net: expedite synchronize_net() for cleanup_net()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250604093928.1323333-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 846742f7e32f ("selftests: drv-net: add a warning for
bkg + shell + terminate") added a warning for bkg() used
with terminate=True. The tso test was missed as we didn't
have it running anywhere in NIPA. Add exit_wait=True, to avoid:
# Warning: combining shell and terminate is risky!
# SIGTERM may not reach the child on zsh/ksh!
getting printed twice for every variant.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012055.891431-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The device type for IPv4 GRE is "gre" not "ipgre",
unlike for IPv6 which uses "ip6gre".
Not sure how I missed this when writing the test, perhaps
because all HW I have access to is on an IPv6-only network.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012031.891242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604001653.853008-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
iavf: get rid of the crit lock
Przemek Kitszel says:
Fix some deadlocks in iavf, and make it less error prone for the future.
Patch 1 is simple and independent from the rest.
Patches 2, 3, 4 are strictly a refactor, but it enables the last patch
to be much smaller.
(Technically Jake given his RB tags not knowing I will send it to -net).
Patch 5 just adds annotations, this also helps prove last patch to be correct.
Patch 6 removes the crit lock, with its unusual try_lock()s.
I have more refactoring for scheduling done for -next, to be sent soon.
There is a simple test:
add VF; decrease number of queueus; remove VF
that was way too hard to pass without this series :)
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: get rid of the crit lock
iavf: sprinkle netdev_assert_locked() annotations
iavf: extract iavf_watchdog_step() out of iavf_watchdog_task()
iavf: simplify watchdog_task in terms of adminq task scheduling
iavf: centralize watchdog requeueing itself
iavf: iavf_suspend(): take RTNL before netdev_lock()
====================
Link: https://patch.msgid.link/20250603171710.2336151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host. This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).
The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.
This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.
As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.
In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.
The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.
On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.
We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.
More details are available in our Netdev paper.
Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone <mirco.barone@polito.it>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|