Age | Commit message (Collapse) | Author |
|
Pull clang-format updates from Miguel Ojeda:
"A routine update of the 'for_each' macro list"
* tag 'clang-format-6.12' of https://github.com/ojeda/linux:
clang-format: Update with v6.11-rc1's `for_each` macro list
|
|
Currently the Rust support is gated on not having MODVERSIONS enabled,
and as a result an "allmodconfig" build will disable Rust build tests.
While MODVERSIONS configurations are worth build testing, the feature is
not actually meaningful unless you run the result, and I'd rather get
build coverage of Rust than MODVERSIONS. So let's disable MODVERSIONS
for build testing until the Rust side clears up.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up
objtool warnings), teach objtool about 'noreturn' Rust symbols and
mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we
should be objtool-warning-free, so enable it to run for all Rust
object files.
- KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.
- Support 'RUSTC_VERSION', including re-config and re-build on
change.
- Split helpers file into several files in a folder, to avoid
conflicts in it. Eventually those files will be moved to the right
places with the new build system. In addition, remove the need to
manually export the symbols defined there, reusing existing
machinery for that.
- Relax restriction on configurations with Rust + GCC plugins to just
the RANDSTRUCT plugin.
'kernel' crate:
- New 'list' module: doubly-linked linked list for use with reference
counted values, which is heavily used by the upcoming Rust Binder.
This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
unique for the given ID), 'AtomicTracker' (tracks whether a
'ListArc' exists using an atomic), 'ListLinks' (the prev/next
pointers for an item in a linked list), 'List' (the linked list
itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor
into a 'List' that allows to remove elements), 'ListArcField' (a
field exclusively owned by a 'ListArc'), as well as support for
heterogeneous lists.
- New 'rbtree' module: red-black tree abstractions used by the
upcoming Rust Binder.
This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a
node), 'RBTreeNodeReservation' (a memory reservation for a node),
'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor'
(bidirectional cursor that allows to remove elements), as well as
an entry API similar to the Rust standard library one.
- 'init' module: add 'write_[pin_]init' methods and the
'InPlaceWrite' trait. Add the 'assert_pinned!' macro.
- 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
introducing an associated type in the trait.
- 'alloc' module: add 'drop_contents' method to 'BoxExt'.
- 'types' module: implement the 'ForeignOwnable' trait for
'Pin<Box<T>>' and improve the trait's documentation. In addition,
add the 'into_raw' method to the 'ARef' type.
- 'error' module: in preparation for the upcoming Rust support for
32-bit architectures, like arm, locally allow Clippy lint for
those.
Documentation:
- https://rust.docs.kernel.org has been announced, so link to it.
- Enable rustdoc's "jump to definition" feature, making its output a
bit closer to the experience in a cross-referencer.
- Debian Testing now also provides recent Rust releases (outside of
the freeze period), so add it to the list.
MAINTAINERS:
- Trevor is joining as reviewer of the "RUST" entry.
And a few other small bits"
* tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits)
kasan: rust: Add KASAN smoke test via UAF
kbuild: rust: Enable KASAN support
rust: kasan: Rust does not support KHWASAN
kbuild: rust: Define probing macros for rustc
kasan: simplify and clarify Makefile
rust: cfi: add support for CFI_CLANG with Rust
cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
rust: support for shadow call stack sanitizer
docs: rust: include other expressions in conditional compilation section
kbuild: rust: replace proc macros dependency on `core.o` with the version text
kbuild: rust: rebuild if the version text changes
kbuild: rust: re-run Kconfig if the version text changes
kbuild: rust: add `CONFIG_RUSTC_VERSION`
rust: avoid `box_uninit_write` feature
MAINTAINERS: add Trevor Gross as Rust reviewer
rust: rbtree: add `RBTree::entry`
rust: rbtree: add cursor
rust: rbtree: add mutable iterator
rust: rbtree: add iterator
rust: rbtree: add red-black tree implementation backed by the C version
...
|
|
program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to
quantum in KFD for GFX12.
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
|
|
v1 - remove cs parse code (Christian)
On VCN v4_0_6 AV1 is supported on both the instances.
Remove cs IB parse code since explict handling of AV1 schedule is
not required.
Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VF FLR will be triggered by host driver before job timeout,
hence the error status of GPU get cleared. Performing a
coredump here is unnecessary.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch tries to solve the basic problem we also need to sync to
the KFD fences of the BO because otherwise it can be that we clear
PTEs while the KFD queues are still running.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
enable_level_process_quantum_check is requried to enable process
quantum based scheduling.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
|
|
Expose allowed group priorities with a new device query.
This new uAPI will be used in Mesa to properly report what priorities a
user can use for EGL_IMG_context_priority.
Since this extends the uAPI and because userland needs a way to
advertise priorities accordingly, this also bumps the driver minor
version.
v2:
- Remove drm_panthor_group_allow_priority_flags definition
- Document that allowed_mask is a bitmask of drm_panthor_group_priority
v3:
- Use BIT macro in panthor_query_group_priorities_info
- Add r-b from Steven Price and Boris Brezillon
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240909064820.34982-4-mary.guillemard@collabora.com
|
|
This adds a new value to drm_panthor_group_priority exposing the
realtime priority to userspace.
This is required to implement NV_context_priority_realtime in Mesa.
v2:
- Add Steven Price r-b
v3:
- Add Boris Brezillon r-b
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240909064820.34982-3-mary.guillemard@collabora.com
|
|
Add a test case for tracepoint events on modules. This checks if it can add
and remove the events correctly.
Link: https://lore.kernel.org/all/172397781494.286558.7581515061075998225.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Support raw tracepoint events on future loaded (unloaded) modules.
This allows user to create raw tracepoint events which can be used from
module's __init functions.
Note: since the kernel does not have any information about the tracepoints
in the unloaded modules, fprobe events can not check whether the tracepoint
exists nor extend the BTF based arguments.
Link: https://lore.kernel.org/all/172397780593.286558.18360375226968537828.stgit@devnote2/
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Support raw tracepoint event on module by fprobe events.
Since it only uses for_each_kernel_tracepoint() to find a tracepoint,
the tracepoints on modules are not handled. Thus if user specified a
tracepoint on a module, it shows an error.
This adds new for_each_module_tracepoint() API to tracepoint subsystem,
and uses it to find tracepoints on modules.
Link: https://lore.kernel.org/all/172397779651.286558.15903703620679186867.stgit@devnote2/
Reported-by: don <zds100@gmail.com>
Closes: https://lore.kernel.org/all/20240530215718.aeec973a1d0bf058d39cb1e3@kernel.org/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Add for_each_tracepoint_in_module() function to iterate tracepoints in
a module. This API is needed for handling tracepoints in a loading
module from tracepoint_module_notifier callback function.
This also update for_each_module_tracepoint() to pass the module to
callback function so that it can find module easily.
Link: https://lore.kernel.org/all/172397778740.286558.15781131277732977643.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Add for_each_module_tracepoint() for iterating over tracepoints
on modules. This is similar to the for_each_kernel_tracepoint()
but only for the tracepoints on modules (not including kernel
built-in tracepoints).
Link: https://lore.kernel.org/all/172397777800.286558.14554748203446214056.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The PVH entry point is 32bit. For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables. In the past,
PVH used init_top_pgt.
This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address. If the kernel is loaded at a
different address, they need to be adjusted.
__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code. The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.
64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map. pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap. The direct map is used for __va() on the initramfs
and other guest physical addresses.
Add a dedicated set of prebuild page tables for PVH entry. They are
adjusted in assembly before loading.
Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints. The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page. It could be larger with more pages.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-6-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to pgtable_64.h, so they can be re-used.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Alfred Agrell found that TOMOYO cannot handle execveat(AT_EMPTY_PATH)
inside chroot environment where /dev and /proc are not mounted, for
commit 51f39a1f0cea ("syscalls: implement execveat() system call") missed
that TOMOYO tries to canonicalize argv[0] when the filename fed to the
executed program as argv[0] is supplied using potentially nonexistent
pathname.
Since "/dev/fd/<fd>" already lost symlink information used for obtaining
that <fd>, it is too late to reconstruct symlink's pathname. Although
<filename> part of "/dev/fd/<fd>/<filename>" might not be canonicalized,
TOMOYO cannot use tomoyo_realpath_nofollow() when /dev or /proc is not
mounted. Therefore, fallback to tomoyo_realpath_from_path() when
tomoyo_realpath_nofollow() failed.
Reported-by: Alfred Agrell <blubban@gmail.com>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1082001
Fixes: 51f39a1f0cea ("syscalls: implement execveat() system call")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
|
|
default regs values
CDC_RX_BCL_VBAT_RF_PROC1 is listed twice and its default value
is 0x2a which is overwriten by its next occurence in rx_defaults[].
The second one should be missing CDC_RX_BCL_VBAT_RF_PROC2 instead
and its default value is expected 0x0.
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://patch.msgid.link/20240925043823.520218-2-alexey.klimov@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page. Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-4-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
The PVH entrypoint is 32bit non-PIC code running the uncompressed
vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
(16MB). The kernel is loaded at that physical address inside the VM by
the VMM software (Xen/QEMU).
When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
into the PVH container. There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB. This creates a conflict where the PVH
kernel cannot be loaded at that address.
Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address. Only the 64bit entry path is converted. A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.
This makes the code PIC, but the page tables need to be updated as well
to handle running from the kernel high map.
The UNWIND_HINT_END_OF_STACK is to silence:
vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
after the lret into 64bit code.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-3-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Sync Xen's elfnote.h header from xen.git to pull in the
XEN_ELFNOTE_PHYS32_RELOC define.
xen commit dfc9fab00378 ("x86/PVH: Support relocatable dom0 kernels")
This is a copy except for the removal of the emacs editor config at the
end of the file.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-2-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Add a new V3D parameter to expose the support of Super Pages to
userspace. The userspace might want to know this information to
apply optimizations that are specific to kernels with Super Pages
enabled.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-12-mcanal@igalia.com
|
|
Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-11-mcanal@igalia.com
|
|
Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.
Therefore, as V3D has a mountpoint with "huge=within_size" (if user has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make use
of Big/Super Pages.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-10-mcanal@igalia.com
|
|
The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.
In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-9-mcanal@igalia.com
|
|
Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.
This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].
By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs
of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).
Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].
[1] https://github.com/zmike/vkoverhead/issues/14
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-8-mcanal@igalia.com
|
|
Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.
This function will be useful for drivers that have a special mountpoint
with flags enabled.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-7-mcanal@igalia.com
|
|
Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-6-mcanal@igalia.com
|
|
For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.
Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-5-mcanal@igalia.com
|
|
If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-4-mcanal@igalia.com
|
|
We must ensure that the MMU is flushed before we supply more memory to
the binner, otherwise we might end up with invalid MMU accesses by the
GPU.
Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-3-mcanal@igalia.com
|
|
We must first flush the MMU cache and then, flush the TLB, not the other
way around. Currently, we can see a race condition between the MMU cache
and the TLB when running multiple rendering processes at the same time.
This is evidenced by MMU errors triggered by the IRQ.
Fix the MMU flush order by flushing the MMU cache and then the TLB.
Also, in order to address the race condition, wait for the MMU cache flush
to finish before starting the TLB flush.
Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240923141348.2422499-2-mcanal@igalia.com
|
|
The init_test_probes() have been removed since
commit e44e81c5b90f ("kprobes: convert tests to kunit"), and now
it is useless, so remove it.
Link: https://lore.kernel.org/all/20240826032552.4016314-1-cuigaosheng1@huawei.com/
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
trace_uprobe->nhit counter is not incremented atomically, so its value
is questionable in when uprobe is hit on multiple CPUs simultaneously.
Also, doing this shared counter increment across many CPUs causes heavy
cache line bouncing, limiting uprobe/uretprobe performance scaling with
number of CPUs.
Solve both problems by making this a per-CPU counter.
Link: https://lore.kernel.org/all/20240813203409.3985398-1-andrii@kernel.org/
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
When the driver needs to send new packets to the device, it always
queues the new sk_buffs into an intermediate queue (send_pkt_queue)
and schedules a worker (send_pkt_work) to then queue them into the
virtqueue exposed to the device.
This increases the chance of batching, but also introduces a lot of
latency into the communication. So we can optimize this path by
adding a fast path to be taken when there is no element in the
intermediate queue, there is space available in the virtqueue,
and no other process that is sending packets (tx_lock held).
The following benchmarks were run to check improvements in latency and
throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz
and L1 guest running on QEMU/KVM with vhost process and all vCPUs
pinned individually to pCPUs.
- Latency
Tool: Fio version 3.37-56
Mode: pingpong (h-g-h)
Test runs: 50
Runtime-per-test: 50s
Type: SOCK_STREAM
In the following fio benchmark (pingpong mode) the host sends
a payload to the guest and waits for the same payload back.
fio process pinned both inside the host and the guest system.
Before: Linux 6.9.8
Payload 64B:
1st perc. overall 99th perc.
Before 12.91 16.78 42.24 us
After 9.77 13.57 39.17 us
Payload 512B:
1st perc. overall 99th perc.
Before 13.35 17.35 41.52 us
After 10.25 14.11 39.58 us
Payload 4K:
1st perc. overall 99th perc.
Before 14.71 19.87 41.52 us
After 10.51 14.96 40.81 us
- Throughput
Tool: iperf-vsock
The size represents the buffer length (-l) to read/write
P represents the number of parallel streams
P=1
4K 64K 128K
Before 6.87 29.3 29.5 Gb/s
After 10.5 39.4 39.9 Gb/s
P=2
4K 64K 128K
Before 10.5 32.8 33.2 Gb/s
After 17.8 47.7 48.5 Gb/s
P=4
4K 64K 128K
Before 12.7 33.6 34.2 Gb/s
After 16.9 48.1 50.5 Gb/s
The performance improvement is related to this optimization,
I used a ebpf kretprobe on virtio_transport_send_skb to check
that each packet was sent directly to the virtqueue
Co-developed-by: Marco Pinna <marco.pinn95@gmail.com>
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
|
|
Preliminary patch to introduce an optimization to the
enqueue system.
All the code used to enqueue a packet into the virtqueue
is removed from virtio_transport_send_pkt_work()
and moved to the new virtio_transport_send_skb() function.
Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This 'struct kobj_type' is not modified. It is only used in
kobject_init_and_add() which takes a 'const struct kobj_type *ktype'
parameter.
Constifying this structure and moving it to a read-only section,
and this can increase over all security.
```
[Before]
text data bss dec hex filename
5974 1008 96 7078 1ba6 drivers/firmware/qemu_fw_cfg.o
[After]
text data bss dec hex filename
6038 944 96 7078 1ba6 drivers/firmware/qemu_fw_cfg.o
```
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Message-Id: <20240904011743.2010319-1-lihongbo22@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Currently, when a new MR is set up, the old MR is deleted. MR deletion
is about 30-40% the time of MR creation. As deleting the old MR is not
important for the process of setting up the new MR, this operation
can be postponed.
This series adds a workqueue that does MR garbage collection at a later
point. If the MR lock is taken, the handler will back off and
reschedule. The exception during shutdown: then the handler must
not postpone the work.
Note that this is only a speculative optimization: if there is some
mapping operation that is triggered while the garbage collector handler
has the lock taken, this operation it will have to wait for the handler
to finish.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240830105838.2666587-9-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
There's currently not a lot of action happening during
the init/destroy of MR resources. But more will be added
in the upcoming patches.
As the mr mutex lock init/destroy has been moved to these
new functions, the lifetime has now shifted away from
mlx5_vdpa_alloc_resources() / mlx5_vdpa_free_resources()
into these new functions. However, the lifetime at the
outer scope remains the same:
mlx5_vdpa_dev_add() / mlx5_vdpa_dev_free()
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240830105838.2666587-8-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Now that the mr resources have their own namespace in the
struct, give the lock a clearer name.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20240830105838.2666587-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Group all mapping related resources into their own structure.
Upcoming patches will add more members in this new structure.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20240830105838.2666587-6-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
A followup patch will use this name for something else.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240830105838.2666587-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Use the async interface to issue MTT MKEY deletion.
This makes destroy_user_mr() on average 8x times faster.
This number is also dependent on the size of the MR being
deleted.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20240830105838.2666587-4-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Use the async interface to issue MTT MKEY creation.
Extra care is taken at the allocation of FW input commands
due to the MTT tables having variable sizes depending on
MR.
The indirect MKEY is still created synchronously at the
end as the direct MKEYs need to be filled in.
This makes create_user_mr() 3-5x faster, depending on
the size of the MR.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240830105838.2666587-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The virtio-vsock driver is already under VM SOCKETS (AF_VSOCK),
managed pricipally with the net tree, and VIRTIO AND VHOST
VSOCK DRIVER. However, changes that only affect the virtio part
usually go with Michael's tree, so let's also put the driver in
the VIRTIO CORE section to have its maintainers in CC for changes
to the virtio-vsock driver.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240829143757.85844-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Introduce sysfs entries to provide visibility to the multiple queues
used by the Virtio FS device. This enhancement allows users to query
information about these queues.
Specifically, add two sysfs entries:
1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8).
2. CPU list: Shows the list of CPUs that can process requests for each
queue.
The CPU list feature is inspired by similar functionality in the block
MQ layer, which provides analogous sysfs entries for block devices.
These new sysfs entries will improve observability and aid in debugging
and performance tuning of Virtio FS devices.
Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Introduce a new helper function virtio_fs_put_locked to encapsulate the
common pattern of releasing a virtio_fs reference while holding a lock.
The existing virtio_fs_put helper will be used to release a virtio_fs
reference while not holding a lock.
Also add an assertion in case the lock is not taken when it should.
Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
|