Age | Commit message (Collapse) | Author |
|
Initialise ki_complete during request prep stage, we'll depend on it not
being reset during issue in the following patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/817624086bd5f0448b08c80623399919fda82f34.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We want to avoid checking ->ki_complete directly in the io_uring
completion path. Fortunately we have only two callback the selection
of which depend on the ring constant flags, i.e. IOPOLL, so use that
to infer the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4eb4bdab8cbcf5bc87083f7047edc81e920ab83c.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
At the moment we can't sanely handle queuing an async request from a
multishot context, so disable them. It shouldn't matter as pollable
files / socekts don't normally do async.
Patching it in __io_read() is not the cleanest way, but it's simpler
than other options, so let's fix it there and clean up on top.
Cc: stable@vger.kernel.org
Reported-by: chase xd <sl1589472800@gmail.com>
Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7d51732c125159d17db4fe16f51ec41b936973f8.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
IO_NODE_ALLOC_CACHE_MAX has been unused since commit fbbb8e991d86
("io_uring/rsrc: get rid of io_rsrc_node allocation cache") removed the
rsrc_node_cache.
IO_RSRC_TAG_TABLE_SHIFT and IO_RSRC_TAG_TABLE_MASK have been unused
since commit 7029acd8a950 ("io_uring/rsrc: get rid of per-ring
io_rsrc_node list") removed the separate tag table for registered nodes.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20250219033444.2020136-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is obviously not that important, but when changes are synced back
from the kernel to liburing, the codespell CI ends up erroring because
of this misspelling. Let's just correct it and avoid this biting us
again on an import.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_uring_create() computes ctx->lockless_cq as:
ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)
So use it to simplify that expression in io_req_complete_post().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20250212005119.3433005-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked") removed the
only field of io_tw_state but kept it as a task work callback argument
to "forc[e] users not to invoke them carelessly out of a wrong context".
Passing the struct io_tw_state * argument adds a few instructions to all
callers that can't inline the functions and see the argument is unused.
So pass struct io_tw_state by value instead. Since it's a 0-sized value,
it can be passed without any instructions needed to initialize it.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for changing how io_tw_state is passed, introduce a type
alias io_tw_token_t for struct io_tw_state *. This allows for changing
the representation in one place, without having to update the many
functions that just forward their struct io_tw_state * argument.
Also add a comment to struct io_tw_state to explain its purpose.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Most callers of io_put_rsrc_node() already check that node is non-NULL:
- io_rsrc_data_free()
- io_sqe_buffer_register()
- io_reset_rsrc_node()
- io_req_put_rsrc_nodes() (REQ_F_BUF_NODE indicates non-NULL buf_node)
Only io_splice_cleanup() can call io_put_rsrc_node() with a NULL node.
So move the NULL check there.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250216225900.1075446-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_init_req_drain() takes a struct io_kiocb *req argument but only uses
it to get struct io_ring_ctx *ctx. The caller already knows the ctx, so
pass it instead.
Drop "req" from the function name since it operates on the ctx rather
than a specific req.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250212164807.3681036-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace the 2 instances of REQ_F_LINK | REQ_F_HARDLINK with
the more commonly used IO_REQ_LINK_FLAGS.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250211202002.3316324-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Current recv bundles are only supported for multishot receives, and
additionally they also always post at least 2 CQEs if more data is
available than what a buffer will hold. This happens because the initial
bundle recv will do a single buffer, and then do the rest of what is in
the socket as a followup receive. As shown in a test program, if 1k
buffers are available and 32k is available to receive in the socket,
you'd get the following completions:
bundle=1, mshot=0
cqe res 1024
cqe res 1024
[...]
cqe res 1024
bundle=1, mshot=1
cqe res 1024
cqe res 31744
where bundle=1 && mshot=0 will post 32 1k completions, and bundle=1 &&
mshot=1 will post a 1k completion and then a 31k completion.
To support bundle recv without multishot, it's possible to simply retry
the recv immediately and post a single completion, rather than split it
into two completions. With the below patch, the same test looks as
follows:
bundle=1, mshot=0
cqe res 32768
bundle=1, mshot=1
cqe res 32768
where mshot=0 works fine for bundles, and both of them post just a
single 32k completion rather than split it into separate completions.
Posting fewer completions is always a nice win, and not needing
multishot for proper bundle efficiency is nice for cases that can't
necessarily use multishot.
Reported-by: Norman Maurer <norman_maurer@apple.com>
Link: https://lore.kernel.org/r/184f9f92-a682-4205-a15d-89e18f664502@kernel.dk
Fixes: 2f9c9515bdfd ("io_uring/net: support bundles for recv")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't implement our own loop rolling and checking, just use the generic
helper to find and cancel requests.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't implement our own loop rolling and checking, just use the generic
helper to find and cancel requests.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Any opcode that is cancelable ends up defining its own cancel helper
for finding and canceling a specific request. Add a generic helper that
can be used for this purpose.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the generic helper for cancelations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the generic helper for cancelations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Any opcode that is cancelable ends up defining its own remove all
helper, which iterates the pending list and cancels matches. Add a
generic helper for it, which can be used by them.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_put_kbufs() and other helper functions are too large to be inlined,
compilers would normally refuse to do so. Uninline it and move together
with io_kbuf_commit into kbuf.c.
io_kbuf_commitSigned-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3dade7f55ad590e811aff83b1ec55c9c04e17b2b.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_kbuf_drop() is only used for legacy provided buffers, and so
__io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the
dead branch out of __io_put_kbuf_list(), rename it into
io_kbuf_drop_legacy() and use it directly instead of io_kbuf_drop().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c8cc73e2272f09a86ecbdad9ebdd8304f8e583c0.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_put_kbuf() is a trivial wrapper, open code it into
__io_put_kbufs().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9dc17380272b48d56c95992c6f9eaacd5546e1d3.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove all struct io_buffer caches. It makes it a fair bit simpler.
Apart from from killing a bunch of lines and juggling between lists,
__io_put_kbuf_list() doesn't need ->completion_lock locking now.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As a preparation step remove an optimisation from __io_put_kbuf() trying
to use the locked cache. With that __io_put_kbuf_list() is only used
with ->io_buffers_comp, and we remove the explicit list argument.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b7f1394ec4afc7f96b35a61f5992e27c49fd067.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move the burden of locking out of the caller into io_kbuf_drop(), that
will help with furher refactoring.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/530f0cf1f06963029399f819a9a58b1a34bebef3.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the kmem cache used by legacy provided buffers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Legacy provided buffers are slow and discouraged in favour of the ring
variant. Remove the bulk allocation to keep it simpler as we don't care
about performance.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a064d70370e590efed8076e9501ae4cfc20fe0ca.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Do all struct io_uring_params validation early on before allocating the
context. That makes initialisation easier, especially by having fewer
places where we need to care about partial de-initialisation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/363ba90b83ff78eefdc88b60e1b2c4a39d182247.1738344646.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
alloc_workqueue() can fail even during init in io_uring_init(), check
the result and panic if anything went wrong.
Fixes: 73eaa2b583493 ("io_uring: use private workqueue for exit work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3a046063902f888f66151f89fa42f84063b9727b.1738343083.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a function that frees all ring caches since we already have two
spots repeating the same thing and it's easy to miss it and change only
one of them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b6b0125677c58bdff99eda91ab320137406e8562.1738342562.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The only caller has already determined this pointer, so let's skip
the redundant dereference.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-7-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Previously, the `hash` variable was initialized with `-1` and only
updated by io_get_next_work() if the current work was hashed. Commit
60cf46ae6054 ("io-wq: hash dependent work") changed this to always
call io_get_work_hash() even if the work was not hashed. This caused
the `hash != -1U` check to always be true, adding some overhead for
the `hash->wait` code.
This patch fixes the regression by checking the `IO_WQ_WORK_HASHED`
flag.
Perf diff for a flood of `IORING_OP_NOP` with `IOSQE_ASYNC`:
38.55% -1.57% [kernel.kallsyms] [k] queued_spin_lock_slowpath
6.86% -0.72% [kernel.kallsyms] [k] io_worker_handle_work
0.10% +0.67% [kernel.kallsyms] [k] put_prev_entity
1.96% +0.59% [kernel.kallsyms] [k] io_nop_prep
3.31% -0.51% [kernel.kallsyms] [k] try_to_wake_up
7.18% -0.47% [kernel.kallsyms] [k] io_wq_free_work
Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-6-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This eliminates several redundant atomic reads and therefore reduces
the duration the surrounding spinlocks are held.
In several io_uring benchmarks, this reduced the CPU time spent in
queued_spin_lock_slowpath() considerably:
io_uring benchmark with a flood of `IORING_OP_NOP` and `IOSQE_ASYNC`:
38.86% -1.49% [kernel.kallsyms] [k] queued_spin_lock_slowpath
6.75% +0.36% [kernel.kallsyms] [k] io_worker_handle_work
2.60% +0.19% [kernel.kallsyms] [k] io_nop
3.92% +0.18% [kernel.kallsyms] [k] io_req_task_complete
6.34% -0.18% [kernel.kallsyms] [k] io_wq_submit_work
HTTP server, static file:
42.79% -2.77% [kernel.kallsyms] [k] queued_spin_lock_slowpath
2.08% +0.23% [kernel.kallsyms] [k] io_wq_submit_work
1.19% +0.20% [kernel.kallsyms] [k] amd_iommu_iotlb_sync_map
1.46% +0.15% [kernel.kallsyms] [k] ep_poll_callback
1.80% +0.15% [kernel.kallsyms] [k] io_worker_handle_work
HTTP server, PHP:
35.03% -1.80% [kernel.kallsyms] [k] queued_spin_lock_slowpath
0.84% +0.21% [kernel.kallsyms] [k] amd_iommu_iotlb_sync_map
1.39% +0.12% [kernel.kallsyms] [k] _copy_to_iter
0.21% +0.10% [kernel.kallsyms] [k] update_sd_lb_stats
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-5-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Have separate linked lists for bounded and unbounded workers. This
way, io_acct_activate_free_worker() sees only workers relevant to it
and doesn't need to skip irrelevant ones. This speeds up the
linked list traversal (under acct->lock).
The `io_wq.lock` field is moved to `io_wq_acct.workers_lock`. It did
not actually protect "access to elements below", that is, not all of
them; it only protected access to the worker lists. By having two
locks instead of one, contention on this lock is reduced.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-4-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This replaces the `IO_WORKER_F_BOUND` flag. All code that checks this
flag is not interested in knowing whether this is a "bound" worker;
all it does with this flag is determine the `io_wq_acct` pointer. At
the cost of an extra pointer field, we can eliminate some fragile
pointer arithmetic. In turn, the `create_index` and `index` fields
are not needed anymore.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-3-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of calling io_work_get_acct() again, pass acct to
io_wq_insert_work() and io_wq_remove_pending().
This atomic access in io_work_get_acct() was done under the
`acct->lock`, and optimizing it away reduces lock contention a bit.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-2-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix annoying logs when building tools in parallel
- Fix the Debian linux-headers package build again
- Fix the target triple detection for userspace programs on Clang
* tag 'kbuild-fixes-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: Fix a few typos in a comment
kbuild: userprogs: fix bitsize and target detection on clang
kbuild: fix linux-headers package build when $(CC) cannot link userspace
tools: fix annoying "mkdir -p ..." logs when building tools in parallel
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core api addition from Greg KH:
"Here is a driver core new api for 6.14-rc3 that is being added to
allow platform devices from stop being abused.
It adds a new 'faux_device' structure and bus and api to allow almost
a straight or simpler conversion from platform devices that were not
really a platform device. It also comes with a binding for rust, with
an example driver in rust showing how it's used.
I'm adding this now so that the patches that convert the different
drivers and subsystems can all start flowing into linux-next now
through their different development trees, in time for 6.15-rc1.
We have a number that are already reviewed and tested, but adding
those conversions now doesn't seem right. For now, no one is using
this, and it passes all build tests from 0-day and linux-next, so all
should be good"
* tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
rust/kernel: Add faux device bindings
driver core: add a faux bus for use when a simple device/bus is needed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for some reported problems.
Nothing major, just:
- sc16is7xx irq check fix
- 8250 fifo underflow fix
- serial_port and 8250 iotype fixes
Most of these have been in linux-next already, and all have passed
0-day testing"
* tag 'tty-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: Fix fifo underflow on flush
serial: 8250_pnp: Remove unneeded ->iotype assignment
serial: 8250_platform: Remove unneeded ->iotype assignment
serial: 8250_of: Remove unneeded ->iotype assignment
serial: port: Make ->iotype validation global in __uart_read_properties()
serial: port: Always update ->iotype in __uart_read_properties()
serial: port: Assign ->iotype correctly when ->iobase is set
serial: sc16is7xx: Fix IRQ number check behavior
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes, and new device ids, for
6.14-rc3. Lots of tiny stuff for reported problems, including:
- new device ids and quirks
- usb hub crash fix found by syzbot
- dwc2 driver fix
- dwc3 driver fixes
- uvc gadget driver fix
- cdc-acm driver fixes for a variety of different issues
- other tiny bugfixes
Almost all of these have been in linux-next this week, and all have
passed 0-day testing"
* tag 'usb-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
usb: typec: tcpm: PSSourceOffTimer timeout in PR_Swap enters ERROR_RECOVERY
usb: roles: set switch registered flag early on
usb: gadget: uvc: Fix unstarted kthread worker
USB: quirks: add USB_QUIRK_NO_LPM quirk for Teclast dist
usb: gadget: core: flush gadget workqueue after device removal
USB: gadget: f_midi: f_midi_complete to call queue_work
usb: core: fix pipe creation for get_bMaxPacketSize0
usb: dwc3: Fix timeout issue during controller enter/exit from halt state
USB: Add USB_QUIRK_NO_LPM quirk for sony xperia xz1 smartphone
USB: cdc-acm: Fill in Renesas R-Car D3 USB Download mode quirk
usb: cdc-acm: Fix handling of oversized fragments
usb: cdc-acm: Check control transfer buffer size before access
usb: xhci: Restore xhci_pci support for Renesas HCs
USB: pci-quirks: Fix HCCPARAMS register error for LS7A EHCI
USB: serial: option: drop MeiG Smart defines
USB: serial: option: fix Telit Cinterion FN990A name
USB: serial: option: add Telit Cinterion FN990B compositions
USB: serial: option: add MeiG Smart SLM828
usb: gadget: f_midi: fix MIDI Streaming descriptor lengths
usb: dwc2: gadget: remove of_node reference upon udc_stop
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq Kconfig cleanup from Borislav Petkov:
- Remove an unused config item GENERIC_PENDING_IRQ_CHIPFLAGS
* tag 'irq_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove unused CONFIG_GENERIC_PENDING_IRQ_CHIPFLAGS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Explicitly clear DEBUGCTL.LBR to prevent LBRs continuing being
enabled after handoff to the OS
- Check CPUID(0x23) leaf and subleafs presence properly
- Remove the PEBS-via-PT feature from being supported on hybrid systems
- Fix perf record/top default commands on systems without a raw PMU
registered
* tag 'perf_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Ensure LBRs are disabled when a CPU is starting
perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF
perf/x86/intel: Clean up PEBS-via-PT on hybrid
perf/x86/rapl: Fix the error checking order
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
- Clarify what happens when a task is woken up from the wake queue and
make clear its removal from that queue is atomic
* tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Clarify wake_up_q()'s write to task->wake_q.next
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
- Move a warning about a lld.ld breakage into the verbose setting as
said breakage has been fixed in the meantime
- Teach objtool to ignore dangling jump table entries added by Clang
* tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Move dodgy linker warn to verbose
objtool: Ignore dangling jump table entries
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Large set of fixes for vector handling, especially in the
interactions between host and guest state.
This fixes a number of bugs affecting actual deployments, and
greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland
for dealing with this thankless task.
- Fix an ugly race between vcpu and vgic creation/init, resulting in
unexpected behaviours
- Fix use of kernel VAs at EL2 when emulating timers with nVHE
- Small set of pKVM improvements and cleanups
x86:
- Fix broken SNP support with KVM module built-in, ensuring the PSP
module is initialized before KVM even when the module
infrastructure cannot be used to order initcalls
- Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being
emulated by KVM to fix a NULL pointer dereference
- Enter guest mode (L2) from KVM's perspective before initializing
the vCPU's nested NPT MMU so that the MMU is properly tagged for
L2, not L1
- Load the guest's DR6 outside of the innermost .vcpu_run() loop, as
the guest's value may be stale if a VM-Exit is handled in the
fastpath"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
x86/sev: Fix broken SNP support with KVM module built-in
KVM: SVM: Ensure PSP module is initialized if KVM module is built-in
crypto: ccp: Add external API interface for PSP module initialization
KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic()
KVM: arm64: timer: Drop warning on failed interrupt signalling
KVM: arm64: Fix alignment of kvm_hyp_memcache allocations
KVM: arm64: Convert timer offset VA when accessed in HYP code
KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp()
KVM: arm64: Eagerly switch ZCR_EL{1,2}
KVM: arm64: Mark some header functions as inline
KVM: arm64: Refactor exit handlers
KVM: arm64: Refactor CPTR trap deactivation
KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
KVM: arm64: Remove host FPSIMD saving for non-protected KVM
KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop
KVM: nSVM: Enter guest mode before initializing nested NPT MMU
KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC
KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
"Fix for o32 ptrace/get_syscall_info"
* tag 'mips-fixes_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix mips_get_syscall_arg() for o32
MIPS: Export syscall stack arguments properly for remote use
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Add bindings for QCom QCS8300 clocks, QCom SAR2130P qfprom, and
powertip,{st7272|hx8238a} displays
- Fix compatible for TI am62a7 dss
- Add a kunit test for __of_address_resource_bounds()
* tag 'devicetree-fixes-for-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: display: Add powertip,{st7272|hx8238a} as DT Schema description
dt-bindings: nvmem: qcom,qfprom: Add SAR2130P compatible
dt-bindings: display: ti: Fix compatible for am62a7 dss
of: address: Add kunit test for __of_address_resource_bounds()
dt-bindings: clock: qcom: Add QCS8300 video clock controller
dt-bindings: clock: qcom: Add CAMCC clocks for QCS8300
dt-bindings: clock: qcom: Add GPU clocks for QCS8300
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Richard Weinberger:
- Align signal stack correctly
- Convert to raw spinlocks where needed (irq and virtio)
- FPU related fixes
* tag 'uml-for-linus-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: convert irq_lock to raw spinlock
um: virtio_uml: use raw spinlock
um: virt-pci: don't use kmalloc()
um: fix execve stub execution on old host OSs
um: properly align signal stack on x86_64
um: avoid copying FP state from init_task
um: add back support for FXSAVE registers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace ring buffer fixes from Steven Rostedt:
- Enable resize on mmap() error
When a process mmaps a ring buffer, its size is locked and resizing
is disabled. But if the user passes in a wrong parameter, the mmap()
can fail after the resize was disabled and the mmap() exits with
error without reenabling the ring buffer resize. This prevents the
ring buffer from ever being resized after that. Reenable resizing of
the ring buffer on mmap() error.
- Have resizing return proper error and not always -ENOMEM
If the ring buffer is mmapped by one task and another task tries to
resize the buffer it will error with -ENOMEM. This is confusing to
the user as there may be plenty of memory available. Have it return
the error that actually happens (in this case -EBUSY) where the user
can understand why the resize failed.
- Test the sub-buffer array to validate persistent memory buffer
On boot up, the initialization of the persistent memory buffer will
do a validation check to see if the content of the data is valid, and
if so, it will use the memory as is, otherwise it re-initializes it.
There's meta data in this persistent memory that keeps track of which
sub-buffer is the reader page and an array that states the order of
the sub-buffers. The values in this array are indexes into the
sub-buffers. The validator checks to make sure that all the entries
in the array are within the sub-buffer list index, but it does not
check for duplications.
While working on this code, the array got corrupted and had
duplicates, where not all the sub-buffers were accounted for. This
passed the validator as all entries were valid, but the link list was
incorrect and could have caused a crash. The corruption only produced
incorrect data, but it could have been more severe. To fix this,
create a bitmask that covers all the sub-buffer indexes and set it to
all zeros. While iterating the array checking the values of the array
content, have it set a bit corresponding to the index in the array.
If the bit was already set, then it is a duplicate and mark the
buffer as invalid and reset it.
- Prevent mmap()ing persistent ring buffer
The persistent ring buffer uses vmap() to map the persistent memory.
Currently, the mmap() logic only uses virt_to_page() to get the page
from the ring buffer memory and use that to map to user space. This
works because a normal ring buffer uses alloc_page() to allocate its
memory. But because the persistent ring buffer use vmap() it causes a
kernel crash.
Fixing this to work with vmap() is not hard, but since mmap() on
persistent memory buffers never worked, just have the mmap() return
-ENODEV (what was returned before mmap() for persistent memory ring
buffers, as they never supported mmap. Normal buffers will still
allow mmap(). Implementing mmap() for persistent memory ring buffers
can wait till the next merge window.
- Fix polling on persistent ring buffers
There's a "buffer_percent" option (default set to 50), that is used
to have reads of the ring buffer binary data block until the buffer
fills to that percentage. The field "pages_touched" is incremented
every time a new sub-buffer has content added to it. This field is
used in the calculations to determine the amount of content is in the
buffer and if it exceeds the "buffer_percent" then it will wake the
task polling on the buffer.
As persistent ring buffers can be created by the content from a
previous boot, the "pages_touched" field was not updated. This means
that if a task were to poll on the persistent buffer, it would block
even if the buffer was completely full. It would block even if the
"buffer_percent" was zero, because with "pages_touched" as zero, it
would be calculated as the buffer having no content. Update
pages_touched when initializing the persistent ring buffer from a
previous boot.
* tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Update pages_touched to reflect persistent buffer content
tracing: Do not allow mmap() of persistent ring buffer
ring-buffer: Validate the persistent meta data subbuf array
tracing: Have the error of __tracing_resize_ring_buffer() passed to user
ring-buffer: Unlock resize on mmap error
|
|
The pages_touched field represents the number of subbuffers in the ring
buffer that have content that can be read. This is used in accounting of
"dirty_pages" and "buffer_percent" to allow the user to wait for the
buffer to be filled to a certain amount before it reads the buffer in
blocking mode.
The persistent buffer never updated this value so it was set to zero, and
this accounting would take it as it had no content. This would cause user
space to wait for content even though there's enough content in the ring
buffer that satisfies the buffer_percent.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20250214123512.0631436e@gandalf.local.home
Fixes: 5f3b6e839f3ce ("ring-buffer: Validate boot range memory events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|