Age | Commit message (Collapse) | Author |
|
The rcuc-starvation output from print_cpu_stall_info() might overflow the
buffer if there is a huge difference in jiffies difference. The situation
might seem improbable, but computers sometimes get very confused about
time, which can result in full-sized integers, and, in this case,
buffer overflow.
Also, the unsigned jiffies difference is printed using %ld, which is
normally for signed integers. This is intentional for debugging purposes,
but it is not obvious from the code.
This commit therefore changes sprintf() to snprintf() and adds a
clarifying comment about intention of %ld format.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 245a62982502 ("rcu: Dump rcuc kthread status for CPUs not reporting quiescent state")
Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
When using "guard(rcu)();" sparse will complain, because even
though it now understands the cleanup attribute, it doesn't
evaluate the calls from it at function exit, and thus doesn't
count the context correctly.
Given that there's a conditional in the resulting code:
static inline void class_rcu_destructor(class_rcu_t *_T)
{
if (_T->lock) {
rcu_read_unlock();
}
}
it seems that even trying to teach sparse to evalulate the
cleanup attribute function it'd still be difficult to really
make it understand the full context here.
Suppress the sparse warning by just releasing the context in
the acquisition part of the function, after all we know it's
safe with the guard, that's the whole point of it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
There is a possibility of buffer overflow in
show_rcu_tasks_trace_gp_kthread() if counters, passed
to sprintf() are huge. Counter numbers, needed for this
are unrealistically high, but buffer overflow is still
possible.
Use snprintf() with buffer size instead of sprintf().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: edf3775f0ad6 ("rcu-tasks: Add count for idle tasks on offline CPUs")
Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
The synchronize_srcu() has been removed by commit("rcu-tasks: Eliminate
deadlocks involving do_exit() and RCU tasks") in rcu_tasks_postscan.
This commit therefore fixes the tasks_rcu_exit_srcu_stall_timer comment.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
Because the Tasks RCU ->rtp_exit_list is initialized at rcu_init()
time while there is only one CPU running with interrupts disabled, it
is not possible for an exiting task to encounter an uninitialized list.
This commit therefore replaces the conditional initialization with
a WARN_ON_ONCE().
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/all/ZdiNXmO3wRvmzPsr@lothringen/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
This is part of a larger series cleaning up the semantics of
should_be_locked and adding assertions around it; if we don't need an
iterator/path anymore, it clearly doesn't need to be locked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In __bch2_read_super(), if kstrdup() fails, it needs to release memory
in sb->holder, fix to call bch2_free_super() in the error path.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"A fix to kselftest harness to prevent infinite loop triggered in an
assert in FIXTURE_TEARDOWN and a fix to a problem seen in being able
to stop subsystem-enable tests when sched events are being traced"
* tag 'linux_kselftest-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
selftests/ftrace: Limit length in subsystem-enable tests
|
|
Currently normal HugeTLB fault ends up crashing the kernel, as p4dp derived
from p4d_offset() is an invalid address when PGTABLE_LEVEL = 5. A p4d level
entry needs to be allocated when not available while walking the page table
during HugeTLB faults. Let's call p4d_alloc() to allocate such entries when
required instead of current p4d_offset().
Unable to handle kernel paging request at virtual address ffffffff80000000
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000081da9000
[ffffffff80000000] pgd=1000000082cec003, p4d=0000000082c32003, pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 108 Comm: high_addr_hugep Not tainted 6.9.0-rc4 #48
Hardware name: Foundation-v8A (DT)
pstate: 01402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : huge_pte_alloc+0xd4/0x334
lr : hugetlb_fault+0x1b8/0xc68
sp : ffff8000833bbc20
x29: ffff8000833bbc20 x28: fff000080080cb58 x27: ffff800082a7cc58
x26: 0000000000000000 x25: fff0000800378e40 x24: fff00008008d6c60
x23: 00000000de9dbf07 x22: fff0000800378e40 x21: 0004000000000000
x20: 0004000000000000 x19: ffffffff80000000 x18: 1ffe00010011d7a1
x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000001
x14: 0000000000000000 x13: ffff8000816120d0 x12: ffffffffffffffff
x11: 0000000000000000 x10: fff00008008ebd0c x9 : 0004000000000000
x8 : 0000000000001255 x7 : fff00008003e2000 x6 : 00000000061d54b0
x5 : 0000000000001000 x4 : ffffffff80000000 x3 : 0000000000200000
x2 : 0000000000000004 x1 : 0000000080000000 x0 : 0000000000000000
Call trace:
huge_pte_alloc+0xd4/0x334
hugetlb_fault+0x1b8/0xc68
handle_mm_fault+0x260/0x29c
do_page_fault+0xfc/0x47c
do_translation_fault+0x68/0x74
do_mem_abort+0x44/0x94
el0_da+0x2c/0x9c
el0t_64_sync_handler+0x70/0xc4
el0t_64_sync+0x190/0x194
Code: aa000084 cb010084 b24c2c84 8b130c93 (f9400260)
---[ end trace 0000000000000000 ]---
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: a6bbf5d4d9d1 ("arm64: mm: Add definitions to support 5 levels of paging")
Reported-by: Dev Jain <dev.jain@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20240415094003.1812018-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The #if condition controlling the rcu_preempt_sleep_check() definition
has a redundant check for CONFIG_PREEMPT_RCU, which is already checked
for by an enclosing #ifndef. This commit therefore removes this redundant
condition from the inner #if.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists.
Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution
and a four-byte cmpxchg(), such that the other three bytes are always
atomically updated to their old values. This works, but results in
false-positive KCSAN failures because as far as KCSAN knows, this
cmpxchg() operation is updating all four bytes.
This commit therefore encloses the cmpxchg() in a data_race() and adds
a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly
what is going on so as to avoid the false positives.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
The FF-A uses the notification pending interrupt to inform the receiver
that it has a pending notification. This is a virtual interrupt and is
used by the following type of receivers:
1. A guest/VM running under a hypervisor.
2. An S-EL1 SP running under a S-EL2 SPMC.
The rules that govern the properties of the NPI are the same as the
rules for the SRI with couple of exceptions. Both SRI and NPI can be
supported simultaneously.
The handling of NPI is also same as the handling of notification for the
self/primary VM with ID 0 except the absence of global notification.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Link: https://lore.kernel.org/r/20240411-ffa_npi_support-v2-3-927a670254e6@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
In preparation to support handling of Notification Pending Interrupt(NPI)
in addition to the existing support for Schedule Receiver Interrupt(SRI),
refactor the code around SRI handling so that NPI support can reuse some
of it. This change shouldn't have any functionality impact. It neither
adds the support for NPIs nor changes any SRI support.
Tested-by: Jens Wiklander <jens.wiklander@linaro.org>
Link: https://lore.kernel.org/r/20240411-ffa_npi_support-v2-2-927a670254e6@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
When the FF-A driver is running inside a guest VM under an hypervisor,
the driver/guest VM doesn't have the permission/capability to request
the creation of notification bitmaps. For a VM, the hypervisor reserves
memory for its VM and hypervisor framework notification bitmaps and the
SPMC reserves memory for its SP and SPMC framework notification bitmaps
before the hypervisor initializes it.
The hypervisor does not initialize a VM if memory cannot be reserved
for all its notification bitmaps. So the creation of all the necessary
bitmaps are already done when the driver initialises and hence it can be
skipped. We rely on FFA_FEATURES(FFA_NOTIFICATION_BITMAP_CREATE) to fail
when running in the guest to handle this in the driver.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Link: https://lore.kernel.org/r/20240411-ffa_npi_support-v2-1-927a670254e6@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.
Sort the primary plane formats according to our order of preference.
Nice side-effect of this change is that it makes IGT's kms_atomic
plane-invalid-params pass because the test picks the first format
which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
which make Pixman, which IGT depends on assert due to the fact that our
16bpp formats aren't 32 bit aligned like Pixman requires all formats
to be.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.12+
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-6-zack.rusin@broadcom.com
|
|
The conditional was supposed to prevent enabling of a crtc state
without a set primary plane. Accidently it also prevented disabling
crtc state with a set primary plane. Neither is correct.
Fix the conditional and just driver-warn when a crtc state has been
enabled without a primary plane which will help debug broken userspace.
Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: 06ec41909e31 ("drm/vmwgfx: Add and connect CRTC helper functions")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-5-zack.rusin@broadcom.com
|
|
vmwgfx never supported prime import of external buffers. Furthermore the
driver exposes two different objects to userspace: vmw_surface's and
gem buffers but prime import/export only worked with vmw_surfaces.
Because gem buffers are used through the dumb_buffer interface this meant
that the driver created buffers couldn't have been prime exported or
imported.
Fix prime import/export. Makes IGT's kms_prime pass.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: 8afa13a0583f ("drm/vmwgfx: Implement DRIVER_GEM")
Cc: <stable@vger.kernel.org> # v6.6+
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-4-zack.rusin@broadcom.com
|
|
Rearrange the instructions so that readers facing a regression within a
stable or longterm series first test its latest release before testing
mainline. This is less scary for some people. It also reduces the chance
that something goes sideways for readers that compile their first
kernel, as mainline can cause slightly more trouble.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/efd3cb9c68db450091021326bf9c334553df0ec2.1712647788.git.linux@leemhuis.info
|
|
Describe how to build kernels on another system (with and without
cross-compiling), as building locally can be quite painfully on some
slow systems. This is done in an add-on section, as it would make the
step-by-step guide to complicated if this special case would be
described there.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/288160cb4769e46a3280250ca71da0abc4aa002d.1712647788.git.linux@leemhuis.info
|
|
Rename 'Supplementary tasks' to 'Complementary tasks' while introducing
a section 'Optional tasks: test reverts, patches, or later versions':
the latter is something readers occasionally will have to do after
reporting a bug and thus is best covered here.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/dacf26a4c48e9e8f04ecbc77e0a74c9b2a6a1103.1712647788.git.linux@leemhuis.info
|
|
Various small improvements and fixes:
* Separate ref links from their target with a space for better
readability.
* Add a proper heading for the note at the end of the step-by-step
guide.
* Use proper 3rd and 4th level headlines in the reference section and
add short intros for the 2nd level headlines that lacked one.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/f59f0f235a2192ed93899a7338153e4cb71075f0.1712647788.git.linux@leemhuis.info
|
|
Add and fetch all required stable branches ahead of time. This fixes a
bug, as readers that wanted to bisect a regression within a stable or
longterm series otherwise did not have them available at the right time.
This way also matches the flow somewhat better and avoids some "if you
haven't already added it" phrases that otherwise become necessary in
future changes.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/57dcf312959476abe6151bf3d35eb79e3e9a83d1.1712647788.git.linux@leemhuis.info
|
|
Various small improvements and fixes:
* Use the more modern 'git switch' instead of 'git checkout', which
makes it more obvious what's happening (among others due to the
--discard-changes parameter that is more clear than --force).
* Provide a hint how a mainline version number and one from a stable
series look like.
* When trying to validate the bisection result with a revert, add a
special tag to facilitate the identification.
* Sync version numbers used in various examples for consistency: stick
to 6.0.13, 6.0.15, and 6.1.5.
* Fix a few typos and oddities.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/85029aa004447b0eeb5043fb014630f2acafacec.1712647788.git.linux@leemhuis.info
|
|
Allow the power-domains property to the PWM_DISP block as on some SoCs
this does need at most one power domain.
Fixes: b09b179bac0a ("dt-bindings: pwm: Convert pwm-mtk-disp.txt to mediatek,pwm-disp.yaml format")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240404081808.92199-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
|
|
With 16 channel pwm support, we're registering two instances of pwm_chip
with 8 channels each. We need to update PM functions to use both instances
of pwm_chip during power state transitions.
Introduce struct dwc_pwm_drvdata and use it as driver_data, which will
maintain both instances of pwm_chip along with dwc_pwm_info and allow us
to use them inside suspend/resume handles.
Fixes: ebf2c89eb95e ("pwm: dwc: Add 16 channel support for Intel Elkhart Lake")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240415074051.14681-1-raag.jadav@intel.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
|
|
We only pool write combined and uncached allocations because they
require extra overhead on allocation and release.
If we also pool cached NUMA it not only means some extra unnecessary
overhead, but also that under memory pressure it can happen that
pages from the wrong NUMA node enters the pool and are re-used
over and over again.
This can lead to performance reduction after running into memory
pressure.
v2: restructure and cleanup the code a bit from the internal hack to
test this.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 4482d3c94d7f ("drm/ttm: add NUMA node id to the pool")
CC: stable@vger.kernel.org
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415134821.1919-1-christian.koenig@amd.com
|
|
No swap support -- no swapfiles possible.
Signed-off-by: Alexey Dobriyan (Yandex) <adobriyan@gmail.com>
Link: https://lore.kernel.org/r/2391c7f5-0f83-4188-ae56-4ec7ccbf2576@p183
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now the __GFP_COMP is set only if the higher-order is not 0. However,
__GFP_COMP flag can be set unconditionally because compound page can
not be created in the order-0 case. And this can also simplify the code
a bit (no need to check the order is 0 or not).
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
time of the corresponding CPU's last online or offline operation,
respectively. However, this information is not particularly useful.
It would be better to instead track the grace period state kept
in rcu_state.gp_state. This would also be consistent with the
initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED
(an rcu_state.gp_state value), and also with the diagnostics in
rcu_implicit_dynticks_qs(), whose format is consistent with an integer,
not a bitmask.
This commit therefore makes this change and changes the names to
->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
The rcu_state.n_online_cpus value is only ever updated by CPU-hotplug
operations, which are serialized. However, this value is read locklessly.
This commit therefore marks those reads. While in the area, it also
adds ASSERT_EXCLUSIVE_WRITER() calls just in case parallel CPU hotplug
becomes a thing.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
The rcu_sync structure's ->gp_count field is updated under the protection
of ->rss_lock, but read locklessly, and KCSAN noted the data race.
This commit therefore uses WRITE_ONCE() to do this update to clearly
document its racy nature.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
This commit adds READ_ONCE() to a lockless diagnostic read from
rcu_state.gp_flags to avoid giving the compiler any chance whatsoever
of confusing the diagnostic state printed.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
blkdev_dio_unaligned() is called from __blkdev_direct_IO(),
__blkdev_direct_IO_simple(), and __blkdev_direct_IO_async(), and all these
are only called from blkdev_direct_IO().
Move the blkdev_dio_unaligned() call to the common callsite,
blkdev_direct_IO().
Pass those functions the bdev pointer from blkdev_direct_IO(), as it is
non-trivial to look up.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240415122020.1541594-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
->account_pages is the number of pages we account against the user
derived from unsigned len, it definitely fits into unsigned, which saves
some space in struct io_notif_data.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19f2687fcb36daa74d86f4a27bfb3d35cffec318.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We don't need ctx in the hottest path, i.e. registered buffers,
let's get it only when we need it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e7345e268ffaeaf79b4c8f3a5d019d6a87a3d1f1.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Flip the dec_and_test "if", that makes the function extension easier in
the future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/43939e2b04dff03bff5d7227c98afedf951227b3.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A previous consolidation cleanup missed handling the case where the ring
is dying, and __io_cqring_overflow_flush() doesn't flush entries if the
CQ ring is already full. This is fine for the normal CQE overflow
flushing, but if the ring is going away, we need to flush everything,
even if it means simply freeing the overflown entries.
Fixes: 6c948ec44b29 ("io_uring: consolidate overflow flushing")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In the __io_timeout_prep function, the io_timeout list is initialized
twice, removing the meaningless second initialization.
Signed-off-by: Ruyi Zhang <ruyi.zhang@samsung.com>
Link: https://lore.kernel.org/r/20240411055953.2029218-1-ruyi.zhang@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Consolidate __io_cqring_overflow_flush and io_cqring_overflow_kill()
into a single function as it once was, it's easier to work with it this
way.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/986b42c35e76a6be7aa0cdcda0a236a2222da3a7.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Conditional locking is never great, in case of
__io_cqring_overflow_flush(), which is a slow path, it's not justified.
Don't handle IOPOLL separately, always grab uring_lock for overflow
flushing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is only one caller of io_cqring_overflow_flush(), open code it
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a1fecd56d9dba923ed8d4d159727fa939d3baa2a.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
c1edbf5f081be ("io_uring: flag SQPOLL busy condition to userspace")
added an extra overflowed CQE flush in the SQPOLL submission path due to
backpressure, which was later removed. Remove the flush and let
io_cqring_wait() / iopoll handle it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a83b0724ca6ca9d16c7d79a51b77c81876b2e39.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are no users of io_req_cqe_overflow() apart from io_uring.c, make
it static.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f4295eb2f9eb98d5db38c0578f57f0b86bfe0d8c.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're exporting some io_uring bits to networking, e.g. for implementing
a net callback for io_uring cmds, but we don't want to expose more than
needed. Add a separate header for networking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240409210554.1878789-1-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We can set MSG_ZEROCOPY at the preparation step, do it so we don't have
to care about it later in the issue callback.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c2c22aaa577624977f045979a6db2b9fb2e5648c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_notif_complete_tw_ext() can be removed and combined with
io_notif_complete_tw to make it simpler without sacrificing
anything.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/025a124a5e20e2474a57e2f04f16c422eb83063c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Splitting io_tx_ubuf_callback_ext from io_tx_ubuf_callback is a pre
mature optimisation that doesn't give us much. Merge the functions into
one and reclaim some simplicity back.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d44d68f6f7add33a0dcf0b7fd7b73c2dc543604f.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The only caller doesn't handle the return value of io_put_kbuf_comp(), so
change its return type into void.
Also follow Jens's suggestion to rename it as io_put_kbuf_drop().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240407132759.4056167-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_req_put_rsrc_locked() is a weird shim function around
io_req_put_rsrc(). All calls to io_req_put_rsrc() require holding
->uring_lock, so we can just use it directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a195bc78ac3d2c6fbaea72976e982fe51e50ecdd.1712331455.git.asml.silence@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_req_complete_post() was a sole user of ->locked_free_list, but
since we just gutted the function, the cache is not used anymore and
can be removed.
->locked_free_list served as an asynhronous counterpart of the main
request (i.e. struct io_kiocb) cache for all unlocked cases like io-wq.
Now they're all forced to be completed into the main cache directly,
off of the normal completion path or via io_free_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7bffccd213e370abd4de480e739d8b08ab6c1326.1712331455.git.asml.silence@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|