Age | Commit message (Collapse) | Author |
|
subpage_calc_start_bit()
Inside the macro, subpage_calc_start_bit(), we need to calculate the
offset to the beginning of the folio.
But we're using offset_in_page(), on systems with 4K page size and 4K fs
block size, this means we will always return offset 0 for a large folio,
causing all kinds of errors.
Fix it by using offset_in_folio() instead.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The panthor_gpu_coherency_init() call has been moved around, but the
error path hasn't been adjusted accordingly. Make sure we undo what
has been done before this call in case of failure.
Fixes: 7d5a3b22f5b5 ("drm/panthor: Call panthor_gpu_coherency_init() after PM resume()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/4da470aa-4f84-460e-aff8-dabc8cc4da15@stanley.mountain/T/#t
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://lore.kernel.org/r/20250414130120.581274-1-boris.brezillon@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
|
|
The blamed commit exposes a possible issue with flow_offload_teardown():
We might remove the offload bit of a conntrack entry that has been
offloaded again.
1. conntrack entry c1 is offloaded via flow f1 (f1->ct == c1).
2. f1 times out and is pushed back to slowpath, c1 offload bit is
removed. Due to bug, f1 is not unlinked from rhashtable right away.
3. a new packet arrives for the flow and re-offload is triggered, i.e.
f2->ct == c1. This is because lookup in flowtable skip entries with
teardown bit set.
4. Next flowtable gc cycle finds f1 again
5. flow_offload_teardown() is called again for f1 and c1 offload bit is
removed again, even though we have f2 referencing the same entry.
This is harmless, but clearly not correct.
Fix the bug that exposes this: set 'teardown = true' to have the gc
callback unlink the flowtable entry from the table right away instead of
the unintentional defer to the next round.
Also prevent flow_offload_teardown() from fixing up the ct state more than
once: We could also be called from the data path or a notifier, not only
from the flowtable gc callback.
NF_FLOW_TEARDOWN can never be unset, so we can use it as synchronization
point: if we observe did not see a 0 -> 1 transition, then another CPU
is already doing the ct state fixups for us.
Fixes: 03428ca5cee9 ("netfilter: conntrack: rework offload nf_conn timeout extension logic")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix and improve documentation related to landlock_restrict_self(2)'s
flags. Update the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF
documentation according to the current semantic.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Fix, deduplicate, and improve rendering of landlock_restrict_self(2)'s
flags documentation.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Move and fix the flags documentation, and improve formatting.
It makes more sense and it eases maintenance to document syscall flags
in landlock.h, where they are defined. This is already the case for
landlock_restrict_self(2)'s flags.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Clean up unused platform check macros from compat i915_drv.h. Display no
longer uses any of the IS_*() platform checks. The remaining users are
part of the soc/ code. Note that in a comment.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/2f09b3c60223d9426049a28d3d06a3ec2c6ec348.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Nobody uses the irq chip data. Stop setting it, and as a bonus get rid
of another struct drm_i915_private * reference.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/d75ec986093c912de67a42782aa5a49357a9f8e5.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform.dgfx based platform detection over the old
IS_DGFX() macro.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/99de7f8f26156afbddcdac850088e6a96d322c55.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/02659f1144180f328167734f7e31499833749c8d.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/7a69d2ffa15306da899b98e0d6af09b4df1b7ec3.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/36166cd0cfdb88df4c0322c4edea69fad5ad7177.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/a5cde717001eb2843344beb21ca8907ab2e43d4f.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/eda2b6cd285ec76d57d91ea3fe33158852aaec22.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Prefer display->platform based platform detection over the old IS_*()
macros.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/83980c1ae53157ef5d65d7ce99b294889622faa8.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert intel_frontbuffer.[ch] to struct intel_display.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/ef0860583b7d6ad141959f84c25657e0c102d6d2.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert intel_sprite_uapi.c to struct intel_display.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/d4f71c2976a1a28b4e74c2fc1097090fe7f78743.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert intel_modeset_verify.[ch] to struct intel_display.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/b01a3ef3dbb2ffdaa6b5e9ebec14f91efcca3049.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert intel_modeset_setup.[ch] to struct intel_display.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/21d51387a36f027313a0687d09a14586eb8f71a6.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
intel_display
Going forward, struct intel_display is the main display device data
pointer. Convert intel_fbdev.[ch] and as much as possible of
intel_fbdev_fb.[ch] to struct intel_display.
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Link: https://lore.kernel.org/r/49651754f3716041f97984e47c15d331851870a5.1744222449.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Return -ENOMEM if udl_alloc_urb_list() fails. Don't return success.
Fixes: fb10144ba426 ("drm/udl: Support adapters without firmware descriptor")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/Z_-P4N4_U-xTC9-O@stanley.mountain
|
|
Currently bdex_statx is only called from the very high-level
vfs_statx_path function, and thus bypassing it for in-kernel calls
to vfs_getattr or vfs_getattr_nosec.
This breaks querying the block ѕize of the underlying device in the
loop driver and also is a pitfall for any other new kernel caller.
Move the call into the lowest level helper to ensure all callers get
the right results.
Fixes: 2d985f8c6b91 ("vfs: support STATX_DIOALIGN on block devices")
Fixes: f4774e92aab8 ("loop: take the file system minimum dio alignment into account")
Reported-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250417064042.712140-1-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
GCC 15's new -Wunterminated-string-initialization notices that the
character lookup tables "fscache_cache_states" and "fscache_cookie_states"
(which are not used as a C-String) need to be marked as "nonstring":
fs/netfs/fscache_cache.c:375:67: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (6 chars into 5 available) [-Wunterminated-string-initialization]
375 | static const char fscache_cache_states[NR__FSCACHE_CACHE_STATE] = "-PAEW";
| ^~~~~~~
fs/netfs/fscache_cookie.c:32:69: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (11 chars into 10 available) [-Wunterminated-string-initialization]
32 | static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAIFUWRD";
| ^~~~~~~~~~~~
Annotate the arrays.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/20250416221654.work.028-kees@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Commit 3f66425a4fc8 ("cpufreq: Enable COMPILE_TEST on Arm drivers")
enabled compile testing of most Arm CPUFreq drivers but left the
existing default values unchanged so that many drivers are enabled by
default whenever COMPILE_TEST is selected.
This specifically results in the S3C64XX CPUFreq driver being enabled
and initialised during boot of non-S3C64XX platforms with the following
error logged:
cpufreq: Unable to obtain ARMCLK: -2
Commit d4f610a9bafd ("cpufreq: Do not enable by default during compile
testing") recently fixed most of the default values, but two entries
were missed and two could use a more specific default condition.
Fix the default values for drivers that can be compile tested and that
should be enabled by default when not compile testing.
Fixes: 3f66425a4fc8 ("cpufreq: Enable COMPILE_TEST on Arm drivers")
Cc: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Fix an off-by-one error when setting the vblanking start in
<VBLKSTR>. Commit d6460bd52c27 ("drm/mgag200: Add dedicated
variables for blanking fields") switched the value from
crtc_vdisplay to crtc_vblank_start, which DRM helpers copy
from the former. The commit missed to subtract one though.
Reported-by: Wakko Warner <wakko@animx.eu.org>
Closes: https://lore.kernel.org/dri-devel/CAMwc25rKPKooaSp85zDq2eh-9q4UPZD=RqSDBRp1fAagDnmRmA@mail.gmail.com/
Reported-by: Сергей <afmerlord@gmail.com>
Closes: https://lore.kernel.org/all/5b193b75-40b1-4342-a16a-ae9fc62f245a@gmail.com/
Closes: https://bbs.archlinux.org/viewtopic.php?id=303819
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: d6460bd52c27 ("drm/mgag200: Add dedicated variables for blanking fields")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Tested-by: Wakko Warner <wakko@animx.eu.org>
Link: https://lore.kernel.org/r/20250416083847.51764-1-tzimmermann@suse.de
|
|
Avoid an edge case where epoll_wait arms a timer and calls schedule()
even if the timer will expire immediately.
For example: if the user has specified an epoll busy poll usecs which is
equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
consumed the entire timeout duration so it is unnecessary to induce
scheduling latency by calling schedule() (via schedule_hrtimeout_range).
This can be measured using a simple bpftrace script:
tracepoint:sched:sched_switch
/ args->prev_pid == $1 /
{
print(kstack());
print(ustack());
}
Before this patch is applied:
Testing an epoll_wait app with busy poll usecs set to 1000, and
epoll_wait timeout set to 1ms using the script above shows:
__traceiter_sched_switch+69
__schedule+1495
schedule+32
schedule_hrtimeout_range+159
do_epoll_wait+1424
__x64_sys_epoll_wait+97
do_syscall_64+95
entry_SYSCALL_64_after_hwframe+118
epoll_wait+82
Which is unexpected; the busy poll usecs should have consumed the
entire timeout and there should be no reason to arm a timer.
After this patch is applied: the same test scenario does not generate a
call to schedule() in the above edge case. If the busy poll usecs are
reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
armed as expected.
Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/20250416185826.26375-1-jdamato@fastly.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Test struct drm_gem_object.import_attach to detect imported objects.
During object clenanup, the dma_buf field might be NULL. Testing it in
an object's free callback then incorrectly does a cleanup as for native
objects. Happens for calls to drm_mode_destroy_dumb_ioctl() that
clears the dma_buf field in drm_gem_object_exported_dma_buf_free().
v3:
- only test for import_attach (Boris)
v2:
- use import_attach.dmabuf instead of dma_buf (Christian)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: b57aa47d39e9 ("drm/gem: Test for imported GEM buffers with helper")
Reported-by: Andy Yan <andyshrk@163.com>
Closes: https://lore.kernel.org/dri-devel/38d09d34.4354.196379aa560.Coremail.andyshrk@163.com/
Tested-by: Andy Yan <andyshrk@163.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Anusha Srivatsa <asrivats@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://lore.kernel.org/r/20250416065820.26076-1-tzimmermann@suse.de
|
|
When EINT support for multiple addresses was introduced, the driver
library for the older generations (pinctrl-mtk-common) was not fixed
together. This resulted in invalid pointer accesses.
Fix up the filled in |struct mtk_eint| in pinctrl-mtk-common to match
what is now expected by the mtk-eint library.
Reported-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Closes: https://lore.kernel.org/all/43nd5jxpk7b7fv46frqlfjnqfh5jlpqsemeoakqzd4wdi3df6y@w7ycd3k5ezvn/
Fixes: 3ef9f710efcb ("pinctrl: mediatek: Add EINT support for multiple addresses")
Cc: Hao Chang <ot_chhao.chang@mediatek.com>
Cc: Qingliang Li <qingliang.li@mediatek.com>
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/20250415112339.2385454-1-wenst@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Document the lifetime, nolifetime and max_open_zones mount options
added for zoned rt file systems.
Also add documentation describing the max_open_zones sysfs attribute
exposed in /sys/fs/xfs/<dev>/zoned/
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When CONFIG_NEED_DMA_MAP_STATE is not defined, dma-mapping clients might
report unused data compilation warnings for dma_unmap_*() calls
arguments. Redefine macros for those calls to let compiler to notice that
it is okay when the provided arguments are not used.
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250415075659.428549-1-m.szyprowski@samsung.com
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.15
Pull MD fixes from Yu:
"- fix raid10 missing discard IO accounting (Yu Kuai)
- fix bitmap stats for bitmap file (Zheng Qixing)
- fix oops while reading all member disks failed during check/repair
(Meir Elisha)"
* tag 'md-6.15-20250416' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md/raid1: Add check for missing source disk in process_checks()
md/md-bitmap: fix stats collection for external bitmaps
md/raid10: fix missing discard IO accounting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"31 hotfixes.
9 are cc:stable and the remainder address post-6.15 issues or aren't
considered necessary for -stable kernels.
22 patches are for MM, 9 are otherwise"
* tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
MAINTAINERS: update HUGETLB reviewers
mm: fix apply_to_existing_page_range()
selftests/mm: fix compiler -Wmaybe-uninitialized warning
alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
mailmap: add entry for Jean-Michel Hautbois
mm: (un)track_pfn_copy() fix + doc improvements
mm: fix filemap_get_folios_contig returning batches of identical folios
mm/hugetlb: add a line break at the end of the format string
selftests: mincore: fix tmpfs mincore test failure
mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
mm/cma: report base address of single range correctly
mm: page_alloc: speed up fallbacks in rmqueue_bulk()
kunit: slub: add module description
mm/kasan: add module decription
ucs2_string: add module description
zlib: add module description
fpga: tests: add module descriptions
samples/livepatch: add module descriptions
ASN.1: add module description
mm/vma: add give_up_on_oom option on modify/merge, use in uffd release
...
|
|
Add one simple fault inject target, and verify if an application using ublk
device sees an I/O error quickly after the ublk server dies.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now ublk_abort_queue() is moved to ublk char device release handler,
meantime our request queue is "quiesced" because either ->canceling was
set from uring_cmd cancel function or all IOs are inflight and can't be
completed by ublk server, things becomes easy much:
- all uring_cmd are done, so we needn't to mark io as UBLK_IO_FLAG_ABORTED
for handling completion from uring_cmd
- ublk char device is closed, no one can hold IO request reference any more,
so we can simply complete this request or requeue it for ublk_nosrv_should_reissue_outstanding.
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove __ublk_quiesce_dev() and open code for updating device state as
QUIESCED.
We needn't to drain inflight requests in __ublk_quiesce_dev() any more,
because all inflight requests are aborted in ublk char device release
handler.
Also we needn't to set ->canceling in __ublk_quiesce_dev() any more
because it is done unconditionally now in ublk_ch_release().
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are currently two ways in which ublk server exit is detected by
ublk_drv:
1. uring_cmd cancellation. If there are any outstanding uring_cmds which
have not been completed to the ublk server when it exits, io_uring
calls the uring_cmd callback with a special cancellation flag as the
issuing task is exiting.
2. I/O timeout. This is needed in addition to the above to handle the
"saturated queue" case, when all I/Os for a given queue are in the
ublk server, and therefore there are no outstanding uring_cmds to
cancel when the ublk server exits.
There are a couple of issues with this approach:
- It is complex and inelegant to have two methods to detect the same
condition
- The second method detects ublk server exit only after a long delay
(~30s, the default timeout assigned by the block layer). This delays
the nosrv behavior from kicking in and potential subsequent recovery
of the device.
The second issue is brought to light with the new test_generic_06 which
will be added in following patch. It fails before this fix:
selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 30.0611 s, 0.0 kB/s
DEAD
dd took 31 seconds to exit (>= 5s tolerance)!
generic_06 : [FAIL]
Fix this by instead detecting and handling ublk server exit in the
character file release callback. This has several advantages:
- This one place can handle both saturated and unsaturated queues. Thus,
it replaces both preexisting methods of detecting ublk server exit.
- It runs quickly on ublk server exit - there is no 30s delay.
- It starts the process of removing task references in ublk_drv. This is
needed if we want to relax restrictions in the driver like letting
only one thread serve each queue
There is also the disadvantage that the character file release callback
can also be triggered by intentional close of the file, which is a
significant behavior change. Preexisting ublk servers (libublksrv) are
dependent on the ability to open/close the file multiple times. To
address this, only transition to a nosrv state if the file is released
while the ublk device is live. This allows for programs to open/close
the file multiple times during setup. It is still a behavior change if a
ublk server decides to close/reopen the file while the device is LIVE
(i.e. while it is responsible for serving I/O), but that would be highly
unusual. This behavior is in line with what is done by FUSE, which is
very similar to ublk in that a userspace daemon is providing services
traditionally provided by the kernel.
With this change in, the new test (and all other selftests, and all
ublksrv tests) pass:
selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.0376731 s, 0.0 kB/s
DEAD
generic_04 : [PASS]
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ublk_ch_release() is called after ublk char device is closed, when all
uring_cmd are done, so it is perfect fine to move ublk device reset to
ublk_ch_release() from ublk_ctrl_start_recovery().
This way can avoid to grab the exiting daemon task_struct too long.
However, reset of the following ublk IO flags has to be moved until ublk
io_uring queues are ready:
- ubq->canceling
For requeuing IO in case of ublk_nosrv_dev_should_queue_io() before device
is recovered
- ubq->fail_io
For failing IO in case of UBLK_F_USER_RECOVERY_FAIL_IO before device is
recovered
- ublk_io->flags
For preventing using io->cmd
With this way, recovery is simplified a lot.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now ublk deals with ublk_nosrv_dev_should_queue_io() by keeping request
queue as quiesced. This way is fragile because queue quiesce crosses syscalls
or process contexts.
Switch to rely on ubq->canceling for dealing with
ublk_nosrv_dev_should_queue_io(), because it has been used for this purpose
during io_uring context exiting, and it can be reused before recovering too.
In ublk_queue_rq(), the request will be added to requeue list without
kicking off requeue in case of ubq->canceling, and finally requests added in
requeue list will be dispatched from either ublk_stop_dev() or
ublk_ctrl_end_recovery().
Meantime we have to move reset of ubq->canceling from ublk_ctrl_start_recovery()
to ublk_ctrl_end_recovery(), when IO handling can be recovered completely.
Then blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() are always used
in same context.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Link: https://lore.kernel.org/r/20250416035444.99569-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add ublk_force_abort_dev() for handling ublk_nosrv_dev_should_queue_io()
in ublk_stop_dev(). Then queue quiesce and unquiesce can be paired in
single function.
Meantime not change device state to QUIESCED any more, since the disk is
going to be removed soon.
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Most uring_cmds issued against ublk character devices are serialized
because each command affects only one queue, and there is an early check
which only allows a single task (the queue's ubq_daemon) to issue
uring_cmds against that queue. However, this mechanism does not work for
FETCH_REQs, since they are expected before ubq_daemon is set. Since
FETCH_REQs are only used at initialization and not in the fast path,
serialize them using the per-ublk-device mutex. This fixes a number of
data races that were previously possible if a badly behaved ublk server
decided to issue multiple FETCH_REQs against the same qid/tag
concurrently.
Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
test may exit early because of missing program or not having required
feature before calling _prep_test(), then $UBLK_TMP isn't cleaned.
Fix it by moving creating $UBLK_TMP into _prep_test(), any resources
created since _prep_test() will be cleaned by _cleanup_test().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-14-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test_stress_05.sh for covering removing device with recovery
enabled.
io-hang has been observed with the following patch:
https://lore.kernel.org/linux-block/20250403-ublk_timeout-v3-1-aa09f76c7451@purestorage.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-13-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add user recovery feature.
Meantime add user recovery test: generic_04 and generic_05(zero copy)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-12-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Support target specific command line for making related command line code
handling more readable & clean.
Also helps for adding new features.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Increase max nr_queues to 32, and queue depth to 1024.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In NUMA machine, ublk IO performance is very sensitive with queue
pthread's affinity setting.
Retrieve queue's affinity and select the 1st cpu as queue thread's sched
affinity, and it is observed that single cpu task affinity can get
stable & good performance if client application is put on proper cpu.
Dump this info when adding one ublk device. Use shmem to communicate
queue's tid between parent and daemon.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
It is observed that this way is more efficient for fast nvme backing
file.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add stress_03 & stress_04 for covering zero copy feature.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Run stress tests in parallel, meantime add shell local function to
simplify the two stress tests.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Detach ublk daemon from the starting process completely by double-fork and
clearing its process group, so that `_add_ublk_dev` can return from sub-shell.
Then it is more friendly for writing shell test script for adding/recovering
ublk device.
Prepare for running ublk test in parallel.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|