Age | Commit message (Collapse) | Author |
|
Supporting this mode allows creating and merging multi-segment metadata
requests that wouldn't be possible otherwise. It also allows directly
using user space requests that straddle physically discontiguous pages.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The code currently uses list_for_each_entry_rcu() while holding an SRCU
lock, triggering false positive warnings with CONFIG_PROVE_RCU=y
enabled:
drivers/nvme/host/multipath.c:168 RCU-list traversed in non-reader section!!
drivers/nvme/host/multipath.c:227 RCU-list traversed in non-reader section!!
drivers/nvme/host/multipath.c:260 RCU-list traversed in non-reader section!!
While the list is properly protected by SRCU lock, the code uses the
wrong list traversal primitive. Replace list_for_each_entry_rcu() with
list_for_each_entry_srcu() to correctly indicate SRCU-based protection
and eliminate the false warning.
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: be647e2c76b2 ("nvme: use srcu for iterating namespace list")
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-6.13/block
Pull MD fixes from Song:
"This set contains a fix for a W=1 warning, by John Garry, and a
MAINTAINERS update."
* tag 'md-6.13-20241115' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
MAINTAINERS: Update git tree for mdraid subsystem
md/raid5: Increase r5conf.cache_name size
|
|
Add requests to the tail of the list instead of the front so that they
are queued up in submission order.
Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs
and nvme_queue_rqs now that the list is ordered as expected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace the semi-open coded request list helpers with a proper rq_list
type that mirrors the bio_list and has head and tail pointers. Besides
better type safety this actually allows to insert at the tail of the
list, which will be useful soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk_mq_flush_plug_list submits requests in the reverse order that they
were submitted, which leads to a rather suboptimal I/O pattern
especially in rotational devices. Fix this by rewriting virtio_queue_rqs
so that it always pops the requests from the passed in request list, and
then adds them to the head of a local submit list. This actually
simplifies the code a bit as it removes the complicated list splicing,
at the cost of extra updates of the rq_next pointer. As that should be
cache hot anyway it should be an easy price to pay.
Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk_mq_flush_plug_list submits requests in the reverse order that they
were submitted, which leads to a rather suboptimal I/O pattern especially
in rotational devices. Fix this by rewriting nvme_queue_rqs so that it
always pops the requests from the passed in request list, and then adds
them to the head of a local submit list. This actually simplifies the
code a bit as it removes the complicated list splicing, at the cost of
extra updates of the rq_next pointer. As that should be cache hot
anyway it should be an easy price to pay.
Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
for-6.13/block
Pull NVMe updates from Keith:
"nvme updates for Linux 6.13
- Use uring_cmd helper (Pavel)
- Host Memory Buffer allocation enhancements (Christoph)
- Target persistent reservation support (Guixin)
- Persistent reservation tracing (Guixen)
- NVMe 2.1 specification support (Keith)
- Rotational Meta Support (Matias, Wang, Keith)
- Volatile cache detection enhancment (Guixen)"
* tag 'nvme-6.13-2024-11-13' of git://git.infradead.org/nvme: (22 commits)
nvmet: add tracing of reservation commands
nvme: parse reservation commands's action and rtype to string
nvmet: report ns's vwc not present
nvme: check ns's volatile write cache not present
nvme: add rotational support
nvme: use command set independent id ns if available
nvmet: support for csi identify ns
nvmet: implement rotational media information log
nvmet: implement endurance groups
nvmet: declare 2.1 version compliance
nvmet: implement crto property
nvmet: implement supported features log
nvmet: implement supported log pages
nvmet: implement active command set ns list
nvmet: implement id ns for nvm command set
nvmet: support reservation feature
nvme: add reservation command's defines
nvme-core: remove repeated wq flags
nvmet: make nvmet_wq visible in sysfs
nvme-pci: use dma_alloc_noncontigous if possible
...
|
|
Add tracing of reservation commands, including register, acquire,
release and report, and also parse the action and rtype to string
to make the trace log more human-readable.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Parse reservation commands's action(including rrega, racqa and rrela)
and rtype to string to make the trace log more human-readable.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Currently, we report that controller has vwc even though the ns may
not have vwc. Report ns's vwc not present when not buffered_io or
backdev doesn't have vwc.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
For compiling with W=1, the following warning can be seen:
drivers/md/raid5.c: In function ‘setup_conf’:
drivers/md/raid5.c:2423:12: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size between 16 and 26 [-Werror=format-truncation=]
"raid%d-%s", conf->level, mdname(conf->mddev));
^~
drivers/md/raid5.c:2422:3: note: ‘snprintf’ output between 7 and 48 bytes into a destination of size 32
snprintf(conf->cache_name[0], namelen,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"raid%d-%s", conf->level, mdname(conf->mddev));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Increase the array size to avoid this warning.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241112161019.4154616-2-john.g.garry@oracle.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
The write_hint is only used for read/write requests, which must have a
bio attached to them. Just use the bio field instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241112170050.1612998-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When the VWC of a namespace does not exist, the BLK_FEAT_WRITE_CACHE
flag should not be set when registering the block device, regardless
of whether the controller supports VWC.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Rotational devices, such as hard-drives, can be detected using
the rotational bit in the namespace independent identify namespace
data structure. Make the bit visible to the block layer through the
rotational queue setting.
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The NVMe 2.0 specification adds an independent identify namespace
data structure that contains generic attributes that apply to all
namespace types. Some attributes carry over from the NVM command set
identify namespace data structure, and others are new.
Currently, the data structure only considered when CRIMS is enabled or
when the namespace type is key-value.
However, the independent namespace data structure is mandatory for
devices that implement features from the 2.0+ specification. Therefore,
we can check this data structure first. If unavailable, retrieve the
generic attributes from the NVM command set identify namespace data
structure.
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Implements reporting the I/O Command Set Independent Identify Namespace
command.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Most of the information is stubbed. Supporting these commands is a
requirement for supporting rotational media.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Most of the returned information is just stubbed data. The target must
support these in order to report rotational media. Since this driver
doesn't know any better, each namespace is its own endurance group with
the engid value matching the nsid.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The target driver implements all the mandatory logs, identifications,
features, and properties up to nvme sepcification 2.1.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This property is required for nvme 2.1. The target only supports ready
with media, so this is just the same value as CAP.TO.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This log is required for nvme 2.1.
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This log is required for nvme 2.1.
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This is required for nvme 2.1 for targets that support multiple command
sets. We support NVM and ZNS, so are required to support this
identification.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
We don't report anything here, but it's a mandatory identification for
nvme 2.1.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This patch implements the reservation feature, including:
1. reservation register(register, unregister and replace).
2. reservation acquire(acquire, preempt, preempt and abort).
3. reservation release(release and clear).
4. reservation report.
5. set feature and get feature of reservation notify mask.
6. get log page of reservation event.
Not supported:
1. persistent reservation through power loss.
Test cases:
Use nvme-cli and fio to test all implemented sub features:
1. use nvme resv-register to register host a registrant or
unregister or replace a new key.
2. use nvme resv-acquire to set host to the holder, and use fio
to send read and write io in all reservation type. And also
test preempt and "preempt and abort".
3. use nvme resv-report to show all registrants and reservation
status.
4. use nvme resv-release to release all registrants.
5. use nvme get-log to get events generated by the preceding
operations.
In addition, make reservation configurable, one can set ns to
support reservation before enable ns. The default of resv_enable
is false.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The limits stacking now properly zeroes it if at least one of the
underlying limits clears it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add proper bio_split() error handling. For any error, call
raid_end_bio_io() and return. Except for discard, where we end the bio
directly.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241111112150.3756529-7-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add proper bio_split() error handling. For any error, call
raid_end_bio_io() and return.
For the case of an in the write path, we need to undo the increment in
the rdev pending count and NULLify the r1_bio->bios[] pointers.
For read path failure, we need to undo rdev pending count increment from
the earlier read_balance() call.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241111112150.3756529-6-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add proper bio_split() error handling. For any error, set bi_status, end
the bio, and return.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241111112150.3756529-5-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In ublk_ch_mmap(), queue id is calculated in the following way:
(vma->vm_pgoff << PAGE_SHIFT) / `max_cmd_buf_size`
'max_cmd_buf_size' is equal to
`UBLK_MAX_QUEUE_DEPTH * sizeof(struct ublksrv_io_desc)`
and UBLK_MAX_QUEUE_DEPTH is 4096 and part of UAPI, so 'max_cmd_buf_size'
is always page aligned in 4K page size kernel. However, it isn't true in
64K page size kernel.
Fixes the issue by always rounding up 'max_cmd_buf_size' with PAGE_SIZE.
Cc: stable@vger.kernel.org
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241111110718.1394001-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix typo in comment:
requeust->request,
Removve->Remove,
notthing->nothing.
Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20241108133913.3068782-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In case of an early failure in dasd_init, dasd_proc_init is never
called and /proc/dasd* files are never created. That can happen, for
example, if an incompatible or incorrect argument is provided to the
dasd_mod.dasd= kernel parameter.
However, the attempted removal of /proc/dasd* files causes 8 warnings
and backtraces in this case. 4 on the error path within dasd_init and
4 when the dasd module is unloaded. Notice the "removing permanent
/proc entry 'devices'" message that is caused by the dasd_proc_exit
function trying to remove /proc/devices instead of /proc/dasd/devices
since dasd_proc_root_entry is NULL and /proc/devices is indeed
permanent. Example:
------------[ cut here ]------------
removing permanent /proc entry 'devices'
WARNING: CPU: 6 PID: 557 at fs/proc/generic.c:701 remove_proc_entry+0x22e/0x240
CPU: 6 PID: 557 Comm: modprobe Not tainted 6.10.5-1-default #1
openSUSE Tumbleweed f6917bfd6e5a5c7a7e900e0e3b517786fb5c6301
Hardware name: QEMU 8561 QEMU (KVM/Linux)
Krnl PSW : 0704c00180000000 000003fffed0e9f2 (remove_proc_entry+0x232/0x240)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff00000027 000003ff00000023 0000000000000028 000002f200000000
000002f3f05bec20 0000037ffecfb7d0 000003ffffdabab0 000003ff7ee4ec72
000003ff7ee4ec72 0000000000000007 000002f280e22600 000002f280e22688
000003ffa252cfa0 0000000000010000 000003fffed0e9ee 0000037ffecfba38
Krnl Code: 000003fffed0e9e2: c020004e7017 larl %r2,000003ffff6dca10
000003fffed0e9e8: c0e5ffdfad24 brasl %r14,000003fffe904430
#000003fffed0e9ee: af000000 mc 0,0
>000003fffed0e9f2: a7f4ff4c brc 15,000003fffed0e88a
000003fffed0e9f6: 0707 bcr 0,%r7
000003fffed0e9f8: 0707 bcr 0,%r7
000003fffed0e9fa: 0707 bcr 0,%r7
000003fffed0e9fc: 0707 bcr 0,%r7
Call Trace:
[<000003fffed0e9f2>] remove_proc_entry+0x232/0x240
([<000003fffed0e9ee>] remove_proc_entry+0x22e/0x240)
[<000003ff7ef5a084>] dasd_proc_exit+0x34/0x60 [dasd_mod]
[<000003ff7ef560c2>] dasd_exit+0x22/0xc0 [dasd_mod]
[<000003ff7ee5a26e>] dasd_init+0x26e/0x280 [dasd_mod]
[<000003fffe8ac9d0>] do_one_initcall+0x40/0x220
[<000003fffe9bc758>] do_init_module+0x78/0x260
[<000003fffe9bf3a6>] __do_sys_init_module+0x216/0x250
[<000003ffff37ac9e>] __do_syscall+0x24e/0x2d0
[<000003ffff38cca8>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<000003fffef7ea20>] __s390_indirect_jump_r14+0x0/0x10
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
While the cause is a user failure, the dasd module should handle the
situation more gracefully. One of the simplest solutions is to make
removal of the /proc/dasd* entries idempotent.
Signed-off-by: Miroslav Franc <mfranc@suse.cz>
[ sth: shortened if clause ]
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20241108133913.3068782-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.13/block
Pull raid5 fix from Song.
* tag 'md-6.13-20241107' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
MAINTAINERS: Make Yu Kuai co-maintainer of md/raid subsystem
md/raid5: Wait sync io to finish before changing group cnt
|
|
PAGE_SIZE may be 64K, and the max block size can be PAGE_SIZE, so any
variable for holding block size can't be defined as 'unsigned short'.
Unfortunately commit 473516b36193 ("loop: use the atomic queue limits
update API") passes 'bsize' with type of 'unsigned short' to
loop_reconfigure_limits(), and causes LTP/ioctl_loop06 test failure:
12 ioctl_loop06.c:76: TINFO: Using LOOP_SET_BLOCK_SIZE with arg > PAGE_SIZE
13 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
...
18 ioctl_loop06.c:76: TINFO: Using LOOP_CONFIGURE with block_size > PAGE_SIZE
19 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
Fixes the issue by defining 'block size' variable with 'unsigned int', which is
aligned with block layer's definition.
(improve commit log & add fixes tag)
Fixes: 473516b36193 ("loop: use the atomic queue limits update API")
Cc: John Garry <john.g.garry@oracle.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Li Wang <liwang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241109022744.1126003-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.
Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
needs to wait for all ios including sync io to finish. Now it's used
to only wait normal io.
Fix this by calling raid5_quiesce from raid5_store_group_thread_cnt
directly to wait all sync requests to finish before changing the group
cnt.
Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Cc: stable@vger.kernel.org
Signed-off-by: Xiao Ni <xni@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20241106095124.74577-1-xni@redhat.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Unfreeze queue after returning from blk_mark_disk_dead(), this way at
least allows us to verify queue freeze correctly with lockdep.
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241031133723.303835-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This causes issue on, at least, nvme-mpath where my boot fails with:
WARNING: CPU: 354 PID: 2729 at block/blk-settings.c:75 blk_validate_limits+0x356/0x380
Modules linked in: tg3(+) nvme usbcore scsi_mod ptp i2c_piix4 libphy nvme_core crc32c_intel scsi_common usb_common pps_core i2c_smbus
CPU: 354 UID: 0 PID: 2729 Comm: kworker/u2061:1 Not tainted 6.12.0-rc6+ #181
Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
Workqueue: async async_run_entry_fn
RIP: 0010:blk_validate_limits+0x356/0x380
Code: f6 47 01 04 75 28 83 bf 94 00 00 00 00 75 39 83 bf 98 00 00 00 00 75 34 83 7f 68 00 75 32 31 c0 83 7f 5c 00 0f 84 9b fd ff ff <0f> 0b eb 13 0f 0b eb 0f 48 c7 c0 74 12 58 92 48 89 c7 e8 13 76 46
RSP: 0018:ffffa8a1dfb93b30 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9232829c8388 RCX: 0000000000000088
RDX: 0000000000000080 RSI: 0000000000000200 RDI: ffffa8a1dfb93c38
RBP: 000000000000000c R08: 00000000ffffffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9232829b9000
R13: ffff9232829b9010 R14: ffffa8a1dfb93c38 R15: ffffa8a1dfb93c38
FS: 0000000000000000(0000) GS:ffff923867c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1b92480a8 CR3: 0000002484ff0002 CR4: 0000000000370ef0
Call Trace:
<TASK>
? __warn+0xca/0x1a0
? blk_validate_limits+0x356/0x380
? report_bug+0x11a/0x1a0
? handle_bug+0x5e/0x90
? exc_invalid_op+0x16/0x40
? asm_exc_invalid_op+0x16/0x20
? blk_validate_limits+0x356/0x380
blk_alloc_queue+0x7a/0x250
__blk_alloc_disk+0x39/0x80
nvme_mpath_alloc_disk+0x13d/0x1b0 [nvme_core]
nvme_scan_ns+0xcc7/0x1010 [nvme_core]
async_run_entry_fn+0x27/0x120
process_scheduled_works+0x1a0/0x360
worker_thread+0x2bc/0x350
? pr_cont_work+0x1b0/0x1b0
kthread+0x111/0x120
? kthread_unuse_mm+0x90/0x90
ret_from_fork+0x30/0x40
? kthread_unuse_mm+0x90/0x90
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
presumably due to max_zone_append_sectors not being cleared to zero,
resulting in blk_validate_zoned_limits() complaining and failing.
This reverts commit 2a8f6153e1c2db06a537a5c9d61102eb591776f1.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.13/block
Pull MD changes from Song:
"1. Enhance handling of faulty and blocked devices, by Yu Kuai.
2. raid5-ppl atomic improvement, by Uros Bizjak.
3. md-bitmap fix, by Yuan Can."
* tag 'md-6.13-20241105' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/md-bitmap: Add missing destroy_work_on_stack()
md/raid5: don't set Faulty rdev for blocked_rdev
md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid1: factor out helper to handle blocked rdev from raid1_write_request()
md: don't record new badblocks for faulty rdev
md: don't wait faulty rdev in md_wait_for_blocked_rdev()
md: add a new helper rdev_blocked()
md/raid5-ppl: Use atomic64_inc_return() in ppl_new_iounit()
|
|
pcim_iomap_table() and pcim_request_regions() have been deprecated in
commit e354bb84a4c1 ("PCI: Deprecate pcim_iomap_table(),
pcim_iomap_regions_request_all()") and commit d140f80f60358 ("PCI:
Deprecate pcim_iomap_regions() in favor of pcim_iomap_region()"),
respectively.
Replace these functions with pcim_iomap_region().
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Link: https://lore.kernel.org/r/20241106145249.108996-2-pstanner@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This commit add missed destroy_work_on_stack() operations for
unplug_work.work in bitmap_unplug_async().
Fixes: a022325ab970 ("md/md-bitmap: add a new helper to unplug bitmap asynchrously")
Cc: stable@vger.kernel.org
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20241105130105.127336-1-yuancan@huawei.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-8-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-7-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Faulty rdev should never be accessed anymore, hence there is no point to
wait for bad block to be acknowledged in this case while handling write
request.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-6-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Currently raid1 is preparing IO for underlying disks while checking if
any disk is blocked, if so allocated resources must be released, then
waiting for rdev to be unblocked and try to prepare IO again.
Make code cleaner by checking blocked rdev first, it doesn't matter if
rdev is blocked while issuing IO, the IO will wait for rdev to be
unblocked or not.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-5-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Faulty will be checked before issuing IO to the rdev, however, rdev can
be faulty at any time, hence it's possible that rdev_set_badblocks()
will be called for faulty rdev. In this case, mddev->sb_flags will be
set and some other path can be blocked by updating super block.
Since faulty rdev will not be accesed anymore, there is no need to
record new babblocks for faulty rdev and forcing updating super block.
Noted this is not a bugfix, just prevent updating superblock in some
corner cases, and will help to slice a bug related to external
metadata[1], testing also shows that devices are removed faster in the
case IO error.
[1] https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@linux.intel.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-4-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
md_wait_for_blocked_rdev() is called for write IO while rdev is
blocked, howerver, rdev can be faulty after choosing this rdev to write,
and faulty rdev should never be accessed anymore, hence there is no point
to wait for faulty rdev to be unblocked.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-3-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
The helper will be used in later patches for raid1/raid10/raid5, the
difference is that Faulty rdev with unacknowledged bad block will not
be considered blocked.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/r/20241031033114.3845582-2-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref)
to use optimized implementation and ease register pressure around
the primitive for targets that implement optimized variant.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Link: https://lore.kernel.org/r/20241007084831.48067-1-ubizjak@gmail.com
Signed-off-by: Song Liu <song@kernel.org>
|