md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone

Currently md raid0/linear are not provided with any mechanism to validate if an array member got removed or failed. The driver keeps sending BIOs regardless of the state of array members, and kernel shows state 'clean' in the 'array_state' sysfs attribute. This leads to the following situation: if a raid0/linear array member is removed and the array is mounted, some user writing to this array won't realize that errors are happening unless they check dmesg or perform one fsync per written file. Despite udev signaling the member device is gone, 'mdadm' cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted. In other words, no -EIO is returned and writes (except direct ones) appear normal. Meaning the user might think the wrote data is correctly stored in the array, but instead garbage was written given that raid0 does stripping (and so, it requires all its members to be working in order to not corrupt data). For md/linear, writes to the available members will work fine, but if the writes go to the missing member(s), it'll cause a file corruption situation, whereas the portion of the writes to the missing devices aren't written effectively. This patch changes this behavior: we check if the block device's gendisk is UP when submitting the BIO to the array member, and if it isn't, we flag the md device as MD_BROKEN and fail subsequent I/Os to that device; a read request to the array requiring data from a valid member is still completed. While flagging the device as MD_BROKEN, we also show a rate-limited warning in the kernel log. A new array state 'broken' was added too: it mimics the state 'clean' in every aspect, being useful only to distinguish if the array has some member missing. We rely on the MD_BROKEN flag to put the array in the 'broken' state. This state cannot be written in 'array_state' as it just shows one or more members of the array are missing but acts like 'clean', it wouldn't make sense to write it. With this patch, the filesystem reacts much faster to the event of missing array member: after some I/O errors, ext4 for instance aborts the journal and prevents corruption. Without this change, we're able to keep writing in the disk and after a machine reboot, e2fsck shows some severe fs errors that demand fixing. This patch was tested in ext4 and xfs filesystems, and requires a 'mdadm' counterpart to handle the 'broken' state. Cc: Song Liu <songliubraving@fb.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Song Liu <songliubraving@fb.com>
author: Guilherme G. Piccoli <gpiccoli@canonical.com> 2019-09-03 16:49:00 -0300
committer: Song Liu <songliubraving@fb.com> 2019-09-03 14:49:28 -0700
commit: 62f7b1989c02feed9274131b2fd5e990de4aba6f (patch)
tree: d6bf6a9b8e10bc8c6aa2e1cd27dd82b463bd0f18 /drivers/md/raid0.c
parent: a22a9602b88fabf10847f238ff81fde5f906fef7 (diff)
1 files changed, 6 insertions, 0 deletions
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index bf5cf184a260..bc422eae2c95 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -586,6 +586,12 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 
 	zone = find_zone(mddev->private, &sector);
 	tmp_dev = map_sector(mddev, zone, sector, &sector);
+
+	if (unlikely(is_mddev_broken(tmp_dev, "raid0"))) {
+		bio_io_error(bio);
+		return true;
+	}
+
 	bio_set_dev(bio, tmp_dev->bdev);
 	bio->bi_iter.bi_sector = sector + zone->dev_start +
 		tmp_dev->data_offset;
author	Guilherme G. Piccoli <gpiccoli@canonical.com>	2019-09-03 16:49:00 -0300
committer	Song Liu <songliubraving@fb.com>	2019-09-03 14:49:28 -0700
commit	62f7b1989c02feed9274131b2fd5e990de4aba6f (patch)
tree	d6bf6a9b8e10bc8c6aa2e1cd27dd82b463bd0f18 /drivers/md/raid0.c
parent	a22a9602b88fabf10847f238ff81fde5f906fef7 (diff)