summaryrefslogtreecommitdiff
path: root/drivers/md/raid1.c
AgeCommit message (Collapse)Author
2021-10-18md/raid1: use rdev in raid1_write_request directlyGuoqing Jiang
We already get rdev from conf->mirrors[i].rdev at the beginning of the loop, so just use it. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18md/raid1: only allocate write behind bio for WriteMostly deviceGuoqing Jiang
Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the size of behind bio is not bigger than BIO_MAX_VECS sectors. Unfortunately the same calltrace still could happen since an array could enable write-behind without write mostly device. To match the manpage of mdadm (which says "write-behind is only attempted on drives marked as write-mostly"), we need to check WriteMostly flag to avoid such unexpected behavior. [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25 Cc: stable@vger.kernel.org # v5.12+ Cc: Jens Stutte <jens@chianterastutte.eu> Reported-by: Jens Stutte <jens@chianterastutte.eu> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "Sitting on top of the core block changes, here are the driver changes for the 5.15 merge window: - NVMe updates via Christoph: - suspend improvements for devices with an HMB (Keith Busch) - handle double completions more gacefull (Sagi Grimberg) - cleanup the selects for the nvme core code a bit (Sagi Grimberg) - don't update queue count when failing to set io queues (Ruozhu Li) - various nvmet connect fixes (Amit Engel) - cleanup lightnvm leftovers (Keith Busch, me) - small cleanups (Colin Ian King, Hou Pu) - add tracing for the Set Features command (Hou Pu) - CMB sysfs cleanups (Keith Busch) - add a mutex_destroy call (Keith Busch) - remove lightnvm subsystem. It's served its purpose and ultimately led to zoned nvme support, we no longer need it (Christoph) - revert floppy O_NDELAY fix (Denis) - nbd fixes (Hou, Pavel, Baokun) - nbd locking fixes (Tetsuo) - nbd device removal fixes (Christoph) - raid10 rcu warning fix (Xiao) - raid1 write behind fix (Guoqing) - rnbd fixes (Gioh, Md Haris) - misc fixes (Colin)" * tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits) Revert "floppy: reintroduce O_NDELAY fix" raid1: ensure write behind bio has less than BIO_MAX_VECS sectors md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard nbd: remove nbd->destroy_complete nbd: only return usable devices from nbd_find_unused nbd: set nbd->index before releasing nbd_index_mutex nbd: prevent IDR lookups from finding partially initialized devices nbd: reset NBD to NULL when restarting in nbd_genl_connect nbd: add missing locking to the nbd_dev_add error path nvme: remove the unused NVME_NS_* enum nvme: remove nvm_ndev from ns nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers block: nbd: add sanity check for first_minor nvmet: check that host sqsize does not exceed ctrl MQES nvmet: avoid duplicate qid in connect cmd nvmet: pass back cntlid on successful completion nvme-rdma: don't update queue count when failing to set io queues nvme-tcp: don't update queue count when failing to set io queues nvme-tcp: pair send_mutex init with destroy nvme: allow user toggling hmb usage ...
2021-08-27raid1: ensure write behind bio has less than BIO_MAX_VECS sectorsGuoqing Jiang
We can't split write behind bio with more than BIO_MAX_VECS sectors, otherwise the below call trace was triggered because we could allocate oversized write behind bio later. [ 8.097936] bvec_alloc+0x90/0xc0 [ 8.098934] bio_alloc_bioset+0x1b3/0x260 [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] [ 8.100988] ? __bio_clone_fast+0xa8/0xe0 [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod] [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod] [ 8.104084] submit_bio_noacct+0x139/0x530 [ 8.105127] submit_bio+0x78/0x1d0 [ 8.106163] ext4_io_submit+0x48/0x60 [ext4] [ 8.107242] ext4_writepages+0x652/0x1170 [ext4] [ 8.108300] ? do_writepages+0x41/0x100 [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4] [ 8.110406] do_writepages+0x41/0x100 [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100 [ 8.112513] file_write_and_wait_range+0x61/0xb0 [ 8.113564] ext4_sync_file+0x73/0x370 [ext4] [ 8.114607] __x64_sys_fsync+0x33/0x60 [ 8.115635] do_syscall_64+0x33/0x40 [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae Thanks for the comment from Christoph. [1]. https://bugs.archlinux.org/task/70992 Cc: stable@vger.kernel.org # v5.12+ Reported-by: Jens Stutte <jens@chianterastutte.eu> Tested-by: Jens Stutte <jens@chianterastutte.eu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-08-04Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.14 * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid10: properly indicate failure when ending a failed write request
2021-07-23md/raid10: properly indicate failure when ending a failed write requestWei Shuyu
Similar to [1], this patch fixes the same bug in raid10. Also cleanup the comments. [1] commit 2417b9869b81 ("md/raid1: properly indicate failure when ending a failed write request") Cc: stable@vger.kernel.org Fixes: 7cee6d4e6035 ("md/raid10: end bio when the device faulty") Signed-off-by: Wei Shuyu <wsy@dogben.com> Acked-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: enable io accountingGuoqing Jiang
For raid1, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r1bio accordingly. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: rename print_msg with r1bio_existedGuoqing Jiang
The caller of raid1_read_request could pass NULL or a valid pointer for "struct r1bio *r1_bio", so it actually means whether r1_bio is existed or not. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-04-23md/raid1: properly indicate failure when ending a failed write requestPaul Clements
This patch addresses a data corruption bug in raid1 arrays using bitmaps. Without this fix, the bitmap bits for the failed I/O end up being cleared. Since we are in the failure leg of raid1_end_write_request, the request either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded). Fixes: eeba6809d8d5 ("md/raid1: end bio when the device faulty") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Paul Clements <paul.clements@us.sios.com> Signed-off-by: Song Liu <song@kernel.org>
2021-01-27md: remove bio_alloc_mddevChristoph Hellwig
bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is initialized in md_run, so it always must be initialized as well. Just open code the remaining call to bio_alloc_bioset. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24block: store a block_device pointer in struct bioChristoph Hellwig
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08writeback: remove bdi->congested_fnChristoph Hellwig
Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: rename generic_make_request to submit_bio_noacctChristoph Hellwig
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-13md/raid1: release pending accounting for an I/O only after write-behind is ↵David Jeffery
also finished When using RAID1 and write-behind, md can deadlock when errors occur. With write-behind, r1bio structs can be accounted by raid1 as queued but not counted as pending. The pending count is dropped when the original bio is returned complete but write-behind for the r1bio may still be active. This breaks the accounting used in some conditions to know when the raid1 md device has reached an idle state. It can result in calls to freeze_array deadlocking. freeze_array will never complete from a negative "unqueued" value being calculated due to a queued count larger than the pending count. To properly account for write-behind, move the call to allow_barrier from call_bio_endio to raid_end_bio_io. When using write-behind, md can call call_bio_endio before all write-behind I/O is complete. Using raid_end_bio_io for the point to call allow_barrier will release the pending count at a point where all I/O for an r1bio, even write-behind, is done. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13md/raid1: introduce wait_for_serializationGuoqing Jiang
Previously, we call check_and_add_serial when serialization is enabled for write IO, but it could allocate and free memory back and forth. Now, let's just get an element from memory pool with the new function, then insert node to rb tree if no collision happens. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13md/raid1: use bucket based mechanism for IO serializationGuoqing Jiang
Since raid1 had already used bucket based mechanism to reduce the conflict between write IO and resync IO, it is possible to speed up performance for io serialization with refer to the same mechanism. To align with the barrier bucket mechanism, we created arrays (with the same number of BARRIER_BUCKETS_NR) for spinlock, rb tree and waitqueue. Then we can reduce lock competition with multiple spinlocks, boost search performance with multiple rb trees and also reduce thundering herd problem with multiple waitqueues. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13md: introduce a new struct for IO serializationGuoqing Jiang
Obviously, IO serialization could cause the degradation of performance a lot. In order to reduce the degradation, so a rb interval tree is added in raid1 to speed up the check of collision. So, a rb root is needed in md_rdev, then abstract all the serialize related members to a new struct (serial_in_rdev), embed it into md_rdev. Of course, we need to free the struct if it is not needed anymore, so rdev/rdevs_uninit_serial are added accordingly. And they should be called when destroty memory pool or can't alloc memory. And we need to consider to call mddev_destroy_serial_pool in case serialize_policy/write-behind is disabled, bitmap is destroyed or in __md_stop_writes. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13raid1: serialize the overlap writeGuoqing Jiang
Before dispatch write bio, raid1 array which enables serialize_policy need to check if overlap exists between this bio and previous on-flying bios. If there is overlap, then it has to wait until the collision is disappeared. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13md: rename wb stuffsGuoqing Jiang
Previously, wb_info_pool and wb_list stuffs are introduced to address potential data inconsistence issue for write behind device. Now rename them to serial related name, since the same mechanism will be used to address reorder overlap write issue for raid1. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-12-11md: raid1: check rdev before reference in raid1_sync_request funcZhiqiang Liu
In raid1_sync_request func, rdev should be checked before reference. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-11-11md/raid1: avoid soft lockup under high loadHannes Reinecke
As all I/O is being pushed through a kernel thread the softlockup watchdog might be triggered under high load. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-10-24md: improve handling of bio with REQ_PREFLUSH in md_flush_request()David Jeffery
If pers->make_request fails in md_flush_request(), the bio is lost. To fix this, pass back a bool to indicate if the original make_request call should continue to handle the I/O and instead of assuming the flush logic will push it to completion. Convert md_flush_request to return a bool and no longer calls the raid driver's make_request function. If the return is true, then the md flush logic has or will complete the bio and the md make_request call is done. If false, then the md make_request function needs to keep processing like it is a normal bio. Let the original call to md_handle_request handle any need to retry sending the bio to the raid driver's make_request function should it be needed. Also mark md_flush_request and the make_request function pointer as __must_check to issue warnings should these critical return values be ignored. Fixes: 2bc13b83e629 ("md: batch flush requests.") Cc: stable@vger.kernel.org # # v4.19+ Cc: NeilBrown <neilb@suse.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-09-03md/raid1: fail run raid1 array when active disk less than oneYufen Yu
When run test case: mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] --assume-clean --bitmap=internal mdadm -S /dev/md1 mdadm -A /dev/md1 /dev/sd[b-c] --run --force mdadm --zero /dev/sda mdadm /dev/md1 -a /dev/sda echo offline > /sys/block/sdc/device/state echo offline > /sys/block/sdb/device/state sleep 5 mdadm -S /dev/md1 echo running > /sys/block/sdb/device/state echo running > /sys/block/sdc/device/state mdadm -A /dev/md1 /dev/sd[a-c] --run --force mdadm run fail with kernel message as follow: [ 172.986064] md: kicking non-fresh sdb from array! [ 173.004210] md: kicking non-fresh sdc from array! [ 173.022383] md/raid1:md1: active with 0 out of 4 mirrors [ 173.022406] md1: failed to create bitmap (-5) In fact, when active disk in raid1 array less than one, we need to return fail in raid1_run(). Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-08-07raid1: factor out a common routine to handle the completion of sync writeHou Tao
It's just code clean-up. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-08-07md: allow last device to be forcibly removed from RAID1/RAID10.Guoqing Jiang
When the 'last' device in a RAID1 or RAID10 reports an error, we do not mark it as failed. This would serve little purpose as there is no risk of losing data beyond that which is obviously lost (as there is with RAID5), and there could be other sectors on the device which are readable, and only readable from this device. This in general this maximises access to data. However the current implementation also stops an admin from removing the last device by direct action. This is rarely useful, but in many case is not harmful and can make automation easier by removing special cases. Also, if an attempt to write metadata fails the device must be marked as faulty, else an infinite loop will result, attempting to update the metadata on all non-faulty devices. So add 'fail_last_dev' member to 'struct mddev', then we can bypasses the 'last disk' checks for RAID1 and RAID10, and control the behavior per array by change sysfs node. Signed-off-by: NeilBrown <neilb@suse.de> [add sysfs node for fail_last_dev by Guoqing] Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-08-07md/raid1: end bio when the device faultyYufen Yu
When write bio return error, it would be added to conf->retry_list and wait for raid1d thread to retry write and acknowledge badblocks. In narrow_write_error(), the error bio will be split in the unit of badblock shift (such as one sector) and raid1d thread issues them one by one. Until all of the splited bio has finished, raid1d thread can go on processing other things, which is time consuming. But, there is a scene for error handling that is not necessary. When the device has been set faulty, flush_bio_list() may end bios in pending_bio_list with error status. Since these bios has not been issued to the device actually, error handlding to retry write and acknowledge badblocks make no sense. Even without that scene, when the device is faulty, badblocks info can not be written out to the device. Thus, we also no need to handle the error IO. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-08-07raid1: use an int as the return value of raise_barrier()Hou Tao
Using a sector_t as the return value is misleading, because raise_barrier() only return 0 or -EINTR. Also add comments for the return values of raise_barrier(). Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-06-26md/raid1: Fix a warning message in remove_wb()Dan Carpenter
The WARN_ON() macro doesn't take an error message, it just takes a condition. I've changed this to use WARN(1, "...") instead. Fixes: 3e148a320979 ("md/raid1: fix potential data inconsistency issue with write behind device") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-06-20md/raid1: fix potential data inconsistency issue with write behind deviceGuoqing Jiang
For write-behind mode, we think write IO is complete once it has reached all the non-writemostly devices. It works fine for single queue devices. But for multiqueue device, if there are lots of IOs come from upper layer, then the write-behind device could issue those IOs to different queues, depends on the each queue's delay, so there is no guarantee that those IOs can arrive in order. To address the issue, we need to check the collision among write behind IOs, we can only continue without collision, otherwise wait for the completion of previous collisioned IO. And WBCollision is introduced for multiqueue device which is worked under write-behind mode. But this patch doesn't handle below cases which could have the data inconsistency issue as well, these cases will be handled in later patches. 1. modify max_write_behind by write backlog node. 2. add or remove array's bitmap dynamically. 3. the change of member disk. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-06-15md: raid1-10: Unify r{1,10}bio_pool_freeMarcos Paulo de Souza
Avoiding duplicated code, since they just execute a kfree. Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-15md/raid1: get rid of extra blank line and spaceYufen Yu
This patch get rid of extra blank line and space, and add necessary space for code. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-15drivers: md: Unify common definitions of raid1 and raid10Marcos Paulo de Souza
These definitions are being moved to raid1-10.c. Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version you should have received a copy of the gnu general public license for example usr src linux copying if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 20 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.552543146@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-30block: remove the i argument to bio_for_each_segment_allChristoph Hellwig
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-08Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "Not a huge amount of changes in this round, the biggest one is that we finally have Mings multi-page bvec support merged. Apart from that, this pull request contains: - Small series that avoids quiescing the queue for sysfs changes that match what we currently have (Aleksei) - Series of bcache fixes (via Coly) - Series of lightnvm fixes (via Mathias) - NVMe pull request from Christoph. Nothing major, just SPDX/license cleanups, RR mp policy (Hannes), and little fixes (Bart, Chaitanya). - BFQ series (Paolo) - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection for the fast path (Jianchao) - fops->iopoll() added for async IO polling, this is a feature that the upcoming io_uring interface will use (Christoph, me) - Partition scan loop fixes (Dongli) - mtip32xx conversion from managed resource API (Christoph) - cdrom registration race fix (Guenter) - MD pull from Song, two minor fixes. - Various documentation fixes (Marcos) - Multi-page bvec feature. This brings a lot of nice improvements with it, like more efficient splitting, larger IOs can be supported without growing the bvec table size, and so on. (Ming) - Various little fixes to core and drivers" * tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits) block: fix updating bio's front segment size block: Replace function name in string with __func__ nbd: propagate genlmsg_reply return code floppy: remove set but not used variable 'q' null_blk: fix checking for REQ_FUA block: fix NULL pointer dereference in register_disk fs: fix guard_bio_eod to check for real EOD errors blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map block: optimize bvec iteration in bvec_iter_advance block: introduce mp_bvec_for_each_page() for iterating over page block: optimize blk_bio_segment_split for single-page bvec block: optimize __blk_segment_map_sg() for single-page bvec block: introduce bvec_nth_page() iomap: wire up the iopoll method block: add bio_set_polled() helper block: wire up block device iopoll method fs: add an iopoll method to struct file_operations loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() loop: do not print warn message if partition scan is successful block: bounce: make sure that bvec table is updated ...
2019-02-15block: allow bio_for_each_segment_all() to iterate over multi-page bvecMing Lei
This patch introduces one extra iterator variable to bio_for_each_segment_all(), then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. Given it is just one mechannical & simple change on all bio_for_each_segment_all() users, this patch does tree-wide change in one single patch, so that we can avoid to use a temporary helper for this conversion. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-12md/raid1: don't clear bitmap bits on interrupted recovery.Nate Dailey
sync_request_write no longer submits writes to a Faulty device. This has the unfortunate side effect that bitmap bits can be incorrectly cleared if a recovery is interrupted (previously, end_sync_write would have prevented this). This means the next recovery may not copy everything it should, potentially corrupting data. Add a function for doing the proper md_bitmap_end_sync, called from end_sync_write and the Faulty case in sync_request_write. backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync Cc: stable@vger.kernel.org 4.14+ Fixes: 0c9d5b127f69 ("md/raid1: avoid reusing a resync bio after error handling.") Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Tested-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-02-04raid1: simplify raid1_error functionYufen Yu
Remove redundance set_bit and let code simplify. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2018-10-14MD: fix invalid stored role for a disk - try2Shaohua Li
Commit d595567dc4f0 (MD: fix invalid stored role for a disk) broke linear hotadd. Let's only fix the role for disks in raid1/10. Based on Guoqing's original patch. Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-08-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a new driver for Rohm BU21029 touch controller - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free - updates to Atmel, eeti. pxrc and iforce drivers - assorted driver cleanups and fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits) MAINTAINERS: Add PhoenixRC Flight Controller Adapter Input: do not use WARN() in input_alloc_absinfo() Input: mark expected switch fall-throughs Input: raydium_i2c_ts - use true and false for boolean values Input: evdev - switch to bitmap API Input: gpio-keys - switch to bitmap_zalloc() Input: elan_i2c_smbus - cast sizeof to int for comparison bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free() md: Avoid namespace collision with bitmap API dm: Avoid namespace collision with bitmap API Input: pm8941-pwrkey - add resin entry Input: pm8941-pwrkey - abstract register offsets and event code Input: iforce - reorganize joystick configuration lists Input: atmel_mxt_ts - move completion to after config crc is updated Input: atmel_mxt_ts - don't report zero pressure from T9 Input: atmel_mxt_ts - zero terminate config firmware file Input: atmel_mxt_ts - refactor config update code to add context struct Input: atmel_mxt_ts - config CRC may start at T71 Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE ...
2018-08-01md: Avoid namespace collision with bitmap APIAndy Shevchenko
bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods. On the other hand MD bitmap API is special case. Adding 'md' prefix to it to avoid name space collision. No functional changes intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Shaohua Li <shli@kernel.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-09Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD updates from Shaohua Li: "A few fixes of MD for this merge window. Mostly bug fixes: - raid5 stripe batch fix from Amy - Read error handling for raid1 FailFast device from Gioh - raid10 recovery NULL pointer dereference fix from Guoqing - Support write hint for raid5 stripe cache from Mariusz - Fixes for device hot add/remove from Neil and Yufen - Improve flush bio scalability from Xiao" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: MD: fix lock contention for flush bios md/raid5: Assigning NULL to sh->batch_head before testing bit R5_Overlap of a stripe md/raid1: add error handling of read error from FailFast device md: fix NULL dereference of mddev->pers in remove_and_add_spares() raid5: copy write hint from origin bio to stripe md: fix two problems with setting the "re-add" device state. raid10: check bio in r10buf_pool_free to void NULL pointer dereference md: fix an error code format and remove unsed bio_sector
2018-05-30md: convert to bioset_init()/mempool_init()Kent Overstreet
Convert md to embedded bio sets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-17md/raid1: add error handling of read error from FailFast deviceGioh Kim
Current handle_read_error() function calls fix_read_error() only if md device is RW and rdev does not include FailFast flag. It does not handle a read error from a RW device including FailFast flag. I am not sure it is intended. But I found that write IO error sets rdev faulty. The md module should handle the read IO error and write IO error equally. So I think read IO error should set rdev faulty. Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-05-01md: fix an error code format and remove unsed bio_sectorYufen Yu
Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-09raid1: copy write hint from master bio to behind bioMariusz Dabrowski
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-09md/raid1: exit sync request if MD_RECOVERY_INTR is setYufen Yu
We met a sync thread stuck as follows: raid1_sync_request+0x2c9/0xb50 md_do_sync+0x983/0xfa0 md_thread+0x11c/0x160 kthread+0x111/0x130 ret_from_fork+0x35/0x40 0xffffffffffffffff At the same time, there is a stuck mdadm thread (mdadm --manage /dev/md2 --add /dev/sda). It is trying to stop the sync thread: kthread_stop+0x42/0xf0 md_unregister_thread+0x3a/0x70 md_reap_sync_thread+0x15/0x160 action_store+0x142/0x2a0 md_attr_store+0x6c/0xb0 kernfs_fop_write+0x102/0x180 __vfs_write+0x33/0x170 vfs_write+0xad/0x1a0 SyS_write+0x52/0xc0 do_syscall_64+0x6e/0x190 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Debug tools show that the sync thread is waiting in raise_barrier(), until raid1d() end all normal IO bios into bio_end_io_list(introduced in commit 55ce74d4bfe1). But, raid1d() cannot end these bios if MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock and then clear the bit in md_check_recovery(). However, the lock is holding by mdadm in action_store(). Thus, there is a loop: mdadm waiting for sync thread to stop, sync thread waiting for raid1d() to end bios, raid1d() waiting for mdadm to release mddev->reconfig_mutex lock and then it can end bios. Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(), so that sync thread can exit while mdadm is stoping the sync thread. Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>