summaryrefslogtreecommitdiff
path: root/drivers/block/drbd/drbd_main.c
AgeCommit message (Collapse)Author
2017-07-05Merge branch 'work.misc-set_fs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc user access cleanups from Al Viro: "The first pile is assorted getting rid of cargo-culted access_ok(), cargo-culted set_fs() and field-by-field copyouts. The same description applies to a lot of stuff in other branches - this is just the stuff that didn't fit into a more specific topical branch" * 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Switch flock copyin/copyout primitives to copy_{from,to}_user() fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found fs/fcntl: f_setown, avoid undefined behaviour fs/fcntl: f_setown, allow returning error lpfc debugfs: get rid of pointless access_ok() adb: get rid of pointless access_ok() isdn: get rid of pointless access_ok() compat statfs: switch to copy_to_user() fs/locks: don't mess with the address limit in compat_fcntl64 nfsd_readlink(): switch to vfs_get_link() drbd: ->sendpage() never needed set_fs() fs/locks: pass kernel struct flock to fcntl_getlk/setlk fs: locks: Fix some troubles at kernel-doc comments
2017-06-27block: don't bother with bounce limits for make_request driversChristoph Hellwig
We only call blk_queue_bounce for request-based drivers, so stop messing with it for make_request based drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18drbd: use bio_clone_fast() instead of bio_clone()NeilBrown
drbd does not modify the bi_io_vec of the cloned bio, so there is no need to clone that part. So bio_clone_fast() is the better choice. For bio_clone_fast() we need to specify a bio_set. We could use fs_bio_set, which bio_clone() uses, or drbd_md_io_bio_set, which drbd uses for metadata, but it is generally best to avoid sharing bio_sets unless you can be certain that there are no interdependencies. So create a new bio_set, drbd_io_bio_set, and use bio_clone_fast(). Also remove a "XXX cannot fail ???" comment because it definitely cannot fail - bio_clone_fast() doesn't fail if the GFP flags allow for sleeping. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18blk: make the bioset rescue_workqueue optional.NeilBrown
This patch converts bioset_create() to not create a workqueue by default, so alloctions will never trigger punt_bios_to_rescuer(). It also introduces a new flag BIOSET_NEED_RESCUER which tells bioset_create() to preserve the old behavior. All callers of bioset_create() that are inside block device drivers, are given the BIOSET_NEED_RESCUER flag. biosets used by filesystems or other top-level users do not need rescuing as the bio can never be queued behind other bios. This includes fs_bio_set, blkdev_dio_pool, btrfs_bioset, xfs_ioend_bioset, and one allocated by target_core_iblock.c. biosets used by md/raid do not need rescuing as their usage was recently audited and revised to never risk deadlock. It is hoped that most, if not all, of the remaining biosets can end up being the non-rescued version. Reviewed-by: Christoph Hellwig <hch@lst.de> Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes) Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18blk: replace bioset_create_nobvec() with a flags arg to bioset_create()NeilBrown
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-05-27drbd: ->sendpage() never needed set_fs()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-08block: remove the discard_zeroes_data flagChristoph Hellwig
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08drbd: implement REQ_OP_WRITE_ZEROESChristoph Hellwig
It seems like DRBD assumes its on the wire TRIM request always zeroes data. Use that fact to implement REQ_OP_WRITE_ZEROES. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-03Merge branch 'WIP.sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched.h split-up from Ingo Molnar: "The point of these changes is to significantly reduce the <linux/sched.h> header footprint, to speed up the kernel build and to have a cleaner header structure. After these changes the new <linux/sched.h>'s typical preprocessed size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K lines), which is around 40% faster to build on typical configs. Not much changed from the last version (-v2) posted three weeks ago: I eliminated quirks, backmerged fixes plus I rebased it to an upstream SHA1 from yesterday that includes most changes queued up in -next plus all sched.h changes that were pending from Andrew. I've re-tested the series both on x86 and on cross-arch defconfigs, and did a bisectability test at a number of random points. I tried to test as many build configurations as possible, but some build breakage is probably still left - but it should be mostly limited to architectures that have no cross-compiler binaries available on kernel.org, and non-default configurations" * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits) sched/headers: Clean up <linux/sched.h> sched/headers: Remove #ifdefs from <linux/sched.h> sched/headers: Remove the <linux/topology.h> include from <linux/sched.h> sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h> sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h> sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h> sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h> sched/headers: Remove <linux/sched.h> from <linux/sched/init.h> sched/core: Remove unused prefetch_stack() sched/headers: Remove <linux/rculist.h> from <linux/sched.h> sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h> sched/headers: Remove <linux/signal.h> from <linux/sched.h> sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> sched/headers: Remove the runqueue_is_locked() prototype sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h> sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h> sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h> sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h> ...
2017-03-02Merge branch 'work.sendmsg' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs sendmsg updates from Al Viro: "More sendmsg work. This is a fairly separate isolated stuff (there's a continuation around lustre, but that one was too late to soak in -next), thus the separate pull request" * 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ncpfs: switch to sock_sendmsg() ncpfs: don't mess with manually advancing iovec on send ncpfs: sendmsg does *not* bugger iovec these days ceph_tcp_sendpage(): use ITER_BVEC sendmsg afs_send_pages(): use ITER_BVEC rds: remove dead code ceph: switch to sock_recvmsg() usbip_recv(): switch to sock_recvmsg() iscsi_target: deal with short writes on the tx side [nbd] pass iov_iter to nbd_xmit() [nbd] switch sock_xmit() to sock_{send,recv}msg() [drbd] use sock_sendmsg()
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-28Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds
Pull IDR rewrite from Matthew Wilcox: "The most significant part of the following is the patch to rewrite the IDR & IDA to be clients of the radix tree. But there's much more, including an enhancement of the IDA to be significantly more space efficient, an IDR & IDA test suite, some improvements to the IDR API (and driver changes to take advantage of those improvements), several improvements to the radix tree test suite and RCU annotations. The IDR & IDA rewrite had a good spin in linux-next and Andrew's tree for most of the last cycle. Coupled with the IDR test suite, I feel pretty confident that any remaining bugs are quite hard to hit. 0-day did a great job of watching my git tree and pointing out problems; as it hit them, I added new test-cases to be sure not to be caught the same way twice" Willy goes on to expand a bit on the IDR rewrite rationale: "The radix tree and the IDR use very similar data structures. Merging the two codebases lets us share the memory allocation pools, and results in a net deletion of 500 lines of code. It also opens up the possibility of exposing more of the features of the radix tree to users of the IDR (and I have some interesting patches along those lines waiting for 4.12) It also shrinks the size of the 'struct idr' from 40 bytes to 24 which will shrink a fair few data structures that embed an IDR" * 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax: (32 commits) radix tree test suite: Add config option for map shift idr: Add missing __rcu annotations radix-tree: Fix __rcu annotations radix-tree: Add rcu_dereference and rcu_assign_pointer calls radix tree test suite: Run iteration tests for longer radix tree test suite: Fix split/join memory leaks radix tree test suite: Fix leaks in regression2.c radix tree test suite: Fix leaky tests radix tree test suite: Enable address sanitizer radix_tree_iter_resume: Fix out of bounds error radix-tree: Store a pointer to the root in each node radix-tree: Chain preallocated nodes through ->parent radix tree test suite: Dial down verbosity with -v radix tree test suite: Introduce kmalloc_verbose idr: Return the deleted entry from idr_remove radix tree test suite: Build separate binaries for some tests ida: Use exceptional entries for small IDAs ida: Move ida_bitmap to a percpu variable Reimplement IDR and IDA using the radix tree radix-tree: Add radix_tree_iter_delete ...
2017-02-21Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: - blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12. - Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others. - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API. - Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar. - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. - Support for ranged discard requests and discard merging, also from Christoph. - Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks. - Error code fixup for gdrom from Christophe. - cciss pci irq allocation cleanup from Christoph. - Making the cdrom device operations read only, from Kees Cook. - Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan. - Set of fixes and updates for lightnvm, from Matias and Javier. - A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd. - Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device. - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it. - SG gap merging fix from Ming Lei for block. - Loop fix also from Ming, fixing a race and crash between setting loop status and IO. - Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations. - Double free fix from cgroup writeback, from Tejun. - Another double free fix in blkcg, from Hou Tao. - Partition overflow fix for EFI from Alden Tondettar. * tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits) nvme: Check for Security send/recv support before issuing commands. block/sed-opal: allocate struct opal_dev dynamically block/sed-opal: tone down not supported warnings block: don't defer flushes on blk-mq + scheduling blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers blk-mq: don't special case flush inserts for blk-mq-sched blk-mq-sched: don't add flushes to the head of requeue queue blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not block: do not allow updates through sysfs until registration completes lightnvm: set default lun range when no luns are specified lightnvm: fix off-by-one error on target initialization Maintainers: Modify SED list from nvme to block Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN uapi: sed-opal fix IOW for activate lsp to use correct struct cdrom: Make device operations read-only elevator: fix loading wrong elevator type for blk-mq devices cciss: switch to pci_irq_alloc_vectors block/loop: fix race between I/O and set_status blk-mq-sched: don't hold queue_lock when calling exit_icq block: set make_request_fn manually in blk_mq_update_nr_hw_queues ...
2017-02-13idr: Return the deleted entry from idr_removeMatthew Wilcox
It is a relatively common idiom (8 instances) to first look up an IDR entry, and then remove it from the tree if it is found, possibly doing further operations upon the entry afterwards. If we change idr_remove() to return the removed object, all of these users can save themselves a walk of the IDR tree. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-02block: Use pointer to backing_dev_info from request_queueJan Kara
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-14locking/atomic, kref: Kill kref_sub()Peter Zijlstra
By general sentiment kref_sub() is a bad interface, make it go away. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-26[drbd] use sock_sendmsg()Al Viro
... and keep ->msg_iter through the loop Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-09drbd: Fix kernel_sendmsg() usage - potential NULL derefRichard Weinberger
Don't pass a size larger than iov_len to kernel_sendmsg(). Otherwise it will cause a NULL pointer deref when kernel_sendmsg() returns with rv < size. DRBD as external module has been around in the kernel 2.4 days already. We used to be compatible to 2.4 and very early 2.6 kernels, we used to use rv = sock_sendmsg(sock, &msg, iov.iov_len); then later changed to rv = kernel_sendmsg(sock, &msg, &iov, 1, size); when we should have used rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives changes that, and exposes our long standing error. Even with this error exposed, to trigger the bug, we would need to have an environment (config or otherwise) causing us to not use sendpage() for larger transfers, a failing connection, and have it fail "just at the right time". Apparently that was unlikely enough for most, so this went unnoticed for years. Still, it is known to trigger at least some of these, and suspected for the others: [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546 [3] https://ubuntuforums.org/showthread.php?t=2336150 [4] http://e2.howsolveproblem.com/i/1175162/ This should go into 4.9, and into all stable branches since and including v4.0, which is the first to contain the exposing change. It is correct for all stable branches older than that as well (which contain the DRBD driver; which is 2.6.33 and up). It requires a small "conflict" resolution for v4.4 and earlier, with v4.5 we dropped the comment block immediately preceding the kernel_sendmsg(). Fixes: b411b3637fa7 ("The DRBD driver") Cc: <stable@vger.kernel.org> # 2.6.33.x- Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Signed-off-by: Richard Weinberger <richard@nod.at> [changed oneliner to be "obvious" without context; more verbose message] Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: code cleanups without semantic changesFabian Frederick
This contains various cosmetic fixes ranging from simple typos to const-ifying, and using booleans properly. Original commit messages from Fabian's patch set: drbd: debugfs: constify drbd_version_fops drbd: use seq_put instead of seq_print where possible drbd: include linux/uaccess.h instead of asm/uaccess.h drbd: use const char * const for drbd strings drbd: kerneldoc warning fix in w_e_end_data_req() drbd: use unsigned for one bit fields drbd: use bool for peer is_ states drbd: fix typo drbd: use | for bitmask combination drbd: use true/false for bool drbd: fix drbd_bm_init() comments drbd: introduce peer state union drbd: fix maybe_pull_ahead() locking comments drbd: use bool for growing drbd: remove redundant declarations drbd: replace if/BUG by BUG_ON Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: introduce WRITE_SAME supportLars Ellenberg
We will support WRITE_SAME, if * all peers support WRITE_SAME (both in kernel and DRBD version), * all peer devices support WRITE_SAME * logical_block_size is identical on all peers. We may at some point introduce a fallback on the receiving side for devices/kernels that do not support WRITE_SAME, by open-coding a submit loop. But not yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: adjust assert in w_bitmap_io to account for BM_LOCKED_CHANGE_ALLOWEDLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: Implement handling of thinly provisioned storage on resync target nodesPhilipp Reisner
If during resync we read only zeroes for a range of sectors assume that these secotors can be discarded on the sync target node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: bitmap bulk IO: do not always suspend IOLars Ellenberg
The intention was to only suspend IO if some normal bitmap operation is supposed to be locked out, not always. If the bulk operation is flaged as BM_LOCKED_CHANGE_ALLOWED, we do not need to suspend IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSHMike Christie
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drbd: use bio op accessorsMike Christie
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12drbd: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-01-27drbd: Use shash and ahashHerbert Xu
This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-11-25drbd: don't block forever in disconnect during resync if fencing=r-a-stonithLars Ellenberg
Disconnect should wait for pending bitmap IO. But if that bitmap IO is not happening, because it is waiting for pending application IO, and there is no progress, because the fencing policy suspended application IO because of the disconnect, then we deadlock. The bitmap writeout in this case does not care for concurrent application IO, so there is no point waiting for it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: make drbd known to lsblk: use bd_link_disk_holderLars Ellenberg
lsblk should be able to pick up stacking device driver relations involving DRBD conveniently. Even though upstream kernel since 2011 says "DON'T USE THIS UNLESS YOU'RE ALREADY USING IT." a new user has been added since (bcache), which sets the precedences for us to use it as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: fix spurious alert level printkLars Ellenberg
When accessing out meta data area on disk, we double check the plausibility of the requested sector offsets, and are very noisy about it if they look suspicious. During initial read of our "superblock", for "external" meta data, this triggered because the range estimate returned by drbd_md_last_sector() was still wrong. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: use resource name in workqueueLars Ellenberg
Since kernel 3.3, we can use snprintf-style arguments to create a workqueue. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: Create a dedicated workqueue for sending acks on the control connectionPhilipp Reisner
The intention is to reduce CPU utilization. Recent measurements unveiled that the current performance bottleneck is CPU utilization on the receiving node. The asender thread became CPU limited. One of the main points is to eliminate the idr_for_each_entry() loop from the sending acks code path. One exception in that is sending back ping_acks. These stay in the ack-receiver thread. Otherwise the logic becomes too complicated for no added value. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: Rename asender to ack_receiverPhilipp Reisner
This prepares the next patch where the sending on the meta (or control) socket is moved to a dedicated workqueue. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: Fix locking across all resourcesAndreas Gruenbacher
Instead of using a rwlock for synchronizing state changes across resources, take the request locks of all resources for global state changes. Use resources_mutex to serialize global state changes. This means that taking the request lock of a resource is now enough to prevent changes of that resource. (Previously, a read lock on the global state lock was needed as well.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: Move enum write_ordering_e to drbd.hAndreas Gruenbacher
Also change the enum values to all-capital letters. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13block: kill merge_bvec_fn() completelyKent Overstreet
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02writeback: move backing_dev_info->state into bdi_writebackTejun Heo
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-24block, drbd: use mempool_create_slab_pool()David Rientjes
Mempools created for slab caches should use mempool_create_slab_pool(). Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Only use drbd_msg_put_info() in drbd_nl.cAndreas Gruenbacher
Avoid generic netlink calls in other parts of the code base. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Minor cleanupsAndreas Gruenbacher
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Use better variable namesAndreas Gruenbacher
Rename local variable 'ds' to 'disk_state' or 'data_size'. 'dgs' to 'digest_size' Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-07-10drbd: implicitly truncate cpu-maskLars Ellenberg
Don't error out with misleading "out of memory" if the cpu-mask has more bits set than there are CPUs. Just truncate to nr_cpu_ids implicitly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: Add in_flight_summaryLars Ellenberg
* Add details about pending meta data operations to in_flight_summary. * Report number of requests waiting for activity log transactions. * timing details of peer_requests to in_flight_summary. * FLUSH details DRBD devides the incoming request stream into "epochs", in which peers are allowed to re-order writes independendly. These epochs are separated by P_BARRIER on the replication link. Such barrier packets, depending on configuration, may cause the receiving side to drain the lower level device request queues and call blkdev_issue_flush(). This is known to be an other major source of latency in DRBD. Track timing details of calls to blkdev_issue_flush(), and add them to in_flight_summary. * data socket stats To be able to diagnose bottlenecks and root causes of "slow" IO on DRBD, it is useful to see network buffer stats along with the timing details of requests, peer requests, and meta data IO. * pending bitmap IO timing details to in_flight_summary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add basic hierarchyLars Ellenberg
Add new debugfs hierarchy /sys/kernel/debug/ drbd/ resources/ $resource_name/connections/peer/$volume_number/ $resource_name/volumes/$volume_number/ minors/$minor_number -> ../resources/$resource_name/volumes/$volume_number/ Followup commits will populate this hierarchy with files containing statistics, diagnostic information and some attribute data. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: track details of bitmap IOLars Ellenberg
Track start and submit time of bitmap operations, and add pending bitmap IO contexts to a new pending_bitmap_io list. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: improve throttling decisions of background resynchronisationLars Ellenberg
Background resynchronisation does some "side-stepping", or throttles itself, if it detects application IO activity, and the current resync rate estimate is above the configured "cmin-rate". What was not detected: if there is no application IO, because it blocks on activity log transactions. Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests, and count non-zero as application IO activity. This counter is exposed at proc_details level 2 and above. Also make sure to release the currently locked resync extent if we side-step due to such voluntary throttling. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: add lists to find oldest pending requestsLars Ellenberg
Adding requests to per-device fifo lists as soon as possible after allocating them leaves a simple list_first_entry_or_null() to find the oldest request, regardless what it is still waiting for. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: gather detailed timing statistics for drbd_requestsLars Ellenberg
Record (in jiffies) how much time a request spends in which stages. Followup commits will use and present this additional timing information so we can better locate and tackle the root causes of latency spikes, or present the backlog for asynchronous replication. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: track meta data IO intent, start and submit timeLars Ellenberg
For diagnostic purposes, track intent, start time and latest submit time of meta data IO. Move separate members from struct drbd_device into the embeded struct drbd_md_io. s/md_io_(page|in_use)/md_io.\1/ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>