summaryrefslogtreecommitdiff
path: root/drivers/block/drbd
AgeCommit message (Collapse)Author
2014-07-10drbd: get rid of drbd_queue_work_frontLars Ellenberg
The last user was al_write_transaction, if called with "delegate", and the last user to call it with "delegate = true" was the receiver thread, which has no need to delegate, but can call it himself. Finally drop the delegate parameter, drop the extra w_al_write_transaction callback, and drop drbd_queue_work_front. Do not (yet) change dequeue_work_item to dequeue_work_batch, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: use drbd_device_post_work() in more placesLars Ellenberg
This replaces the md_sync_work member of struct drbd_device by a new MD_SYNC "work bit" in device->flags. This replaces the resync_start_work member of struct drbd_device by a new RS_START "work bit" in device->flags. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: make sure disk cleanup happens in worker contextLars Ellenberg
The recent fix to put_ldev() (correct ordering of access to local_cnt and state.disk; memory barrier in __drbd_set_state) guarantees that the cleanup happens exactly once. However it does not yet guarantee that the cleanup happens from worker context, the last put_ldev() may still happen from atomic context, which must not happen: blkdev_put() may sleep. Fix this by scheduling the cleanup to the worker instead, using a couple more bits in device->flags and a new helper, drbd_device_post_work(). Generalized the "resync progress" work to cover these new work bits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: close race when detaching from diskLars Ellenberg
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: bd_release+0x21/0x70 Process drbd_w_t7146 Call Trace: close_bdev_exclusive drbd_free_ldev [drbd] drbd_ldev_destroy [drbd] w_after_state_ch [drbd] Race probably went like this: state.disk = D_FAILED ... first one to hit zero during D_FAILED: put_ldev() /* ----------------> 0 */ i = atomic_dec_return() if (i == 0) if (state.disk == D_FAILED) schedule_work(go_diskless) /* 1 <------ */ get_ldev_if_state() go_diskless() do_some_pre_cleanup() corresponding put_ldev(): force_state(D_DISKLESS) /* 0 <------ */ i = atomic_dec_return() if (i == 0) atomic_inc() /* ---------> 1 */ state.disk = D_DISKLESS schedule_work(after_state_ch) /* execution pre-empted by IRQ ? */ after_state_ch() put_ldev() i = atomic_dec_return() /* 0 */ if (i == 0) if (state.disk == D_DISKLESS) if (state.disk == D_DISKLESS) drbd_ldev_destroy() drbd_ldev_destroy(); Trying to fix this by checking the disk state *before* the atomic_dec_return(), which implies memory barriers, and by inserting extra memory barriers around the state assignment in __drbd_set_state(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: explicitly submit meta data requests with REQ_NOIDLELars Ellenberg
For some reason we have assumed NOIDLE was implied by one of the other flags we set. It is not (anymore?). Explicitly set REQ_NOIDLE for synchronous meta data updates, or we can seriously starve random writes when using CFQ. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: move set_disk_ro() to after we persisted the new roleLars Ellenberg
This probably does not have any real life impact, but we should first persist any potentially new UUID and other meta data flags, as well as our new role, before we allow/disallow write access. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: trigger tcp_push_pending_frames() for PING and PING_ACKLars Ellenberg
This should reduce latency for such in-DRBD-protocol "pings", and may help reduce spurious disconnect/reconnect cycles due to "PingAck did not arrive in time." Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: re-add lost conf_mutex protection in drbd_set_roleLars Ellenberg
The conf_update mutex used to be held while clearing the net_conf->discard_my_data flag inside drbd_set_role. It was moved into drbd_adm_set_role with drbd: allow parallel promote/demote actions but then replaced at that location by the newly introduced adm_mutex with drbd: Fix a potential deadlock in drbdsetup, introduce resource->adm_mutex And I simply forgot to put it back in at the original location. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: stop the meta data sync timer before open coded meta data syncLars Ellenberg
If we re-write all meta data due to resize, we have open-coded write-out of our meta data super block. Stop the md_sync_timer, it would just trigger scary but in this case spurious "timer expired" messages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: fix resync finished detectionLars Ellenberg
This fixes one recent regresion, and one long existing bug. The bug: drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be accounted in the resync extent corresponding to the start sector. Since we allow application requests to cross our "extent" boundaries, this assumption is no longer true, resulting in possible misaccounting, scary messages ("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."), and potentially, if the last bit to be cleared during resync would reside in previously misaccounted resync extent, the resync would never be recognized as finished, but would be "stalled" forever, even though all blocks are in sync again and all bits have been cleared... The regression was introduced by drbd: get rid of atomic update on disk bitmap works For an "empty" resync (rs_total == 0), we must not "finish" the resync on the SyncSource before the SyncTarget knows all relevant information (sync uuid). We need to wait for the full round-trip, the SyncTarget will then explicitly notify us. Also for normal, non-empty resyncs (rs_total > 0), the resync-finished condition needs to be tested before the schedule() in wait_for_work, or it is likely to be missed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: fix a race stopping the worker threadLars Ellenberg
We may implicitly call drbd_send() from inside wait_for_work(), via maybe_send_barrier(). If the "stop" signal was send just before that, drbd_send() would call flush_signals(), and we would run an unbounded schedule() afterwards. Fix: check for thread_state == RUNNING before we schedule() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: get rid of atomic update on disk bitmap worksLars Ellenberg
Just trigger the occasional lazy bitmap write-out during resync from the central wait_for_work() helper. Previously, during resync, bitmap pages would be written out separately, synchronously, one at a time, at least 8 times each (every 512 bytes worth of bitmap cleared). Now we trigger "merge friendly" bulk write out of all cleared pages every two seconds during resync, and once the resync is finished. Most pages will be written out only once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: allow write-ordering policy to be bumped up againLars Ellenberg
Previously, once you disabled flushes as a means of enforcing write-ordering, you'd need to detach/re-attach to enable them again. Allow drbdsetup disk-options to re-enable previously disabled write-ordering policy options at runtime. While at it fix RCU in drbd_bump_write_ordering() max_allowed_wo() uses rcu_dereference, therefore it must be called within rcu_read_lock()/rcu_read_unlock() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: refactor use of first_peer_device()Lars Ellenberg
Reduce the number of calls to first_peer_device(). Instead, call first_peer_device() just once to assign a local variable peer_device. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: reduce number of spinlock drop/re-aquire cyclesLars Ellenberg
Instead of dropping and re-aquiring the spinlock around the submit, just remember that we want to submit, and do that only once we have dropped the spinlock for good. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: rename drbd_free_bc() to drbd_free_ldev()Philipp Reisner
Since the member of drbd_device is called ldev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: device->ldev is not guaranteed on an D_ATTACHING diskPhilipp Reisner
Some parts of the code assumed that get_ldev_if_state(device, D_ATTACHING) is sufficient to access the ldev member of the device object. That was wrong. ldev may not be there or might be freed at any time if the device has a disk state of D_ATTACHING. bm_rw() Documented that drbd_bm_read() is only called from drbd_adm_attach. drbd_bm_write() is only called when a reference is held, and it is documented that a caller has to hold a reference before calling drbd_bm_write() drbd_bm_write_page() Use get_ldev() instead of get_ldev_if_state(device, D_ATTACHING) drbd_bmio_set_n_write() No longer use get_ldev_if_state(device, D_ATTACHING). All callers hold a reference to ldev now. drbd_bmio_clear_n_write() All callers where holding a reference of ldev anyways. Remove the misleading get_ldev_if_state(device, D_ATTACHING) drbd_reconsider_max_bio_size() Removed the get_ldev_if_state(device, D_ATTACHING). All callers now pass a struct drbd_backing_dev* when they have a proper reference, or a NULL pointer. Before this fix, the receiver could trigger a NULL pointer deref when in drbd_reconsider_max_bio_size() drbd_bump_write_ordering() Used get_ldev_if_state(device, D_ATTACHING) with the wrong assumption. Remove it, and allow the caller to pass in a struct drbd_backing_dev* when the caller knows that accessing this bdev is safe. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: Move write_ordering from connection to resourcePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: fix regression 'out of mem, failed to invoke fence-peer helper'Lars Ellenberg
Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-06-25drbd: fix NULL pointer deref in blk_add_request_payloadLars Ellenberg
Discards don't have any payload. But the scsi layer still expects a bio_vec it can use internally, see sd_setup_discard_cmnd() and blk_add_request_payload(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: use list_first_entry_or_null in first_peer_device/first_connectionLars Ellenberg
If there are no peer_devices or connections, I'd rather have NULL than some "arbitrary" address pretending to point to a struct. Helps to avoid hard to debug symptoms, in case we ever try to use and dereference a drbd_connection or drbd_peer_device where we in fact don't have any connection at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Allow attaching of a newly created device to any backing devicePhilipp Reisner
A newly created device was never exposed before, i.e. has a exposed_data_uuid of 0. Then it is valid to attach to any current_uuid of a backing device (of course also to a newly created one (4)) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Test cstate while holding req_lockPhilipp Reisner
In case a connection transitions into C_TIMEOUT within the timer function (request_timer_fn()) we need to make sure that the receiver thread (potentially running on a different CPU) sees the updated cstate later on. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: use blk_set_stacking_limits()Philipp Reisner
...instead directly assigning to q->limits.discard_zeroes_data Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: evaluate disk and network timeout on different requestsLars Ellenberg
Just because it is the oldest not yet completed request does not make it the oldest request waiting for disk. Or waiting for the peer. And we completely missed already completed requests that would still hold references to activity log extents, waiting only for the barrier ack. Find two oldest not yet completely processed requests, one that is still waiting for local completion, and one that is still waiting for some response from the peer. These may or may not be the same request object. Then separately apply the network and disk timeouts, respectively. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Fix a hole in the challange-response connection authenticationPhilipp Reisner
In the implementation as it was, the two peers sent each other a challenge, and expects the challenge hashed with the shared secret back. A attacker could simply wait for the challenge of the peer, and send the same challenge back. Then it waits for the response, and sends the same response back. Prevent this by not accepting a challenge from the peer that is the same as the challenge sent to the peer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: always implicitly close last epoch when idleLars Ellenberg
Once our sender thread needs to wait_for_work(), and actually needs to schedule(), just before we do that, we already check if it is useful to implicitly close the last epoch. The condition was too strict: only implicitly close the epoch, if there have been no new (write) requests at all. The assumption was that if there were new requests, they would always be communicated one way or another, and would send necessary epoch separating barriers explicitly. This is not always true, e.g. when becoming diskless, or while explicitly starting a full resync. The last communicated epoch could stay open for a long time, locking down corresponding activity log extents. It is safe to always implicitly send that last barrier, as soon as we determin that there cannot be more requests in the last communicated epoch, even if there have been (uncommunicated) new requests in new epochs meanwhile. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: add back some fairness to AL transactionsLars Ellenberg
When batching more updates to the activity log into single transactions, we lost the ability for new requests to force themselves into the active set: all preparation steps became non-blocking, and if all currently hot extents keep busy, they could starve out new incoming requests to cold extents for quite a while. This can only happen if your IO backend accepts more IO operations per average DRBD replication round trip time than you have al-extents configured. If we have incoming requests to cold extents, at least do one blocking update per transaction. In an artificial worst-case workload on SSD with an asynchronous 600 ms replication link, with al-extents = 7 (the minimum we allow), and concurrent full resynch, without this patch, some write requests have been observed to be starved for 40 seconds. With this patch, application observed a worst case latency of twice the replication round trip time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: keep max-bio size during detach/attach on disconnected primaryLars Ellenberg
We want to store in persistent meta data what the peer DRBD can handle, which, due to spreading requests to multiple bios, may be more than its backing device can handle. Otherwise, if a disconnected Primary temporarily loses access to its local data as well, we may accidentally shrink the max-bio setting, portentially causing already assembled, but not yet processed, application bios to be spuriously failed due to device limits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: fix a race between start_resync and send_and_submitLars Ellenberg
In the drbd make request function, specifically in drbd_send_and_submit(), we decide whether we want to send the actual write request, or only a "set this block out of sync" information. We do so based on the current connection state, while holding the req_lock. The connection state is not supposed to change while holding the req_lock. But in drbd_start_resync, we did change that state anyways, while only holding the global_state_lock, which is enough to change sync-after dependencies (paused vs active resync), but not good enough to change the connection state. Fix: in drbd_start_resync, first grab the req_lock to serialize with drbd_send_and_submit(), before grabbing the global_state_lock to be able to evaluate the sync-after dependencies. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Enable QUEUE_FLAG_DISCARD only if the peer can recieve P_TRIMLars Ellenberg
Allow the user of REQ_DISCARD. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: prepare sending side for REQ_DISCARDLars Ellenberg
Note that I do NOT call __drbd_chk_io_error for failed REQ_DISCARD. That may be wrong, though, or needs to differ between EOPNOTSUPP and other errors... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: prepare receiving side for REQ_DISCARDLars Ellenberg
If the receiver needs to serve a discard request on a queue that does not announce to be discard cabable, it falls back to do synchronous blkdev_issue_zeroout(). We expect only "reasonably" large (up to one activity log extent?) discard requests. We do this to not to not block the receiver for too long in this fallback code path, and to not set/clear too many bits inside one spinlock_irq_save() in drbd_set_in_sync/drbd_set_out_of_sync, Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: allow parallel promote/demote actionsLars Ellenberg
We plan to use genl_family->parallel_ops = true in the future, but need to review all possible interactions first. For now, only selectively drop genl_lock() in drbd_set_role(), instead serializing on our own internal resource->conf_update mutex. We now can be promoted/demoted on many resources in parallel, which may significantly improve cluster failover times when fencing is required. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: perpare for genetlink parallel_opsLars Ellenberg
Because all administrative requests via genetlink have been globally serialized via genl_lock(), we used to have one static struct drbd_config_context "admin context". Move this on-stack to the respective callback functions. This will allow us to selectively drop the genl_lock() (or use genl_family->parallel_ops) in the future. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Do not BUG() when connection breaks in a special wayPhilipp Reisner
When a 'cluster wide' disconnect executes, the result comes back from the peer, and immediately after that the connection breaks then _conn_rq_cond() reported back SS_CW_SUCCESS. Therefore _conn_request_state() calls conn_set_state(), which has a BUG() in it. The BUG() is hit because conn_is_valid_transition() does not like the transaction. Which goes back to is_valid_soft_transition() returning SS_OUTDATE_WO_CONN. This fix is to consider an error reported by is_valid_soft_transition() even when the peer agreed to the transaction. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: don't let application IO pre-empt resync too oftenLars Ellenberg
Before, application IO could pre-empt resync activity for up to hardcoded 20 seconds per resync request. A very busy server could throttle the effective resync bandwidth down to one request per 20 seconds. Now, we only let application IO pre-empt resync traffic while the current resync rate estimate is above c-min-rate. If you disable the c-min-rate throttle feature (set c-min-rate = 0), application IO will no longer pre-empt resync traffic at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: fix potential distributed deadlock during verify or resyncLars Ellenberg
If max-buffers and socket buffer sizes are "too small" for the chosen resync rate, this could lead potentially lead to a distributed deadlock, which may or may not resolve itself via the "ko-count" and request timeout mechanism, or could be resolved by forced disconnect. One option to deal with this is proper configuration: use larger max-buffer and socket buffers settings, or reduce the resync rate. But even with bad configuration we should not deadlock, but "gracefully" recover. The issue is avoided by using only up to max-buffers/2 for resync requests, and by using max-buffers not as a hard limit for data buffer allocations, but as a throttle threshold only. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: resync: fix too large bursts for very slow ratesLars Ellenberg
While merging adjacent dirty blocks into resync requests, the resync rate throttle was disregarded. For very low resync rates, the effective rate may have exceeded the intended rate by a larger margin. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: fix stalled resync detection in /proc/drbdLars Ellenberg
If we don't make resync or verify progress for "too long", we want to flag it as "stalled". Since 2010, "use rolling marks for resync speed calculation" this "too long" was wrong by a factor of HZ. With HZ 250, it would have been flagged as stalled after 100 minutes. Hardcode 3 minutes instead. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Allow online layout change of AL while peer is not connectedPhilipp Reisner
If a user forces the operation he takes the blame in case the peer does not have enough space. No reason to dey this... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Remove drbd_wrappers.hPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Leave IO suspended if the fence handler find the peer primaryPhilipp Reisner
Actually we are clearing the susp_fen flag if we are not going to call a fencing handler. For setting the susp_fen flag needs to be edge-triggerd, and not level triggered. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Break a deadlock while concurrent fencing and establishing a connectionPhilipp Reisner
When we need to outdate the peer while being promoted to primary, and the connection gets established at the same time, we deadlock in drbd_try_outdate_peer() when trying to clear the susp_fen bit. Fix this by setting the STATE_SENT bit while holding the mutex. Using drbd_change_state(.. , CS_HARD, ..) which does not block until STATE_SENT is cleared, is only for clearness. It does not contribute anything to the fix. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
2014-04-01drbd: don't open-code kernel_recvmsg()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-21drbd: Fix future possible NULL pointer dereferenceAndreas Gruenbacher
Right now every resource has exactly one connection. But we are preparing for dynamic connections. I.e. in the future thre can be resources without connections. However smatch points this out as 'variable dereferenced before check', which is correct. This issue was introduced in drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-17drbd: Add drbd_thread->resource and make drbd_thread->connection optionalAndreas Gruenbacher
In the drbd_thread "infrastructure" functions, only use the resource instead of the connection. Make the connection field of drbd_thread optional. This will allow to introduce threads which are not associated with a connection. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Use the right peer deviceAndreas Gruenbacher
in w_e_ (peer request) callbacks and in peer request I/O completion handlers Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Remove unused parameter of wire_flags_to_bio()Andreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>