summaryrefslogtreecommitdiff
path: root/fs/gfs2/glock.c
AgeCommit message (Collapse)Author
2021-10-25gfs2: Remove 'first' trace_gfs2_promote argumentAndreas Gruenbacher
Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the 'first' holder isn't the one that instantiates the glock (gl_instantiate), which is what the 'first' flag was apparently supposed to indicate. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: change go_lock to go_instantiateBob Peterson
Before this patch, the go_lock glock operations (glops) did not do any actual locking. They were used to instantiate objects, like reading in dinodes and rgrps from the media. This patch renames the functions to go_instantiate for clarity. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Save ip from gfs2_glock_nq_initAndreas Gruenbacher
Before this patch, when a glock was locked by function gfs2_glock_nq_init, it initialized the holder gh_ip (return address) as gfs2_glock_nq_init. That made it extremely difficult to track down problems because many functions call gfs2_glock_nq_init. This patch changes the function so that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes it easy to backtrack which holder took the lock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-10-25gfs2: move GL_SKIP check from glops to do_promoteBob Peterson
Before this patch, each individual "go_lock" glock operation (glop) checked the GL_SKIP flag, and if set, would skip further processing. This patch changes the logic so the go_lock caller, function go_promote, checks the GL_SKIP flag before calling the go_lock op in the first place. This avoids having to unnecessarily unlock gl_lockref.lock only to re-lock it again. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Add GL_SKIP holder flag to dump_holderBob Peterson
Somehow, the GL_SKIP flag was missed when dumping glock holders. This patch adds it to function hflags2str. I added it at the end because I wanted Holder and Skip flags together to read "Hs" rather than "sH" to avoid confusion with "Shared" ("SH") holder state. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20gfs2: Introduce flag for glock holder auto-demotionBob Peterson
This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that will allow glocks to be demoted automatically on locking conflicts. When a locking request comes in that isn't compatible with the locking state of an active holder and that holder has the HIF_MAY_DEMOTE flag set, the holder will be demoted before the incoming locking request is granted. Note that this mechanism demotes active holders (with the HIF_HOLDER flag set), while before we were only demoting glocks without any active holders. This allows processes to keep hold of locks that may form a cyclic locking dependency; the core glock logic will then break those dependencies in case a conflicting locking request occurs. We'll use this to avoid giving up the inode glock proactively before faulting in pages. Processes that allow a glock holder to be taken away indicate this by calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag. Later, they call gfs2_holder_disallow_demote() to clear the flag again, and then they check if their holder is still queued: if it is, they are still holding the glock; if it isn't, they can re-acquire the glock (or abort). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20gfs2: Clean up function may_grantAndreas Gruenbacher
Pass the first current glock holder into function may_grant and deobfuscate the logic there. While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make that build cleanly, de-constify the may_grant arguments. We're now using function find_first_holder in do_promote, so move the function's definition above do_promote. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-08-20gfs2: Remove redundant check from gfs2_glock_dqBob Peterson
In function gfs2_glock_dq, it checks to see if this is the fast path. Before this patch, it checked both "find_first_holder(gl) == NULL" and list_empty(&gl->gl_holders), which is redundant. If gl_holders is empty then find_first_holder must return NULL. This patch removes the redundancy. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20gfs2: Eliminate vestigial HIF_FIRSTBob Peterson
Holder flag HIF_FIRST is no longer used or needed, so remove it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-06-28gfs2: Use list_move_tail instead of list_del/list_add_tailBaokun Li
Using list_move_tail() instead of list_del() + list_add_tail(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-31gfs2: Fix use-after-free in gfs2_glock_shrink_scanHillf Danton
The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to remove the glock from the lru list in __gfs2_glock_put(). On the shrink scan path, the same flag is cleared under lru_lock but because of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the put side can be made without deleting the glock from the lru list. Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to ensure correct behavior on both sides - clear GLF_LRU after list_del under lru_lock. Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-20gfs2: fix a deadlock on withdraw-during-mountBob Peterson
Before this patch, gfs2 would deadlock because of the following sequence during mount: mount gfs2_fill_super gfs2_make_fs_rw <--- Detects IO error with glock kthread_stop(sdp->sd_quotad_process); <--- Blocked waiting for quotad to finish logd Detects IO error and the need to withdraw calls gfs2_withdraw gfs2_make_fs_ro kthread_stop(sdp->sd_quotad_process); <--- Blocked waiting for quotad to finish gfs2_quotad gfs2_statfs_sync gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted glock_work_func do_xmote <---Detects IO error, can't release glock: blocked on withdraw glops->go_inval glock_blocked_by_withdraw requeue glock work & exit <--- work requeued, blocked by withdraw This patch makes a special exception for the statfs system inode glock, which allows the statfs glock UNLOCK to proceed normally. That allows the quotad daemon to exit during the withdraw, which allows the logd daemon to exit during the withdraw, which allows the mount to exit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-20gfs2: fix scheduling while atomic bug in glocksBob Peterson
Before this patch, in the unlikely event that gfs2_glock_dq encountered a withdraw, it would do a wait_on_bit to wait for its journal to be recovered, but it never released the glock's spin_lock, which caused a scheduling-while-atomic error. This patch unlocks the lockref spin_lock before waiting for recovery. Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-05mm: introduce and use mapping_empty()Matthew Wilcox (Oracle)
Patch series "Remove nrexceptional tracking", v2. We actually use nrexceptional for very little these days. It's a minor pain to keep in sync with nrpages, but the pain becomes much bigger with the THP patches because we don't know how many indices a shadow entry occupies. It's easier to just remove it than keep it accurate. Also, we save 8 bytes per inode which is nothing to sneeze at; on my laptop, it would improve shmem_inode_cache from 22 to 23 objects per 16kB, and inode_cache from 26 to 27 objects. Combined, that saves a megabyte of memory from a combined usage of 25MB for both caches. Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save any memory for ext4. This patch (of 4): Instead of checking the two counters (nrpages and nrexceptional), we can just check whether i_pages is empty. Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-29Merge tag 'gfs2-for-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix some compiler and kernel-doc warnings - Various minor cleanups and optimizations - Add a new sysfs gfs2 status file with some filesystem wide information * tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix fall-through warnings for Clang gfs2: Fix a number of kernel-doc warnings gfs2: Make gfs2_setattr_simple static gfs2: Add new sysfs file for gfs2 status gfs2: Silence possible null pointer dereference warning gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent gfs2: Add new gfs2_iomap_get helper gfs2: Remove unused variable sb_format gfs2: Fix dir.c function parameter descriptions gfs2: Eliminate gh parameter from go_xmote_bh func gfs2: don't create empty buffers for NO_CREATE
2021-04-09gfs2: Fix a number of kernel-doc warningsLee Jones
Building the kernel with W=1 results in a number of kernel-doc warnings like incorrect function names and parameter descriptions. Fix those, mostly by adding missing parameter descriptions, removing left-over descriptions, and demoting some less important kernel-doc comments into regular comments. Originally proposed by Lee Jones; improved and combined into a single patch by Andreas. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-08treewide: Change list_sort to use const pointersSami Tolvanen
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-03gfs2: Eliminate gh parameter from go_xmote_bh funcBob Peterson
The only glock that uses go_xmote_bh glops function is the freeze glock which uses freeze_go_xmote_bh. It does not use its gh parameter, so this patch eliminates the unneeded parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-17gfs2: Allow node-wide exclusive glock sharingBob Peterson
Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag set, the exclusive lock is shared among all local processes who are holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set. From the point of view of other nodes, the lock is still held exclusively. A future patch will start using this flag to improve performance with rgrp sharing. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01Revert "GFS2: Prevent delete work from occurring on glocks used for create"Andreas Gruenbacher
Since commit a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"), we're cancelling any pending delete work of an iopen glock before attaching a new inode to that glock in gfs2_create_inode. This means that delete_work_func can no longer be queued or running when attaching the iopen glock to the new inode, and we can revert commit a4923865ea07 ("GFS2: Prevent delete work from occurring on glocks used for create"), which tried to achieve the same but in a racy way. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-24gfs2: set lockdep subclass for iopen glocksAlexander Aring
This patch introduce a new globs attribute to define the subclass of the glock lockref spinlock. This avoid the following lockdep warning, which occurs when we lock an inode lock while an iopen lock is held: ============================================ WARNING: possible recursive locking detected 5.10.0-rc3+ #4990 Not tainted -------------------------------------------- kworker/0:1/12 is trying to acquire lock: ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20 but task is already holding lock: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&gl->gl_lockref.lock); lock(&gl->gl_lockref.lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/12: #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 stack backtrace: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: delete_workqueue delete_work_func Call Trace: dump_stack+0x8b/0xb0 __lock_acquire.cold+0x19e/0x2e3 lock_acquire+0x150/0x410 ? lockref_get+0x9/0x20 _raw_spin_lock+0x27/0x40 ? lockref_get+0x9/0x20 lockref_get+0x9/0x20 delete_work_func+0x188/0x260 process_one_work+0x237/0x540 worker_thread+0x4d/0x3b0 ? process_one_work+0x540/0x540 kthread+0x127/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-03gfs2: Wake up when sd_glock_disposal becomes zeroAlexander Aring
Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a sd_glock_disposal accounting bug by adding a missing atomic_dec statement, but it failed to wake up sd_glock_wait when that decrement causes sd_glock_disposal to reach zero. As a consequence, gfs2_gl_hash_clear can now run into a 10-minute timeout instead of being woken up. Add the missing wakeup. Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race") Cc: stable@vger.kernel.org # v2.6.39+ Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Only access gl_delete for iopen glocksBob Peterson
Only initialize gl_delete for iopen glocks, but more importantly, only access it for iopen glocks in flush_delete_work: flush_delete_work is called for different types of glocks including rgrp glocks, and those use gl_vm which is in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm, which results in general memory corruption. Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Fix comments to glock_hash_walkBob Peterson
The comments before function glock_hash_walk had the wrong name and an extra parameter. This simply fixes the comments. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-15gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)Bob Peterson
Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated when a glock had a holder queued. It was only checked for inode glocks, although set and cleared by all glocks, and it was only used to determine whether the glock should be held for the minimum hold time before releasing. The problem is that the flag is not accurate at all. If a process holds the glock, the flag is set. When they dequeue the glock, it only cleared the flag in cases when the state actually changed. So if the state doesn't change, the flag may still be set, even when nothing is queued. This happens to iopen glocks often: the get held in SH, then the file is closed, but the glock remains in SH mode. We don't need a special flag to indicate this: we can simply tell whether the glock has any items queued to the holders queue. It's a waste of cpu time to maintain it. This patch eliminates the flag in favor of simply checking list_empty on the glock holders. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14gfs2: call truncate_inode_pages_final for address space glocksBob Peterson
Before this patch, we were not calling truncate_inode_pages_final for the address space for glocks, which left the possibility of a leak. We now take care of the problem instead of complaining, and we do it during glock tear-down.. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macroLiu Shixin
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03gfs2: Fix refcount leak in gfs2_glock_pokeAndreas Gruenbacher
In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local glock holder. Without that, we're leaking a glock and a pid reference. Fixes: 9e8990dea926 ("gfs2: Smarter iopen glock waiting") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03gfs2: Add some flags missing from glock outputBob Peterson
Before this patch, three flags were not represented in the glock output. This patch adds them in: c - GLF_INODE_CREATING P - GLF_PENDING_DELETE x - GLF_FREEING (both f and F are already used) Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30gfs2: Don't sleep during glock hash walkAndreas Gruenbacher
In flush_delete_work, instead of flushing each individual pending delayed work item, cancel and re-queue them for immediate execution. The waiting isn't needed here because we're already waiting for all queued work items to complete in gfs2_flush_delete_work. This makes the code more efficient, but more importantly, it avoids sleeping during a rhashtable walk, inside rcu_read_lock(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05Merge branch 'gfs2-iopen' into for-nextAndreas Gruenbacher
2020-06-05gfs2: Smarter iopen glock waitingAndreas Gruenbacher
When trying to upgrade the iopen glock from a shared to an exclusive lock in gfs2_evict_inode, abort the wait if there is contention on the corresponding inode glock: in that case, the inode must still be in active use on another node, and we're not guaranteed to get the iopen glock anytime soon. To make this work even better, when we notice contention on the iopen glock and we can't evict the corresponsing inode and release the iopen glock immediately, poke the inode glock. The other node(s) trying to acquire the lock can then abort instead of timing out. Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous version of this patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Wake up when setting GLF_DEMOTEAndreas Gruenbacher
Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE flag. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Check inode generation number in delete_work_funcAndreas Gruenbacher
In delete_work_func, if the iopen glock still has an inode attached, limit the inode lookup to that specific generation number: in the likely case that the inode was deleted on the node on which the inode's link count dropped to zero, we can skip verifying the on-disk block type and reading in the inode. The same applies if another node that had the inode open managed to delete the inode before us. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Minor gfs2_lookup_by_inum cleanupAndreas Gruenbacher
Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode generation number will qualify: a valid inode never has a zero no_formal_ino. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Give up the iopen glock on contentionAndreas Gruenbacher
When there's contention on the iopen glock, it means that the link count of the corresponding inode has dropped to zero on a remote node which is now trying to delete the inode. In that case, try to evict the inode so that the iopen glock will be released, which will allow the remote node to do its job. When the inode is still open locally, the inode's reference count won't drop to zero and so we'll keep holding the inode and its iopen glock. The remote node will time out its request to grab the iopen glock, and when the inode is finally closed locally, we'll try to delete it ourself. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Turn gl_delete into a delayed workAndreas Gruenbacher
This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Keep track of deleted inode generations in LVBsAndreas Gruenbacher
When deleting an inode, keep track of the generation of the deleted inode in the inode glock Lock Value Block (LVB). When trying to delete an inode remotely, check the last-known inode generation against the deleted inode generation to skip duplicate remote deletes. This avoids taking the resource group glock in order to verify the block type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Allow ASPACE glocks to also have an lvbBob Peterson
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: introduce new gfs2_glock_assert_withdrawBob Peterson
Before this patch, asserts based on glocks did not print the glock with the error. This patch introduces a new macro, gfs2_glock_assert_withdraw which first prints the glock, then takes the assert. This also changes a few glock asserts to the new macro. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: print mapping->nrpages in glock dump for address space glocksBob Peterson
This patch makes the glock dumps in debugfs print the number of pages (nrpages) for address space glocks. This will aid in debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08Revert "gfs2: Don't demote a glock until its revokes are written"Bob Peterson
This reverts commit df5db5f9ee112e76b5202fbc331f990a0fc316d6. This patch fixes a regression: patch df5db5f9ee112 allowed function run_queue() to bypass its call to do_xmote() if revokes were queued for the glock. That's wrong because its call to do_xmote() is what is responsible for calling the go_sync() glops functions to sync both the ail list and any revokes queued for it. By bypassing the call, gfs2 could get into a stand-off where the glock could not be demoted until its revokes are written back, but the revokes would not be written back because do_xmote() was never called. It "sort of" works, however, because there are other mechanisms like the log flush daemon (logd) that can sync the ail items and revokes, if it deems it necessary. The problem is: without file system pressure, it might never deem it necessary. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-05-08gfs2: If go_sync returns error, withdraw but skip invalidateBob Peterson
Before this patch, if the go_sync operation returned an error during the do_xmote process (such as unable to sync metadata to the journal) the code did goto out. That kept the glock locked, so it could not be given away, which correctly avoids file system corruption. However, it never set the withdraw bit or requeueing the glock work. So it would hang forever, unable to ever demote the glock. This patch changes to goto to a new label, skip_inval, so that errors from go_sync are treated the same way as errors from go_inval: The delayed withdraw bit is set and the work is requeued. That way, the logd should eventually figure out there's a problem and withdraw properly there. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08gfs2: Fix error exit in do_xmoteBob Peterson
Before this patch, if an error was detected from glock function go_sync by function do_xmote, it would return. But the function had temporarily unlocked the gl_lockref spin_lock, and it never re-locked it. When the caller of do_xmote tried to unlock it again, it was already unlocked, which resulted in a corrupted spin_lock value. This patch makes sure the gl_lockref spin_lock is re-locked after it is unlocked. Thanks to Wu Bo <wubo40@huawei.com> for reporting this problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-03-27gfs2: Switch to list_{first,last}_entryAndreas Gruenbacher
Replace open-coded versions of list_first_entry and list_last_entry with those functions. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-27gfs2: Do proper error checking for go_sync family of glops functionsBob Peterson
Before this patch, function do_xmote would try to sync out the glock dirty data by calling the appropriate glops function XXX_go_sync() but it did not check for a good return code. If the sync was not possible due to an io error or whatever, do_xmote would continue on and call go_inval and release the glock to other cluster nodes. When those nodes go to replay the journal, they may already be holding glocks for the journal records that should have been synced, but were not due to the ignored error. This patch introduces proper error code checking to the go_sync family of glops functions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Don't demote a glock until its revokes are writtenBob Peterson
Before this patch, run_queue would demote glocks based on whether there are any more holders. But if the glock has pending revokes that haven't been written to the media, giving up the glock might end in file system corruption if the revokes never get written due to io errors, node crashes and fences, etc. In that case, another node will replay the metadata blocks associated with the glock, but because the revoke was never written, it could replay that block even though the glock had since been granted to another node who might have made changes. This patch changes the logic in run_queue so that it never demotes a glock until its count of pending revokes reaches zero. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Check for log write errors before telling dlm to unlockBob Peterson
Before this patch, function do_xmote just assumed all the writes submitted to the journal were finished and successful, and it called the go_unlock function to release the dlm lock. But if they're not, and a revoke failed to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and tell dlm to release the glock to another node. This patch adds a couple checks for errors in do_xmote after the calls to go_sync and go_inval. If an error is found, we cannot withdraw yet, because the withdraw itself uses glocks to make the file system read-only. Instead, we flag the error. Later, asserts should cause another node to replay the journal before continuing, thus protecting rgrp and dinode glocks and maintaining the integrity of the metadata. Note that we only need to do this for journaled glocks. System glocks should be able to progress even under withdrawn conditions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: fix infinite loop when checking ail item count before go_invalBob Peterson
Before this patch, the rgrp_go_inval and inode_go_inval functions each checked if there were any items left on the ail count (by way of a count), and if so, did a withdraw. But the withdraw code now uses glocks when changing the file system to read-only status. So we can not have glock functions withdrawing or a hang will likely result: The glocks can't be serviced by the work_func if the work_func is busy doing its own withdraw. This patch removes the checks from the go_inval functions and adds a centralized check in do_xmote to warn about the problem and not withdraw, but flag the error so it's eventually caught when the logd daemon eventually runs. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Force withdraw to replay journals and wait for it to finishBob Peterson
When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the file system, and none of the other nodes know about the problem. Later, when the problem is fixed and the withdrawn node is rebooted, it would then discover that its own journal was incomplete, and replay it. However, replaying it at this point is almost guaranteed to introduce corruption because the other nodes are likely to have used affected resource groups that appeared in the journal since the time of the withdraw. Replaying the journal later will overwrite any changes made, and not through any fault of dlm, which was instructed during the withdraw to release those resources. This patch makes file system withdraws seen by the entire cluster. Withdrawing nodes dequeue their journal glock to allow recovery. The remaining nodes check all the journals to see if they are clean or in need of replay. They try to replay dirty journals, but only the journals of withdrawn nodes will be "not busy" and therefore available for replay. Until the journal replay is complete, no i/o related glocks may be given out, to ensure that the replay does not cause the aforementioned corruption: We cannot allow any journal replay to overwrite blocks associated with a glock once it is held. The "live" glock which is now used to signal when a withdraw occurs. When a withdraw occurs, the node signals its withdraw by dequeueing the "live" glock and trying to enqueue it in EX mode, thus forcing the other nodes to all see a demote request, by way of a "1CB" (one callback) try lock. The "live" glock is not granted in EX; the callback is only just used to indicate a withdraw has occurred. Note that all nodes in the cluster must wait for the recovering node to finish replaying the withdrawing node's journal before continuing. To this end, it checks that the journals are clean multiple times in a retry loop. Also note that the withdraw function may be called from a wide variety of situations, and therefore, we need to take extra precautions to make sure pointers are valid before using them in many circumstances. We also need to take care when glocks decide to withdraw, since the withdraw code now uses glocks. Also, before this patch, if a process encountered an error and decided to withdraw, if another process was already withdrawing, the second withdraw would be silently ignored, which set it free to unlock its glocks. That's correct behavior if the original withdrawer encounters further errors down the road. But if secondary waiters don't wait for the journal replay, unlocking glocks will allow other nodes to use them, despite the fact that the journal containing those blocks is being replayed. The replay needs to finish before our glocks are released to other nodes. IOW, secondary withdraws need to wait for the first withdraw to finish. For example, if an rgrp glock is unlocked by a process that didn't wait for the first withdraw, a journal replay could introduce file system corruption by replaying a rgrp block that has already been granted to a different cluster node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>