summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-10-26btrfs: defrag: remove the old infrastructureQu Wenruo
Now the old infrastructure can all be removed, defrag Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()Qu Wenruo
The function defrag_one_cluster() is able to defrag one range well enough, we only need to do preparation for it, including: - Clamp and align the defrag range - Exclude invalid cases - Proper inode locking The old infrastructures will not be removed in this patch, as it would be too noisy to review. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag one clusterQu Wenruo
This new helper, defrag_one_cluster(), will defrag one cluster (at most 256K): - Collect all initial targets - Kick in readahead when possible - Call defrag_one_range() on each initial target With some extra range clamping. - Update @sectors_defragged parameter This involves one behavior change, the defragged sectors accounting is no longer as accurate as old behavior, as the initial targets are not consistent. We can have new holes punched inside the initial target, and we will skip such holes later. But the defragged sectors accounting doesn't need to be that accurate anyway, thus I don't want to pass those extra accounting burden into defrag_one_range(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag a rangeQu Wenruo
A new helper, defrag_one_range(), is introduced to defrag one range. This function will mostly prepare the needed pages and extent status for defrag_one_locked_target(). As we can only have a consistent view of extent map with page and extent bits locked, we need to re-check the range passed in to get a real target list for defrag_one_locked_target(). Since defrag_collect_targets() will call defrag_lookup_extent() and lock extent range, we also need to teach those two functions to skip extent lock. Thus new parameter, @locked, is introduced to skip extent lock if the caller has already locked the range. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to defrag a contiguous prepared rangeQu Wenruo
A new helper, defrag_one_locked_target(), introduced to do the real part of defrag. The caller needs to ensure both page and extents bits are locked, and no ordered extent exists for the range, and all writeback is finished. The core defrag part is pretty straight-forward: - Reserve space - Set extent bits to defrag - Update involved pages to be dirty Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: introduce helper to collect target file extentsQu Wenruo
Introduce a helper, defrag_collect_targets(), to collect all possible targets to be defragged. This function will not consider things like max_sectors_to_defrag, thus caller should be responsible to ensure we don't exceed the limit. This function will be the first stage of later defrag rework. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: factor out page preparation into a helperQu Wenruo
In cluster_pages_for_defrag(), we have complex code block inside one for() loop. The code block is to prepare one page for defrag, this will ensure: - The page is locked and set up properly. - No ordered extent exists in the page range. - The page is uptodate. This behavior is pretty common and will be reused by later defrag rework. So factor out the code into its own helper, defrag_prepare_one_page(), for later usage, and cleanup the code by a little. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: replace hard coded PAGE_SIZE with sectorsizeQu Wenruo
When testing subpage defrag support, I always find some strange inode nbytes error, after a lot of debugging, it turns out that defrag_lookup_extent() is using PAGE_SIZE as size for lookup_extent_mapping(). Since lookup_extent_mapping() is calling __lookup_extent_mapping() with @strict == 1, this means any extent map smaller than one page will be ignored, prevent subpage defrag to grab a correct extent map. There are quite some PAGE_SIZE usage in ioctl.c, but most of them are correct usages, and can be one of the following cases: - ioctl structure size check We want ioctl structure to be contained inside one page. - real page operations The remaining cases in defrag_lookup_extent() and check_defrag_in_cache() will be addressed in this patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: also check PagePrivate for subpage cases in ↵Qu Wenruo
cluster_pages_for_defrag() In function cluster_pages_for_defrag() we have a window where we unlock page, either start the ordered range or read the content from disk. When we re-lock the page, we need to make sure it still has the correct page->private for subpage. Thus add the extra PagePrivate check here to handle subpage cases properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: defrag: pass file_ra_state instead of file to btrfs_defrag_file()Qu Wenruo
Currently btrfs_defrag_file() accepts both "struct inode" and "struct file" as parameter. We can easily grab "struct inode" from "struct file" using file_inode() helper. The reason why we need "struct file" is just to re-use its f_ra. Change this to pass "struct file_ra_state" parameter, so that it's more clear what we really want. Since we're here, also add some comments on the function btrfs_defrag_file(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: rename and switch to bool btrfs_chunk_readonlyAnand Jain
btrfs_chunk_readonly() checks if the given chunk is writeable. It returns 1 for readonly, and 0 for writeable. So the return argument type bool shall suffice instead of the current type int. Also, rename btrfs_chunk_readonly() to btrfs_chunk_writeable() as we check if the bg is writeable, and helps to keep the logic at the parent function simpler to understand. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: reflink: initialize return value to 0 in btrfs_extent_same()Sidong Yang
Fix a warning reported by smatch that ret could be returned without initialized. The dedupe operations are supposed to to return 0 for a 0 length range but the caller does not pass olen == 0. To keep this behaviour and also fix the warning initialize ret to 0. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: pack all subpage bitmaps into a larger bitmapQu Wenruo
Currently we use u16 bitmap to make 4k sectorsize work for 64K page size. But this u16 bitmap is not large enough to contain larger page size like 128K, nor is space efficient for 16K page size. To handle both cases, here we pack all subpage bitmaps into a larger bitmap, now btrfs_subpage::bitmaps[] will be the ultimate bitmap for subpage usage. Each sub-bitmap will has its start bit number recorded in btrfs_subpage_info::*_start, and its bitmap length will be recorded in btrfs_subpage_info::bitmap_nr_bits. All subpage bitmap operations will be converted from using direct u16 operations to bitmap operations, with above *_start calculated. For 64K page size with 4K sectorsize, this should not cause much difference. While for 16K page size, we will only need 1 unsigned long (u32) to store all the bitmaps, which saves quite some space. Furthermore, this allows us to support larger page size like 128K and 258K. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26fs: remove leftover comments from mandatory locking removalJeff Layton
Stragglers from commit f7e33bdbd6d1 ("fs: remove mandatory file locking support"). Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-10-25fscrypt: improve a few commentsEric Biggers
Improve a few comments. These were extracted from the patch "fscrypt: add support for hardware-wrapped keys" (https://lore.kernel.org/r/20211021181608.54127-4-ebiggers@kernel.org). Link: https://lore.kernel.org/r/20211026021042.6581-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-10-25btrfs: subpage: introduce btrfs_subpage_bitmap_infoQu Wenruo
Currently we use fixed size u16 bitmap for subpage bitmap. This is fine for 4K sectorsize with 64K page size. But for 4K sectorsize and larger page size, the bitmap is too small, while for smaller page size like 16K, u16 bitmaps waste too much space. Here we introduce a new helper structure, btrfs_subpage_bitmap_info, to record the proper bitmap size, and where each bitmap should start at. By this, we can later compact all subpage bitmaps into one u32 bitmap. This patch is the first step. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: subpage: make btrfs_alloc_subpage() return btrfs_subpage directlyQu Wenruo
The existing calling convention of btrfs_alloc_subpage() is pretty awful. Change it to a more common pattern by returning struct btrfs_subpage directly and let the caller to determine if the call succeeded. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: subpage: only call btrfs_alloc_subpage() when sectorsize is smaller ↵Qu Wenruo
than PAGE_SIZE There are two call sites of btrfs_alloc_subpage(): - btrfs_attach_subpage() We have ensured sectorsize is smaller than PAGE_SIZE - alloc_extent_buffer() We call btrfs_alloc_subpage() unconditionally. The alloc_extent_buffer() forces us to check the sectorsize size against page size inside btrfs_alloc_subpage(). Since the function name, btrfs_alloc_subpage(), already indicates it should only get called for subpage cases, do the check in alloc_extent_buffer() and add an ASSERT() in btrfs_alloc_subpage(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: update comment for fs_devices::seed_list in btrfs_rm_deviceSu Yue
Update it since commit 944d3f9fac61 ("btrfs: switch seed device to list api") did conversion from fs_devices::seed to fs_devices::seed_list. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: drop unnecessary ret in ioctl_quota_rescan_statusAnand Jain
There is no need for the variable ret after d66105cfa873 ("btrfs: allocate btrfs_ioctl_quota_rescan_args on stack"), remove it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: send: simplify send_create_inode_if_neededMarcos Paulo de Souza
The out label is being overused, we can simply return if the condition permits. No functional changes. Reviewed-by: Su Yue <l@damenly.su> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25btrfs: rename btrfs_alloc_chunk to btrfs_create_chunkNikolay Borisov
The user facing function used to allocate new chunks is btrfs_chunk_alloc, unfortunately there is yet another similar sounding function - btrfs_alloc_chunk. This creates confusion, especially since the latter function can be considered "private" in the sense that it implements the first stage of chunk creation and as such is called by btrfs_chunk_alloc. To avoid the awkwardness that comes with having similarly named but distinctly different in their purpose function rename btrfs_alloc_chunk to btrfs_create_chunk, given that the main purpose of this function is to orchestrate the whole process of allocating a chunk - reserving space into devices, deciding on characteristics of the stripe size and creating the in-memory structures. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25fs: get rid of the res2 iocb->ki_complete argumentJens Axboe
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clusterise ki_flags access in rw_prepPavel Begunkov
ioprio setup doesn't depend on other fields that are modified in io_prep_rw() and we can move it down in the function without worrying about performance. It's useful as it makes iocb->ki_flags accesses/modifications closer together, so it's more likely the compiler will cache it in a register and avoid extra reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: kill unused param from io_file_supports_nowaitPavel Begunkov
io_file_supports_nowait() doesn't use rw argument anymore, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean up timeout async_data allocationPavel Begunkov
opcode prep functions are one of the first things that are called, we can't have ->async_data allocated at this point and it's certainly a bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE just in case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: don't try io-wq polling if not supportedPavel Begunkov
If an opcode doesn't support polling, just let it be executed synchronously in iowq, otherwise it will do a nonblock attempt just to fail in io_arm_poll_handler() and return back to blocking execution. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: check if opcode needs poll first on armingPavel Begunkov
->pollout or ->pollin are set only for opcodes that need a file, so if io_arm_poll_handler() tests them first we can be sure that the request has file set and the ->file check can be removed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean iowq submit work cancellationPavel Begunkov
If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error on the same lines as the check instead of having a weird code flow. The main loop doesn't change but goes one indention left. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean io_wq_submit_work()'s main loopPavel Begunkov
Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid of switch, just replace it with a single if as we're retrying in both other cases. Kill issue_sqe label, Get rid of needs_poll nesting and disambiguate a bit the comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25gfs2: Fix unused value warning in do_gfs2_set_flags()Tim Gardner
Coverity complains of an unused value: CID 119623 (#1 of 1): Unused value (UNUSED_VALUE) assigned_value: Assigning value -1 to error here, but that stored value is overwritten before it can be used. 237 error = -EPERM; Fix it by removing the assignment. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: check context in gfs2_glock_putAlexander Aring
Add a might_sleep call into gfs2_glock_put which can sleep in DLM when the last reference is released. This will show problems earlier, and not only when the last reference is put. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Fix glock_hash_walk bugsAndreas Gruenbacher
So far, glock_hash_walk took a reference on each glock it iterated over, and it was the examiner's responsibility to drop those references. Dropping the final reference to a glock can sleep and the examiners are called in a RCU critical section with spin locks held, so examiners that didn't need the extra reference had to drop it asynchronously via gfs2_glock_queue_put or similar. This wasn't done correctly in thaw_glock which did call gfs2_glock_put, and not at all in dump_glock_func. Change glock_hash_walk to not take glock references at all. That way, the examiners that don't need them won't have to bother with slow asynchronous puts, and the examiners that do need references can take them themselves. Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Cancel remote delete work asynchronouslyAndreas Gruenbacher
In gfs2_inode_lookup and gfs2_create_inode, we're calling gfs2_cancel_delete_work which currently cancels any remote delete work (delete_work_func) synchronously. This means that if the work is currently running, it will wait for it to finish. We're doing this to pevent a previous instance of an inode from having any influence on the next instance. However, delete_work_func uses gfs2_inode_lookup internally, and we can end up in a deadlock when delete_work_func gets interrupted at the wrong time. For example, (1) An inode's iopen glock has delete work queued, but the inode itself has been evicted from the inode cache. (2) The delete work is preempted before reaching gfs2_inode_lookup. (3) Another process recreates the inode (gfs2_create_inode). It tries to cancel any outstanding delete work, which blocks waiting for the ongoing delete work to finish. (4) The delete work calls gfs2_inode_lookup, which blocks waiting for gfs2_create_inode to instantiate and unlock the new inode => deadlock. It turns out that when the delete work notices that its inode has been re-instantiated, it will do nothing. This means that it's safe to cancel the delete work asynchronously. This prevents the kind of deadlock described above. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-10-25gfs2: set glock object after nqBob Peterson
Before this patch, function gfs2_create_inode called glock_set_object to set the gl_object for inode and iopen glocks before the glock was locked. That's wrong because other competing processes like evict may be blocked waiting for the glock and still have gl_object set before the actual eviction can take place. This patch moves the call to glock_set_object until after the glock is acquire in function gfs2_create_inode, so it waits for possibly competing evicts to finish their processing first. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: remove RDF_UPTODATE flagBob Peterson
The new GLF_INSTANTIATE_NEEDED flag obsoletes the old rgrp flag GFS2_RDF_UPTODATE, so this patch replaces it like we did with inodes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Eliminate GIF_INVALID flagBob Peterson
With the addition of the new GLF_INSTANTIATE_NEEDED flag, the GIF_INVALID flag is now redundant. This patch removes it. Since inode_instantiate is only called when instantiation is needed, the check in inode_instantiate is removed too. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: fix GL_SKIP node_scope problemsBob Peterson
Before this patch, when a glock was locked, the very first holder on the queue would unlock the lockref and call the go_instantiate glops function (if one existed), unless GL_SKIP was specified. When we introduced the new node-scope concept, we allowed multiple holders to lock glocks in EX mode and share the lock. But node-scope introduced a new problem: if the first holder has GL_SKIP and the next one does NOT, since it is not the first holder on the queue, the go_instantiate op was not called. Eventually the GL_SKIP holder may call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was still a window of time in which another non-GL_SKIP holder assumes the instantiate function had been called by the first holder. In the case of rgrp glocks, this led to a NULL pointer dereference on the buffer_heads. This patch tries to fix the problem by introducing two new glock flags: GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function needs to be called to "fill in" or "read in" the object before it is referenced. GLF_INSTANTIATE_IN_PROG which is used to determine when a process is in the process of reading in the object. Whenever a function needs to reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate" function. As before, the gl_lockref spin_lock is unlocked during the IO operation, which may take a relatively long amount of time to complete. While unlocked, if another process determines go_instantiate is still needed, it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared, it needs to check GLF_INSTANTIATE_NEEDED again because the other process's go_instantiate operation may not have been successful. Functions that previously called the instantiate sub-functions now call directly into gfs2_instantiate so the new bits are managed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: split glock instantiation off from do_promoteBob Peterson
Before this patch, function do_promote had a section of code that did the actual instantiation. This patch splits that off into its own function, gfs2_instantiate, which prepares us for the next patch that will use that function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: further simplify do_promoteBob Peterson
This patch further simplifies function do_promote by eliminating some redundant code in favor of using a lock_released flag. This is just prep work for a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: re-factor function do_promoteBob Peterson
This patch simply re-factors function do_promote to reduce the indents. The logic should be unchanged. This makes future patches more readable. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Remove 'first' trace_gfs2_promote argumentAndreas Gruenbacher
Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the 'first' holder isn't the one that instantiates the glock (gl_instantiate), which is what the 'first' flag was apparently supposed to indicate. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: change go_lock to go_instantiateBob Peterson
Before this patch, the go_lock glock operations (glops) did not do any actual locking. They were used to instantiate objects, like reading in dinodes and rgrps from the media. This patch renames the functions to go_instantiate for clarity. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: dump glocks from gfs2_consist_OBJ_iBob Peterson
Before this patch, failed consistency checks printed out the object that failed, but not the object's glock. This patch makes it also print out the object glock so we can see the glock's holders and flags to aid with debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: dequeue iopen holder in gfs2_inode_lookup errorBob Peterson
Before this patch, if function gfs2_inode_lookup encountered an error after it had locked the iopen glock, it never unlocked it, relying on the evict code to do the cleanup. The evict code then took the inode glock while holding the iopen glock, which violates the locking order. For example, (1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock locked. (2) node B calls delete_work_func -> gfs2_lookup_by_inum -> gfs2_inode_lookup. It locks the inode glock and blocks trying to lock the iopen glock, which is held by node A. (3) node A eventually calls gfs2_evict_inode -> evict_should_delete. It blocks trying to lock the inode glock, which is now held by node B. This patch introduces error handling to function gfs2_inode_lookup so it properly dequeues held iopen glocks on errors. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Save ip from gfs2_glock_nq_initAndreas Gruenbacher
Before this patch, when a glock was locked by function gfs2_glock_nq_init, it initialized the holder gh_ip (return address) as gfs2_glock_nq_init. That made it extremely difficult to track down problems because many functions call gfs2_glock_nq_init. This patch changes the function so that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes it easy to backtrack which holder took the lock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-10-25gfs2: Allow append and immutable bits to coexistBob Peterson
Before this patch, function do_gfs2_set_flags checked if the append and immutable flags were being set while already set. If so, error -EPERM was given. There's no reason why these two flags should be mutually exclusive, and if you set them separately, you will, in essence, set one while it is already set. For example: chattr +a /mnt/gfs2/file1 chattr +i /mnt/gfs2/file1 The first command sets the append-only flag. Since they are additive, the second command sets the immutable flag AND append-only flag, since they both coexist in i_diskflags. So the second command should not return an error. This bug caused xfstests generic/545 to fail. This patch simply removes the invalid checks. I also eliminated an unused parm from do_gfs2_set_flags. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debugBob Peterson
In rgrp.c, there are several places where it does BUG_ON. This tells us the call stack but nothing more, which is not very helpful. This patch switches them to GLOCK_BUG_ON which also prints the glock, its holders, and many of the rgrp values, which will help us debug problems in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: move GL_SKIP check from glops to do_promoteBob Peterson
Before this patch, each individual "go_lock" glock operation (glop) checked the GL_SKIP flag, and if set, would skip further processing. This patch changes the logic so the go_lock caller, function go_promote, checks the GL_SKIP flag before calling the go_lock op in the first place. This avoids having to unnecessarily unlock gl_lockref.lock only to re-lock it again. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25gfs2: Add GL_SKIP holder flag to dump_holderBob Peterson
Somehow, the GL_SKIP flag was missed when dumping glock holders. This patch adds it to function hflags2str. I added it at the end because I wanted Holder and Skip flags together to read "Hs" rather than "sH" to avoid confusion with "Shared" ("SH") holder state. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>