summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)Author
2020-02-27gfs2: don't allow releasepage to free bd still used for revokesBob Peterson
Before this patch, function gfs2_releasepage would free any bd elements that had been used for the page being released. However, those bd elements may still be queued to the sd_log_revokes list, in which case we cannot free them until the revoke has been issued. This patch adds additional checks for bds that are still being used for revokes. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-27gfs2: flesh out delayed withdraw for gfs2_log_flushBob Peterson
Function gfs2_log_flush() had a few places where it tried to withdraw from the file system when errors were encountered. The problem is, it should delay those withdraws until the log flush lock is no longer held. This patch creates a new function just for delayed withdraws for situations like this. If errors=panic was specified on mount, we still want to do it the old fashioned way because the panic it does not help to delay in that situation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Do proper error checking for go_sync family of glops functionsBob Peterson
Before this patch, function do_xmote would try to sync out the glock dirty data by calling the appropriate glops function XXX_go_sync() but it did not check for a good return code. If the sync was not possible due to an io error or whatever, do_xmote would continue on and call go_inval and release the glock to other cluster nodes. When those nodes go to replay the journal, they may already be holding glocks for the journal records that should have been synced, but were not due to the ignored error. This patch introduces proper error code checking to the go_sync family of glops functions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Don't demote a glock until its revokes are writtenBob Peterson
Before this patch, run_queue would demote glocks based on whether there are any more holders. But if the glock has pending revokes that haven't been written to the media, giving up the glock might end in file system corruption if the revokes never get written due to io errors, node crashes and fences, etc. In that case, another node will replay the metadata blocks associated with the glock, but because the revoke was never written, it could replay that block even though the glock had since been granted to another node who might have made changes. This patch changes the logic in run_queue so that it never demotes a glock until its count of pending revokes reaches zero. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: drain the ail2 list after io errorsBob Peterson
Before this patch, gfs2_logd continually tried to flush its journal log, after the file system is withdrawn. We don't want to write anything to the journal, lest we add corruption. Best course of action is to drain the ail1 into the ail2 list (via gfs2_ail1_empty) then drain the ail2 list with a new function, ail2_drain. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages failsBob Peterson
Before this patch, function gfs2_ail1_start_one would return any errors it received from write_cache_pages (except -EBUSY) but it did not withdraw. Since function gfs2_ail1_flush just checks for the bad return code and loops, the loop might potentially never end. This patch adds some logic to allow it to exit the loop and withdraw properly when errors are received from write_cache_pages. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is emptyBob Peterson
Before this patch, if gfs2_ail_empty_gl saw there was nothing on the ail list, it would return and not flush the log. The problem is that there could still be a revoke for the rgrp sitting on the sd_log_le_revoke list that's been recently taken off the ail list. But that revoke still needs to be written, and the rgrp_go_inval still needs to call log_flush_wait to ensure the revokes are all properly written to the journal before we relinquish control of the glock to another node. If we give the glock to another node before we have this knowledge, the node might crash and its journal replayed, in which case the missing revoke would allow the journal replay to replay the rgrp over top of the rgrp we already gave to another node, thus overwriting its changes and corrupting the file system. This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather than returning. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Check for log write errors before telling dlm to unlockBob Peterson
Before this patch, function do_xmote just assumed all the writes submitted to the journal were finished and successful, and it called the go_unlock function to release the dlm lock. But if they're not, and a revoke failed to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and tell dlm to release the glock to another node. This patch adds a couple checks for errors in do_xmote after the calls to go_sync and go_inval. If an error is found, we cannot withdraw yet, because the withdraw itself uses glocks to make the file system read-only. Instead, we flag the error. Later, asserts should cause another node to replay the journal before continuing, thus protecting rgrp and dinode glocks and maintaining the integrity of the metadata. Note that we only need to do this for journaled glocks. System glocks should be able to progress even under withdrawn conditions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Prepare to withdraw as soon as an IO error occurs in log writeBob Peterson
Before this patch, function gfs2_end_log_write would detect any IO errors writing to the journal and put out an appropriate message, but it never set a withdrawing condition. Eventually, the log daemon would see the error and determine it was time to withdraw, but in the meantime, other processes could continue running as if nothing bad ever happened. The biggest consequence is that __gfs2_glock_put would BUG() when it saw that there were still unwritten items. This patch sets the WITHDRAWING status as soon as an IO error is detected, and that way, the BUG will be avoided so the file system can be properly withdrawn and unmounted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Issue revokes more intelligentlyBob Peterson
Before this patch, function gfs2_write_revokes would call gfs2_ail1_empty, then traverse the sd_ail1_list looking for transactions that had bds which were no longer queued to a glock. And if it found some, it would try to issue revokes for them, up to a predetermined maximum. There were two problems with how it did this. First was the fact that gfs2_ail1_empty moves transactions which have nothing remaining on the ail1 list from the sd_ail1_list to the sd_ail2_list, thus making its traversal of sd_ail1_list miss them completely, and therefore, never issue revokes for them. Second was the fact that there were three traversals (or partial traversals) of the sd_ail1_list, each of which took and then released the sd_ail_lock lock: First inside gfs2_ail1_empty, second to determine if there are any revokes to be issued, and third to actually issue them. All this taking and releasing of the sd_ail_lock meant other processes could modify the lists and the conditions in which we're working. This patch simplies the whole process by adding a new parameter to function gfs2_ail1_empty, max_revokes. For normal calls, this is passed in as 0, meaning we don't want to issue any revokes. For function gfs2_write_revokes, we pass in the maximum number of revokes we can, thus allowing gfs2_ail1_empty to add the revokes where needed. This simplies the code, allows for a single holding of the sd_ail_lock, and allows gfs2_ail1_empty to add revokes for all the necessary bd items without missing any. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Add verbose option to check_journal_cleanBob Peterson
Before this patch, function check_journal_clean would give messages related to journal recovery. That's fine for mount time, but when a node withdraws and forces replay that way, we don't want all those distracting and misleading messages. This patch adds a new parameter to make those messages optional. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: fix infinite loop when checking ail item count before go_invalBob Peterson
Before this patch, the rgrp_go_inval and inode_go_inval functions each checked if there were any items left on the ail count (by way of a count), and if so, did a withdraw. But the withdraw code now uses glocks when changing the file system to read-only status. So we can not have glock functions withdrawing or a hang will likely result: The glocks can't be serviced by the work_func if the work_func is busy doing its own withdraw. This patch removes the checks from the go_inval functions and adds a centralized check in do_xmote to warn about the problem and not withdraw, but flag the error so it's eventually caught when the logd daemon eventually runs. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27gfs2: Force withdraw to replay journals and wait for it to finishBob Peterson
When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the file system, and none of the other nodes know about the problem. Later, when the problem is fixed and the withdrawn node is rebooted, it would then discover that its own journal was incomplete, and replay it. However, replaying it at this point is almost guaranteed to introduce corruption because the other nodes are likely to have used affected resource groups that appeared in the journal since the time of the withdraw. Replaying the journal later will overwrite any changes made, and not through any fault of dlm, which was instructed during the withdraw to release those resources. This patch makes file system withdraws seen by the entire cluster. Withdrawing nodes dequeue their journal glock to allow recovery. The remaining nodes check all the journals to see if they are clean or in need of replay. They try to replay dirty journals, but only the journals of withdrawn nodes will be "not busy" and therefore available for replay. Until the journal replay is complete, no i/o related glocks may be given out, to ensure that the replay does not cause the aforementioned corruption: We cannot allow any journal replay to overwrite blocks associated with a glock once it is held. The "live" glock which is now used to signal when a withdraw occurs. When a withdraw occurs, the node signals its withdraw by dequeueing the "live" glock and trying to enqueue it in EX mode, thus forcing the other nodes to all see a demote request, by way of a "1CB" (one callback) try lock. The "live" glock is not granted in EX; the callback is only just used to indicate a withdraw has occurred. Note that all nodes in the cluster must wait for the recovering node to finish replaying the withdrawing node's journal before continuing. To this end, it checks that the journals are clean multiple times in a retry loop. Also note that the withdraw function may be called from a wide variety of situations, and therefore, we need to take extra precautions to make sure pointers are valid before using them in many circumstances. We also need to take care when glocks decide to withdraw, since the withdraw code now uses glocks. Also, before this patch, if a process encountered an error and decided to withdraw, if another process was already withdrawing, the second withdraw would be silently ignored, which set it free to unlock its glocks. That's correct behavior if the original withdrawer encounters further errors down the road. But if secondary waiters don't wait for the journal replay, unlocking glocks will allow other nodes to use them, despite the fact that the journal containing those blocks is being replayed. The replay needs to finish before our glocks are released to other nodes. IOW, secondary withdraws need to wait for the first withdraw to finish. For example, if an rgrp glock is unlocked by a process that didn't wait for the first withdraw, a journal replay could introduce file system corruption by replaying a rgrp block that has already been granted to a different cluster node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-20gfs2: Allow some glocks to be used during withdrawBob Peterson
We need to allow some glocks to be enqueued, dequeued, promoted, and demoted when we're withdrawn. For example, to maintain metadata integrity, we should disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like iopen or the transaction glocks may be safely used because none of their metadata goes through the journal. So in general, we should disallow all glocks with an address space, and allow all the others. One exception is: we need to allow our active journal to be demoted so others may recover it. Allowing glocks after withdraw gives us the ability to take appropriate action (in a following patch) to have our journal properly replayed by another node rather than just abandoning the current transactions and pretending nothing bad happened, leaving the other nodes free to modify the blocks we had in our journal, which may result in file system corruption. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: move check_journal_clean to util.c for future useBob Peterson
Before this patch function check_journal_clean was in ops_fstype.c. This patch moves it to util.c so we can make use of it elsewhere in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Ignore dlm recovery requests if gfs2 is withdrawnBob Peterson
When a node fails, user space informs dlm of the node failure, and dlm instructs gfs2 on the surviving nodes to perform journal recovery. It does this by calling various callback functions in lock_dlm.c. To mark its progress, it keeps generation numbers and recover bits in a dlm "control" lock lvb, which is seen by all nodes to determine which journals need to be replayed. The gfs2 on all nodes get the same recovery requests from dlm, so they all try to do the recovery, but only one will be granted the exclusive lock on the journal. The others fail with a "Busy" message on their "try lock." However, when a node is withdrawn, it cannot safely do any recovery or replay any journals. To make matters worse, gfs2 might withdraw as a result of attempting recovery. For example, this might happen if the device goes offline, or if an hba fails. But in today's gfs2 code, it doesn't check for being withdrawn at any step in the recovery process. What's worse is that these callbacks from dlm have no return code, so there is no way to indicate failure back to dlm. We can send a "Recovery failed" uevent eventually, but that tells user space what happened, not dlm's kernel code. Before this patch, lock_dlm would perform its recovery steps but ignore the result, and eventually it would still update its generation number in the lvb, despite the fact that it may have withdrawn or encountered an error. The other nodes would then see the newer generation number in the lvb and conclude that they don't need to do recovery because the generation number is newer than the last one they saw. They think a different node has already recovered the journal. This patch adds checks to several of the callbacks used by dlm in its recovery state machine so that the functions are ignored and skipped if an io error has occurred or if the file system is withdrawn. That prevents the lvb bits from being updated, and therefore dlm and user space still see the need for recovery to take place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Only complain the first time an io error occurs in quota or logBob Peterson
Before this patch, all io errors received by the quota daemon or the logd daemon would cause a complaint message to be issued, such as: gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0 This patch changes it so that the error message is only issued the first time the error is encountered. Also, before this patch function gfs2_end_log_write did not set the sd_log_error value, so log errors would not cause the file system to be withdrawn. This patch sets the error code so the file system is properly withdrawn if an io error is encountered writing to the journal. WARNING: This change in function breaks check xfstests generic/441 and causes it to fail: io errors writing to the log should cause a file system to be withdrawn, and no further operations are tolerated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: log error reformBob Peterson
Before this patch, gfs2 kept track of journal io errors in two places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags. This patch consolidates the two into sd_log_error so that it reflects the first error encountered writing to the journal. In future patches, we will take advantage of this by checking this value rather than having to check both when reacting to io errors. In addition, this fixes a tight loop in unmount: If buffers get on the ail1 list and an io error occurs elsewhere, the ail1 list would never be cleared because they were always busy. So unmount would hang, waiting for the ail1 list to empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Rework how rgrp buffer_heads are managedBob Peterson
Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: clear ail1 list when gfs2 withdrawsBob Peterson
This patch fixes a bug in which function gfs2_log_flush can get into an infinite loop when a gfs2 file system is withdrawn. The problem is the infinite loop "for (;;)" in gfs2_log_flush which would never finish because the io error and subsequent withdraw prevented the items from being taken off the ail list. This patch tries to clean up the mess by allowing withdraw situations to move not-in-flight buffer_heads to the ail2 list, where they will be dealt with later. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Introduce concept of a pending withdrawBob Peterson
File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be dequeued and demoted while there are still buffers that haven't been properly revoked, due to io errors writing to the journal. This patch introduces a new concept of a pending withdraw, which means an inconsistency has been discovered and we need to withdraw at the earliest possible opportunity. In these cases, we aren't quite withdrawn yet, but we still need to not dequeue glocks and other critical things. If we dequeue the glocks and the withdraw results in our journal being replayed, the replay could overwrite data that's been modified by a different node that acquired the glock in the meantime. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Return bool from gfs2_assert functionsAndreas Gruenbacher
The gfs2_assert functions only print messages when the filesystem hasn't been withdrawn yet, and they indicate whether or not they've printed something in their return value. However, none of the callers use that information, so simply return whether or not the assert has failed. (The gfs2_assert functions are still backwards; they return false when an assertion is true.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Turn gfs2_consist into void functionsAndreas Gruenbacher
Change the various gfs2_consist functions to return void. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Remove usused cluster_wide arguments of gfs2_consist functionsAndreas Gruenbacher
These arguments are always passed as 0, and they are never evaluated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Report errors before withdrawAndreas Gruenbacher
In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Split gfs2_lm_withdraw into two functionsAndreas Gruenbacher
Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-07gfs2: switch to use of errorfc() et.al.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parse: handle optional arguments sanelyAl Viro
Don't bother with "mixed" options that would allow both the form with and without argument (i.e. both -o foo and -o foo=bar). Rather than trying to shove both into a single fs_parameter_spec, allow having with-argument and no-argument specs with the same name and teach fs_parse to handle that. There are very few options of that sort, and they are actually easier to handle that way - callers end up with less postprocessing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro
The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parser: remove fs_parameter_description name fieldEric Sandeen
Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fold struct fs_parameter_enum into struct constant_tableAl Viro
no real difference now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parse: get rid of ->enumsAl Viro
Don't do a single array; attach them to fsparam_enum() entry instead. And don't bother trying to embed the names into those - it actually loses memory, with no real speedup worth mentioning. Simplifies validation as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-06gfs2: fix O_SYNC write handlingAndreas Gruenbacher
In gfs2_file_write_iter, for direct writes, the error checking in the buffered write fallback case is incomplete. This can cause inode write errors to go undetected. Fix and clean up gfs2_file_write_iter along the way. Based on a proposed fix by Christoph Hellwig <hch@lst.de>. Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06gfs2: move setting current->backing_dev_infoChristoph Hellwig
Set current->backing_dev_info just around the buffered write calls to prepare for the next fix. Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06gfs2: fix gfs2_find_jhead that returns uninitialized jhead with seq 0Abhi Das
When the first log header in a journal happens to have a sequence number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit, and return an uninitialized jhead with seq 0. This can cause failures in the caller. For instance, a mount fails in one test case. The correct behavior is for it to continue searching through the journal to find the correct journal head with the highest sequence number. Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-28Revert "gfs2: eliminate tr_num_revoke_rm"Bob Peterson
This reverts commit e955537e3262de8e56f070b13817f525f472fa00. Before patch e955537e32, tr_num_revoke tracked the number of revokes added to the transaction, and tr_num_revoke_rm tracked how many revokes were removed. But since revokes are queued off the sdp (superblock) pointer, some transactions could remove more revokes than they added. (e.g. revokes added by a different process). Commit e955537e32 eliminated transaction variable tr_num_revoke_rm, but in order to do so, it changed the accounting to always use tr_num_revoke for its math. Since you can remove more revokes than you add, tr_num_revoke could now become a negative value. This negative value broke the assert in function gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) && (tr->tr_num_revoke <=3D tr->tr_revokes))) One way to fix this is to simply remove the tr_num_revoke clause from the assert and allow the value to become negative. Andreas didn't like that idea, so instead, we decided to revert e955537e32. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-21gfs2: remove unused LBIT macrosAlex Shi
Since commit 223b2b889f37 ("GFS2: Fix alignment issue and tidy gfs2_bitfit"), these 3 macros aren't used anymore, so remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-21fs/gfs2: remove unused IS_DINODE and IS_LEAF macrosAlex Shi
Since commit 1579343a73e3 ("GFS2: Remove dirent_first() function"), these macros aren't used any more, so remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-20gfs2: Remove GFS2_MIN_LVB_SIZE defineAndreas Gruenbacher
The dlm lockspace is set up to have lock value blocks of GDLM_LVB_SIZE bytes, and dlm is the only lock manager we support, so there is no point in claiming that the lock value block could have any other size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-20gfs2: Fix incorrect variable nameAndreas Gruenbacher
Rename sd_log_commited_revoke to sd_log_committed_revoke. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-15gfs2: Avoid access time thrashing in gfs2_inode_lookupAndreas Gruenbacher
In gfs2_inode_lookup, we initialize inode->i_atime to the lowest possibly value after gfs2_inode_refresh may already have been called. This should be the other way around, but we didn't notice because usually the inode type is known from the directory entry and so gfs2_inode_lookup won't call gfs2_inode_refresh. In addition, only initialize ip->i_no_formal_ino from no_formal_ino when actually needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-08gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepageBob Peterson
This patch simply removes variable ret, which is used to store the return code of its call to __gfs2_jdata_writepage, in favor of just returning the result directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-01-07gfs2: eliminate ssize parameter from gfs2_struct2blkBob Peterson
Every caller of function gfs2_struct2blk specified sizeof(u64). This patch eliminates the unnecessary parameter and replaces the size calculation with a new superblock variable that is computed to be the maximum number of block pointers we can fit inside a log descriptor, as is done for pointers per dinode and indirect block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-07gfs2: Another gfs2_find_jhead fixAndreas Gruenbacher
On filesystems with a block size smaller than the page size, gfs2_find_jhead can split a page across two bios (for example, when blocks are not allocated consecutively). When that happens, the first bio that completes will unlock the page in its bi_end_io handler even though the page hasn't been read completely yet. Fix that by using a chained bio for the rest of the page. While at it, clean up the sector calculation logic in gfs2_log_alloc_bio. In gfs2_find_jhead, simplify the disk block and offset calculation logic and fix a variable name. Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-12-05Merge tag 'gfs2-for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Andreas Gruenbacher: "Bob's extensive filesystem withdrawal and recovery testing: - don't write log headers after file system withdraw - clean up iopen glock mess in gfs2_create_inode - close timing window with GLF_INVALIDATE_IN_PROGRESS - abort gfs2_freeze if io error is seen - don't loop forever in gfs2_freeze if withdrawn - fix infinite loop in gfs2_ail1_flush on io error - introduce function gfs2_withdrawn - fix glock reference problem in gfs2_trans_remove_revoke Filesystems with a block size smaller than the page size: - fix end-of-file handling in gfs2_page_mkwrite - improve mmap write vs. punch_hole consistency Other: - remove active journal side effect from gfs2_write_log_header - multi-block allocations in gfs2_page_mkwrite Minor cleanups and coding style fixes: - remove duplicate call from gfs2_create_inode - make gfs2_log_shutdown static - make gfs2_fs_parameters static - some whitespace cleanups - removed unnecessary semicolon" * tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Don't write log headers after file system withdraw gfs2: Remove duplicate call from gfs2_create_inode gfs2: clean up iopen glock mess in gfs2_create_inode gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS gfs2: Abort gfs2_freeze if io error is seen gfs2: Don't loop forever in gfs2_freeze if withdrawn gfs2: fix infinite loop in gfs2_ail1_flush on io error gfs2: Introduce function gfs2_withdrawn gfs2: fix glock reference problem in gfs2_trans_remove_revoke gfs2: make gfs2_log_shutdown static gfs2: Remove active journal side effect from gfs2_write_log_header gfs2: Fix end-of-file handling in gfs2_page_mkwrite gfs2: Multi-block allocations in gfs2_page_mkwrite gfs2: Improve mmap write vs. punch_hole consistency gfs2: make gfs2_fs_parameters static gfs2: Some whitespace cleanups gfs2: removed unnecessary semicolon
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "In this release, we hoisted as much of XFS' writeback code into iomap as was practicable, refactored the unshare file data function, added the ability to perform buffered io copy on write, and tweaked various parts of the directio implementation as needed to port ext4's directio code (that will be a separate pull). Summary: - Make iomap_dio_rw callers explicitly tell us if they want us to wait - Port the xfs writeback code to iomap to complete the buffered io library functions - Refactor the unshare code to share common pieces - Add support for performing copy on write with buffered writes - Other minor fixes - Fix unchecked return in iomap_bmap - Fix a type casting bug in a ternary statement in iomap_dio_bio_actor - Improve tracepoints for easier diagnostic ability - Fix pipe page leakage in directio reads" * tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits) iomap: Fix pipe page leakage during splicing iomap: trace iomap_appply results iomap: fix return value of iomap_dio_bio_actor on 32bit systems iomap: iomap_bmap should check iomap_apply return value iomap: Fix overflow in iomap_page_mkwrite fs/iomap: remove redundant check in iomap_dio_rw() iomap: use a srcmap for a read-modify-write I/O iomap: renumber IOMAP_HOLE to 0 iomap: use write_begin to read pages to unshare iomap: move the zeroing case out of iomap_read_page_sync iomap: ignore non-shared or non-data blocks in xfs_file_dirty iomap: always use AOP_FLAG_NOFS in iomap_write_begin iomap: remove the unused iomap argument to __iomap_write_end iomap: better document the IOMAP_F_* flags iomap: enhance writeback error message iomap: pass a struct page to iomap_finish_page_writeback iomap: cleanup iomap_ioend_compare iomap: move struct iomap_page out of iomap.h iomap: warn on inline maps in iomap_writepage_map iomap: lift the xfs writeback code to iomap ...
2019-11-21gfs2: Don't write log headers after file system withdrawBob Peterson
Before this patch, when a node withdrew a gfs2 file system, it wrote a (clean) unmount log header. That's wrong. You don't want to write anything to the journal once you're withdrawn because that's acknowledging that the transaction is complete and the journal is in good shape, neither of which may be a valid assumption when the file system is withdrawn. This is especially true if the withdraw was caused due to io errors writing to the journal in the first place. The best course of action is to leave the journal "as is" until it may be safely replayed during journal recovery, regardless of whether it's done by this node or another. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-21gfs2: Remove duplicate call from gfs2_create_inodeAndreas Gruenbacher
In gfs2_create_inode, gfs2_set_inode_blocks is called twice for no good reason. Remove the unnecessary call. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>