summaryrefslogtreecommitdiff
path: root/fs/gfs2/trans.c
AgeCommit message (Collapse)Author
2020-01-28Revert "gfs2: eliminate tr_num_revoke_rm"Bob Peterson
This reverts commit e955537e3262de8e56f070b13817f525f472fa00. Before patch e955537e32, tr_num_revoke tracked the number of revokes added to the transaction, and tr_num_revoke_rm tracked how many revokes were removed. But since revokes are queued off the sdp (superblock) pointer, some transactions could remove more revokes than they added. (e.g. revokes added by a different process). Commit e955537e32 eliminated transaction variable tr_num_revoke_rm, but in order to do so, it changed the accounting to always use tr_num_revoke for its math. Since you can remove more revokes than you add, tr_num_revoke could now become a negative value. This negative value broke the assert in function gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) && (tr->tr_num_revoke <=3D tr->tr_revokes))) One way to fix this is to simply remove the tr_num_revoke clause from the assert and allow the value to become negative. Andreas didn't like that idea, so instead, we decided to revert e955537e32. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-07gfs2: eliminate ssize parameter from gfs2_struct2blkBob Peterson
Every caller of function gfs2_struct2blk specified sizeof(u64). This patch eliminates the unnecessary parameter and replaces the size calculation with a new superblock variable that is computed to be the maximum number of block pointers we can fit inside a log descriptor, as is done for pointers per dinode and indirect block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-14gfs2: fix glock reference problem in gfs2_trans_remove_revokeBob Peterson
Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock after it had been freed. To do that, it temporarily added a new glock reference by calling gfs2_glock_hold in function gfs2_add_revoke. However, if the bd element was removed by gfs2_trans_remove_revoke, it failed to drop the additional reference. This patch adds logic to gfs2_trans_remove_revoke to properly drop the additional glock reference. Fixes: 9287c6452d2b ("gfs2: Fix occasional glock use-after-free") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: eliminate tr_num_revoke_rmBob Peterson
For its journal processing, gfs2 kept track of the number of buffers added and removed on a per-transaction basis. These values are used to calculate space needed in the journal. But while these calculations make sense for the number of buffers, they make no sense for revokes. Revokes are managed in their own list, linked from the superblock. So it's entirely unnecessary to keep separate per-transaction counts for revokes added and removed. A single count will do the same job. Therefore, this patch combines the transaction revokes into a single count. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}Andreas Gruenbacher
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no such thing as an "unrevoke" object; all this function does is remove existing revoke objects plus some bookkeeping. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Rename sd_log_le_{revoke,ordered}Andreas Gruenbacher
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11gfs2: Remove vestigial bd_opsBob Peterson
Field bd_ops was set but never used, so I removed it, and all code supporting it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-05gfs2: Use fs_* functions instead of pr_* function where we canBob Peterson
Before this patch, various errors and messages were reported using the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does not tell you which gfs2 mount had the problem, which is often vital to debugging. This patch changes the calls from pr_* to fs_* in most of the messages so that the file system id is printed along with the message. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04gfs2: Remove ordered write mode handling from gfs2_trans_add_dataAndreas Gruenbacher
In journaled data mode, we need to add each buffer head to the current transaction. In ordered write mode, we only need to add the inode to the ordered inode list. So far, both cases are handled in gfs2_trans_add_data. This makes the code look misleading and is inefficient for small block sizes as well. Handle both cases separately instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23GFS2: Introduce new gfs2_log_header_v2Bob Peterson
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-17gfs2: Remove pointless BUG_ONAndreas Gruenbacher
The current transaction is being dereferenced before asserting that is not NULL; that isn't going to help. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-31gfs2: Fix a harmless typoAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-30GFS2: Reduce contention on gfs2_log_lockBob Peterson
This patch modifies functions gfs2_trans_add_meta and _data so that they check whether the buffer_head is already in a transaction, and if so, avoid taking the gfs2_log_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-27GFS2: Inline function meta_lo_addBob Peterson
This patch simply combines function meta_lo_add with its only caller, trans_add_meta. This makes the code easier to read and will make it easier to reduce contention on gfs2_log_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-27GFS2: Switch tr_touched to flag in transactionBob Peterson
This patch eliminates the int variable tr_touched in favor of a new flag in the transaction. This is a step toward reducing contention on the gfs2_log_lock spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-10-01gfs2: Add missing else in trans_add_meta/dataBob Peterson
This patch fixes a timing window that causes a segfault. The problem is that bd can remain NULL throughout the function and then reference that NULL pointer if the bh->b_private starts out NULL, then someone sets it to non-NULL inside the locking. In that case, bd still needs to be set. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-09-03GFS2: Move glock superblock pointer to field gl_nameBob Peterson
What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2014-11-17GFS2: update freeze code to use freeze/thaw_super on all nodesBenjamin Marzinski
The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-10-08GFS2: use _RET_IP_ instead of (unsigned long)__builtin_return_address(0)Fabian Frederick
use macro definition Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-05-14GFS2: remove transaction glockBenjamin Marzinski
GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-07GFS2: Use pr_<level> more consistentlyJoe Perches
Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-06GFS2: global conversion to pr_foo()Fabian Frederick
-All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24GFS2: Move log buffer accounting to transactionSteven Whitehouse
Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24GFS2: Move log buffer lists into transactionSteven Whitehouse
Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-21GFS2: Reduce struct gfs2_trans in sizeSteven Whitehouse
A couple of "int" fields were being used as boolean values so we can make them bitfields of one bit, and put them in what might otherwise be a hole in the structure with 64 bit alignment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: fix warning messageBenjamin Marzinski
This patch fixes a warning message introduced in the recent "GFS2: aggressively issue revokes in gfs2_log_flush" patch. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: aggressively issue revokes in gfs2_log_flushBenjamin Marzinski
This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds
Pull GFS2 updates from Steven Whitehouse: "There is not a whole lot of change this time - there are some further changes which are in the works, but those will be held over until next time. Here there are some clean ups to inode creation, the addition of an origin (local or remote) indicator to glock demote requests, removal of one of the remaining GFP_NOFAIL allocations during log flushes, one minor clean up, and a one liner bug fix." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Flush work queue before clearing glock hash tables GFS2: Add origin indicator to glock demote tracing GFS2: Add origin indicator to glock callbacks GFS2: replace gfs2_ail structure with gfs2_trans GFS2: Remove vestigial parameter ip from function rs_deltree GFS2: Use gfs2_dinode_out() in the inode create path GFS2: Remove gfs2_refresh_inode from inode creation path GFS2: Clean up inode creation path
2013-04-29gfs2: Convert print_symbol to %pSRJoe Perches
Use the new vsprintf extension to avoid any possible message interleaving. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-08GFS2: replace gfs2_ail structure with gfs2_transBenjamin Marzinski
In order to allow transactions and log flushes to happen at the same time, gfs2 needs to move the transaction accounting and active items list code into the gfs2_trans structure. As a first step toward this, this patch removes the gfs2_ail structure, and handles the active items list in the gfs_trans structure. This keeps gfs2 from allocating an ail structure on log flushes, and gives us a struture that can later be used to store the transaction accounting outside of the gfs2 superblock structure. With this patch, at the end of a transaction, gfs2 will add the gfs2_trans structure to the superblock if there is not one already. This structure now has the active items fields that were previously in gfs2_ail. This is not necessary in the case where the transaction was simply used to add revokes, since these are never written outside of the journal, and thus, don't need an active items list. Also, in order to make sure that the transaction structure is not removed while it's still in use by gfs2_trans_end, unlocking the sd_log_flush_lock has to happen slightly later in ending the transaction. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Use ->writepages for ordered writesSteven Whitehouse
Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Merge gfs2_attach_bufdata() into trans.cSteven Whitehouse
The locking in gfs2_attach_bufdata() was type specific (data/meta) which made the function rather confusing. This patch moves the core of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata() and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta() As a result all of the locking related to adding data and metadata to the journal is now in these two functions. This should help to clarify what is going on, and give us some opportunities to simplify in some cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Copy gfs2_trans_add_bh into new data/meta functionsSteven Whitehouse
This patch copies the body of gfs2_trans_add_bh into the two newly added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can then move the .lo_add functions from lops.c into trans.c and call them directly. As a result of this, we no longer need to use the .lo_add functions at all, so that is removed from the log operations structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Split gfs2_trans_add_bh() into twoSteven Whitehouse
There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Merge revoke adding functionsSteven Whitehouse
This moves the lo_add function for revokes into trans.c, removing a function call and making the code easier to read. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07GFS2: Test bufdata with buffer locked and gfs2_log_lock heldBenjamin Marzinski
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-31gfs2: Convert to new freezing mechanismJan Kara
We update gfs2_page_mkwrite() to use new freeze protection and the transaction code to use freeze protection while the transaction is running. That is needed to stop iput() of unlinked file from modifying the filesystem. The rest is handled by the generic code. CC: cluster-devel@redhat.com CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-02GFS2: eliminate log elements and simplifyBob Peterson
This patch eliminates the gfs2_log_element data structure and rolls its two components into the gfs2_bufdata. This makes the code easier to understand and makes it easier to migrate to a rbtree to keep the list sorted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-24GFS2: Remove bd_list_trSteven Whitehouse
This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count schemeBob Peterson
Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
2010-05-05GFS2: Various gfs2_logd improvementsBenjamin Marzinski
This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-13GFS2: Move journal live test at transaction startSteven Whitehouse
There seems little point grabbing the transaction glock only to have to release it again if the journal isn't live. This moves the test earlier to avoid grabbing the lock when we don't need it in the first place. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24GFS2: Fix deadlock on journal flushSteven Whitehouse
This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24GFS2: Merge lock_dlm module into GFS2Steven Whitehouse
This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Update gfs2_trans_add_unrevoke to accept extentsSteven Whitehouse
By adding an extra argument to gfs2_trans_add_unrevoke we can now specify an extent length of blocks to unrevoke. This means that we only need to make one pass through the list for each extent rather than each block. Currently the only extent length which is used is 1, but that will change in the future. Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta since its the only difference between this and gfs2_alloc_data which is left. This will allow a future patch to merge these two functions into one (i.e. one call to allocate both data and metadata in a single extent in the future). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Don't add glocks to the journalSteven Whitehouse
The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Clean up gfs2_trans_add_revoke()Steven Whitehouse
The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>