Age | Commit message (Collapse) | Author |
|
Function delete_work_func() was previously assuming that the
GLF_TRY_TO_EVICT and GLF_VERIFY_DELETE flags won't both be set at the
same time, but there probably are races in which that can happen, so
handle that case correctly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move those definitions into the the scope in which they are used.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In case an iopen glock cannot be upgraded, function
gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode()
whether deleting the inode should be deferred or skipped altogether.
Change the function to return the appropriate enum evict_behavior value
to indicate that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.
In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move function needs_demote() to glock.h and rename it to
glock_needs_demote(). In handle_callback(), wake up the glock when
setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE
flag already triggered a wake-up.)
With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock()
to wake up when either of those flags is set for the inode glock: the
faster we can react to contention, the better.
The GLF_PENDING_DEMOTE flag is only used for inode glocks (see
gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in
gfs2_drop_inode(). Still, using glock_needs_demote() there as well
makes the code a little easier to read.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When mounting of a corrupted disk image fails, the error message printed
can reference uninitialized inode fields. To prevent that from happening,
always initialize those fields.
Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Once upon a time, predecessors of those used to do file lookup
without bumping a refcount, provided that caller held rcu_read_lock()
across the lookup and whatever it wanted to read from the struct
file found. When struct file allocation switched to SLAB_TYPESAFE_BY_RCU,
that stopped being feasible and these primitives started to bump the
file refcount for lookup result, requiring the caller to call fput()
afterwards.
But that turned them pointless - e.g.
rcu_read_lock();
file = lookup_fdget_rcu(fd);
rcu_read_unlock();
is equivalent to
file = fget_raw(fd);
and all callers of lookup_fdget_rcu() are of that form. Similarly,
task_lookup_fdget_rcu() calls can be replaced with calling fget_task().
task_lookup_next_fdget_rcu() doesn't have direct counterparts, but
its callers would be happier if we replaced it with an analogue that
deals with RCU internally.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Benjamin Coddington <bcodding@redhat.com> says:
Last year both GFS2 and OCFS2 had some work done to make their locking more
robust when exported over NFS. Unfortunately, part of that work caused both
NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send
lock notifications to clients.
This in itself is not a huge problem because most NFS clients will still
poll the server in order to acquire a conflicted lock, but now that I've
noticed it I can't help but try to fix it because there are big advantages
for setups that might depend on timely lock notifications, and we've
supported that as a feature for a long time.
Its important for NLM and kNFSD that they do not block their kernel threads
inside filesystem's file_lock implementations because that can produce
deadlocks. We used to make sure of this by only trusting that
posix_lock_file() can correctly handle blocking lock calls asynchronously,
so the lock managers would only setup their file_lock requests for async
callbacks if the filesystem did not define its own lock() file operation.
However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started signalling this
behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting
posix_lock_file() was inadvertently dropped, so now most filesystems no
longer produce lock notifications when exported over NFS.
I tried to fix this by simply including the old check for lock(), but the
resulting include mess and layering violations was more than I could accept.
There's a much cleaner way presented here using an fop_flag, which while
potentially flag-greedy, greatly simplifies the problem and grooms the
way for future uses by both filesystems and lock managers alike.
* patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com:
exportfs: Remove EXPORT_OP_ASYNC_LOCK
NLM/NFSD: Fix lock notifications for async-capable filesystems
gfs2/ocfs2: set FOP_ASYNC_LOCK
fs: Introduce FOP_ASYNC_LOCK
NFS: trace: show TIMEDOUT instead of 0x6e
nfsd: use system_unbound_wq for nfsd_file_gc_worker()
nfsd: count nfsd_file allocations
nfsd: fix refcount leak when file is unhashed after being found
nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire
nfsd: add list_head nf_gc to struct nfsd_file
Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that GFS2 and OCFS2 are signalling async ->lock() support with
FOP_ASYNC_LOCK and checks for support are converted, we can remove
EXPORT_OP_ASYNC_LOCK.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/0a114db814fec3086f937ae3d44a086f13b8de26.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point. These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work. This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.
Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().
Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add an argument to gfs2_queue_verify_delete() that allows it to queue
GLF_VERIFY_DELETE work for immediate execution. This is used in the
next patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Set gl_no_formal_ino of the iopen glock to the generation of the
associated inode (ip->i_no_formal_ino) as soon as that value is known.
This saves us from setting it later, possibly repeatedly, when queuing
GLF_VERIFY_DELETE work.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename the GLF_VERIFY_EVICT flag to GLF_VERIFY_DELETE: that flag
indicates that we want to delete an inode / verify that it has been
deleted.
To match, rename gfs2_queue_verify_evict() to
gfs2_queue_verify_delete().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 update from Andreas Gruenbacher:
- Convert the writepage address space operation to writepages (Matthew
Wilcox)
- A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas
Gruenbacher)
* tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove gfs2_aspace_writepage()
gfs2: Remove gfs2_jdata_writepage()
gfs2: Remove __gfs2_writepage()
gfs2: Add gfs2_aspace_writepages()
gfs2: fix double destroy_workqueue error
gfs2: Minor gfs2_glock_cb cleanup
|
|
Both GFS2 and OCFS2 use DLM locking, which will allow async lock requests.
Signal this support by setting FOP_ASYNC_LOCK.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/fc4163dbbf33c58e5a8b8ee8cb8c57e555f53ce5.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In order to switch fuse over to using iomap for buffered writes we need
to be able to have the struct file for the original write, in case we
have to read in the page to make it uptodate. Handle this by using the
existing private field in the iomap_iter, and add the argument to
iomap_file_buffered_write. This will allow us to pass the file in
through the iomap buffered write path, and is flexible for any other
file systems needs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There are no remaining callers of gfs2_aspace_writepage() other than
vmscan, which is known to do more harm than good.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
There are no remaining callers of gfs2_jdata_writepage() other than
vmscan, which is known to do more harm than good.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Call aops->writepages() instead of using write_cache_pages() to call
aops->writepage. Change the handling of -ENODATA to not set the
persistent error on the block device.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This saves one indirect function call per folio and gets us closer to
removing aops->writepage.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When gfs2_fill_super() fails, destroy_workqueue() is called within
gfs2_gl_hash_clear(), and the subsequent code path calls
destroy_workqueue() on the same work queue again.
This issue can be fixed by setting the work queue pointer to NULL after
the first destroy_workqueue() call and checking for a NULL pointer
before attempting to destroy the work queue again.
Reported-by: syzbot+d34c2a269ed512c531b0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0
Fixes: 30e388d57367 ("gfs2: Switch to a per-filesystem glock workqueue")
Cc: stable@vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_glock_cb(), we only need to calculate the glock hold time for
inode glocks; the value is unused otherwise.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The logic for determining when to demote a glock in glock_work_func(),
introduced in commit 7cf8dcd3b68a ("GFS2: Automatically adjust glock min
hold time"), doesn't make sense: inode glocks have a minimum hold time
that delays demotion, while all other glocks are expected to be demoted
immediately. Instead of demoting non-inode glocks immediately,
glock_work_func() schedules glock work for them to be demoted, however.
Get rid of that unnecessary indirection.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since the previous commit, function gfs2_quota_sync() will not cause the
sync generation to creep forward by one every time the function is
called; this helps keep things a but more tidy. We also don't care that
this function allocates a page of memory every time it is called, so no
good reason for keeping qd_changed() anymore, which just duplicates
qd_grab_sync().
This reverts commit 06aa6fd31a5f402b055e12ea53bb7b086359d3c8.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The quota sync generation is only ever updated under sd_quota_sync_mutex
by gfs2_quota_sync(), but its current value is fetched ouside of that
mutex, so use WRITE_ONCE() and READ_ONCE() when accessing it without
holding that mutex.
Pass the current sync generation to do_sync() from its callers to ensure
that we're not recording the wrong generation when the syncing is
done. Also, make sure that qd->qd_sync_gen only ever moves forward.
In gfs2_quota_sync(), only write the new sync generation when we know
that there are changes. This eliminates the need for function
sd_changed(), which we will remove in the next commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
With the locking the previous patch has introduced for each struct
gfs2_quota_data object, sd_quota_mutex has become largely irrelevant.
By waiting on the buffer head instead of waiting on the mutex in
get_bh(), it becomes completely irrelevant and can be removed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The quota code is missing some locking between local quota changes and
syncing those quota changes to the global quota file (gfs2_quota_sync);
in particular, qd->qd_change needs to be kept in sync with the
QDF_CHANGE change flag and the number of references held. Use the
qd->qd_lockref.lock spinlock for that.
With the qd->qd_lockref.lock spinlock held, we can no longer call
lockref_get(), so turn qd_hold() into a variant that assumes that the
lock is held. This function is really supposed to take an additional
reference when one or more references are already held, so check for
that instead of checking if the lockref is dead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The split between qd_fish() and gfs2_quota_sync() is rather unfortunate
as qd_fish() is repeatedly called to scan sdp->sd_quota_list only to
find the next object to that needs syncing; if there are multiple
objects on the list that need syncing, it makes more sense to grab them
all in one go. This is relatively easy to do when qd_fish() is folded
into gfs2_quota_sync().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename variable 'value' to 'change' as it stores a change in value.
Add new 'value' and 'limit' variables for the current value and limit.
Only fetch the tuning parameters when we need them.
Get rid of unnecessary nesting.
No change in functionality.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function do_qc() is supposed to be conceptually simple: it alters the
current in-memory and on-disk quota change values for a given uid/gid by
a given delta. If the on-disk record isn't defined yet, a new record is
created. If the on-disk record exists and the resulting change value is
zero, there no longer is a need for that record and so the record is
deleted. On top of that, some reference counting is involved when
creating and deleting records.
Currently, instead of doing the above, do_qc() alters the on-disk value
and then it sets the in-memory value to the on-disk value. This is
incorrect when the on-disk value differs from the in-memory value. The
two values are allowed to differ when quota changes are synced to the
global quota file. Fix by changing both values by the same amount.
In addition, do_qc() currently gets confused when the delta value is 0.
It isn't supposed to be called that way, but that assumption isn't
mentioned and it makes the code harder to read. Make the code more
explicit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 432928c93779 ("gfs2: Add quota_change type") makes the incorrect
assertion that function do_qc() should behave differently in the two
contexts it is used in, but that isn't actually true. In all cases,
do_qc() grabs a "reference" when it starts using a slot in the per-node
quota changes file, and it releases that "reference" when no more
residual changes remain. Revert that broken commit.
There are some remaining issues with function do_qc() which are
addressed in the next commit.
This reverts commit 432928c9377959684c748a9bc6553ed2d3c2ea4f.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 4c6a08125f22 ("gfs2: ignore negated quota changes") skips quota
changes with qd_change == 0 instead of writing them back, which leaves
behind non-zero qd_change values in the affected slots. The kernel then
assumes that those slots are unused, while the qd_change values on disk
indicate that they are indeed still in use. The next time the
filesystem is mounted, those invalid slots are read in from disk, which
will cause inconsistencies.
Revert that commit to avoid filesystem corruption.
This reverts commit 4c6a08125f2249531ec01783a5f4317d7342add5.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename qd_check_sync() to qd_grab_sync() and make it return a bool.
Turn the sync_gen pointer into a regular u64 and pass in U64_MAX instead
of a NULL pointer when sync generation checking isn't needed.
Introduce a new qd_ungrab_sync() helper for undoing the effects of
qd_grab_sync() if the subsequent bh_get() on the qd object fails.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The qd_bh_get_or_undo() helper introduced by that commit doesn't improve
the code much, so revert it and clean things up in a more useful way in
the next commit.
This reverts commit 7dbc6ae60dd7089d8ed42892b6a66c138f0aa7a0.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_quota_init(), make sure that the per-node "quota_change%u" file
doesn't contain duplicate uids/gids. Those duplicates would cause us to
acquire the glock corresponding to those ids repeatedly, which the glock
code doesn't allow.
When finding inconsistencies, we wipe them out and ignore them. The
resulting quotas will likely be inconsistent, and running quotacheck(1)
is advised.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add a fail_brelse label and use it where useful. Move variable bh out
of the loop to extend its visibility to the new label. No functional
change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The demote_ok glock operation is only still used to prevent the inode
glocks of the "jindex" and "rindex" directories from getting recycled
while they are still referenced by sdp->sd_jindex and sdp->sd_rindex.
However, the LRU walking code will no longer recycle glocks which are
referenced, so the demote_ok glock operation is obsolete and can be
removed.
Each of a glock's holders in the gl_holders list is holding a reference
on the glock, so when the list of holders isn't empty in demote_ok(),
the existing reference count check will already prevent the glock from
getting released. This means that demote_ok() is obsolete as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This reverts commit e7ccaf5fe1590667b3fa2f8df5c5ec9ba0dc5b85.
Before commit e7ccaf5fe159, every time a resource group glock was
dequeued by gfs2_glock_dq(), it was added to the glock LRU list even
though the glock was still referenced by the resource group and could
never be evicted, anyway. Commit e7ccaf5fe159 added a GLOF_LRU hack to
avoid that overhead for resource group glocks, and that hack was since
adopted for some other types of glocks as well.
We now no longer add glocks to the glock LRU list while they are still
referenced. This solves the underlying problem, and obsoletes the
GLOF_LRU hack.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
|
|
In the current glock reference counting model, a bias of one is added to
a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED).
A glock is removed from the lru_list when it is enqueued, and added back
when it is dequeued. This isn't a very appropriate model because most
glocks are held for long periods of time (for example, the inode "owns"
references to its inode and iopen glocks as long as the inode is cached
even when the glock state changes to LM_ST_UNLOCKED), and they can only
be freed when they are no longer referenced, anyway.
Fix this by getting rid of the refcount bias for locked glocks. That
way, we can use lockref_put_or_lock() to efficiently drop all but the
last glock reference, and put the glock onto the lru_list when the last
reference is dropped. When find_insert_glock() returns a reference to a
cached glock, it removes the glock from the lru_list.
Dumping the "glocks" and "glstats" debugfs files also takes glock
references, but instead of removing the glocks from the lru_list in that
case as well, we leave them on the list. This ensures that dumping
those files won't perturb the order of the glocks on the lru_list.
In addition, when the last reference to an *unlocked* glock is dropped,
we immediately free it; this preserves the preexisting behavior. If it
later turns out that caching unlocked glocks is useful in some
situations, we can change the caching strategy.
It is currently unclear if a glock that has no active references can
have the GLF_LFLUSH flag set. To make sure that such a glock won't
accidentally be evicted due to memory pressure, we add a GLF_LFLUSH
check to gfs2_dispose_glock_lru().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Switch to a per-filesystem glock workqueue. Additional workqueues are
cheap nowadays, and keeping separate workqueues allows to flush the work
of each filesystem without affecting the others.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When glocks cannot be freed for a long time, avoid the "task blocked for
more than N seconds" messages and report how many glocks are still
outstanding, instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Clean up the messy code in gfs2_glock_get(). No change in
functionality.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Invert the meaning of the GLF_INITIAL flag: right now, when GLF_INITIAL
is set, a DLM lock exists and we have a valid identifier for it; when
GLF_INITIAL is cleared, no DLM lock exists (yet). This is confusing.
In addition, it makes more sense to highlight the exceptional case
(i.e., no DLM lock exists yet) in glock dumps and trace points than to
highlight the common case.
To avoid confusion between the "old" and the "new" meaning of the flag,
use 'a' instead of 'I' to represent the flag.
For improved code consistency, check if the GLF_INITIAL flag is cleared
to determine whether a DLM lock exists instead of checking if the lock
identifier is non-zero.
Document what the flag is used for.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function handle_callback() is used to request a glock demote. This
often happens in response to a conflicting remote locking request and
subsequent bast callback from DLM, but there are other reasons for
triggering a demote request as well, such as when trying to release a
glock in response to memory pressure. To clarify that, rename the
function to request_demote().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The GLF_FROZEN flag indicates that a reply to a DLM locking request has
been received, but should not be processed at this time. To clarify
that meaning, rename the flag to GLF_HAVE_FROZEN_REPLY.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The GLF_REPLY_PENDING flag indicates to glock_work_func() that in
response to a locking request, DLM has sent a reply that needs to be
processed. A flag with that name could as well indicate that we are
waiting on a reply from DLM, however. To disambiguate these two cases,
rename the flag to GLF_HAVE_REPLY.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename the GLF_FREEING flag to GLF_UNLOCKED, and the ->go_free glock
operation to ->go_unlocked. This mechanism is used to wait for the
underlying DLM lock to be unlocked; being able to free the glock is a
consequence of the DLM lock being unlocked.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The return statement at the end of run_queue() is useless.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|