Age | Commit message (Collapse) | Author |
|
Both callers have a folio, so pass it in and use it throughout.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241025190822.1319162-4-willy@infradead.org
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Remove the conversion to a struct page. Removes a few hidden calls to
compound_head(). Use 'err' instead of 'rc' for clarity.
Also remove the unnecessary call to ClearPageUptodate(); the uptodate
flag is already clear if this function is being called. That lets us
switch to folio_end_read() which does one atomic flag operation instead
of two.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241025190822.1319162-3-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
By adding a ->migrate_folio implementation, theree is no need to keep
the ->writepage implementation. The new writepages removes the
unnecessary call to SetPageUptodate(); the folio should already be
uptodate at this point.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241025190822.1319162-2-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
All fuse requests use folios instead of pages for transferring data.
Remove pages from the requests and exclusively use folios.
No functional changes.
[SzM: rename back folio_descs -> descs, etc.]
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
To make it possible to put struct bin_attribute into read-only memory,
the sysfs core has to stop passing mutable pointers to the read() and
write() callbacks.
As there are numerous implementors of these callbacks throughout the
tree it's not possible to change all of them at once.
To enable a step-by-step transition, add new variants of the read() and
write() callbacks which differ only in the constness of the struct
bin_attribute argument.
As most binary attributes are defined through macros, extend these
macros to transparently handle both variants of callbacks to minimize
the churn during the transition.
As soon as all handlers are switch to the const variant, the non-const
one can be removed together with the transition machinery.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-9-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Several drivers need to dynamically calculate the size of an binary
attribute. Currently this is done by assigning attr->size from the
is_bin_visible() callback.
This has drawbacks:
* It is not documented.
* A single attribute can be instantiated multiple times, overwriting the
shared size field.
* It prevents the structure to be moved to read-only memory.
Introduce a new dedicated callback to calculate the size of the
attribute.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-2-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Upcoming changes to the sysfs core require the size of the created file
to be overridable by the caller.
Add a parameter to enable this.
For now keep using attr->size in all cases.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-1-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Shared the regular buffered write iomap_ops with the page fault path
and just check for the IOMAP_FAULT flag to skip delalloc punching.
This keeps the delalloc punching checks in one place, and will make it
easier to convert iomap to an iter model where the begin and end
handlers are merged into a single callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_filemap_huge_fault only ever serves DAX faults, so hard code the
call to xfs_dax_read_fault and open code __xfs_filemap_fault in the
only remaining caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Only two of the callers of __xfs_filemap_fault every handle read faults.
Split the write_fault handling out of __xfs_filemap_fault so that all
callers call that directly either conditionally or unconditionally and
only leave the read fault handling in __xfs_filemap_fault.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Split the xfs_filemap_fault trace event into separate ones for read and
write faults and move them into the applicable locations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
It's just read in from the superblock and used without doing any
validity checks at all on the value.
Fixes: fb4f2b4e5a82 ("xfs: add sparse inode chunk alignment superblock field")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_zero_extent does some really odd gymnstics to calculate the block
layer sectors numbers passed to blkdev_issue_zeroout. This is because it
used to call sb_issue_zeroout and the calculations in that helper got
open coded here in the rather misleadingly named commit 3dc29161070a
("dax: use sb_issue_zerout instead of calling dax_clear_sectors").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There are two invocations of xfs_alloc_log_agf in xfs_alloc_put_freelist.
The AGF does not change between the two calls. Although this does not pose
any practical problems, it seems like a small mistake. Therefore, fix it
by removing the first xfs_alloc_log_agf invocation.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The DLM_LKF_QUECVT flag needs to be set for "upward" lock conversions to
ensure fairness, but setting it for "downward" lock conversions will
lead to a failure. The flag is currently set based on the GLF_BLOCKING
flag and it's not immediately obvious why this is correct. Simplify
things by figuring out if a lock conversion is "upward" by looking at
the before and after locking modes instead of relying on the
GLF_BLOCKING flag.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is
never initialized, but that isn't obvious; if it did initialize gh and
then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to
release it. To clarify the code, change gfs2_evict_inode() to always
check if gh needs to be released, no matter what evict_should_delete()
returns.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_inode_refresh() is only used in fs/gfs2/glops.c.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Use get_random_u32() instead of get_random_bytes() to remove the last
remaining call to get_random_bytes().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Randomize the delay of GLF_VERIFY_DELETE work. This avoids thundering
herd problems when multiple nodes schedule that kind of work in response
to an inode being unlinked remotely.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In the unlikely case that we're trying to queue GLF_TRY_TO_EVICT work
for an inode that already has GLF_VERIFY_DELETE work queued, we want to
make sure that the GLF_TRY_TO_EVICT work gets scheduled immediately
instead of waiting for the delayed work timer to expire.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Try to be a bit more clear and remove some duplications. We cannot
actually get rid of the verification step eventually, so remove the
comment saying so.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function delete_work_func() was previously assuming that the
GLF_TRY_TO_EVICT and GLF_VERIFY_DELETE flags won't both be set at the
same time, but there probably are races in which that can happen, so
handle that case correctly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move those definitions into the the scope in which they are used.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In case an iopen glock cannot be upgraded, function
gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode()
whether deleting the inode should be deferred or skipped altogether.
Change the function to return the appropriate enum evict_behavior value
to indicate that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.
In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move function needs_demote() to glock.h and rename it to
glock_needs_demote(). In handle_callback(), wake up the glock when
setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE
flag already triggered a wake-up.)
With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock()
to wake up when either of those flags is set for the inode glock: the
faster we can react to contention, the better.
The GLF_PENDING_DEMOTE flag is only used for inode glocks (see
gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in
gfs2_drop_inode(). Still, using glock_needs_demote() there as well
makes the code a little easier to read.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Convert direct io requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert writeback requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert retrieve requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert ioctl requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert non-writeback write requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert read requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert readdir requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert readlink requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert cuse requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Until all requests have been converted to use folios instead of pages,
virtio will need to support both types. Once all requests have been
converted, then virtio will support just folios.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This adds support in struct fuse_args_pages and fuse_copy_pages() for
using folios instead of pages for transferring data. Both folios and
pages must be supported right now in struct fuse_args_pages and
fuse_copy_pages() until all request types have been converted to use
folios. Once all have been converted, then
struct fuse_args_pages and fuse_copy_pages() will only support folios.
Right now in fuse, all folios are one page (large folios are not yet
supported). As such, copying folio->page is sufficient for copying
the entire folio in fuse_copy_pages().
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If Client send simultaneous SMB operations to ksmbd, It exhausts too much
memory through the "ksmbd_work_cache”. It will cause OOM issue.
ksmbd has a credit mechanism but it can't handle this problem. This patch
add the check if it exceeds max credits to prevent this problem by assuming
that one smb request consumes at least one credit.
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd_user_session_put should be called under smb3_preauth_hash_rsp().
It will avoid freeing session before calling smb3_preauth_hash_rsp().
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is a race condition between ksmbd_smb2_session_create and
ksmbd_expire_session. This patch add missing sessions_table_lock
while adding/deleting session from global session table.
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Set FMODE_CAN_ATOMIC_WRITE flag if we can atomic write for that inode.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> #On ppc64
|
|
Validate that an atomic write adheres to length/offset rules. Currently
we can only write a single FS block.
For an IOCB with IOCB_ATOMIC set to get as far as xfs_file_write_iter(),
FMODE_CAN_ATOMIC_WRITE will need to be set for the file; for this,
ATOMICWRITES flags would also need to be set for the inode.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Support providing info on atomic write unit min and max for an inode.
For simplicity, currently we limit the min at the FS block size. As for
max, we limit also at FS block size, as there is no current method to
guarantee extent alignment or granularity for regular files.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.
Initially FSes (XFS) should only support writing a single FS block
atomically.
As with any atomic write, we should produce a single bio which covers the
complete write length.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
[djwong: clarify a couple of things in the docs]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The XFS code will need this.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Multi-threaded buffered reads to the same file exposed significant
inode spinlock contention in nfs_clear_invalid_mapping().
Eliminate this spinlock contention by checking flags without locking,
instead using smp_rmb and smp_load_acquire accordingly, but then take
spinlock and double-check these inode flags.
Also refactor nfs_set_cache_invalid() slightly to use
smp_store_release() to pair with nfs_clear_invalid_mapping()'s
smp_load_acquire().
While this fix is beneficial for all multi-threaded buffered reads
issued by an NFS client, this issue was identified in the context of
surprisingly low LOCALIO performance with 4K multi-threaded buffered
read IO. This fix dramatically speeds up LOCALIO performance:
before: read: IOPS=1583k, BW=6182MiB/s (6482MB/s)(121GiB/20002msec)
after: read: IOPS=3046k, BW=11.6GiB/s (12.5GB/s)(232GiB/20001msec)
Fixes: 17dfeb911339 ("NFS: Fix races in nfs_revalidate_mapping")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Fix the possibility of racing nfs_local_probe() resulting in:
list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:35!
Add nfs_uuid_init() to properly initialize all nfs_uuid_t members
(particularly its list_head).
Switch to returning bool from nfs_uuid_begin(), returns false if
nfs_uuid_t is already in-use (its list_head is on a list). Update
nfs_local_probe() to return early if the nfs_client's cl_uuid
(nfs_uuid_t) is in-use.
Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to
list_add_tail() -- rculist was used in an earlier version of the
localio code that had a lockless nfs_uuid_lookup interface.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
When asked to set both an atime and an mtime to the current system time,
ensure that the setting is atomic by calling inode_update_timestamps()
only once with the appropriate flags.
Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|