Age | Commit message (Collapse) | Author |
|
Pull vfs fixlet from Al Viro:
"Fix bogus default y in Kconfig (VALIDATE_FS_PARSER)
That thing should not be turned on by default, especially since it's
not quiet in case it finds no problems. Geert has sent the obvious fix
quite a few times, but it fell through the cracks"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: VALIDATE_FS_PARSER should default to n
|
|
Pull nfsd fixes from Bruce Fields:
"Two more quick bugfixes for nfsd: fixing a regression causing mount
failures on high-memory machines and fixing the DRC over RDMA"
* tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix overflow causing non-working mounts on 1 TB machines
svcrdma: Ignore source port when computing DRC hash
|
|
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The xattr scrubber functions use the temporary memory buffer either for
storing bitmaps or for testing if attribute value extraction works. The
bitmap code always zeroes what it needs and the value extraction sets
the buffer contents, so it's not necessary to waste CPU time zeroing on
allocation.
Note that while we never read the contents that the attr value
extraction function sets, we do need to call it to check the remote
attribute header and CRCs to check for corruption.
A flame graph analysis showed that we were spending 7% of a xfs_scrub
run (the whole program, not just the attr scrubber itself) allocating
and zeroing 64k segments needlessly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
In examining a flame graph of time spent running xfs_scrub on various
filesystems, I noticed that we spent nearly 7% of the total runtime on
allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation.
We do this even if none of the attribute values were anywhere near 64k
in size, even if there were no attribute blocks to check space on, and
even if it just turns out there are no attributes at all.
Therefore, rearrange the xattr buffer setup code to support reallocating
with a bigger buffer and redistribute the callers of that function so
that we only allocate memory just prior to needing it, and only allocate
as much as we need. If we can't get memory with the ILOCK held we'll
bail out with EDEADLOCK which will allocate the maximum memory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Move the code that allocates memory buffers for the extended attribute
scrub code into a separate function so we can reduce memory allocations
in the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Replace the open-coded attribute buffer pointer calculations with helper
functions to make it more obvious what we're doing with our freeform
memory allocation w.r.t. either storing xattr values or computing btree
block free space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
When we're iterating all the attributes using the built-in xattr
iterator, we can use the seen_enough variable to pass error codes back
to the main scrub function instead of flattening them into 0/1. This
will be used in a more exciting fashion in upcoming patches.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Currently if the allocation of roots or tmp_ulist fails the error handling
does not free up the allocation of path causing a memory leak. Fix this and
other similar leaks by moving the call of btrfs_free_path from label out
to label out_free_ulist.
Kudos to David Sterba for spotting the issue in my original fix and suggesting
the correct way to fix the leak and Anand Jain for spotting a double free
issue.
Addresses-Coverity: ("Resource leak")
Fixes: 5911c8fe05c5 ("btrfs: fiemap: preallocate ulists for btrfs_check_shared")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser
tables are vaguely sane. It was set to default to 'Y' for the moment to
catch errors in upcoming fs conversion development.
Make sure it is not enabled by default in the final release of v5.1.
Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Merge more fixes from Andrew Morton:
"5 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
swap_readpage(): avoid blk_wake_io_task() if !synchronous
devres: allow const resource arguments
mm/vmscan.c: prevent useless kswapd loops
fs/userfaultfd.c: disable irqs for fault_pending and event locks
mm/page_alloc.c: fix regression with deferred struct page init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fix from Dan Williams:
"A single dax fix that has been soaking awaiting other fixes under
discussion to join it. As it is getting late in the cycle lets proceed
with this fix and save follow-on changes for post-v5.3-rc1.
- Fix xarray entry association for mixed mappings"
* tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix xarray entry association for mixed mappings
|
|
Pull do_move_mount() fix from Al Viro:
"Regression fix"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: move_mount: reject moving kernel internal mounts
|
|
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.
This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which in turn can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock.
But elsewhere the fault_pending_wqh and event_wqh locks are taken with
IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep
reports that a deadlock is possible.
Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.
Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
waitqueue lock") didn't fix this because it only accounted for the
fd_wqh lock, not the other locks nested inside it.
Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No point having two call sites (earlier in init_rootfs() from
mnt_init() in case we are going to use shmem-style rootfs,
later from do_basic_setup() unconditionally), along with the
logics in shmem_init() itself to make the second call a no-op...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
init_mount_tree() can get to rootfs_fs_type directly and that simplifies
a lot of things. We don't need to register it, we don't need to look
it up *and* we don't need to bother with preventing subsequent userland
mounts. That's the way we should've done that from the very beginning.
There is a user-visible change, namely the disappearance of "rootfs"
from /proc/filesystems. Note that it's been unmountable all along
and it didn't show up in /proc/mounts; however, it *is* a user-visible
change and theoretically some script might've been using its presence
in /proc/filesystems to tell 2.4.11+ from earlier kernels.
*IF* any complaints about behaviour change do show up, we could fake
it in /proc/filesystems. I very much doubt we'll have to, though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
the only thing done by the latter is making ramfs visible
to mount(2); we don't need it there - rootfs is separate
and, in fact, made visible to mount(2) in the same init_rootfs().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the openpromfs filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the efivarfs filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
[AV: get rid of efivarfs_sb nonsense - it has never been used]
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Garrett <matthew.garrett@nebula.com>
cc: Jeremy Kerr <jk@ozlabs.org>
cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
cc: linux-efi@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the configfs filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the binfmt_misc filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
counterpart of mount_single(); switch fusectl to it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
counterpart of mount_nodev(). Switch hugetlb and pseudo to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
make unhash_mnt() return the mountpoint to be dropped, let callers
deal with it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is just two functions, put it in root-tree.c since it involves root
items.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have code for data and metadata reservations for delalloc. There's
quite a bit of code here, and it's used in a lot of places so I've
separated it out to it's own file. inode.c and file.c are already
pretty large, and this code is complicated enough to live in its own
space.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Move this into transaction.c with the rest of the transaction related
code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
These belong with the delayed refs related code, not in extent-tree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Simplification. No point passing the tree variable when it can be
evaluated from inode. The tests now use the io_tree from btrfs_inode as
opposed to creating one.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Remove the unused flags argument of gfs2_iomap_alloc.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Create a new bulk ireq flag that enables userspace to ask us for a
special inode number instead of interpreting @ino as a literal inode
number. This enables us to query the root inode easily.
The reason for adding the ability to query specifically the root
directory inode is that certain programs (xfsdump and xfsrestore) want
to confirm when they've been pointed to the root directory. The
userspace code assumes the root directory is always the first result
from calling bulkstat with lastino == 0, but this isn't true if the
(initial btree roots + initial AGFL + inode alignment padding) is itself
long enough to be allocated to new inodes if all of those blocks should
happen to be free at the same time. Rather than make userspace guess
at internal filesystem state, we provide a direct query.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Add a new xfs_bulk_ireq flag to constrain the iteration to a single AG.
If the passed-in startino value is zero then we start with the first
inode in the AG that the user passes in; otherwise, we iterate only
within the same AG as the passed-in inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Stephen writes:
After merging the driver-core tree, today's linux-next build (x86_64
allmodconfig) produced this warning:
fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init':
fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label]
out:
^~~
fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init':
fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable]
struct dentry *ret;
^~~
Fix this up and change the return type of the function to void as it can
not fail, which cleans up some more code and variables as well.
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f095adba36bb ("orangefs: no need to check return value of debugfs_create functions")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Stephen writes:
After merging the driver-core tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:
fs/ubifs/debug.c: In function 'dbg_debugfs_init_fs':
fs/ubifs/debug.c:2812:6: warning: unused variable 'err' [-Wunused-variable]
int err, n;
^~~
So fix this up properly.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Richard Weinberger <richard@nod.at>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Wire up the v5 INUMBERS ioctl.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Wire up the new v5 BULKSTAT ioctl.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Introduce a new "v5" inode group structure that fixes the alignment
and padding problems of the existing structure.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Introduce a new version of the in-core bulkstat structure that supports
our new v5 format features. This structure also fills the gaps in the
previous structure. We leave wiring up the ioctls for the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Rename the bulkstat functions to 'fsbulkstat' so that they match the
ioctl names. We will be introducing a new set of bulkstat/inumbers
ioctls soon, and it will be important to keep the names straight.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Remove xfs_bstat_t, xfs_fsop_bulkreq_t, xfs_inogrp_t, and similarly
named compat typedefs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Decode the implementation ID and display in nfsd/clients/#/info. It may
be help identify the client. It won't be used otherwise.
(When this went into the protocol, I thought the implementation ID would
be a slippery slope towards implementation-specific workarounds as with
the http user-agent. But I guess I was wrong, the risk seems pretty low
now.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Move some repeated code to a common helper. No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
NFSv4 clients are automatically expired and all their locks removed if
they don't contact the server for a certain amount of time (the lease
period, 90 seconds by default).
There can still be situations where that's not enough, so allow
userspace to force expiry by writing "expire\n" to the new
nfsd/client/#/ctl file.
(The generic "ctl" name is because I expect we may want to allow other
operations on clients in the future.)
The write will not return until the client is expired and all of its
locks and other state removed.
The fault injection code also provides a way of expiring clients, but it
fails if there are any in-progress RPC's referencing the client. Also,
its method of selecting a client to expire is a little more
primitive--it uses an IP address, which can't always uniquely specify an
NFSv4 client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Factor our some common code. No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
These are also minimal for now, I'm not sure what information would be
useful.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|