Age | Commit message (Collapse) | Author |
|
In future patches we'll be taking and relying on Fx caps. Add proper
refcounting for them.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When the unsafe reply to a request comes in, the request is put on the
r_unsafe_dir inode's list. In future patches, we're going to need to
wait on requests that may not have gotten an unsafe reply yet.
Change __register_request to put the entry on the dir inode's list when
the pointer is set in the request, and don't check the
CEPH_MDS_R_GOT_UNSAFE flag when unregistering it.
The only place that uses this list today is fsync codepath, and with
the coming changes, we'll want to wait on all operations whether it has
gotten an unsafe reply or not.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Clang warns:
fs/notify/fanotify/fanotify.c:28:23: warning: self-comparison always
evaluates to true [-Wtautological-compare]
return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
^
fs/notify/fanotify/fanotify.c:28:57: warning: self-comparison always
evaluates to true [-Wtautological-compare]
return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
^
2 warnings generated.
The intention was clearly to compare val[0] and val[1] in the two
different fsid structs. Fix it otherwise this function always returns
true.
Fixes: afc894c784c8 ("fanotify: Store fanotify handles differently")
Link: https://github.com/ClangBuiltLinux/linux/issues/952
Link: https://lore.kernel.org/r/20200327171030.30625-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Minor comment conflict in mac80211.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To 2.26
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When encryption is used, smb2_transform_hdr is defined on the stack and is
passed to the transport. This doesn't work with RDMA as the buffer needs to
be DMA'ed.
Fix it by using kmalloc.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a RDMA packet is received and server is extending send credits, we should
check and unblock senders immediately in IRQ context. Doing it in a worker
queue causes unnecessary delay and doesn't save much CPU on the receive path.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SMBDirect send/receive
The packet size needs to take account of SMB2 header size and possible
encryption header size. This is only done when signing is used and it is for
RDMA send/receive, not read/write.
Also remove the dead SMBD code in smb2_negotiate_r(w)size.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Improve readability and maintainability by replacing a hardcoded string
allocation and formatting by the use of the kasprintf() helper.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
ext4_fill_super doublechecks the number of groups before mounting; if
that check fails, the resulting error message prints the group count
from the ext4_sb_info sbi, which hasn't been set yet. Print the freshly
computed group count instead (which at that point has just been computed
in "blocks_count").
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Fixes: 4ec1102813798 ("ext4: Add sanity checks for the superblock before mounting the filesystem")
Link: https://lore.kernel.org/r/8b957cd1513fcc4550fe675c10bcce2175c33a49.1585431964.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If ext4_fill_super detects an invalid number of inodes per group, the
resulting error message printed the number of blocks per group, rather
than the number of inodes per group. Fix it to print the correct value.
Fixes: cd6bb35bf7f6d ("ext4: use more strict checks for inodes_per_block on mount")
Link: https://lore.kernel.org/r/8be03355983a08e5d4eed480944613454d7e2550.1585434649.git.josh@joshtriplett.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently on calling echo 3 > drop_caches on host machine, we see
FS corruption in the guest. This happens on Power machine where
blocksize < pagesize.
So as a temporary workaound don't enable dioread_nolock by default
for blocksize < pagesize until we identify the root cause.
Also emit a warning msg in case if this mount option is manually
enabled for blocksize < pagesize.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200327200744.12473-1-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the inode buffer backing a particular inode is locked,
xfs_iflush() returns -EAGAIN and xfs_inode_item_push() skips the
inode. It still returns success to xfsaild, however, which bypasses
the xfsaild backoff heuristic. Update xfs_inode_item_push() to
return locked status if the inode buffer couldn't be locked.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
A dquot flush currently blocks on the buffer lock for the underlying
dquot buffer. In turn, this causes xfsaild to block rather than
continue processing other items in the meantime. Update
xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers
are handled, and return -EAGAIN if the lock fails. Fix up any
callers that don't currently handle the error properly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Since the "no-allocation" reservations for file creations has
been removed, the resblks value should be larger than zero, so
remove unnecessary ternary conditional.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: s/judgment/ternary/]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
If the FLUSH_SYNC flag is set, nfs_initiate_pgio() will currently
wait for completion, and then return the status of the I/O operation.
What we actually want to report in nfs_pageio_doio() is whether or
not the RPC call was launched successfully, whereas actual I/O
status is intended handled in the reply callbacks.
Since FLUSH_SYNC is never set by any of the callers anyway, let's
just remove that code altogether.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT.
PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this
problem. The replacement was done conditionaly at compile time, but
Christoph requested to do an unconditional conversion.
Jan suggested to move the spinlock into a existing padding hole which
avoids a size increase of struct buffer_head on production kernels.
As a benefit the lock gains lockdep coverage.
[ bigeasy: Remove the wrapper and use always spinlock_t and move it into
the padding hole ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de
|
|
Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Remove the requirement that the server always sends whole file
layouts.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When starting to read or write with a layout segment, check that the
range matches our request.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix up pnfs_layout_mark_request_commit() to alway reschedule the write
if the layout segment is invalid. Also minor cleanup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Remove the unused bucket array in struct pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code,
and add support for commit arrays.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix the O_DIRECT code to avoid retries if the COMMIT fails with a fatal
error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add a pNFS callback to allow the O_DIRECT code to release the DS
commitinfo when freeing the dreq.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_commit_pagelist().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_recover_commit_reqs().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_scan_commit_lists()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Function gfs2_recover_func grabs the sd_log_flush_lock rw_semaphore in
write mode. This is unnecessary because we only need to prevent log flush
from using sd_log_bio bio while it does. Therefore, a read lock will be
enough. This is a small step in cleaning up log flush.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, if the ail1 flush got stuck for some reason, there
were no clues as to why. This patch introduces a check for getting
stuck for more than a minute, and if it happens, it dumps the items
still remaining on the ail1 list.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
In function try_rgrp_unlink, we added a temporary lock of the
sd_log_flush_lock while searching the bitmaps. This protected us from
problems in which dinodes being freed were still in a state of flux
because the rgrp was in an active transaction. It was a kludge.
Now that we've straightened out the code for inode eviction, deletes,
and all the recovery mess, we no longer need this kludge.
This patch removes it, and should improve performance.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
We now get the quota data structure when opening a file writable and put it
when closing that writable file descriptor, so there no longer is a need for
gfs2_qa_{get,put} while we're holding a writable file descriptor.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Keeping reservations and quotas separate helps reviewing the code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.
This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Replace open-coded versions of list_first_entry and list_last_entry with those
functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
When allocating a new inode, mark the iopen glock holder as uninitialized to
make sure gfs2_evict_inode won't fail after an incomplete create or lookup. In
gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate
iopen glock teardown code. In gfs2_inode_lookup, don't tear down things that
gfs2_evict_inode will already tear down.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
It clarifies the code slightly to use SMB2_SIGNATURE_SIZE
define rather than 16.
Suggested-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
So far, with xino=auto, we only enable xino if we know that all
underlying filesystem use 32bit inode numbers.
When users configure overlay with xino=auto, they already declare that
they are ready to handle 64bit inode number from overlay.
It is a very common case, that underlying filesystem uses 64bit ino,
but rarely or never uses the high inode number bits (e.g. tmpfs, xfs).
Leaving it for the users to declare high ino bits are unused with
xino=on is not a recipe for many users to enjoy the benefits of xino.
There appears to be very little reason not to enable xino when users
declare xino=auto even if we do not know how many bits underlying
filesystem uses for inode numbers.
In the worst case of xino bits overflow by real inode number, we
already fall back to the non-xino behavior - real inode number with
unique pseudo dev or to non persistent inode number and overlay st_dev
(for directories).
The only annoyance from auto enabling xino is that xino bits overflow
emits a warning to kmsg. Suppress those warnings unless users explicitly
asked for xino=on, suggesting that they expected high ino bits to be
unused by underlying filesystem.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When xino feature is enabled and a real directory inode number overflows
the lower xino bits, we cannot map this directory inode number to a unique
and persistent inode number and we fall back to the real inode st_ino and
overlay st_dev.
The real inode st_ino with high bits may collide with a lower inode number
on overlay st_dev that was mapped using xino.
To avoid possible collision with legitimate xino values, map a non
persistent inode number to a dedicated range in the xino address space.
The dedicated range is created by adding one more bit to the number of
reserved high xino bits. We could have added just one more fsid, but that
would have had the undesired effect of changing persistent overlay inode
numbers on kernel or require more complex xino mapping code.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There is no reason to deplete the system's global get_next_ino() pool for
overlay non-persistent inode numbers and there is no reason at all to
allocate non-persistent inode numbers for non-directories.
For non-directories, it is much better to leave i_ino the same as real
i_ino, to be consistent with st_ino/d_ino.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Changes to underlying layers should not cause WARN_ON(), but this repro
does:
mkdir w l u mnt
sudo mount -t overlay -o workdir=w,lowerdir=l,upperdir=u overlay mnt
touch mnt/h
ln u/h u/k
rm -rf mnt/k
rm -rf mnt/h
dmesg
------------[ cut here ]------------
WARNING: CPU: 1 PID: 116244 at fs/inode.c:302 drop_nlink+0x28/0x40
After upper hardlinks were added while overlay is mounted, unlinking all
overlay hardlinks drops overlay nlink to zero before all upper inodes
are unlinked.
After unlink/rename prevent i_nlink from going to zero if there are still
hashed aliases (i.e. cached hard links to the victim) remaining.
Reported-by: Phasip <phasip@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|