Age | Commit message (Collapse) | Author |
|
req->r_timeout is only used during mounting, so this error will
be more accurate.
URL: https://tracker.ceph.com/issues/44215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This patch re-organizes copy_file_range, trying to fix a few issues in the
error handling. Here's the summary:
- Abort copy if initial do_splice_direct() returns fewer bytes than
requested.
- Move the 'size' initialization (with i_size_read()) further down in the
code, after the initial call to do_splice_direct(). This avoids issues
with a possibly stale value if a manual copy is done.
- Move the object copy loop into a separate function. This makes it
easier to handle errors (e.g, dirtying caps and updating the MDS
metadata if only some objects have been copied before an error has
occurred).
- Added calls to ceph_oloc_destroy() to avoid leaking memory with src_oloc
and dst_oloc
- After the object copy loop, the new file size to be reported to the MDS
(if there's file size change) is now the actual file size, and not the
size after an eventual extra manual copy.
- Added a few dout() to show the number of bytes copied in the two manual
copies and in the object copy loop.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
On my machine (x86_64) this struct is 952 bytes, which gets rounded up
to 1024 by kmalloc. Move this to a dedicated slabcache, so we can
allocate them without the extra 72 bytes of overhead per.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This shrinks the struct size by 16 bytes.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Use the "page has been truncated" logic in page_mkwrite_check_truncate
instead of reimplementing it here. Other than with the existing code,
fail with -EFAULT / VM_FAULT_NOPAGE when page_offset(page) == size here
as well, as should be expected.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When a process exits, kernel closes its files. locks_remove_file()
is called to remove file locks on these files. locks_remove_file()
tries unlocking files even there is no file lock.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Since these helpers are only used by ceph.ko, move them there and
rename them with _sync_ qualifiers.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
CephFS doesn't set this bit to begin with, so there should be no need
to clear it.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Although CEPH_DEFINE_SHOW_FUNC is much older, it now duplicates
DEFINE_SHOW_ATTRIBUTE from linux/seq_file.h.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
These bits will have new meaning for directory inodes.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In future patches we'll be taking and relying on Fx caps. Add proper
refcounting for them.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When the unsafe reply to a request comes in, the request is put on the
r_unsafe_dir inode's list. In future patches, we're going to need to
wait on requests that may not have gotten an unsafe reply yet.
Change __register_request to put the entry on the dir inode's list when
the pointer is set in the request, and don't check the
CEPH_MDS_R_GOT_UNSAFE flag when unregistering it.
The only place that uses this list today is fsync codepath, and with
the coming changes, we'll want to wait on all operations whether it has
gotten an unsafe reply or not.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Clang warns:
fs/notify/fanotify/fanotify.c:28:23: warning: self-comparison always
evaluates to true [-Wtautological-compare]
return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
^
fs/notify/fanotify/fanotify.c:28:57: warning: self-comparison always
evaluates to true [-Wtautological-compare]
return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
^
2 warnings generated.
The intention was clearly to compare val[0] and val[1] in the two
different fsid structs. Fix it otherwise this function always returns
true.
Fixes: afc894c784c8 ("fanotify: Store fanotify handles differently")
Link: https://github.com/ClangBuiltLinux/linux/issues/952
Link: https://lore.kernel.org/r/20200327171030.30625-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Minor comment conflict in mac80211.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To 2.26
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When encryption is used, smb2_transform_hdr is defined on the stack and is
passed to the transport. This doesn't work with RDMA as the buffer needs to
be DMA'ed.
Fix it by using kmalloc.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a RDMA packet is received and server is extending send credits, we should
check and unblock senders immediately in IRQ context. Doing it in a worker
queue causes unnecessary delay and doesn't save much CPU on the receive path.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SMBDirect send/receive
The packet size needs to take account of SMB2 header size and possible
encryption header size. This is only done when signing is used and it is for
RDMA send/receive, not read/write.
Also remove the dead SMBD code in smb2_negotiate_r(w)size.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Improve readability and maintainability by replacing a hardcoded string
allocation and formatting by the use of the kasprintf() helper.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
ext4_fill_super doublechecks the number of groups before mounting; if
that check fails, the resulting error message prints the group count
from the ext4_sb_info sbi, which hasn't been set yet. Print the freshly
computed group count instead (which at that point has just been computed
in "blocks_count").
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Fixes: 4ec1102813798 ("ext4: Add sanity checks for the superblock before mounting the filesystem")
Link: https://lore.kernel.org/r/8b957cd1513fcc4550fe675c10bcce2175c33a49.1585431964.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If ext4_fill_super detects an invalid number of inodes per group, the
resulting error message printed the number of blocks per group, rather
than the number of inodes per group. Fix it to print the correct value.
Fixes: cd6bb35bf7f6d ("ext4: use more strict checks for inodes_per_block on mount")
Link: https://lore.kernel.org/r/8be03355983a08e5d4eed480944613454d7e2550.1585434649.git.josh@joshtriplett.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently on calling echo 3 > drop_caches on host machine, we see
FS corruption in the guest. This happens on Power machine where
blocksize < pagesize.
So as a temporary workaound don't enable dioread_nolock by default
for blocksize < pagesize until we identify the root cause.
Also emit a warning msg in case if this mount option is manually
enabled for blocksize < pagesize.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200327200744.12473-1-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the inode buffer backing a particular inode is locked,
xfs_iflush() returns -EAGAIN and xfs_inode_item_push() skips the
inode. It still returns success to xfsaild, however, which bypasses
the xfsaild backoff heuristic. Update xfs_inode_item_push() to
return locked status if the inode buffer couldn't be locked.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
A dquot flush currently blocks on the buffer lock for the underlying
dquot buffer. In turn, this causes xfsaild to block rather than
continue processing other items in the meantime. Update
xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers
are handled, and return -EAGAIN if the lock fails. Fix up any
callers that don't currently handle the error properly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Since the "no-allocation" reservations for file creations has
been removed, the resblks value should be larger than zero, so
remove unnecessary ternary conditional.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: s/judgment/ternary/]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
If the FLUSH_SYNC flag is set, nfs_initiate_pgio() will currently
wait for completion, and then return the status of the I/O operation.
What we actually want to report in nfs_pageio_doio() is whether or
not the RPC call was launched successfully, whereas actual I/O
status is intended handled in the reply callbacks.
Since FLUSH_SYNC is never set by any of the callers anyway, let's
just remove that code altogether.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT.
PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this
problem. The replacement was done conditionaly at compile time, but
Christoph requested to do an unconditional conversion.
Jan suggested to move the spinlock into a existing padding hole which
avoids a size increase of struct buffer_head on production kernels.
As a benefit the lock gains lockdep coverage.
[ bigeasy: Remove the wrapper and use always spinlock_t and move it into
the padding hole ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de
|
|
Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Remove the requirement that the server always sends whole file
layouts.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When starting to read or write with a layout segment, check that the
range matches our request.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix up pnfs_layout_mark_request_commit() to alway reschedule the write
if the layout segment is invalid. Also minor cleanup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Remove the unused bucket array in struct pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code,
and add support for commit arrays.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix the O_DIRECT code to avoid retries if the COMMIT fails with a fatal
error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add a pNFS callback to allow the O_DIRECT code to release the DS
commitinfo when freeing the dreq.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_commit_pagelist().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_recover_commit_reqs().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_scan_commit_lists()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Function gfs2_recover_func grabs the sd_log_flush_lock rw_semaphore in
write mode. This is unnecessary because we only need to prevent log flush
from using sd_log_bio bio while it does. Therefore, a read lock will be
enough. This is a small step in cleaning up log flush.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, if the ail1 flush got stuck for some reason, there
were no clues as to why. This patch introduces a check for getting
stuck for more than a minute, and if it happens, it dumps the items
still remaining on the ail1 list.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
In function try_rgrp_unlink, we added a temporary lock of the
sd_log_flush_lock while searching the bitmaps. This protected us from
problems in which dinodes being freed were still in a state of flux
because the rgrp was in an active transaction. It was a kludge.
Now that we've straightened out the code for inode eviction, deletes,
and all the recovery mess, we no longer need this kludge.
This patch removes it, and should improve performance.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|