Age | Commit message (Collapse) | Author |
|
I am generally against the "one big header file" approach, and
everything in the client includes this file. Let's move all the NFS v3
declarations into a v3-only header file.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The goal is to create a generic NFS module with code that does not
depend on what versions of NFS are enabled.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This code has been around for a while, but never was enabled, although
it is in a working shape.
Note that we implement NOTIFY_DEVICEID4_CHANGE identical to
NOTIFY_DEVICEID4_DELETE. Given that in either case we can't do anything
but preventing further lookups of a given device ID there isn't much difference
in semantics for the two. For the delete case the server MUST ensure that
there are no outstanding layouts, while for the change case it doesn't, but
that has little relevance to the client.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well
as the management of complex devices. The reason for that is we might have
multiple outstanding complex devices after a NOTIFY_DEVICEID4_CHANGE, which
device mapper or md can't handle as they claim devices exclusively.
But as is turns out simple striping / concatenation is fairly trivial to
implement anyway, so we make our life simpler by reducing the reliance
on blkmapd. For now we still use blkmapd by feeding it synthetic SIMPLE
device XDR to translate device signatures to device numbers, but in the
long runs I have plans to eliminate it entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Create a file to house all the rpc_pipefs boilerplate code instead of
sprinkling it over a few files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Factor out a helper for all per-extent work, and merge the now trivial
functions for lseg allocation and parsing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This isn't device(id) related, so move it into the main file. Simple move
for now, the next commit will clean it up a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Instead of overflowing the XDR send buffer with our extent list allocate
pages and pre-encode the layoutupdate payload into them. We optimistically
allocate a single page use alloc_page and only switch to vmalloc when we
have more extents outstanding. Currently there is only a single testcase
(xfstests generic/113) which can reproduce large enough extent lists for
this to occur.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The current GETDEVICELIST implementation is buggy in that it doesn't handle
cursors correctly, and in that it returns an error if the server returns
NFSERR_NOTSUPP. Given that there is no actual need for GETDEVICELIST,
it has various issues and might get removed for NFSv4.2 stop using it in
the blocklayout driver, and thus the Linux NFS client as whole.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The kbuild test robot complained about a new sparse warning in
objio_alloc_deviceid_node, but it turns out that this was just a moved
reference to an existing variable. Fix it to have the right big endian
annotated type.
Note that there are some other endianess issues in this file that I didn't
bother to sort out as they involve global headers.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The kbuild test robot complained that we got the printk format wrong.
Let's just kill these printks instead of fixing them as there is not
point after the initial tree algorithm debugging.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
commit 4fa2c54b5198d09607a534e2fd436581064587ed
NFS: nfs4_do_open should add negative results to the dcache.
used "d_drop(); d_add();" to ensure that a dentry was hashed
as a negative cached entry.
This is not safe if the dentry has an non-NULL ->d_inode.
It will trigger a BUG_ON in d_instantiate().
In that case, d_delete() is needed.
Also, only d_add if the dentry is currently unhashed, it seems
pointless removed and re-adding it unchanged.
Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 4fa2c54b5198d09607a534e2fd436581064587ed
Cc: Jeff Layton <jeff.layton@primarydata.com>
Link: http://lkml.kernel.org/r/20140908144525.GB19811@infradead.org
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If inline->extent conversion fails (most probably due to ENOSPC) and
we release the temporary page that we allocated to transfer the file
contents, don't keep using the page pointer after releasing the page.
This occasionally leads to complaints about evicting locked pages or
hangs when blocksize > pagesize, because it's possible for the page to
get reallocated elsewhere in the meantime.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Tao Ma <tm@tao.ma>
|
|
If the external journal device has metadata_csum enabled, verify
that the superblock checksum matches the block before we try to
mount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Clear all three journal checksum feature flags before turning on
whichever journal checksum options we want. Rearrange the error
checking so that newer flags get complained about first.
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
A bit of churn on the for-linus side that would be nice to have
in the core bits for 3.18, so pull it in to catch us up and make
forward progress easier.
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
block/scsi_ioctl.c
|
|
Currently sysfs feature files uses ext4_attr_ops as the file operations
to show/store data. However the feature files is not supposed to contain
any data at all, the sole existence of the file means that the module
support the feature. Moreover, none of the sysfs feature attributes
actually register show/store functions so that would not be a problem.
However if a sysfs feature attribute register a show or store function
we might be in trouble because the kobject in this case is _not_ embedded
in the ext4_sb_info structure as ext4_attr_show/store expect.
So just to be safe, provide separate empty sysfs_ops to use in
ext4_feat_ktype. This might safe us from potential problems in the
future. As a bonus we can "store" something more descriptive than
nothing in the files, so let it contain "enabled" to make it clear that
the feature is really present in the module.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently there is no easy way to tell that the mounted file system
contains errors other than checking for log messages, or reading the
information directly from superblock.
This patch adds new sysfs entries:
errors_count (number of fs errors we encounter)
first_error_time (unix timestamp for the first error we see)
last_error_time (unix timestamp for the last error we see)
If the file system is not marked as containing errors then any of the
file will return 0. Otherwise it will contain valid information. More
details about the errors should as always be found in the logs.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types ext4 supports. Although
ext4 will support project quotas, use ext4 private definition for
consistency with other filesystems.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This fixes a failure in xfstests generic/313 because nfs doesn't update
mtime on a truncate. The protocol requires this to be done implicity
for a size changing setattr.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types gfs2 supports and with
addition of project quotas these two numbers stop matching. So make gfs2
use its private definition.
CC: cluster-devel@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Fix a regression introduced by:
6d4ade986f9c8df31e68 GFS2: Add atomic_open support
where an early return misses d_splice_alias() which had been
adding the negative dentry.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch is to remove lengthy name by adding a new variable.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
fsnotify/fdinfo: use named constants instead of hardcoded values
kcmp: fix standard comparison bug
mm/mmap.c: use pr_emerg when printing BUG related information
shm: add memfd.h to UAPI export list
checkpatch: allow commit descriptions on separate line from commit id
sh: get_user_pages_fast() must flush cache
eventpoll: fix uninitialized variable in epoll_ctl
kernel/printk/printk.c: fix faulty logic in the case of recursive printk
mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
|
|
Currently we handle only ENOSPC. In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
produces unexpected messages in the audit log, such as (on a system
running SELinux):
type=AVC msg=audit(1408212798.866:410): avc: denied
{ block_suspend } for pid=7754 comm="dbus-daemon" capability=36
scontext=unconfined_u:unconfined_r:unconfined_t
tcontext=unconfined_u:unconfined_r:unconfined_t
tclass=capability2 permissive=1
type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
exe="/usr/bin/dbus-daemon"
subj=unconfined_u:unconfined_r:unconfined_t key=(null)
("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")
Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.
Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
"Fixes for UDF handling of NFS handles and one fix for proper handling
of corrupted media"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: saner calling conventions for udf_new_inode()
udf: fix the udf_iget() vs. udf_new_inode() races
udf: merge the pieces inserting a new non-directory object into directory
udf: Set i_generation field
udf: Properly detect stale inodes
udf: Make udf_read_inode() and udf_iget() return error
udf: Avoid infinite loop when processing indirect ICBs
udf: Fold udf_fill_inode() into __udf_read_inode()
udf: Avoid dir link count to go negative
|
|
To make sparse happy...
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
sparse says:
fs/nfs/file.c:543:60: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:543:60: expected struct rpc_xprt *xprt
fs/nfs/file.c:543:60: got struct rpc_xprt [noderef] <asn:4>*cl_xprt
fs/nfs/file.c:548:53: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:548:53: expected struct rpc_xprt *xprt
fs/nfs/file.c:548:53: got struct rpc_xprt [noderef] <asn:4>*cl_xprt
cl_xprt is RCU-managed, so we need to take care to dereference and use
it while holding the RCU read lock.
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The VFS never calls setattr with ATTR_SIZE on anything but regular
files. Remove the if check and turn it into an assert similar to
what some other file systems do.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This will be used by the block layout driver when splitting extents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
At a simple helper to issue a GETDEVICELIST operation and pre-load
the device id cache based on the result.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Add support to the common pNFS core to issue GETDEVICEINFO calls on
a device ID cache miss. The code is taken from the well debugged
file layout implementation and calls out to the layoutdriver through
a new alloc_deviceid_node method. The calling conventions for
nfs4_find_get_deviceid are changed so that all information needed to
send a GETDEVICEINFO request is passed to the common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This speads up truncate-heavy workloads like fsx by multiple orders of
magnitude.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This allows removing extents from the extent tree especially on truncate
operations, and thus fixing reads from truncated and re-extended that
previously returned stale data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently the block layout driver tracks extents in three separate
data structures:
- the two list of pnfs_block_extent structures returned by the server
- the list of sectors that were in invalid state but have been written to
- a list of pnfs_block_short_extent structures for LAYOUTCOMMIT
All of these share the property that they are not only highly inefficient
data structures, but also that operations on them are even more inefficient
than nessecary.
In addition there are various implementation defects like:
- using an int to track sectors, causing corruption for large offsets
- incorrect normalization of page or block granularity ranges
- insufficient error handling
- incorrect synchronization as extents can be modified while they are in
use
This patch replace all three data with a single unified rbtree structure
tracking all extents, as well as their in-memory state, although we still
need to instance for read-only and read-write extent due to the arcane
client side COW feature in the block layouts spec.
To fix the problem of extent possibly being modified while in use we make
sure to return a copy of the extent for use in the write path - the
extent can only be invalidated by a layout recall or return which has
to wait until the I/O operations finished due to refcounts on the layout
segment.
The new extent tree work similar to the schemes used by block based
filesystems like XFS or ext4.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The core nfs code handles setting pages uptodate on reads, no need to mess
with the pageflags outselves. Also remove a debug function to dump page
flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Use the new PNFS_READ_WHOLE_PAGE flag to offload read-modify-write
handling to core nfs code, and remove a huge chunk of deadlock prone
mess from the block layout writeback path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If a layout driver keeps per-inode state outside of the layout segments it
needs to be notified of any layout returns or recalls on an inode, and not
just about the freeing of layout segments. Add a method to acomplish this,
which will allow the block layout driver to handle the case of truncated
and re-expanded files properly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Like all block based filesystems, the pNFS block layout driver can't read
or write at a byte granularity and thus has to perform read-modify-write
cycles on writes smaller than this granularity.
Add a flag so that the core NFS code always reads a whole page when
starting a smaller write, so that we can do it in the place where the VFS
expects it instead of doing in very deadlock prone way in the writeback
handler.
Note that in theory we could do less than page size reads here for disks
that have a smaller sector size which are served by a server with a smaller
pnfs block size. But so far that doesn't seem like a worthwhile
optimization.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Expedite layout recall processing by forcing a layout commit when
we see busy segments. Without it the layout recall might have to wait
until the VM decided to start writeback for the file, which can introduce
long delays.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
gcc reports:
linux/fs/nfs/write.c: In function ‘nfs_page_find_head_request_locked.isra.17’:
linux/fs/nfs/write.c:121:64: warning: ‘cinfo.mds’ may be used uninitialized in this function [-Wmaybe-uninitialized]
list_for_each_entry_safe(freq, t, &cinfo.mds->list, wb_list) {
^
linux/fs/nfs/write.c:110:25: note: ‘cinfo.mds’ was declared here
struct nfs_commit_info cinfo;
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When we do non-page sized reads we can underflow the extent_length variable
and read incorrect data. Fix the extent_length calculation and change to
defensive <= checks for the extent length in the read and write path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Make sure the block queue is plugged when performing pNFS blocklayout I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Tell userspace what stage of GETDEVICEINFO failed so that there is a chance
to debug it, especially with the userspace daemon clusterf***k in the block
layout driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The Linux VM subsystem can't support block sizes larger than page size
for block based filesystems very well. While this can be hacked around
to some extent for simple filesystems the read-modify-write cycles
required for pnfs block invalid extents are extremly deadlock prone
when operating on multiple pages. Reject this case early on instead
of pretending to support it (badly).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently there is no XDR buffer space allocated for the per-layout driver
layoutcommit payload, which leads to server buffer overflows in the
blocklayout driver even under simple workloads. As we can't do per-layout
sizes for XDR operations we'll have to splice a previously encoded list
of pages into the XDR stream, similar to how we handle ACL buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|