summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2018-04-10NFS: More fine grained attribute trackingTrond Myklebust
Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by a call to nfs_post_op_update_inode_locked(), then it will not be cleared until all the attributes have been revalidated. This means, for instance, that NFSv4 writes will always force a full attribute revalidation. Track the ctime, mtime, size and change attribute separately from the other attributes so that we can have nfs_post_op_update_inode_locked() set them correctly, and later have the cache consistency bitmask be able to clear them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Don't force unnecessary cache invalidation in nfs_update_inode()Trond Myklebust
If we managed to revalidate all the attributes, then there is no reason to mark them as invalid again. We do, however want to ensure that we set nfsi->attrtimeo correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Don't redirty the attribute cache in nfs_wcc_update_inode()Trond Myklebust
If we received weak cache consistency data from the server, then those attributes are up to date, and there is no reason to mark them as dirty in the attribute cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Don't force a revalidation of all attributes if change is missingTrond Myklebust
Even if the change attribute is missing, it is still OK to mark the other attributes as being up to date. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFSv4: Don't return the delegation when not needed by NFSv4.x (x>0)Trond Myklebust
Starting with NFSv4.1, the server is able to deduce the client id from the SEQUENCE op which means it can always figure out whether or not the client is holding a delegation on a file that is being changed. For that reason, RFC5661 does not require a delegation to be unconditionally recalled on operations such as SETATTR, RENAME, or REMOVE. Note that for now, we continue to return READ delegations since that is still expected by the Linux knfsd server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Remove the unused return_delegation() callbackTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move the delegation return down into _nfs4_do_setattr()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Add a delegation return into nfs4_proc_unlink_setup()Trond Myklebust
Ensure that when we do finally delete the file, then we return the delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move delegation recall into the NFSv4 callback for rename_setup()Trond Myklebust
Move the delegation recall out of the generic code, and into the NFSv4 specific callback. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move the delegation return down into nfs4_proc_remove()Trond Myklebust
Move the delegation return out of generic code and down into the NFSv4 specific unlink code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move the delegation return down into nfs4_proc_link()Trond Myklebust
Move the delegation return out of generic code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFSv4: Fix nfs4_return_incompatible_delegationTrond Myklebust
The 'fmode' argument can take an FMODE_EXEC value, which we want to filter out before comparing to the delegation type. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10nfs4: wake any lock waiters on successful RECLAIM_COMPLETEJeff Layton
If we have a RECLAIM_COMPLETE with a populated cl_lock_waitq, then that implies that a reconnect has occurred. Since we can't expect a CB_NOTIFY_LOCK callback at that point, just wake up the entire queue so that all the tasks can re-poll for their locks. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10nfs4: don't compare clientid in nfs4_wake_lock_waiterJeff Layton
The task is expected to sleep for a while here, and it's possible that a new EXCHANGE_ID has occurred in the interim, and we were assigned a new clientid. Since this is a per-client list, there isn't a lot of value in vetting the clientid on the incoming request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10nfs4: always reset notified flag to false before repolling for lockJeff Layton
We may get a notification and lose the race to another client. Ensure that we wait again for a notification in that case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-02-22NFS: make struct nlmclnt_fl_close_lock_ops staticColin Ian King
The structure nlmclnt_fl_close_lock_ops s local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-22nfs: system crashes after NFS4ERR_MOVED recoveryBill.Baker@oracle.com
nfs4_update_server unconditionally releases the nfs_client for the source server. If migration fails, this can cause the source server's nfs_client struct to be left with a low reference count, resulting in use-after-free. Also, adjust reference count handling for ELOOP. NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6 WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]() nfs_put_client+0xfa/0x110 [nfs] nfs4_run_state_manager+0x30/0x40 [nfsv4] kthread+0xd8/0xf0 BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 nfs4_xdr_enc_write+0x6b/0x160 [nfsv4] rpcauth_wrap_req+0xac/0xf0 [sunrpc] call_transmit+0x18c/0x2c0 [sunrpc] __rpc_execute+0xa6/0x490 [sunrpc] rpc_async_schedule+0x15/0x20 [sunrpc] process_one_work+0x160/0x470 worker_thread+0x112/0x540 ? rescuer_thread+0x3f0/0x3f0 kthread+0xd8/0xf0 This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"), but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops") Reported-by: Helen Chao <helen.chao@oracle.com> Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops") Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-21NFSv4: Fix broken cast in nfs4_callback_recallany()Trond Myklebust
Passing a pointer to a unsigned integer to test_bit() is broken. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-01iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}Goffredo Baroncelli
The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-31Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of misc stuff, without any unifying topic, from various people. Neil's d_anon patch, several bugfixes, introduction of kvmalloc analogue of kmemdup_user(), extending bitfield.h to deal with fixed-endians, assorted cleanups all over the place..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) alpha: osf_sys.c: use timespec64 where appropriate alpha: osf_sys.c: fix put_tv32 regression jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path dcache: delete unused d_hash_mask dcache: subtract d_hash_shift from 32 in advance fs/buffer.c: fold init_buffer() into init_page_buffers() fs: fold __inode_permission() into inode_permission() fs: add RWF_APPEND sctp: use vmemdup_user() rather than badly open-coding memdup_user() snd_ctl_elem_init_enum_names(): switch to vmemdup_user() replace_user_tlv(): switch to vmemdup_user() new primitive: vmemdup_user() memdup_user(): switch to GFP_USER eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd: convert to use anon_inode_getfd() nfs4file: get rid of pointless include of btrfs.h uvc_v4l2: clean copyin/copyout up vme_user: don't use __copy_..._user() usx2y: don't bother with memdup_user() for 16-byte structure ...
2018-01-30Merge tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix breakages in the nfsstat utility due to the inclusion of the NFSv4 LOOKUPP operation - Fix a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall() due to nfs_idmap_legacy_upcall() being called without an 'aux' parameter - Fix a refcount leak in the standard O_DIRECT error path - Fix a refcount leak in the pNFS O_DIRECT fallback to MDS path - Fix CPU latency issues with nfs_commit_release_pages() - Fix the LAYOUTUNAVAILABLE error case in the file layout type - NFS: Fix a race between mmap() and O_DIRECT Features: - Support the statx() mask and query flags to enable optimisations when the user is requesting only attributes that are already up to date in the inode cache, or is specifying the AT_STATX_DONT_SYNC flag - Add a module alias for the SCSI pNFS layout type Bugfixes: - Automounting when resolving a NFSv4 referral should preserve the RDMA transport protocol settings - Various other RDMA bugfixes from Chuck - pNFS block layout fixes - Always set NFS_LOCK_LOST when a lock is lost" * tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits) NFS: Fix a race between mmap() and O_DIRECT NFS: Remove a redundant call to unmap_mapping_range() pnfs/blocklayout: Ensure disk address in block device map pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors lockd: Fix server refcounting SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint SUNRPC: Micro-optimize __rpc_execute SUNRPC: task_run_action should display tk_callback sunrpc: Format RPC events consistently for display SUNRPC: Trace xprt_timer events xprtrdma: Correct some documenting comments xprtrdma: Fix "bytes registered" accounting xprtrdma: Instrument allocation/release of rpcrdma_req/rep objects xprtrdma: Add trace points to instrument QP and CQ access upcalls xprtrdma: Add trace points in the client-side backchannel code paths xprtrdma: Add trace points for connect events xprtrdma: Add trace points to instrument MR allocation and recovery xprtrdma: Add trace points to instrument memory invalidation xprtrdma: Add trace points in reply decoder path xprtrdma: Add trace points to instrument memory registration ..
2018-01-29Merge tag 'iversion-v4.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version rework from Jeff Layton: "This pile of patches is a rework of the inode->i_version field. We have traditionally incremented that field on every inode data or metadata change. Typically this increment needs to be logged on disk even when nothing else has changed, which is rather expensive. It turns out though that none of the consumers of that field actually require this behavior. The only real requirement for all of them is that it be different iff the inode has changed since the last time the field was checked. Given that, we can optimize away most of the i_version increments and avoid dirtying inode metadata when the only change is to the i_version and no one is querying it. Queries of the i_version field are rather rare, so we can help write performance under many common workloads. This patch series converts existing accesses of the i_version field to a new API, and then converts all of the in-kernel filesystems to use it. The last patch in the series then converts the backend implementation to a scheme that optimizes away a large portion of the metadata updates when no one is looking at it. In my own testing this series significantly helps performance with small I/O sizes. I also got this email for Christmas this year from the kernel test robot (a 244% r/w bandwidth improvement with XFS over DAX, with 4k writes): https://lkml.org/lkml/2017/12/25/8 A few of the earlier patches in this pile are also flowing to you via other trees (mm, integrity, and nfsd trees in particular)". * tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: (22 commits) fs: handle inode->i_version more efficiently btrfs: only dirty the inode in btrfs_update_time if something was changed xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing fs: only set S_VERSION when updating times if necessary IMA: switch IMA over to new i_version API xfs: convert to new i_version API ufs: use new i_version API ocfs2: convert to new i_version API nfsd: convert to new i_version API nfs: convert to new i_version API ext4: convert to new i_version API ext2: convert to new i_version API exofs: switch to new i_version API btrfs: convert to new i_version API afs: convert to new i_version API affs: convert to new i_version API fat: convert to new i_version API fs: don't take the i_lock in inode_inc_iversion fs: new API for handling inode->i_version ntfs: remove i_version handling ...
2018-01-29nfs: convert to new i_version APIJeff Layton
For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-28NFS: Fix a race between mmap() and O_DIRECTTrond Myklebust
When locking the file in order to do O_DIRECT on it, we must unmap any mmapped ranges on the pagecache so that we can flush out the dirty data. Fixes: a5864c999de67 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
2018-01-28NFS: Remove a redundant call to unmap_mapping_range()Trond Myklebust
We don't need to call unmap_mapping_range() prior to calling nfs_sync_mapping(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-25pnfs/blocklayout: Ensure disk address in block device mapBenjamin Coddington
It's possible that the device map is smaller than the offset into the device for the I/O we're adding. Add a check for it and bail out, otherwise we risk botching the bio calculations that follow. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-25pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectorsBenjamin Coddington
Fixup the field types to match their use. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-22NFS: reject request for id_legacy key without auxdataEric Biggers
nfs_idmap_legacy_upcall() is supposed to be called with 'aux' pointing to a 'struct idmap', via the call to request_key_with_auxdata() in nfs_idmap_request_key(). However it can also be reached via the request_key() system call in which case 'aux' will be NULL, causing a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall(), assuming that the key description is valid enough to get that far. Fix this by making nfs_idmap_legacy_upcall() negate the key if no auxdata is provided. As usual, this bug was found by syzkaller. A simple reproducer using the command-line keyctl program is: keyctl request2 id_legacy uid:0 '' @s Fixes: 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") Reported-by: syzbot+5dfdbcf7b3eb5912abbb@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-18nfs: Do not convert nfs_idmap_cache_timeout to jiffiesJan Chochol
Since commit 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds. Unfortunately sysctl interface was not updated accordingly. As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some value will incorrectly multiply this value by HZ. Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value divided by HZ. Fixes: 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") Signed-off-by: Jan Chochol <jan@chochol.info> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-18nfs: Use proper enum definitions for nfs_show_stableChuck Lever
Commit 8224b2734ab1 ("NFS: Add static NFS I/O tracepoints") had a hack to work around some odd behavior observed with __print_symbolic. I couldn't ever get it to display NFS_FILE_SYNC when using TRACE_DEFINE_ENUM macros to set up the enum values. I tracked down the actual bug that forced me to add the workaround. That issue will be addressed soon, so replace the hack with a proper implementation. Fixes: 8224b2734ab1 ("NFS: Add static NFS I/O tracepoints") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-18nfs41: do not return ENOMEM on LAYOUTUNAVAILABLETigran Mkrtchyan
A pNFS server may return LAYOUTUNAVAILABLE error on LAYOUTGET for files which don't have any layout. In this situation pnfs_update_layout currently returns NULL. As this NULL is converted into ENOMEM, IO requests fails instead of falling back to MDS. Do not return ENOMEM on LAYOUTUNAVAILABLE and let client retry through MDS. Fixes 8d40b0f14846f. I will suggest to backport this fix to affected stable branches. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> [trondmy: Use IS_ERR_OR_NULL()] Fixes: 8d40b0f14846 ("NFS filelayout:call GETDEVICEINFO after...") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16NFS: commit direct writes even if they fail partiallyJ. Bruce Fields
If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16nfs: remove unused label in nfs_encode_fh()Arnd Bergmann
The only reference to the label got removed, so we now get a harmless compiler warning: fs/nfs/export.c: In function 'nfs_encode_fh': fs/nfs/export.c:58:1: error: label 'out' defined but not used [-Werror=unused-label] Fixes: aaa150089465 ("nfs: remove dead code from nfs_encode_fh()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14nfs: Update server port after referral or migrationChuck Lever
After traversing a referral or recovering from a migration event, ensure that the server port reported in /proc/mounts is updated to the correct port setting for the new submount. Reported-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14nfs: Referrals should use the same proto setting as their parentChuck Lever
Helen Chao <helen.chao@oracle.com> noticed that when a user traverses a referral on an NFS/RDMA mount, the resulting submount always uses TCP. This behavior does not match the vers= setting when traversing a referral (vers=4.1 is preserved). It also does not match the behavior of crossing from the pseudofs into a real filesystem (proto=rdma is preserved in that case). The Linux NFS client does not currently support the fs_locations_info attribute. The situation is similar for all NFSv4 servers I know of. Therefore until the community has broad support for fs_locations_info, when following a referral: - First try to connect with RPC-over-RDMA. This will fail quickly if the client has no RDMA-capable interfaces. - If connecting with RPC-over-RDMA fails, or the RPC-over-RDMA transport is not available, use TCP. Reported-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mdsScott Mayhew
Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14pnfs/blocklayout: handle transient devicesBenjamin Coddington
PNFS block/SCSI layouts should gracefully handle cases where block devices are not available when a layout is retrieved, or the block devices are removed while the client holds a layout. While setting up a layout segment, keep a record of an unavailable or un-parsable block device in cache with a flag so that subsequent layouts do not spam the server with GETDEVINFO. We can reuse the current NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing the device, we will discard it and send a fresh GETDEVINFO after the timeout, since the lookup and validation of the device occurs within the GETDEVINFO response handling. A lookup of a layout segment that references an unavailable device will return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow the pgio layer to mark the layout with the appropriate fail bit, which forces subsequent IO to the MDS, and prevents spamming the server with LAYOUTGET, LAYOUTRETURN. Finally, when IO to a block device fails, look up the block device(s) referenced by the pgio header, and mark them as unavailable. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERRORBenjamin Coddington
If there's an error doing I/O to block device, and the client resends the I/O to the MDS, the MDS must recall the layout from the client before processing the I/O. Let's preempt that exchange by returning the layout before falling back to the MDS when there's an error. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14pnfs/blocklayout: Add module alias for LAYOUT4_SCSIBenjamin Coddington
The blocklayout module contains the client support for both block and SCSI layouts. Add a module alias for the SCSI layout type so that the module will be loaded for SCSI layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFS: remove unused offset arg in nfs_pgio_rpcsetupBenjamin Coddington
nfs_pgio_rpcsetup() is always called with an offset of 0, so we should be able to drop the arguement altogether. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFSv4: always set NFS_LOCK_LOST when a lock is lost.NeilBrown
There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14nfs: remove dead code from nfs_encode_fh()NeilBrown
This code can never be used as the IS_AUTOMOUNT(inode) case has already been handled. So remove it to avoid confusion. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14Support statx() mask and query flags parametersTrond Myklebust
Support the query flags AT_STATX_FORCE_SYNC by forcing an attribute revalidation, and AT_STATX_DONT_SYNC by returning cached attributes only. Use the mask to optimise away server revalidation for attributes that are not being requested by the user. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFS: Fix nfsstat breakage due to LOOKUPPTrond Myklebust
The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: 5b5faaf6df734 ("nfs4: add NFSv4 LOOKUPP handlers") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFSv4: Convert LOCKU to use nfs4_async_handle_exception()Trond Myklebust
Convert CLOSE so that it specifies the correct stateid and inode for the error handling. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFSv4: Convert DELEGRETURN to use nfs4_handle_exception()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFSv4: Convert CLOSE to use nfs4_async_handle_exception()Trond Myklebust
Convert CLOSE so that it specifies the correct stateid, state and inode for the error handling. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14NFS: Add a cond_resched() to nfs_commit_release_pages()Trond Myklebust
The commit list can get very large, and so we need a cond_resched() in nfs_commit_release_pages() in order to ensure we don't hog the CPU for excessive periods of time. Reported-by: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-12-30nfs4file: get rid of pointless include of btrfs.hAl Viro
should've been killed by "vfs: pull btrfs clone API to vfs layer"... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-16Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "This has two stable bugfixes, one to fix a BUG_ON() when nfs_commit_inode() is called with no outstanding commit requests and another to fix a race in the SUNRPC receive codepath. Additionally, there are also fixes for an NFS client deadlock and an xprtrdma performance regression. Summary: Stable bugfixes: - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a commit in the case that there were no commit requests. - SUNRPC: Fix a race in the receive code path Other fixes: - NFS: Fix a deadlock in nfs client initialization - xprtrdma: Fix a performance regression for small IOs" * tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Fix a race in the receive code path nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests xprtrdma: Spread reply processing over more CPUs nfs: fix a deadlock in nfs client initialization