Age | Commit message (Collapse) | Author |
|
Pull file locking updates from Jeff Layton:
"The largest series of changes is from Ben who offered up a set to add
a new helper function for setting locks based on the type set in
fl_flags. Dmitry also send in a fix for a potential race that he
found with KTSAN"
* tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux:
locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait
Move locks API users to locks_lock_inode_wait()
locks: introduce locks_lock_inode_wait()
locks: Use more file_inode and fix a comment
fs: fix data races on inode->i_flctx
locks: change tracepoint for generic_add_lease
|
|
When decoding GETATTR replies, the client checks the attribute bitmap
for which attributes the server has sent. It misses bits at the word
boundaries, though; fix that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The arguments passed around for getacl and setacl xdr encoding, struct
nfs_setaclargs and struct nfs_getaclargs, both contain an array of
pages, an offset into the first page, and the length of the page data.
The offset is unused as it is always zero; remove it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
As new_valid_dev always returns 1, so !new_valid_dev check is not
needed, remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
NFS: NFSoRDMA Client Side Changes
In addition to a variety of bugfixes, these patches are mostly geared at
enabling both swap and backchannel support to the NFS over RDMA client.
Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>
|
|
Forechannel transports get their own "bc_up" method to create an
endpoint for the backchannel service.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Anna Schumaker: Add forward declaration of struct net to xprt.h]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
For loosely coupled pNFS/flexfiles systems, there is often no advantage
at all in going through the MDS for I/O, since the MDS is subject to
the same limitations as all other clients when talking to DSes. If a
DS is unresponsive, I/O through the MDS will fail.
For such systems, the only scalable solution is to have the pNFS clients
retry doing pNFS, and so the protocol now provides a flag that allows
the pNFS server to signal this.
If LAYOUTGET returns FF_FLAGS_NO_IO_THRU_MDS, then we should assume that
the MDS wants the client to retry using these devices, even if they were
previously marked as being unavailable. To do so, we add a helper,
ff_layout_mark_devices_valid() that will be called from layoutget.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the pNFS/flexfiles file is mirrored, and a read to one mirror fails,
then we should bump the mirror index, so that we retry to a different
mirror. Once we've iterated through all mirrors and all failed, we can
return the layout and issue a new LAYOUTGET.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
|
|
* bugfixes:
NFSv4.1/pnfs: Retry through MDS when getting bad length of data
nfs/blocklayout: Fix bad using of page offset in bl_read_pagelist
NFS: Return directly if encode_sessionid fail
NFS: Fix bad checking of max taglen in callback request
NFS: Fix bad defines of callback response maxsize
NFS: Use NFS4_MAX_SESSIONID_LEN directly for decode/encode sessionid
NFS: Remove unneeded NFS_DEBUG checking before define NFSDBG_FACILITY
NFS: Remove the left function defines in callback.h
NFS: Remove the left global variable nfs_callback_tcpport
NFS: Get rid of the unneeded addr stored in callback arguments
nfsroot: make nfsroot to accept the 1024 bytes long directory name
|
|
If non rpc-based layout driver return bad length of data, nfs retries
by calling rpc_restart_call_prepare() that cause an NULL reference panic.
This patch lets nfs retry through MDS for non rpc-based layout driver
return bad length of data.
[13034.883329] BUG: unable to handle kernel NULL pointer dereference at (null)
[13034.884902] IP: [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.886558] PGD 0
[13034.888126] Oops: 0000 [#1] KASAN
[13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
[13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G OE 4.3.0-rc5+ #279
[13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver]
[13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000
[13034.906827] RIP: 0010:[<ffffffffa00db372>] [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.910522] RSP: 0018:ffff880035e97b58 EFLAGS: 00010282
[13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003
[13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282
[13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001
[13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000
[13034.921845] FS: 0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000
[13034.923645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0
[13034.932808] Stack:
[13034.934813] ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2
[13034.936675] ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0
[13034.938593] ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f
[13034.940454] Call Trace:
[13034.942388] [<ffffffffa08800d2>] nfs_readpage_result+0x112/0x200 [nfs]
[13034.944317] [<ffffffffa088029d>] ? nfs_readpage_done+0xdd/0x160 [nfs]
[13034.946267] [<ffffffffa087e08f>] nfs_pgio_result+0x9f/0x120 [nfs]
[13034.948166] [<ffffffffa09266cc>] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4]
[13034.950247] [<ffffffffa03b07ee>] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver]
[13034.952156] [<ffffffff810ebf62>] process_one_work+0x412/0x870
[13034.954102] [<ffffffff810ebe84>] ? process_one_work+0x334/0x870
[13034.955949] [<ffffffff810ebb50>] ? queue_delayed_work_on+0x40/0x40
[13034.957985] [<ffffffff810ec441>] worker_thread+0x81/0x6a0
[13034.959817] [<ffffffff810ec3c0>] ? process_one_work+0x870/0x870
[13034.961785] [<ffffffff810f43bd>] kthread+0x17d/0x1a0
[13034.963544] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.965479] [<ffffffff81100428>] ? finish_task_switch+0x88/0x220
[13034.967223] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.968929] [<ffffffff81b6ae5f>] ret_from_fork+0x3f/0x70
[13034.970534] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 <49> 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b
[13034.977148] RIP [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.978780] RSP <ffff880035e97b58>
[13034.980399] CR2: 0000000000000000
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Blocklayout uses file offset for the read-back page's offset of first writing,
it's definitely wrong, it writes data to bad address of page that cause userspace
application segment fault. It must be the page base stored in header->args.pgbase.
Also, the pg_offset has no influence with isect and extent length.
Note: The offset of the non-first page is always zero.
Ps: A test program will segment fault at read() as,
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
int main(int argc, char **argv)
{
char buf[2049];
char *filename = NULL;
int fd = -1;
if (argc < 2) {
printf("Usage: %s filename\n", argv[0]);
return 0;
}
filename = argv[1];
fd = open(filename, O_RDONLY | O_DIRECT);
if (fd < 0) {
printf("Open %s fail: %m\n", filename);
return 1;
}
lseek(fd, 2048, SEEK_SET);
if (read(fd, buf, sizeof(buf) - 1) != (sizeof(buf) - 1))
printf("Read 4096 bityes data from %s fail: %m\n", filename);
out:
close(fd);
return 0;
}
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
encode_sessionid() may return error, nfs needs process the return value.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The taglen should be checked with CB_OP_TAGLEN_MAXSZ directly.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
As CB_OP_TAGLEN_MAXSZ, all XXX_MAXSZ should be defined as bit.
Each operation should not cantains CB_OP_TAGLEN_MAXSZ.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It's no need to define a temporary variables for NFS4_MAX_SESSIONID_LEN.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It's not needed to checking NFS_DEBUG before define NFSDBG_FACILITY, remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit 778be232a207 "NFS do not find client in NFSv4 pg_authenticate" has remove
the define and using of nfs4_set_callback_sessionid(), and
commit 36281caa839f "NFSv4: Further clean-ups of delegation stateid validation"
has update the checking of stateid, and move the code to nfs4proc.c.
This patch remove those function defines left in callback.h
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit bbe0a3aa4e22 "NFS: make nfs_callback_tcpport per network context" has
make nfs_callback_tcpport per network, but left the global nfs_callback_tcpport,
remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit c36fca52f5 "NFS refactor nfs_find_client and reference client
across callback processing" has store clp in cb_process_state
which is set in cb_sequence.
So that, it's unneeded to store address pointer in any callback arguments.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
although NFS_MAXPATHLEN is defined to 1024, nfs client hopes to accept
a 1024 byte path, but nfs_root_parms is limited to 256, and the nfs path
will truncated when a user inputs nfs path from kernel cmdline
enlarge nfs_root_parms to 1024, to make it accept the 1024 bytes long
directory name, since nfs_root_parms is defined as _initdata, it will
be released after system bootup
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
* nfsclone:
nfs: add missing linux/types.h
NFS: Fix an 'unused variable' complaint when #ifndef CONFIG_NFS_V4_2
nfs42: add NFS_IOC_CLONE_RANGE ioctl
nfs42: respect clone_blksize
nfs: get clone_blksize when probing fsinfo
nfs42: add NFS_IOC_CLONE ioctl
nfs42: add CLONE proc functions
nfs42: add CLONE xdr functions
|
|
Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.
Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
cc: ecryptfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-ima-devel@lists.sourceforge.net
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It follows btrfs BTRFS_IOC_CLONE_RANGE lead on ioctl number and
arguments.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
draft-ietf-nfsv4-minorversion2-38.txt says:
Both cl_src_offset and
cl_dst_offset must be aligned to the clone block size Section 12.2.1.
The number of bytes to be cloned must be a multiple of the clone
block size, except in the case in which cl_src_offset plus the number
of bytes to be cloned is equal to the source file size.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
NFSv42 CLONE operation is supposed to respect it.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It can be called by user space to CLONE two files.
Follow btrfs lead and define NFS_IOC_CLONE same as BTRFS_IOC_CLONE.
Thus we don't mess up userspace with too many ioctls.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
xdr definitions per draft-ietf-nfsv4-minorversion2-38.txt
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
They now only differ in the way we handle waiting, so let's unify.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The code needs to be able to work from inside an asynchronous context.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
For symmetry with the synchronous handler, and so that we can potentially
handle errors such as NFS4ERR_BADNAME.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently, we only do so for asynchronous delays.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Prepare for unification of the synchronous and asynchronous error
handling.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file. Fix this by checking if state is NULL
before trying to use the inode pointer.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When the client goes to return a delegation, it should always update any
nfs4_state currently set up to use that delegation stateid to instead
use the open stateid. It already does do this in some cases,
particularly in the state recovery code, but not currently when the
delegation is voluntarily returned (e.g. in advance of a RENAME). This
causes the client to try to continue using the delegation stateid after
the DELEGRETURN, e.g. in LAYOUTGET.
Set the nfs4_state back to using the open stateid in
nfs4_open_delegation_recall, just before clearing the
NFS_DELEGATED_STATE bit.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Since commit 5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should
have a privileged sequence in the recovery case to allow nograce recovery to
proceed for NFSv4.0.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We need to warn against broken NFSv4.1 servers that try to hand out
delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently, we don't test if the state owner is in use before we try to
recover it. The problem is that if the refcount is zero, then the
state owner will be waiting on the lru list for garbage collection.
The expectation in that case is that if you bump the refcount, then
you must also remove the state owner from the lru list. Otherwise
the call to nfs4_put_state_owner will corrupt that list by trying
to add our state owner a second time.
Avoid the whole problem by just skipping state owners that hold no
state.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If all other conditions in nfs_can_extend_write() are met, and there
are no locks, then we should be able to assume close-to-open semantics
and the ability to extend our write to cover the whole page.
With this patch, the xfstests generic/074 test completes in 242s instead
of >1400s on my test rig.
Fixes: bd61e0a9c852 ("locks: convert posix locks to file_lock_context")
Cc: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently, we are crediting all the calls to nfs_writepages_callback()
(i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from
being inconsistent with the behaviour of the equivalent readpage/readpages
accounting, this also means that we cannot distinguish between bulk writes
and single page writebacks (which confuses the 'nfsiostat -p' tool).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
If we send a layoutreturn asynchronously before close, the close
might reach server first and layoutreturn would fail with BADSTATEID
because there is nothing keeping the layout stateid alive.
Also do not pretend sending layoutreturn if we are not.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When lseg's commit_through_mds is set, pnfs client always WARN once
in nfs_direct_select_verf after checking ds_cinfo.nbuckets.
nfs should use the DS verf except commit_through_mds is set for
layout segment where nbuckets is zero.
[17844.666094] ------------[ cut here ]------------
[17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]()
[17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
[17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G W OE 4.3.0-rc1-pnfs+ #245
[17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc]
[17844.699212] 0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4
[17844.699990] ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000
[17844.700844] ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58
[17844.701637] Call Trace:
[17844.725252] [<ffffffff813680c4>] dump_stack+0x19/0x25
[17844.732693] [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0
[17844.733855] [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20
[17844.735015] [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs]
[17844.735999] [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs]
[17844.736846] [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs]
[17844.737782] [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs]
[17844.738597] [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4]
[17844.739486] [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc]
[17844.740326] [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc]
[17844.741173] [<ffffffff810a387c>] process_one_work+0x21c/0x4c0
[17844.741984] [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0
[17844.742837] [<ffffffff810a3b6a>] worker_thread+0x4a/0x440
[17844.743639] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
[17844.744399] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
[17844.745176] [<ffffffff810a8d75>] kthread+0xf5/0x110
[17844.745927] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
[17844.747105] [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70
[17844.747856] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
[17844.748642] ---[ end trace 336a2845d42b83f0 ]---
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the current open or layout stateid doesn't match the stateid used
in the layoutget RPC call, then don't try to recover it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When a read delegation is being recalled, and we're reclaiming the
cached opens, we need to make sure that we only reclaim read-only
modes.
A previous attempt to do this, relied on retrieving the delegation
type from the nfs4_opendata structure. Unfortunately, as Kinglong
pointed out, this field can only be set when performing reboot recovery.
Furthermore, if we call nfs4_open_recover(), then we end up clobbering
the state->flags for all modes that we're not recovering...
The fix is to have the delegation recall code pass this information
to the recovery call, and then refactor the recovery code so that
nfs4_open_delegation_recall() does not need to call nfs4_open_recover().
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 39f897fdbd46 ("NFSv4: When returning a delegation, don't...")
Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If layouget fail with BAD_STATEID, restart should not using the old stateid.
But, nfs client choose the layout stateid at first, and then the open stateid.
To avoid the infinite loop of using bad stateid for layoutget,
this patch sets the layout flag'ss NFS_LAYOUT_INVALID_STID bit to
skip choosing the bad layout stateid.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
There is a reference leak of layout segment after resetting
pageio read/write to mds.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.
[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G OE 4.3.0-rc1-pnfs+ #244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>] [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8 EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS: 00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195] ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676] ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151] 00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092] [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625] [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133] [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632] [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134] [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632] [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158] [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834] [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385] [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872] [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362] [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907] [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391] [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867] [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330] [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784] [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413] [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855] [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286] [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711] [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132] [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540] [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936] [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326] [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708] [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177] RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|