summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-12-28pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated earlyTrond Myklebust
Currently, we will only record the layoutstats correctly if the RPC call successfully obtains a slot. If we exit before that happens, then we may find ourselves starting the busy timer through the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it. The same thing happens if we're doing DA-DS. The fix is to ensure that we catch these cases in the rpc_release() callback. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/writeTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS/flexfiles: Fix a statistics gathering imbalanceTrond Myklebust
When we replay a failed read, write or commit to the dataserver, we need to ensure that we call ff_layout_read_prepare_v3(), ff_layout_write_prepare_v3 or ff_layout_commit_prepare_v3() so that we reset the statistics. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS/flexfiles: Don't mark the entire layout as failed, when returning itTrond Myklebust
In pNFS/flexfiles, we want to return the layout without necessarily marking it as having completely failed. We therefore move the call to pnfs_layout_io_set_failed() out of pnfs_error_mark_layout_for_return(), and then ensura that pNFS/files layout calls it separately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGETTrond Myklebust
Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pnfs/flexfiles: count io stat in rpc_count_stats callbackPeng Tao
If client ever restarts IO due to some errors, we'll endup mis-counting IO stats if we do the counting in .rpc_done callback. Move it to .rpc_count_stats callback that is only called when releasing RPC. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pnfs/flexfiles: do not mark delay-like status as DS failurePeng Tao
We just need to delay and retry in these cases. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATAPeng Tao
Instead of mapping it to EIO that is a fatal error and fails application. We'll go inband after getting NFS4ERR_LAYOUTUNAVAILABLE. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: only remove page from mapping if launder_page failsPeng Tao
Instead of dropping pages when write fails, only do it when we get fatal failure in launder_page write back. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: handle request add failure properlyPeng Tao
When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: centralize pgio error cleanupPeng Tao
In case we fail during setting things up for read/write IO, set pg_error in IO descriptor and do the cleanup in nfs_pageio_add_request, where we clean up all pages that are still hanging around on the IO descriptor. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: clean up rest of reqs when failing to add onePeng Tao
If we fail to set up things before sending anything over wire, we need to clean up the reqs that are still attached to the IO descriptor. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFS41: pop some layoutget errors to applicationPeng Tao
For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we should just bail out instead of hiding the error and retrying inband IO. Change all the call sites to pop the error all the way up. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS/flexfiles: Support server-supplied layoutstats sampling periodTrond Myklebust
Some servers want to be able to control the frequency with which clients report layoutstats, for instance, in order to monitor QoS for a particular file or set of file. In order to support this, the flexfiles layout allows the server to pass this info as a hint in the layout payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFS: Flush reclaim writes using FLUSH_COND_STABLETrond Myklebust
If there are already writes queued up for commit, then don't flush just this page even if it is a reclaim issue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFS: Background flush should not be low priorityTrond Myklebust
Background flush is needed in order to satisfy the global page limits. Don't subvert by reducing the priority. This should also address a write starvation issue that was reported by Neil Brown. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturnTrond Myklebust
Since commit 2d8ae84fbc32, nothing is bumping lo->plh_block_lgets in the layoutreturn path, so it should not be touched in nfs4_layoutreturn_release either. Fixes: 2d8ae84fbc32 ("NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets...") Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4: Don't perform cached access checks before we've OPENed the fileTrond Myklebust
Donald Buczek reports that a nfs4 client incorrectly denies execute access based on outdated file mode (missing 'x' bit). After the mode on the server is 'fixed' (chmod +x) further execution attempts continue to fail, because the nfs ACCESS call updates the access parameter but not the mode parameter or the mode in the inode. The root cause is ultimately that the VFS is calling may_open() before the NFS client has a chance to OPEN the file and hence revalidate the access and attribute caches. Al Viro suggests: >>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know >>> that things will be caught by server anyway? >> >> That can work as long as we're guaranteed that everything that calls >> inode_permission() with MAY_OPEN on a regular file will also follow up >> with a vfs_open() or dentry_open() on success. Is this always the >> case? > > 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since > it doesn't have ->tmpfile() instance anyway) > > 2) in atomic_open(), after the call of ->atomic_open() has succeeded. > > 3) in do_last(), followed on success by vfs_open() > > That's all. All calls of inode_permission() that get MAY_OPEN come from > may_open(), and there's no other callers of that puppy. Reported-by: Donald Buczek <buczek@molgen.mpg.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771 Link: http://lkml.kernel.org/r/1451046656-26319-1-git-send-email-buczek@molgen.mpg.de Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28orangefs: fix typo in ornagefs_inode_lockArnd Bergmann
Orangefs fails to build on 32-bit SMP configurations due to a simple misspelling, this does the obvious fix. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 575e946125f7 ("Orangefs: change pvfs2 filenames to orangefs") Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-12-28Orangefs: use kzalloc for kmalloc + memset 0Nicholas Mc Guire
This is an API consolidation only. The use of kmalloc + memset to 0 should be equivalent to kzalloc in this case. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-12-28nfs: machine credential support for additional operationsAndrew Elble
Allow LAYOUTRETURN and DELEGRETURN to use machine credentials if the server supports it. Add request for OPEN_DOWNGRADE as the close path also uses that. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: do not initialise statics to 0Wei Tang
This patch fixes the checkpatch.pl error to nfs4sysctl.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS: Modify pnfs_update_layout tracepoints to use layout stateidTrond Myklebust
Instead of displaying a layout segment pointer in these tracepoints, let's use the layout stateid, now that Olga gave us a set of tools for displaying them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4: Fix unused variable warnings in nfs4_init_*_client_string()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: add new tracepoint for pnfs_update_layoutJeff Layton
pnfs_update_layout is really the "nexus" of layout handling. If it returns NULL then we end up going through the MDS. This patch adds some tracepoints to that function that allow us to determine the cause when we end up going through the MDS unexpectedly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28Adding tracepoint to cached openOlga Kornievskaia
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28Adding stateid information to tracepointsOlga Kornievskaia
Operations to which stateid information is added: close, delegreturn, open, read, setattr, layoutget, layoutcommit, test_stateid, write, lock, locku, lockt Format is "stateid=<seqid>:<crc32 hash stateid.other>", also "openstateid=", "layoutstateid=", and "lockstateid=" for open_file, layoutget, set_lock tracepoints. New function is added to internal.h, nfs_stateid_hash(), to compute the hash trace_nfs4_setattr() is moved from nfs4_do_setattr() to _nfs4_do_setattr() to get access to stateid. trace_nfs4_setattr and trace_nfs4_delegreturn are changed from INODE_EVENT to new event type, INODE_STATEID_EVENT which is same as INODE_EVENT but adds stateid information for locking tracepoints, moved trace_nfs4_set_lock() into _nfs4_do_setlk() to get access to stateid information, and removed trace_nfs4_lock_reclaim(), trace_nfs4_lock_expired() as they call into _nfs4_do_setlk() and both were previously same LOCK_EVENT type. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFS: Allow the combination pNFS and labeled NFSTrond Myklebust
Fix the nfs4_pnfs_open_bitmap so that it also allows for labeled NFS. Signed-off-by: Trond Myklebust <trond,myklebust@primarydata.com>
2015-12-28NFS42: handle layoutstats stateid errorPeng Tao
When server returns layoutstats stateid error, we should invalidate client's layout so that next IO can trigger new layoutget. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: Fix race in __update_open_stateid()Andrew Elble
We've seen this in a packet capture - I've intermixed what I think was going on. The fix here is to grab the so_lock sooner. 1964379 -> #1 open (for write) reply seqid=1 1964393 -> #2 open (for read) reply seqid=2 __nfs4_close(), state->n_wronly-- nfs4_state_set_mode_locked(), changes state->state = [R] state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 1964398 -> #3 open (for write) call -> because close is already running 1964399 -> downgrade (to read) call seqid=2 (close of #1) 1964402 -> #3 open (for write) reply seqid=3 __update_open_stateid() nfs_set_open_stateid_locked(), changes state->flags state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 new sequence number is exposed now via nfs4_stateid_copy() next step would be update_open_stateflags(), pending so_lock 1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1) nfs4_close_prepare() gets so_lock and recalcs flags -> send close 1964405 -> downgrade (to read) call seqid=3 (close of #1 retry) __update_open_stateid() gets so_lock * update_open_stateflags() updates state->n_wronly. nfs4_state_set_mode_locked() updates state->state state->flags is [RW] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 * should have suppressed the preceding nfs4_close_prepare() from sending open_downgrade 1964406 -> write call 1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry) nfs_clear_open_stateid_locked() state->flags is [R] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 1964409 -> write reply (fails, openmode) Signed-off-by: Andrew Elble <aweits@rit.edu> Cc: stable@vger,kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: fix missing assignment in nfs4_sequence_done tracepointAndrew Elble
status_flags not set Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-24gfs2: Invalid security labels of inodes when they go invalidAndreas Gruenbacher
When gfs2 releases the glock of an inode, it must invalidate all information cached for that inode, including the page cache and acls. Use the new security_inode_invalidate_secctx hook to also invalidate security labels in that case. These items will be reread from disk when needed after reacquiring the glock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com [PM: fixed spelling errors and description line lengths] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-12-23btrfs: fix warning on uninit variable in btrfs_finish_chunk_allocChris Mason
map->num_stripes really can't be zero, but just in case. Signed-off-by: Chris Mason <clm@fb.com>
2015-12-23Merge branch 'freespace-4.5' into for-linus-4.5Chris Mason
2015-12-23Merge branch 'for-chris-4.5' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5
2015-12-23Merge branch 'dev/simplify-set-bit' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
2015-12-23Merge branch 'dev/gfp-flags' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2015-12-23Merge branch 'cleanup/misc-simplify' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
2015-12-23udf: Fix lost indirect extent blockJan Kara
When inode ends with empty indirect extent block and we extended that file, udf_do_extend_file() ended up just overwriting pointer to it with another extent and thus effectively leaking the block and also corruptiong length of allocation descriptors. Fix the problem by properly following into next indirect extent when it is present. Signed-off-by: Jan Kara <jack@suse.cz>
2015-12-23udf: Factor out code for creating indirect extentJan Kara
Factor out code for creating indirect extent from udf_add_aext(). It was mostly duplicated in two places. Also remove some opencoded versions of udf_write_aext(). Signed-off-by: Jan Kara <jack@suse.cz>
2015-12-23new helpers: no_seek_end_llseek{,_size}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-23lockd: Register callbacks on the inetaddr_chain and inet6addr_chainScott Mayhew
Register callbacks on inetaddr_chain and inet6addr_chain to trigger cleanup of lockd transport sockets when an ip address is deleted. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-12-23nfsd: Register callbacks on the inetaddr_chain and inet6addr_chainScott Mayhew
Register callbacks on inetaddr_chain and inet6addr_chain to trigger cleanup of nfsd transport sockets when an ip address is deleted. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-12-23nfsd: don't base cl_cb_status on stale informationJ. Bruce Fields
The rpc client we use to send callbacks may change occasionally. (In the 4.0 case, the client can use setclientid/setclientid_confirm to update the callback parameters. In the 4.1+ case, sessions and connections can come and go.) The update is done from the callback thread by nfsd4_process_cb_update, which shuts down the old rpc client and then creates a new one. The client shutdown kills any ongoing rpc calls. There won't be any new ones till the new one's created and the callback thread moves on. When an rpc encounters a problem that may suggest callback rpc's aren't working any longer, it normally sets NFSD4_CB_DOWN, so the server can tell the client something's wrong. But if the rpc notices CB_UPDATE is set, then the failure may just be a normal result of shutting down the callback client. Or it could just be a coincidence, but in any case, it means we're runing with the old about-to-be-discarded client, so the failure's not interesting. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-12-23udf: limit the maximum number of indirect extents in a rowVegard Nossum
udf_next_aext() just follows extent pointers while extents are marked as indirect. This can loop forever for corrupted filesystem. Limit number the of indirect extents we are willing to follow in a row. [JK: Updated changelog, limit, style] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: stable@vger.kernel.org Cc: Jan Kara <jack@suse.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2015-12-22Merge tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix from Bruce Fields: "Just one fix for a NFSv4 callback bug introduced in 4.4" * tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux: nfsd: don't hold ls_mutex across a layout recall
2015-12-22f2fs: use atomic variable for total_extent_treeJaegeuk Kim
It would be better to use atomic variable for total_extent_tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-22gfs2: fix flock panic issueJunxiao Bi
Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()") moved flock/posix lock identify code to locks_lock_inode_wait(), but missed to set fl_flags to FL_FLOCK which will cause kernel panic in locks_lock_inode_wait(). Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-21Btrfs: fix unprotected list operations at btrfs_write_dirty_block_groupsFilipe Manana
We call btrfs_write_dirty_block_groups() in the critical section of a transaction's commit, when no other tasks can join the transaction and add more block groups to the transaction's list of dirty block groups, so we not taking the dirty block groups spinlock when checking for the list's emptyness, grabbing its first element or deleting elements from it. However there's a special and rare case where we can have a concurrent task adding elements to this list. We trigger writeback for space caches before at btrfs_start_dirty_block_groups() and in past iterations of the loop at btrfs_write_dirty_block_groups(), this means that when the writeback finishes (which happens asynchronously) it creates a task for the endio free space work queue that executes btrfs_finish_ordered_io() - this function is able to join the transaction, through btrfs_join_transaction_nolock(), and update the free space cache's inode item in the root tree, which can result in COWing nodes of this tree and therefore allocation of a new block group can happen, which gets added to the transaction's list of dirty block groups while the transaction commit task is operating on it concurrently. So fix this by taking the dirty block groups spinlock before doing operations on the dirty block groups list at btrfs_write_dirty_block_groups(). Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-12-20fs: configfs: Add unlocked version of configfs_depend_item()Krzysztof Opasiak
This change is necessary for the SCSI target usb gadget composed with configfs. In this case configfs will be used for two different purposes: to compose a usb gadget and to configure the target part. If an instance of tcm function is created in $CONFIGFS_ROOT/usb_gadget/<gadget>/functions a tpg can be created in $CONFIGFS_ROOT/target/usb_gadget/<wwn>/, but after a tpg is created the tcm function must not be removed until its corresponding tpg is gone. While the configfs_depend/undepend_item() are meant exactly for creating this kind of dependencies, they are not suitable if the other kernel subsystem happens to be another subsystem in configfs, so this patch adds unlocked versions meant for configfs callbacks. Above description has been provided by: Andrzej Pietrasiewicz <andrzej.p@samsung.com> In configfs_depend_item() we have to consider two possible cases: 1) When we are called to depend another item in the same subsystem as caller In this case we should skip locking configfs root as we know that configfs is in valid state and our subsystem will not be unregistered during this call. 2) When we are called to depend item in different subsystem than our caller In this case we are also sure that configfs is in valid state but we have to lock root of configfs to avoid unregistration of target's subsystem. As it is other than caller's subsystem, there may be nothing what protects us against unregistration of that subsystem. Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>