summaryrefslogtreecommitdiff
path: root/fs/nfs/pnfs_nfs.c
AgeCommit message (Collapse)Author
2021-10-20NFS: Fix up commit deadlocksTrond Myklebust
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end(). Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_dsBaptiste Lepers
_nfs4_pnfs_v3/v4_ds_connect do some work smp_wmb ds->ds_clp = clp; And nfs4_ff_layout_prepare_ds currently does smp_rmb if(ds->ds_clp) ... This patch places the smp_rmb after the if. This ensures that following reads only happen once nfs4_ff_layout_prepare_ds has checked that data has been properly initialized. Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3Trond Myklebust
Currently we fail to return an error if the NFSv3 module failed to load when we're trying to connect to a pNFS data server. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple timesTrond Myklebust
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for whether or not ds->ds_clp has already been initialised, so we can end up adding the same transports multiple times. Fixes: fc821d59209d ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()Trond Myklebust
We must ensure that we pass a layout segment to nfs_retry_commit() when we're cleaning up after pnfs_bucket_alloc_ds_commits(). Otherwise, requests that should be committed to the DS will get committed to the MDS. Do so by ensuring that pnfs_bucket_get_committing() always tries to return a layout segment when it returns a non-empty page list. Fixes: c84bea59449a ("NFS/pNFS: Simplify bucket layout segment reference counting") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the requestTrond Myklebust
In pnfs_generic_clear_request_commit(), we try calling pnfs_free_bucket_lseg() before we remove the request from the DS bucket. That will always fail, since the point is to test for whether or not that bucket is empty. Fixes: c84bea59449a ("NFS/pNFS: Simplify bucket layout segment reference counting") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02pNFS: Clean up open coded xdr string decodingTrond Myklebust
Use the existing xdr_stream_decode_string_dup() to safely decode into kmalloced strings. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02NFSv4/pNFS: Store the transport type in struct nfs4_pnfs_ds_addrTrond Myklebust
We want to enable RDMA and UDP as valid transport methods if a GETDEVICEINFO call specifies it. Do so by adding a parser for the netid that translates it to an appropriate argument for the RPC transport layer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02pNFS: Add helpers for allocation/free of struct nfs4_pnfs_ds_addrTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02NFSv4/pNFS: Use connections to a DS that are all of the same protocol familyTrond Myklebust
If the pNFS metadata server advertises multiple addresses for the same data server, we should try to connect to just one protocol family and transport type on the assumption that homogeneity will improve performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-13NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfsTrond Myklebust
When we're doing pnfs then the credential being used for the RPC call is not necessarily the same as the one used in the open context, so don't use RPC_TASK_CRED_NOREF. Fixes: 612965072020 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-11pNFS: Fix RCU lock leakageTrond Myklebust
Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking the RCU lock. Fixes: a9901899b649 ("pNFS: Add infrastructure for cleaning up per-layout commit structures") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS/pNFS: Fix pnfs_layout_mark_request_commit() invalid layout segment handlingTrond Myklebust
Fix up pnfs_layout_mark_request_commit() to alway reschedule the write if the layout segment is invalid. Also minor cleanup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS/pNFS: Simplify bucket layout segment reference countingTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS/pNFS: Clean up pNFS commit operationsTrond Myklebust
Move the pNFS commit related operations into a separate structure that can be carried by the pnfs_ds_commit_info. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS: Remove bucket array from struct pnfs_ds_commit_infoTrond Myklebust
Remove the unused bucket array in struct pnfs_ds_commit_info. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS/pNFS: Add a helper pnfs_generic_search_commit_reqs()Trond Myklebust
Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code, and add support for commit arrays. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Enable per-layout segment commit structuresTrond Myklebust
Enable adding and lookup of per-layout segment commits in filelayout and flexfilelayout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Add infrastructure for cleaning up per-layout commit structuresTrond Myklebust
Ensure that both the file and flexfiles layout types clean up when freeing the layout segments. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Support per-layout segment commits in pnfs_generic_commit_pagelist()Trond Myklebust
Add support for scanning the full list of per-layout segment commit arrays to pnfs_generic_commit_pagelist(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Support per-layout segment commits in pnfs_generic_recover_commit_reqs()Trond Myklebust
Add support for scanning the full list of per-layout segment commit arrays to pnfs_generic_recover_commit_reqs(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFSv4/pNFS: Scan the full list of commit arrays when committingTrond Myklebust
Add support for scanning the full list of per-layout segment commit arrays to pnfs_generic_scan_commit_lists() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26pNFS: Add a helper to allocate the array of bucketsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26NFS/pNFS: Refactor pnfs_generic_commit_pagelist()Trond Myklebust
Refactor pnfs_generic_commit_pagelist() to simplify the conversion to layout segment based commit lists. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-01-15NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()Trond Myklebust
Instead of making assumptions about the commit verifier contents, change the commit code to ensure we always check that the verifier was set by the XDR code. Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26pNFS/flexfiles: Turn off soft RPC callsTrond Myklebust
The pNFS/flexfiles I/O requests are sent with the SOFTCONN flag set, so they automatically time out if the connection breaks. It should therefore not be necessary to have the soft flag set in addition. Fixes: 5f01d9539496 ("nfs41: create NFSv3 DS connection if specified") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown
SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-15NFSv4: Fix locking in pnfs_generic_recover_commit_reqsTrond Myklebust
The use of the inode->i_lock was converted to a mutex, but we forgot to remove the old inode unlock/lock() pair that allowed the layout segment to be put inside the loop. Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Fixes: e824f99adaaf1 ("NFSv4: Use a mutex to protect the per-inode commit...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-03-20sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-17fs, nfs: convert nfs4_pnfs_ds.ds_count from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_pnfs_ds.ds_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-09-09NFS: Remove pnfs_generic_transfer_commit_list()Trond Myklebust
It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlockTrond Myklebust
Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Wait for requests that are locked on the commit listTrond Myklebust
If a request is on the commit list, but is locked, we will currently skip it, which can lead to livelocking when the commit count doesn't reduce to zero. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()Trond Myklebust
Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFSv4: Use a mutex to protect the per-inode commit listsTrond Myklebust
The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-07-19Revert commit 722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")Trond Myklebust
Doing the test without taking any locks is racy, and so really it makes more sense to do it in the flexfiles code (which is the only case that cares). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-19NFS: Fix another COMMIT race in pNFSTrond Myklebust
We must make sure that cinfo->ds->ncommitting is in sync with the commit list, since it is checked as part of pnfs_commit_list(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-19NFS: Fix a COMMIT race in pNFSTrond Myklebust
We must make sure that cinfo->ds->nwritten is in sync with the commit list, since it is checked as part of pnfs_scan_commit_lists(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-05-03pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commitsFred Isaman
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02pNFS: Fix a typo in pnfs_generic_alloc_ds_commitsTrond Myklebust
If the layout segment is invalid, we want to just resend the remaining writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS: Don't send COMMITs to the DSes if the server invalidated our layoutTrond Myklebust
If the layout was invalidated, then assume we should requeue all the pending writes for the DS in question. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS: unexport nfs4_pnfs_v3_ds_connect_unloadTrond Myklebust
It is not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: fix usage of mempools.NeilBrown
When passed GFP flags that allow sleeping (such as GFP_NOIO), mempool_alloc() will never return NULL, it will wait until memory is available. This means that we don't need to handle failure, but that we do need to ensure one thread doesn't call mempool_alloc() twice on the one pool without queuing or freeing the first allocation. If multiple threads did this during times of high memory pressure, the pool could be exhausted and a deadlock could result. pnfs_generic_alloc_ds_commits() attempts to allocate from the nfs_commit_mempool while already holding an allocation from that pool. This is not safe. So change nfs_commitdata_alloc() to take a flag that indicates whether failure is acceptable. In pnfs_generic_alloc_ds_commits(), accept failure and handle it as we currently do. Else where, do not accept failure, and do not handle it. Even when failure is acceptable, we want to succeed if possible. That means both - using an entry from the pool if there is one - waiting for direct reclaim is there isn't. We call mempool_alloc(GFP_NOWAIT) to achieve the first, then kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the second. Each of these can fail, but together they do the best they can without blocking indefinitely. The objects returned by kmem_cache_alloc() will still be freed by mempool_free(). This is safe as mempool_alloc() uses exactly the same function to allocate objects (since the mempool was created with mempool_create_slab_pool()). The object returned by mempool_alloc() and kmem_cache_alloc() are indistinguishable so mempool_free() will handle both identically, either adding to the pool or calling kmem_cache_free(). Also, don't test for failure when allocating from nfs_wdata_mempool. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-03-17pNFS/flexfiles: never nfs4_mark_deviceid_unavailableWeston Andros Adamson
The flexfiles layout should never mark a device unavailable. Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call directly from files layout where it's still needed. The flexfiles driver still handles marked devices in error paths, but will now print a rate limited warning. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-03-17pNFS: return status from nfs4_pnfs_ds_connectWeston Andros Adamson
The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it can wait on another context to reach the same failure. This checks that the rpc_create succeeded and returns the error to the caller. When an error is returned, both the files and flexfiles layouts will return NULL from _prepare_ds(). The flexfiles layout will also return the layout with the error NFS4ERR_NXIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-12-01NFS: Remove unused authflavour parameter from nfs_get_client()Anna Schumaker
This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so let's remove it from this function and callers. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-19NFS pnfs data server multipath session trunkingAndy Adamson
Try all multipath addresses for a data server. The first address that successfully connects and creates a session is the DS mount address. All subsequent addresses are tested for session trunking and added as aliases. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-24Merge branch 'writeback'Trond Myklebust
2016-07-19nfs4: flexfiles: respect noresvport when establishing connections to DSesTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>