summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)Author
2019-07-03nfsd: allow forced expiration of NFSv4 clientsJ. Bruce Fields
NFSv4 clients are automatically expired and all their locks removed if they don't contact the server for a certain amount of time (the lease period, 90 seconds by default). There can still be situations where that's not enough, so allow userspace to force expiry by writing "expire\n" to the new nfsd/client/#/ctl file. (The generic "ctl" name is because I expect we may want to allow other operations on clients in the future.) The write will not return until the client is expired and all of its locks and other state removed. The fault injection code also provides a way of expiring clients, but it fails if there are any in-progress RPC's referencing the client. Also, its method of selecting a client to expire is a little more primitive--it uses an IP address, which can't always uniquely specify an NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: create get_nfsdfs_clp helperJ. Bruce Fields
Factor our some common code. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: show layout stateidsJ. Bruce Fields
These are also minimal for now, I'm not sure what information would be useful. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: show lock and deleg stateidsJ. Bruce Fields
These entries are pretty minimal for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: add file to display list of client's opensJ. Bruce Fields
Add a nfsd/clients/#/opens file to list some information about all the opens held by the given client, including open modes, device numbers, inode numbers, and open owners. Open owners are totally opaque but seem to sometimes have some useful ascii strings included, so passing through printable ascii characters and escaping the rest seems useful while still being machine-readable. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: add more information to client info fileJ. Bruce Fields
Add ip address, full client-provided identifier, and minor version. There's much more that could possibly be useful but this is a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: copy client's address including port number to cl_addrJ. Bruce Fields
rpc_copy_addr() copies only the IP address and misses any port numbers. It seems potentially useful to keep the port number around too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: add a client info fileJ. Bruce Fields
Add a new nfsd/clients/#/info file with some basic information about each NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: make client/ directory names small intsJ. Bruce Fields
We want clientid's on the wire to be randomized for reasons explained in ebd7c72c63ac "nfsd: randomize SETCLIENTID reply to help distinguish servers". But I'd rather have mostly small integers for the clients/ directory. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: add nfsd/clients directoryJ. Bruce Fields
I plan to expose some information about nfsv4 clients here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: use reference count to free clientJ. Bruce Fields
Keep a second reference count which is what is really used to decide when to free the client's memory. Next I'm going to add an nfsd/clients/ directory with a subdirectory for each NFSv4 client. File objects under nfsd/clients/ will hold these references. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: rename cl_refcountJ. Bruce Fields
Rename this to a more descriptive name: it counts the number of in-progress rpc's referencing this client. Next I'm going to add a second refcount with a slightly different use. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: persist nfsd filesystem across mountsJ. Bruce Fields
Keep around one internal mount of the nfsd filesystem so that we can add stuff to it when clients come and go, regardless of whether anyone has it mounted. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: fix cleanup of nfsd_reply_cache_init on failureJ. Bruce Fields
The failure to unregister the shrinker results will result in corruption when the nfsd_net is freed. Also clean up the drc_slab while we're here. Reported-by: syzbot+83a43746cebef3508b49@syzkaller.appspotmail.com Fixes: db17b61765c2 ("nfsd4: drc containerization") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: remove outdated nfsd4_decode_time commentJ. Bruce Fields
Commit bf8d909705e "nfsd: Decode and send 64bit time values" fixed the code without updating the comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: use 64-bit seconds fields in nfsd v4 codeJ. Bruce Fields
After commit 95582b008388 "vfs: change inode times to use struct timespec64" there are spots in the NFSv4 decoding where we decode the protocol into a struct timeval and then convert that into a timeval64. That's unnecesary in the NFSv4 case since the on-the-wire protocol also uses 64-bit values. So just fix up our code to use timeval64 everywhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: Spelling s/EACCESS/EACCES/Geert Uytterhoeven
The correct spelling is EACCES: include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: note inadequate stats lockingJ. Bruce Fields
After 89a26b3d295d "nfsd: split DRC global spinlock into per-bucket locks", there is no longer a single global spinlock to protect these stats. So, really we need to fix that. For now, at least fix the comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: drc containerizationJ. Bruce Fields
The nfsd duplicate reply cache should not be shared between network namespaces. The most straightforward way to fix this is just to move every global in the code to per-net-namespace memory, so that's what we do. Still todo: sort out which members of nfsd_stats should be global and which per-net-namespace. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: don't call nfsd_reply_cache_shutdown twiceJ. Bruce Fields
The caller is cleaning up on ENOMEM, don't try to do it here too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel
Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20190612152603.GB18440@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25vfs: Convert nfsctl to use the new mount APIDavid Howells
Convert the nfsctl filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: "J. Bruce Fields" <bfields@fieldses.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-nfs@vger.kernel.org
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option [no]_[pad]_[ctrl] any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 176 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.652910950@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-15Merge tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This consists mostly of nfsd container work: Scott Mayhew revived an old api that communicates with a userspace daemon to manage some on-disk state that's used to track clients across server reboots. We've been using a usermode_helper upcall for that, but it's tough to run those with the right namespaces, so a daemon is much friendlier to container use cases. Trond fixed nfsd's handling of user credentials in user namespaces. He also contributed patches that allow containers to support different sets of NFS protocol versions. The only remaining container bug I'm aware of is that the NFS reply cache is shared between all containers. If anyone's aware of other gaps in our container support, let me know. The rest of this is miscellaneous bugfixes" * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits) nfsd: update callback done processing locks: move checks from locks_free_lock() to locks_release_private() nfsd: fh_drop_write in nfsd_unlink nfsd: allow fh_want_write to be called twice nfsd: knfsd must use the container user namespace SUNRPC: rsi_parse() should use the current user namespace SUNRPC: Fix the server AUTH_UNIX userspace mappings lockd: Pass the user cred from knfsd when starting the lockd server SUNRPC: Temporary sockets should inherit the cred from their parent SUNRPC: Cache the process user cred in the RPC server listener nfsd: Allow containers to set supported nfs versions nfsd: Add custom rpcbind callbacks for knfsd SUNRPC: Allow further customisation of RPC program registration SUNRPC: Clean up generic dispatcher code SUNRPC: Add a callback to initialise server requests SUNRPC/nfs: Fix return value for nfs4_callback_compound() nfsd: handle legacy client tracking records sent by nfsdcld nfsd: re-order client tracking method selection nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld nfsd: un-deprecate nfsdcld ...
2019-05-09Merge tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - Fall back to MDS if no deviceid is found rather than aborting # v4.11+ - NFS4: Fix v4.0 client state corruption when mount Features: - Much improved handling of soft mounts with NFS v4.0: - Reduce risk of false positive timeouts - Faster failover of reads and writes after a timeout - Added a "softerr" mount option to return ETIMEDOUT instead of EIO to the application after a timeout - Increase number of xprtrdma backchannel requests - Add additional xprtrdma tracepoints - Improved send completion batching for xprtrdma Other bugfixes and cleanups: - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode - Reduce usage of GFP_ATOMIC pages in SUNRPC - Various minor NFS over RDMA cleanups and bugfixes - Use the correct container namespace for upcalls - Don't share superblocks between user namespaces - Various other container fixes - Make nfs_match_client() killable to prevent soft lockups - Don't mark all open state for recovery when handling recallable state revoked flag" * tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (69 commits) SUNRPC: Rebalance a kref in auth_gss.c NFS: Fix a double unlock from nfs_match,get_client nfs: pass the correct prototype to read_cache_page NFSv4: don't mark all open state for recovery when handling recallable state revoked flag SUNRPC: Fix an error code in gss_alloc_msg() SUNRPC: task should be exit if encode return EKEYEXPIRED more times NFS4: Fix v4.0 client state corruption when mount PNFS fallback to MDS if no deviceid found NFS: make nfs_match_client killable lockd: Store the lockd client credential in struct nlm_host NFS: When mounting, don't share filesystems between different user namespaces NFS: Convert NFSv2 to use the container user namespace NFSv4: Convert the NFS client idmapper to use the container user namespace NFS: Convert NFSv3 to use the container user namespace SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall SUNRPC: Use the client user namespace when encoding creds NFS: Store the credential of the mount process in the nfs_server SUNRPC: Cache cred of process creating the rpc_client xprtrdma: Remove stale comment xprtrdma: Update comments that reference ib_drain_qp ...
2019-05-07Merge tag 'Wimplicit-fallthrough-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...
2019-05-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
2019-05-03nfsd: update callback done processingScott Mayhew
Instead of having the convention where individual nfsd4_callback_ops->done operations return -1 to indicate the callback path is down, move the check to nfsd4_cb_done. Only mark the callback path down on transport-level errors, not NFS-level errors. The existing logic causes the server to set SEQ4_STATUS_CB_PATH_DOWN just because the client returned an error to a CB_RECALL for a delegation that the client had already done a FREE_STATEID for. But clearly that error doesn't mean that there's anything wrong with the backchannel. Additionally, handle NFS4ERR_DELAY in nfsd4_cb_recall_done. The client returns NFS4ERR_DELAY if it is already in the process of returning the delegation. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-26SUNRPC: Cache cred of process creating the rpc_clientTrond Myklebust
When converting kuids to AUTH_UNIX creds, etc we will want to use the same user namespace as the process that created the rpc client. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25SUNRPC: Fix up task signallingTrond Myklebust
The RPC_TASK_KILLED flag should really not be set from another context because it can clobber data in the struct task when task->tk_flags is changed non-atomically. Let's therefore swap out RPC_TASK_KILLED with an atomic flag, and add a function to set that flag and safely wake up the task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-24nfsd: fh_drop_write in nfsd_unlinkJ. Bruce Fields
fh_want_write() can now be called twice, but I'm also fixing up the callers not to do that. Other cases include setattr and create. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: allow fh_want_write to be called twiceJ. Bruce Fields
A fuzzer recently triggered lockdep warnings about potential sb_writers deadlocks caused by fh_want_write(). Looks like we aren't careful to pair each fh_want_write() with an fh_drop_write(). It's not normally a problem since fh_put() will call fh_drop_write() for us. And was OK for NFSv3 where we'd do one operation that might call fh_want_write(), and then put the filehandle. But an NFSv4 protocol fuzzer can do weird things like call unlink twice in a compound, and then we get into trouble. I'm a little worried about this approach of just leaving everything to fh_put(). But I think there are probably a lot of fh_want_write()/fh_drop_write() imbalances so for now I think we need it to be more forgiving. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: knfsd must use the container user namespaceTrond Myklebust
Convert knfsd to use the user namespace of the container that started the server processes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24lockd: Pass the user cred from knfsd when starting the lockd serverTrond Myklebust
When starting up a new knfsd server, pass the user cred to the supporting lockd server. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24SUNRPC: Cache the process user cred in the RPC server listenerTrond Myklebust
In order to be able to interpret uids and gids correctly in knfsd, we should cache the user namespace of the process that created the RPC server's listener. To do so, we refcount the credential of that process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: Allow containers to set supported nfs versionsTrond Myklebust
Support use of the --nfs-version/--no-nfs-version arguments to rpc.nfsd in containers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: Add custom rpcbind callbacks for knfsdTrond Myklebust
Add custom rpcbind callbacks in preparation for the knfsd per-container version feature. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24SUNRPC: Allow further customisation of RPC program registrationTrond Myklebust
Add a callback to allow customisation of the rpcbind registration. When clients have the ability to turn on and off version support, we want to allow them to also prevent registration of those versions with the rpc portmapper. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24SUNRPC: Add a callback to initialise server requestsTrond Myklebust
Add a callback to help initialise server requests before they are processed. This will allow us to clean up the NFS server version support, and to make it container safe. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: handle legacy client tracking records sent by nfsdcldScott Mayhew
The new nfsdcld will do a one-time "upgrade" where it searches for records from nfsdcltrack and the legacy tracking during startup. Legacy records will be prefixed with the string "hash:", which we need to strip off before adding to the reclaim_str_hashtbl. When legacy records are encountered, set the new cn_has_legacy flag in the cld_net. When this flag is set, if the search for a reclaim record based on the client name string fails, then do a second search based on the hash of the name string. Note that if there are legacy records then the grace period will not be lifted early via the tracking of RECLAIM_COMPLETEs. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: re-order client tracking method selectionScott Mayhew
The new order is first nfsdcld, then the UMH upcall, and finally the legacy tracking method. Added some printk's to the tracking initialization functions so it's clear which tracking method was ultimately selected. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcldScott Mayhew
When using nfsdcld for NFSv4 client tracking, track the number of RECLAIM_COMPLETE operations we receive from "known" clients to help in deciding if we can lift the grace period early (or whether we need to start a v4 grace period at all). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: un-deprecate nfsdcldScott Mayhew
When nfsdcld was released, it was quickly deprecated in favor of the nfsdcltrack usermodehelper, so as to not require another running daemon. That prevents NFSv4 clients from reclaiming locks from nfsd's running in containers, since neither nfsdcltrack nor the legacy client tracking code work in containers. This commit un-deprecates the use of nfsdcld, with one twist: we will populate the reclaim_str_hashtbl on startup. During client tracking initialization, do an upcall ("GraceStart") to nfsdcld to get a list of clients from the database. nfsdcld will do one downcall with a status of -EINPROGRESS for each client record in the database, which in turn will cause an nfs4_client_reclaim to be added to the reclaim_str_hashtbl. When complete, nfsdcld will do a final downcall with a status of 0. This will save nfsd from having to do an upcall to the daemon during nfs4_check_open_reclaim() processing. Even though nfsdcld was quickly deprecated, there is a very small chance of old nfsdcld daemons running in the wild. These will respond to the new "GraceStart" upcall with -EOPNOTSUPP, in which case we will log a message and fall back to the original nfsdcld tracking ops (now called nfsd4_cld_tracking_ops_v0). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: make nfs4_client_reclaim use an xdr_netobj instead of a fixed char arrayScott Mayhew
This will allow the reclaim_str_hashtbl to store either the recovery directory names used by the legacy client tracking code or the full client strings used by the nfsdcld client tracking code. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: avoid uninitialized variable warningArnd Bergmann
clang warns that 'contextlen' may be accessed without an initialization: fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized] contextlen); ^~~~~~~~~~ fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning int contextlen; ^ = 0 Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled. Adding another #ifdef like the other two in this function avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-22nfsd: wake blocked file lock waiters before sending callbackJeff Layton
When a blocked NFS lock is "awoken" we send a callback to the server and then wake any hosts waiting on it. If a client attempts to get a lock and then drops off the net, we could end up waiting for a long time until we end up waking locks blocked on that request. So, wake any other waiting lock requests before sending the callback. Do this by calling locks_delete_block in a new "prepare" phase for CB_NOTIFY_LOCK callbacks. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>