summaryrefslogtreecommitdiff
path: root/fs/lockd/mon.c
AgeCommit message (Collapse)Author
2018-01-14lockd: convert nsm_handle.sm_count from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsm_handle.sm_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsm_handle.sm_count it might make a difference in following places: - nsm_release(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No change for the spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-11-27lockd: remove net pointer from messagesVasily Averin
Publishing of net pointer is not safe, use net->ns.inum as net ID in debug messages [ 171.757678] lockd_up_net: per-net data created; net=f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) [ 300.653313] lockd: nuking all hosts in net f00001e7... [ 300.653641] lockd: host garbage collection for net f00001e7 [ 300.653968] lockd: nlmsvc_mark_resources for net f00001e7 [ 300.711483] lockd_down_net: per-net data destroyed; net=f00001e7 [ 300.711847] lockd: nuking all hosts in net 0... [ 300.711847] lockd: host garbage collection for net 0 [ 300.711848] lockd: nlmsvc_mark_resources for net 0 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-15sunrpc: mark all struct rpc_procinfo instances as constChristoph Hellwig
struct rpc_procinfo contains function pointers, and marking it as constant avoids it being able to be used as an attach vector for code injections. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-15sunrpc: move p_count out of struct rpc_procinfoChristoph Hellwig
p_count is the only writeable memeber of struct rpc_procinfo, which is a good candidate to be const-ified as it contains function pointers. This patch moves it into out out struct rpc_procinfo, and into a separate writable array that is pointed to by struct rpc_version and indexed by p_statidx. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15lockd: fix some weird indentationChristoph Hellwig
Remove double indentation of a few struct rpc_version and struct rpc_program instance. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-15lockd: fix decoder callback prototypesChristoph Hellwig
Declare the p_decode callbacks with the proper prototype instead of casting to kxdrdproc_t and losing all type safety. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-15lockd: fix encoder callback prototypesChristoph Hellwig
Declare the p_encode callbacks with the proper prototype instead of casting to kxdreproc_t and losing all type safety. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-23lockd: get rid of reference-counted NSM RPC clientsAndrey Ryabinin
Currently we have reference-counted per-net NSM RPC client which created on the first monitor request and destroyed after the last unmonitor request. It's needed because RPC client need to know 'utsname()->nodename', but utsname() might be NULL when nsm_unmonitor() called. So instead of holding the rpc client we could just save nodename in struct nlm_host and pass it to the rpc_create(). Thus ther is no need in keeping rpc client until last unmonitor request. We could create separate RPC clients for each monitor/unmonitor requests. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-12lockd: create NSM handles per net namespaceAndrey Ryabinin
Commit cb7323fffa85 ("lockd: create and use per-net NSM RPC clients on MON/UNMON requests") introduced per-net NSM RPC clients. Unfortunately this doesn't make any sense without per-net nsm_handle. E.g. the following scenario could happen Two hosts (X and Y) in different namespaces (A and B) share the same nsm struct. 1. nsm_monitor(host_X) called => NSM rpc client created, nsm->sm_monitored bit set. 2. nsm_mointor(host-Y) called => nsm->sm_monitored already set, we just exit. Thus in namespace B ln->nsm_clnt == NULL. 3. host X destroyed => nsm->sm_count decremented to 1 4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr dereference of *ln->nsm_clnt So this could be fixed by making per-net nsm_handles list, instead of global. Thus different net namespaces will not be able share the same nsm_handle. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-03SUNRPC: NULL utsname dereference on NFS umount during namespace cleanupTrond Myklebust
Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, which now apparently happens after the utsname has been freed. Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable@vger.kernel.org # 3.18 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-06lockd: ratelimit "lockd: cannot monitor" messagesJeff Layton
When lockd can't talk to a remote statd, it'll spew a warning message to the ring buffer. If the application is really hammering on locks however, it's possible for that message to spam the logs. Ratelimit it to minimize the potential for harm. Reported-by: Ian Collier <imc@cs.ox.ac.uk> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-24lockd: Try to reconnect if statd has movedBenjamin Coddington
If rpc.statd is restarted, upcalls to monitor hosts can fail with ECONNREFUSED. In that case force a lookup of statd's new port and retry the upcall. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-23fs: lockd: Use ktime_get_ns()Thomas Gleixner
Replace the ever recurring: ts = ktime_get_ts(); ns = timespec_to_ns(&ts); with ns = ktime_get_ns(); Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-02-05sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to ↵Jeff Layton
addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-04lockd: Remove trivial BUG_ON()s from the NSM codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-24LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zeroTrond Myklebust
The current code is clearing it in all cases _except_ when zero. Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-10-24LOCKD: fix races in nsm_client_getTrond Myklebust
Commit e9406db20fecbfcab646bad157b4cfdc7cadddfb (lockd: per-net NSM client creation and destruction helpers introduced) contains a nasty race on initialisation of the per-net NSM client because it doesn't check whether or not the client is set after grabbing the nsm_create_mutex. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-10-01lockd: create and use per-net NSM RPC clients on MON/UNMON requestsStanislav Kinsbursky
NSM RPC client can be required on NFSv3 umount, when child reaper is dying (and destroying it's mount namespace). It means, that current nsproxy is set to NULL already, but creation of RPC client requires UTS namespace for gaining hostname string. This patch creates reference-counted per-net NSM client on first monitor request and destroys it after last unmonitor request. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01lockd: use rpc client's cl_nodename for id encodingStanislav Kinsbursky
Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01lockd: per-net NSM client creation and destruction helpers introducedStanislav Kinsbursky
NSM RPC client can be required on NFSv3 umount, when child reaper is dying (and destroying it's mount namespace). It means, that current nsproxy is set to NULL already, but creation of RPC client requires UTS namespace for gaining hostname string. This patch introduces reference counted NFS RPC clients creation and destruction helpers (similar to RPCBIND RPC clients). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15LockD: make NSM network namespace awareStanislav Kinsbursky
NLM host is network namespace aware now. So NSM have to take it into account. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31SUNRPC: constify the rpc_programTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-13module_param: make bool parameters really bool (drivers & misc)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-12-16SUNRPC: New xdr_streams XDR decoder APIChuck Lever
Now that all client-side XDR decoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst *, __be32 *, RPC res *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each decoder function. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16SUNRPC: New xdr_streams XDR encoder APIChuck Lever
Now that all client-side XDR encoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst *, __be32 *, RPC arg *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each encoder function. Also, all the client-side encoder functions return 0 now, making a return value superfluous. Take this opportunity to convert them to return void instead. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16NSM: Avoid return code checking in NSM XDR encoder functionsChuck Lever
Clean up. The trend in the other XDR encoder functions is to BUG() when encoding problems occur, since a problem here is always due to a local coding error. Then, instead of a status, zero is unconditionally returned. Update the NSM XDR encoders to behave this way. To finish the update, use the new-style be32_to_cpup() and cpu_to_be32() macros, and compute the buffer sizes using raw integers instead of sizeof(). This matches the conventions used in other XDR functions Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-01sunrpc: Add net to rpc_create_argsPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-08lockd: don't clear sm_monitored on nsm_reboot_lookupJeff Layton
When lockd gets a notify downcall from statd, it'll search its hosts cache and then clear the sm_monitored bit on the host it finds. The idea is apparently to make lockd redo a SM_MON on the next lock request. This is unnecessary and causes the kernel's NSM cache to go out of sync with statd. statd doesn't stop monitoring a host when it gets a SM_NOTIFY and there's no guarantee that another lock will occur after the reclaim and before the unmount. In that event, no SM_UNMON will occur. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-21sunrpc: add routine for comparing addressesJeff Layton
lockd needs these sort of routines, as does the NFSv4 callback code. Move lockd's routines into common code and rename them so that they can be used by others. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-09lockd: Replace nsm_display_address() with rpc_ntop()Chuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17lockd: Don't bother with RPC ping for NSM upcallsChuck Lever
Cut NSM upcall RPC traffic in half -- don't do a NULL call first. The cases where a ping would be helpful are rare. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17lockd: Update NSM state from SM_MON repliesChuck Lever
When rpc.statd starts up in user space at boot time, it attempts to write the latest NSM local state number into /proc/sys/fs/nfs/nsm_local_state. If lockd.ko isn't loaded yet (as is the case in most configurations), that file doesn't exist, thus the kernel's NSM state remains set to its initial value of zero during lockd operation. This is a problem because rpc.statd and lockd use the NSM state number to prevent repeated lock recovery on rebooted hosts. If lockd sends a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state number is received, there is no way for lockd or rpc.statd to distinguish that stale SM_NOTIFY from an actual reboot. Thus lock recovery could be performed after the rebooted host has already started reclaiming locks, and those locks will be lost. We could change /etc/init.d/nfslock so it always modprobes lockd.ko before starting rpc.statd. However, if lockd.ko is ever unloaded and reloaded, we are back at square one, since the NSM state is not preserved across an unload/reload cycle. This may happen frequently on clients that use automounter. A period of NFS inactivity causes lockd.ko to be unloaded, and the kernel loses its NSM state setting. Instead, let's use the fact that rpc.statd plants the local system's NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a synchronous SM_MON upcall to the local rpc.statd _before_ sending its first NLM request to a new remote. This would permit rpc.statd to provide the current NSM state to lockd, even after lockd.ko had been unloaded and reloaded. Note that NLMPROC_LOCK arguments are constructed before the nsm_monitor() call, so we have to rearrange argument construction very slightly to make this all work out. And, the kernel appears to treat NSM state as a u32 (see struct nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure we don't get bogus comparison results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-04-01NSM: Fix unaligned accesses in nsm_init_private()Mans Rullgard
This fixes unaligned accesses in nsm_init_private() when creating nlm_reboot keys. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-01-06NSM: Move nsm_create()Chuck Lever
Clean up: one last thing... relocate nsm_create() to eliminate the forward declaration and group it near the only function that actually uses it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Move nsm_use_hostnames to mon.cChuck Lever
Clean up. Treat the nsm_use_hostnames global variable like nsm_local_state. Note that the default value of nsm_use_hostnames is still zero. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Move nsm_addr() to fs/lockd/mon.cChuck Lever
Clean up: nsm_addr_in() is no longer used, and nsm_addr() is used only in fs/lockd/mon.c, so move it there. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Remove include/linux/lockd/sm_inter.hChuck Lever
Clean up: The include/linux/lockd/sm_inter.h header is nearly empty now. Remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Replace IP address as our nlm_reboot lookup keyChuck Lever
NLM provides file locking services for NFS files. Part of this service includes a second protocol, known as NSM, which is a reboot notification service. NLM uses this service to determine when to reclaim locks or enter a grace period after a client or server reboots. The NLM service (implemented by lockd in the Linux kernel) contacts the local NSM service (implemented by rpc.statd in Linux user space) via NSM protocol upcalls to register a callback when a particular remote peer reboots. To match the callback to the correct remote peer, the NLM service constructs a cookie that it passes in the request. The NSM service passes that cookie back to the NLM service when it is notified that the given remote peer has indeed rebooted. Currently on Linux, the cookie is the raw 32-bit IPv4 address of the remote peer. To support IPv6 addresses, which are larger, we could use all 16 bytes of the cookie to represent a full IPv6 address, although we still can't represent an IPv6 address with a scope ID in just 16 bytes. Instead, to avoid the need for future changes to support additional address types, we'll use a manufactured value for the cookie, and use that to find the corresponding nsm_handle struct in the kernel during the NLMPROC_SM_NOTIFY callback. This should provide complete support in the kernel's NSM implementation for IPv6 hosts, while remaining backwards compatible with older rpc.statd implementations. Note we also deal with another case where nsm_use_hostnames can change while there are outstanding notifications, possibly resulting in the loss of reboot notifications. After this patch, the priv cookie is always used to lookup rebooted hosts in the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: More clean up of nsm_get_handle()Chuck Lever
Clean up: refactor nsm_get_handle() so it is organized the same way that nsm_reboot_lookup() is. There is an additional micro-optimization here. This change moves the "hostname & nsm_use_hostnames" test out of the list_for_each_entry() clause in nsm_get_handle(), since it is loop-invariant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Refactor nsm_handle creation into a helper functionChuck Lever
Clean up. Refactor the creation of nsm_handles into a helper. Fields are initialized in increasing address order to make efficient use of CPU caches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NLM: Remove "create" argument from nsm_find()Chuck Lever
Clean up: nsm_find() now has only one caller, and that caller unconditionally sets the @create argument. Thus the @create argument is no longer needed. Since nsm_find() now has a more specific purpose, pick a more appropriate name for it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Add nsm_lookup() functionChuck Lever
Introduce a new API to fs/lockd/mon.c that allows nlm_host_rebooted() to lookup up nsm_handles via the contents of an nlm_reboot struct. The new function is equivalent to calling nsm_find() with @create set to zero, but it takes a struct nlm_reboot instead of separate arguments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Encode the new "priv" cookie for NSMPROC_MON requestsChuck Lever
Pass the new "priv" cookie to NSMPROC_MON's XDR encoder, instead of creating the "priv" argument in the encoder at call time. This patch should not cause a behavioral change: the contents of the cookie remain the same for the time being. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Generate NSMPROC_MON's "priv" argument when nsm_handle is createdChuck Lever
Introduce a new data type, used by both the in-kernel NLM and NSM implementations, that is used to manage the opaque "priv" argument for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls. Construct the "priv" cookie when the nsm_handle is created. The nsm_init_private() function may look a little strange, but it is roughly equivalent to how the XDR encoder formed the "priv" argument. It's going to go away soon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Remove !nsm check from nsm_release()Chuck Lever
The nsm_release() function should never be called with a NULL handle point. If it is, that's a bug. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Remove NULL pointer check from nsm_find()Chuck Lever
The nsm_find() function should never be called with a NULL IP address pointer. If it is, that's a bug. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Add dprintk() calls in nsm_find and nsm_releaseChuck Lever
Introduce some dprintk() calls in fs/lockd/mon.c that are enabled by the NLMDBG_MONITOR flag. These report when we find, create, and release nsm_handles. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06NSM: Move nsm_find() to fs/lockd/mon.cChuck Lever
The nsm_find() function sets up fresh nsm_handle entries. This is where we will store the "priv" cookie used to lookup nsm_handles during reboot recovery. The cookie will be constructed when nsm_find() creates a new nsm_handle. As much as possible, I would like to keep everything that handles a "priv" cookie in fs/lockd/mon.c so that all the smarts are in one source file. That organization should make it pretty simple to see how all this works. To me, it makes more sense than the current arrangement to keep nsm_find() with nsm_monitor() and nsm_unmonitor(). So, start reorganizing by moving nsm_find() into fs/lockd/mon.c. The nsm_release() function comes along too, since it shares the nsm_lock global variable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>