summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)Author
2019-02-20SUNRPC: Use poll() to fix up the socket requeue racesTrond Myklebust
Because we clear XPRT_SOCK_DATA_READY before reading, we can end up with a situation where new data arrives, causing xs_data_ready() to queue up a second receive worker job for the same socket, which then immediately gets stuck waiting on the transport receive mutex. The fix is to only clear XPRT_SOCK_DATA_READY once we're done reading, and then to use poll() to check if we might need to queue up a new job in order to deal with any new data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobsTrond Myklebust
Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we ensure memory allocations for asynchronous rpc calls don't ever end up recursing back to the NFS layer for memory reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-16Merge tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull more nfsd fixes from Bruce Fields: "Two small fixes, one for crashes using nfs/krb5 with older enctypes, one that could prevent clients from reclaiming state after a kernel upgrade" * tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux: sunrpc: fix 4 more call sites that were using stack memory with a scatterlist Revert "nfsd4: return default lease period"
2019-02-16Merge tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull more NFS client fixes from Anna Schumaker: "Three fixes this time. Nicolas's is for xprtrdma completion vector allocation on single-core systems. Greg's adds an error check when allocating a debugfs dentry. And Ben's is an additional fix for nfs_page_async_flush() to prevent pages from accidentally getting truncated. Summary: - Make sure Send CQ is allocated on an existing compvec - Properly check debugfs dentry before using it - Don't use page_file_mapping() after removing a page" * tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Don't use page_file_mapping after removing the page rpc: properly check debugfs dentry before using it xprtrdma: Make sure Send CQ is allocated on an existing compvec
2019-02-15sunrpc: fix 4 more call sites that were using stack memory with a scatterlistScott Mayhew
While trying to reproduce a reported kernel panic on arm64, I discovered that AUTH_GSS basically doesn't work at all with older enctypes on arm64 systems with CONFIG_VMAP_STACK enabled. It turns out there still a few places using stack memory with scatterlists, causing krb5_encrypt() and krb5_decrypt() to produce incorrect results (or a BUG if CONFIG_DEBUG_SG is enabled). Tested with cthon on v4.0/v4.1/v4.2 with krb5/krb5i/krb5p using des3-cbc-sha1 and arcfour-hmac-md5. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-14SUNRPC: Use au_rslack when computing reply buffer sizeChuck Lever
au_rslack is significantly smaller than (au_cslack << 2). Using that value results in smaller receive buffers. In some cases this eliminates an extra segment in Reply chunks (RPC/RDMA). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Add rpc_auth::au_ralign fieldChuck Lever
Currently rpc_inline_rcv_pages() uses au_rslack to estimate the size of the upper layer reply header. This is fine for auth flavors where au_verfsize == au_rslack. However, some auth flavors have more going on. krb5i for example has two more words after the verifier, and another blob following the RPC message. The calculation involving au_rslack pushes the upper layer reply header too far into the rcv_buf. au_rslack is still valuable: it's the amount of buffer space needed for the reply, and is used when allocating the reply buffer. We'll keep that. But, add a new field that can be used to properly estimate the location of the upper layer header in each RPC reply, based on the auth flavor in use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Make AUTH_SYS and AUTH_NULL set au_verfsizeChuck Lever
au_verfsize will be needed for a non-flavor-specific computation in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14NFS: Account for XDR pad of buf->pagesChuck Lever
Certain NFS results (eg. READLINK) might expect a data payload that is not an exact multiple of 4 bytes. In this case, XDR encoding is required to pad that payload so its length on the wire is a multiple of 4 bytes. The constants that define the maximum size of each NFS result do not appear to account for this extra word. In each case where the data payload is to be received into pages: - 1 word is added to the size of the receive buffer allocated by call_allocate - rpc_inline_rcv_pages subtracts 1 word from @hdrsize so that the extra buffer space falls into the rcv_buf's tail iovec - If buf->pagelen is word-aligned, an XDR pad is not needed and is thus removed from the tail Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Introduce rpc_prepare_reply_pages()Chuck Lever
prepare_reply_buffer() and its NFSv4 equivalents expose the details of the RPC header and the auth slack values to upper layer consumers, creating a layering violation, and duplicating code. Remedy these issues by adding a new RPC client API that hides those details from upper layers in a common helper function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Add SPDX IDs to some net/sunrpc/auth_gss/ filesChuck Lever
Files under net/sunrpc/auth_gss/ do not yet have SPDX ID tags. This directory is somewhat complicated because most of these files have license boilerplate that is not strictly GPL 2.0. In this patch I add ID tags where there is an obvious match. The less recognizable licenses are still under research. For reference, SPDX IDs added in this patch correspond to the following license text: GPL-2.0 https://spdx.org/licenses/GPL-2.0.html GPL-2.0+ https://spdx.org/licenses/GPL-2.0+.html BSD-3-Clause https://spdx.org/licenses/BSD-3-Clause.html Cc: Simo Sorce <simo@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Remove xdr_buf_trim()Chuck Lever
The key action of xdr_buf_trim() is that it shortens buf->len, the length of the xdr_buf's content. The other actions -- shortening the head, pages, and tail components -- are actually not necessary. In particular, changing the size of those components can corrupt the RPC message contained in the buffer. This is an accident waiting to happen rather than a current bug, as far as we know. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Introduce trace points in rpc_auth_gss.koChuck Lever
Add infrastructure for trace points in the RPC_AUTH_GSS kernel module, and add a few sample trace points. These report exceptional or unexpected events, and observe the assignment of GSS sequence numbers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14SUNRPC: Use struct xdr_stream when decoding RPC Reply headerChuck Lever
Modernize and harden the code path that parses an RPC Reply message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Clean up rpc_verify_header()Chuck Lever
- Recover some instruction count because I'm about to introduce a few xdr_inline_decode call sites - Replace dprintk() call sites with trace points - Reduce the hot path so it fits in fewer cachelines I've also renamed it rpc_decode_header() to match everything else in the RPC client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Use struct xdr_stream when constructing RPC Call headerChuck Lever
Modernize and harden the code path that constructs each RPC Call message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Add build option to disable support for insecure enctypesChuck Lever
Enable distributions to enforce the rejection of ancient and insecure Kerberos enctypes in the kernel's RPCSEC_GSS implementation. These are the single-DES encryption types that were deprecated in 2012 by RFC 6649. Enctypes that were deprecated more recently (by RFC 8429) remain fully supported for now because they are still likely to be widely used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Simo Sorce <simo@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Remove rpc_xprt::tsh_sizeChuck Lever
tsh_size was added to accommodate transports that send a pre-amble before each RPC message. However, this assumes the pre-amble is fixed in size, which isn't true for some transports. That makes tsh_size not very generic. Also I'd like to make the estimation of RPC send and receive buffer sizes more precise. tsh_size doesn't currently appear to be accounted for at all by call_allocate. Therefore let's just remove the tsh_size concept, and make the only transports that have a non-zero tsh_size employ a direct approach. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Remove some dprintk() call sites from auth functionsChuck Lever
Clean up: Reduce dprintk noise by removing dprintk() call sites from hot path that do not report exceptions. These are usually replaceable with function graph tracing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Add trace event that reports reply page vector alignmentChuck Lever
We don't want READ payloads that are partially in the head iovec and in the page buffer because this requires pull-up, which can be expensive. The NFS/RPC client tries hard to predict the size of the head iovec so that the incoming READ data payload lands only in the page vector, but it doesn't always get it right. To help diagnose such problems, add a trace point in the logic that decodes READ-like operations that reports whether pull-up is being done. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Add XDR overflow trace eventChuck Lever
This can help field troubleshooting without needing the overhead of a full network capture (ie, tcpdump). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13SUNRPC: Add xdr_stream::rqst fieldChuck Lever
Having access to the controlling rpc_rqst means a trace point in the XDR code can report: - the XID - the task ID and client ID - the p_name of RPC being processed Subsequent patches will introduce such trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Reduce the doorbell rate (Receive)Chuck Lever
Post RECV WRs in batches to reduce the hardware doorbell rate per transport. This helps the RPC-over-RDMA client scale better in number of transports. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Check inline size before providing a Write chunkChuck Lever
In very rare cases, an NFS READ operation might predict that the non-payload part of the RPC Call is large. For instance, an NFSv4 COMPOUND with a large GETATTR result, in combination with a large Kerberos credential, could push the non-payload part to be several kilobytes. If the non-payload part is larger than the connection's inline threshold, the client is required to provision a Reply chunk. The current Linux client does not check for this case. There are two obvious ways to handle it: a. Provision a Write chunk for the payload and a Reply chunk for the non-payload part b. Provision a Reply chunk for the whole RPC Reply Some testing at a recent NFS bake-a-thon showed that servers can mostly handle a. but there are some corner cases that do not work yet. b. already works (it has to, to handle krb5i/p), but could be somewhat less efficient. However, I expect this scenario to be very rare -- no-one has reported a problem yet. So I'm going to implement b. Sometime later I will provide some patches to help make b. a little more efficient by more carefully choosing the Reply chunk's segment sizes to ensure the payload is optimally aligned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13xprtrdma: Fix sparse warningsChuck Lever
linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: got restricted __be32 [usertype] rq_xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: got restricted __be32 [usertype] rq_xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types) linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: expected unsigned int [usertype] xid linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: got restricted __be32 [usertype] rq_xid Fixes: 0a93fbcb16e6 ("xprtrdma: Plant XID in on-the-wire RDMA ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12rpc: properly check debugfs dentry before using itGreg Kroah-Hartman
debugfs can now report an error code if something went wrong instead of just NULL. So if the return value is to be used as a "real" dentry, it needs to be checked if it is an error before dereferencing it. This is now happening because of ff9fb72bc077 ("debugfs: return error values, not NULL"), but why debugfs files are not being created properly is an older issue, probably one that has always been there and should probably be looked at... Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Cc: netdev@vger.kernel.org Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12xprtrdma: Make sure Send CQ is allocated on an existing compvecNicolas Morey-Chaisemartin
Make sure the device has at least 2 completion vectors before allocating to compvec#1 Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode) Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-06svcrdma: Remove syslog warnings in work completion handlersChuck Lever
These can result in a lot of log noise, and are able to be triggered by client misbehavior. Since there are trace points in these handlers now, there's no need to spam the log. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabledChuck Lever
CC [M] net/sunrpc/xprtrdma/svc_rdma_transport.o linux/net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’: linux/net/sunrpc/xprtrdma/svc_rdma_transport.c:452:19: warning: variable ‘sap’ set but not used [-Wunused-but-set-variable] struct sockaddr *sap; ^ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Use struct_size() in kmalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrpc: fix unlikely races preventing queueing of socketsJ. Bruce Fields
In the rpc server, When something happens that might be reason to wake up a thread to do something, what we do is - modify xpt_flags, sk_sock->flags, xpt_reserved, or xpt_nr_rqsts to indicate the new situation - call svc_xprt_enqueue() to decide whether to wake up a thread. svc_xprt_enqueue may require multiple conditions to be true before queueing up a thread to handle the xprt. In the SMP case, one of the other CPU's may have set another required condition, and in that case, although both CPUs run svc_xprt_enqueue(), it's possible that neither call sees the writes done by the other CPU in time, and neither one recognizes that all the required conditions have been set. A socket could therefore be ignored indefinitely. Add memory barries to ensure that any svc_xprt_enqueue() call will always see the conditions changed by other CPUs before deciding to ignore a socket. I've never seen this race reported. In the unlikely event it happens, another event will usually come along and the problem will fix itself. So I don't think this is worth backporting to stable. Chuck tried this patch and said "I don't see any performance regressions, but my server has only a single last-level CPU cache." Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrpc: svc_xprt_has_something_to_do seems a little longJ. Bruce Fields
The long name seemed cute till I wanted to refer to it somewhere else. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()Trond Myklebust
Use READ_ONCE() to tell the compiler to not optimse away the read of xprt->xpt_flags in svc_xprt_release_slot(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Remove max_sge check at connect timeChuck Lever
Two and a half years ago, the client was changed to use gathered Send for larger inline messages, in commit 655fec6987b ("xprtrdma: Use gathered Send for large inline messages"). Several fixes were required because there are a few in-kernel device drivers whose max_sge is 3, and these were broken by the change. Apparently my memory is going, because some time later, I submitted commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma: Reduce max_send_sges"). These too incorrectly assumed in-kernel device drivers would have more than a few Send SGEs available. The fix for the server side is not the same. This is because the fundamental problem on the server is that, whether or not the client has provisioned a chunk for the RPC reply, the server must squeeze even the most complex RPC replies into a single RDMA Send. Failing in the send path because of Send SGE exhaustion should never be an option. Therefore, instead of failing when the send path runs out of SGEs, switch to using a bounce buffer mechanism to handle RPC replies that are too complex for the device to send directly. That allows us to remove the max_sge check to enable drivers with small max_sge to work again. Reported-by: Don Dutile <ddutile@redhat.com> Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-01-15SUNRPC: Address Kerberos performance/behavior regressionChuck Lever
When using Kerberos with v4.20, I've observed frequent connection loss on heavy workloads. I traced it down to the client underrunning the GSS sequence number window -- NFS servers are required to drop the RPC with the low sequence number, and also drop the connection to signal that an RPC was dropped. Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for interactive tasks"). I've got a one-line workaround for this issue, which is easy to backport to v4.20 while a more permanent solution is being derived. Essentially, tk_owner-based sorting is disabled for RPCs that carry a GSS sequence number. Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-15SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limitTrond Myklebust
According to RFC2203, the RPCSEC_GSS sequence numbers are bounded to an upper limit of MAXSEQ = 0x80000000. Ensure that we handle that correctly. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-15SUNRPC: Ensure rq_bytes_sent is reset before request transmissionTrond Myklebust
When we resend a request, ensure that the 'rq_bytes_sent' is reset to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-09sunrpc: kernel BUG at kernel/cred.c:825!Santosh kumar pradhan
Init missing debug member magic with CRED_MAGIC. Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-08SUNRPC: Fix TCP receive code on archs with flush_dcache_page()Trond Myklebust
After receiving data into the page cache, we need to call flush_dcache_page() for the architectures that define it. Fixes: 277e4ab7d530b ("SUNRPC: Simplify TCP receive code by switching...") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20 Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-08xprtrdma: Double free in rpcrdma_sendctxs_create()Dan Carpenter
The clean up is handled by the caller, rpcrdma_buffer_create(), so this call to rpcrdma_sendctxs_destroy() leads to a double free. Fixes: ae72950abf99 ("xprtrdma: Add data structure to manage RDMA Send arguments") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-08xprtrdma: Fix error code in rpcrdma_buffer_create()Dan Carpenter
This should return -ENOMEM if __alloc_workqueue_key() fails, but it returns success. Fixes: 6d2d0ee27c7a ("xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ...
2019-01-02Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
2019-01-02Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...
2019-01-02sunrpc: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: Add xprt after nfs4_test_session_trunk()Santosh kumar pradhan
Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify the path along with check for session trunking. Add the xprt in nfs4_test_session_trunk() only when nfs4_detect_session_trunking() returns success. Also release refcount hold by rpc_clnt_setup_test_and_add_xprt(). Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com> Tested-by: Suresh Jayaraman <suresh.jayaraman@wdc.com> Reported-by: Aditya Agnihotri <aditya.agnihotri@wdc.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFSJ. Bruce Fields
It's OK to sleep here, we just don't want to recurse into the filesystem as a writeout could be waiting on this. Future work: the documentation for GFP_NOFS says "Please try to avoid using this flag directly and instead use memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't recurse into the FS layer with a short explanation why. All allocation requests will inherit GFP_NOFS implicitly." But I'm not sure where to do this. Should the workqueue be arranging that for us in the case of workqueues created with WQ_MEM_RECLAIM? Reported-by: Trond Myklebust <trondmy@hammer.space> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: handle ENOMEM in rpcb_getport_asyncJ. Bruce Fields
If we ignore the error we'll hit a null dereference a little later. Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever
If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>