summaryrefslogtreecommitdiff
path: root/lib/iov_iter.c
AgeCommit message (Collapse)Author
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-03iov_iter: transparently handle compat iovecs in import_iovecChristoph Hellwig
Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03iov_iter: refactor rw_copy_check_uvector and import_iovecChristoph Hellwig
Split rw_copy_check_uvector into two new helpers with more sensible calling conventions: - iovec_from_user copies a iovec from userspace either into the provided stack buffer if it fits, or allocates a new buffer for it. Returns the actually used iovec. It also verifies that iov_len does fit a signed type, and handles compat iovecs if the compat flag is set. - __import_iovec consolidates the native and compat versions of import_iovec. It calls iovec_from_user, then validates each iovec actually points to user addresses, and ensures the total length doesn't overflow. This has two major implications: - the access_process_vm case loses the total lenght checking, which wasn't required anyway, given that each call receives two iovecs for the local and remote side of the operation, and it verifies the total length on the local side already. - instead of a single loop there now are two loops over the iovecs. Given that the iovecs are cache hot this doesn't make a major difference Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-25iov_iter: move rw_copy_check_uvector() into lib/iov_iter.cDavid Laight
This lets the compiler inline it into import_iovec() generating much better code. Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-20saner calling conventions for csum_and_copy_..._user()Al Viro
All callers of these primitives will * discard anything we might've copied in case of error * ignore the csum value in case of error * always pass 0xffffffff as the initial sum, so the resulting csum value (in case of success, that is) will never be 0. That suggest the following calling conventions: * don't pass err_ptr - just return 0 on error. * don't bother with zeroing destination, etc. in case of error * don't pass the initial sum - just use 0xffffffff. This commit does the minimal conversion in the instances of csum_and_copy_...(); the changes of actual asm code behind them are done later in the series. Note that this asm code is often shared with csum_partial_copy_nocheck(); the difference is that csum_partial_copy_nocheck() passes 0 for initial sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are free to pass 0xffffffff in all cases and subsequent patches will use that freedom without any special comments. A part that could be split off: parisc and uml/i386 claimed to have csum_and_copy_to_user() instances of their own, but those were identical to the generic one, so we simply drop them. Not sure if it's worth a separate commit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-20csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sumAl Viro
Preparation for the change of calling conventions; right now all callers pass 0 as initial sum. Passing 0xffffffff instead yields the values comparable mod 0xffff and guarantees that 0 will not be returned on success. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-20csum_partial_copy_nocheck(): drop the last argumentAl Viro
It's always 0. Note that we theoretically could use ~0U as well - result will be the same modulo 0xffff, _if_ the damn thing did the right thing for any value of initial sum; later we'll make use of that when convenient. However, unlike csum_and_copy_..._user(), there are instances that did not work for arbitrary initial sums; c6x is one such. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-30iov_iter: Move unnecessary inclusion of crypto/hash.hHerbert Xu
The header file linux/uio.h includes crypto/hash.h which pulls in most of the Crypto API. Since linux/uio.h is used throughout the kernel this means that every tiny bit of change to the Crypto API causes the entire kernel to get rebuilt. This patch fixes this by moving it into lib/iov_iter.c instead where it is actually used. This patch also fixes the ifdef to use CRYPTO_HASH instead of just CRYPTO which does not guarantee the existence of ahash. Unfortunately a number of drivers were relying on linux/uio.h to provide access to linux/slab.h. This patch adds inclusions of linux/slab.h as detected by build failures. Also skbuff.h was relying on this to provide a declaration for ahash_request. This patch adds a forward declaration instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-21iov_iter: Use generic instrumented.hMarco Elver
This replaces the kasan instrumentation with generic instrumentation, implicitly adding KCSAN instrumentation support. For KASAN no functional change is intended. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-16pipe: Fix bogus dereference in iov_iter_alignment()Jan Kara
We cannot look at 'i->pipe' unless we know the iter is a pipe. Move the ring_size load to a branch in iov_iter_alignment() where we've already checked the iter is a pipe to avoid bogus dereference. Reported-by: syzbot+bea68382bae9490e7dd6@syzkaller.appspotmail.com Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-15pipe: Allow pipes to have kernel-reserved slotsDavid Howells
Split pipe->ring_size into two numbers: (1) pipe->ring_size - indicates the hard size of the pipe ring. (2) pipe->max_usage - indicates the maximum number of pipe ring slots that userspace orchestrated events can fill. This allows for a pipe that is both writable by the general kernel notification facility and by userspace, allowing plenty of ring space for notifications to be added whilst preventing userspace from being able to pin too much unswappable kernel space. Signed-off-by: David Howells <dhowells@redhat.com>
2019-10-31pipe: Use head and tail pointers for the ring, not cursor and lengthDavid Howells
Convert pipes to use head and tail pointers for the buffer ring rather than pointer and length as the latter requires two atomic ops to update (or a combined op) whereas the former only requires one. (1) The head pointer is the point at which production occurs and points to the slot in which the next buffer will be placed. This is equivalent to pipe->curbuf + pipe->nrbufs. The head pointer belongs to the write-side. (2) The tail pointer is the point at which consumption occurs. It points to the next slot to be consumed. This is equivalent to pipe->curbuf. The tail pointer belongs to the read-side. (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They are only masked off when the array is being accessed, e.g.: pipe->bufs[head & mask] This means that it is not necessary to have a dead slot in the ring as head == tail isn't ambiguous. (4) The ring is empty if "head == tail". A helper, pipe_empty(), is provided for this. (5) The occupancy of the ring is "head - tail". A helper, pipe_occupancy(), is provided for this. (6) The number of free slots in the ring is "pipe->ring_size - occupancy". A helper, pipe_space_for_user() is provided to indicate how many slots userspace may use. (7) The ring is full if "head - tail >= pipe->ring_size". A helper, pipe_full(), is provided for this. Signed-off-by: David Howells <dhowells@redhat.com>
2019-10-23compat_ioctl: reimplement SG_IO handlingArnd Bergmann
There are two code locations that implement the SG_IO ioctl: the old sg.c driver, and the generic scsi_ioctl helper that is in turn used by multiple drivers. To eradicate the old compat_ioctl conversion handler for the SG_IO command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr() helper functions that can be used for both compat and native mode, and then I call this from both drivers. For the iovec handling, there is already a compat_import_iovec() function that can simply be called in place of import_iovec(). To avoid having to pass the compat/native state through multiple indirections, I mark the SG_IO command itself as compatible in fs/compat_ioctl.c and use in_compat_syscall() to figure out where we are called from. As a side-effect of this, the sg.c driver now also accepts the 32-bit sg_io_hdr format in compat mode using the read/write interface, not just ioctl. This should improve compatiblity with old 32-bit binaries, but it would break if any application intentionally passes the 64-bit data structure in compat mode here. Steffen Maier helped debug an issue in an earlier version of this patch. Cc: Steffen Maier <maier@linux.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: Doug Gilbert <dgilbert@interlog.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-24mm: introduce page_size()Matthew Wilcox (Oracle)
Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-31uio: make import_iovec()/compat_import_iovec() return bytes on successJens Axboe
Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-03iov_iter: Fix build error without CONFIG_CRYPTOYueHaibing
If CONFIG_CRYPTO is not set or set to m, gcc building warn this: lib/iov_iter.o: In function `hash_and_copy_to_iter': iov_iter.c:(.text+0x9129): undefined reference to `crypto_stats_get' iov_iter.c:(.text+0x9152): undefined reference to `crypto_stats_ahash_update' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: d05f443554b3 ("iov_iter: introduce hash_and_copy_to_iter helper") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-26iov_iter: optimize page_copy_sane()Eric Dumazet
Avoid cache line miss dereferencing struct page if we can. page_copy_sane() mostly deals with order-0 pages. Extra cache line miss is visible on TCP recvmsg() calls dealing with GRO packets (typically 45 page frags are attached to one skb). Bringing the 45 struct pages into cpu cache while copying the data is not free, since the freeing of the skb (and associated page frags put_page()) can happen after cache lines have been evicted. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull trivial vfs updates from Al Viro: "A few cleanups + Neil's namespace_unlock() optimization" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: exec: make prepare_bprm_creds static genheaders: %-<width>s had been there since v6; %-*s - since v7 VFS: use synchronize_rcu_expedited() in namespace_unlock() iov_iter: reduce code duplication
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-13iov_iter: introduce hash_and_copy_to_iter helperSagi Grimberg
Allow consumers that want to use iov iterator helpers and also update a predefined hash calculation online when copying data. This is useful when copying incoming network buffers to a local iterator and calculate a digest on the incoming stream. nvme-tcp host driver that will be introduced in following patches is the first consumer via skb_copy_and_hash_datagram_iter. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13iov_iter: pass void csum pointer to csum_and_copy_to_iterSagi Grimberg
The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-11-27iov_iter: reduce code duplicationAl Viro
The same combination of csum_partial_copy_nocheck() with csum_add_block() is used in a bunch of places. Add a helper doing just that and use it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-25iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed onesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-24iov_iter: Add I/O discard iteratorDavid Howells
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and just discards any data copied to it. This is useful in a network filesystem for discarding any unwanted data sent by a server. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24iov_iter: Separate type from direction and use accessor functionsDavid Howells
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24iov_iter: Use accessor functionDavid Howells
Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-16lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()Dan Williams
By mistake the ITER_PIPE early-exit / warning from copy_from_iter() was cargo-culted in _copy_to_iter_mcsafe() rather than a machine-check-safe version of copy_to_iter_pipe(). Implement copy_pipe_to_iter_mcsafe() being careful to return the indication of short copies due to a CPU exception. Without this regression-fix all splice reads to dax-mode files fail. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Link: http://lkml.kernel.org/r/153108277278.37979.3327916996902264102.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-16lib/iov_iter: Document _copy_to_iter_flushcache()Dan Williams
Add some theory of operation documentation to _copy_to_iter_flushcache(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/153108276767.37979.9462477994086841699.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-16lib/iov_iter: Document _copy_to_iter_mcsafe()Dan Williams
Add some theory of operation documentation to _copy_to_iter_mcsafe(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/153108276256.37979.1689794213845539316.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04Merge branch 'x86-dax-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dax updates from Ingo Molnar: "This contains x86 memcpy_mcsafe() fault handling improvements the nvdimm tree would like to make more use of" * 'x86-dax-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe() x86/asm/memcpy_mcsafe: Add write-protection-fault handling x86/asm/memcpy_mcsafe: Return bytes remaining x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling x86/asm/memcpy_mcsafe: Remove loop unrolling
2018-05-15x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()Dan Williams
Use the updated memcpy_mcsafe() implementation to define copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant difference from typical copy_to_iter() is that the ITER_KVEC and ITER_BVEC iterator types can fail to complete a full transfer. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-02iov_iter: fix memory leak in pipe_get_pages_alloc()Ilya Dryomov
Make n signed to avoid leaking the pages array if __pipe_get_pages() fails to allocate any pages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-02iov_iter: fix return type of __pipe_get_pages()Ilya Dryomov
It returns -EFAULT and happens to be a helper for pipe_get_pages() whose return type is ssize_t. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-11new primitive: iov_iter_for_each_range()Al Viro
For kvec and bvec: feeds segments to given callback as long as it returns 0. For iovec and pipe: fails. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-20iov_iter: fix page_copy_sane for compound pagesPetar Penkov
Issue is that if the data crosses a page boundary inside a compound page, this check will incorrectly trigger a WARN_ON. To fix this, compute the order using the head of the compound page and adjust the offset to be relative to that head. Fixes: 72e809ed81ed ("iov_iter: sanity checks for copy to/from page primitives") Signed-off-by: Petar Penkov <ppenkov@google.com> CC: Al Viro <viro@zeniv.linux.org.uk> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-07Merge branch 'uaccess-work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter hardening from Al Viro: "This is the iov_iter/uaccess/hardening pile. For one thing, it trims the inline part of copy_to_user/copy_from_user to the minimum that *does* need to be inlined - object size checks, basically. For another, it sanitizes the checks for iov_iter primitives. There are 4 groups of checks: access_ok(), might_fault(), object size and KASAN. - access_ok() had been verified by whoever had set the iov_iter up. However, that has happened in a function far away, so proving that there's no path to actual copying bypassing those checks is hard and proving that iov_iter has not been buggered in the meanwhile is also not pleasant. So we want those redone in actual copyin/copyout. - might_fault() is better off consolidated - we know whether it needs to be checked as soon as we enter iov_iter primitive and observe the iov_iter flavour. No need to wait until the copyin/copyout. The call chains are short enough to make sure we won't miss anything - in fact, it's more robust that way, since there are cases where we do e.g. forced fault-in before getting to copyin/copyout. It's not quite what we need to check (in particular, combination of iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a cause to skip checks), but that's for later series. For now let's keep might_fault(). - KASAN checks belong in copyin/copyout - at the same level where other iov_iter flavours would've hit them in memcpy(). - object size checks should apply to *all* iov_iter flavours, not just iovec-backed ones. There are two groups of primitives - one gets the kernel object described as pointer + size (copy_to_iter(), etc.) while another gets it as page + offset + size (copy_page_to_iter(), etc.) For the first group the checks are best done where we actually have a chance to find the object size. In other words, those belong in inline wrappers in uio.h, before calling into iov_iter.c. Same kind as we have for inlined part of copy_to_user(). For the second group there is no object to look at - offset in page is just a number, it bears no type information. So we do them in the common helper called by iov_iter.c primitives of that kind. All it currently does is checking that we are not trying to access outside of the compound page; eventually we might want to add some sanity checks on the page involved. So the things we need in copyin/copyout part of iov_iter.c do not quite match anything in uaccess.h (we want no zeroing, we *do* want access_ok() and KASAN and we want no might_fault() or object size checks done on that level). OTOH, these needs are simple enough to provide a couple of helpers (static in iov_iter.c) doing just what we need..." * 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: saner checks on copyin/copyout iov_iter: sanity checks for copy to/from page primitives iov_iter/hardening: move object size checks to inlined part copy_{to,from}_user(): consolidate object size checks copy_{from,to}_user(): move kasan checks and might_fault() out-of-line
2017-07-07iov_iter: saner checks on copyin/copyoutAl Viro
* might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic codepath also needs might_fault() coverage) * we have already done object size checks * we have *NOT* done access_ok() recently enough; we rely upon the iovec array having passed sanity checks back when it had been created and not nothing having buggered it since. However, that's very much non-local, so we'd better recheck that. So the thing we want does not match anything in uaccess - we need access_ok + kasan checks + raw copy without any zeroing. Just define such helpers and use them here. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29iov_iter: sanity checks for copy to/from page primitivesAl Viro
for now - just that we don't attempt to cross out of compound page Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29iov_iter/hardening: move object size checks to inlined partAl Viro
There we actually have useful information about object sizes. Note: this patch has them done for all iov_iter flavours. Right now we do them twice in iovec case, but that'll change very shortly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass ↵Dan Williams
operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-09Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Braino fix for iov_iter_revert() misuse" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix braino in generic_file_read_iter()
2017-05-08treewide: use kv[mz]alloc* rather than opencoded variantsMichal Hocko
There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390 Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim Acked-by: David Sterba <dsterba@suse.com> # btrfs Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4 Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5 Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Yishai Hadas <yishaih@mellanox.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fix braino in generic_file_read_iter()Al Viro
Wrong sign of iov_iter_revert() argument. Unfortunately, slipped through the testing, since most of the time we don't do anything to the iterator afterwards and potential oops on walking the iter->iov too far backwards is too infrequent to be easily triggered. Add a sanity check in iov_iter_revert() to catch bugs like this one; fortunately, the same braino hadn't happened in other callers, but we'd better have a warning if such thing crops up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-01Merge branch 'work.uaccess' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess unification updates from Al Viro: "This is the uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC" * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits) HAVE_ARCH_HARDENED_USERCOPY is unconditional now CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now m32r: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER get rid of padding, switch to RAW_COPY_USER ia64: get rid of copy_in_user() ia64: sanitize __access_ok() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: add extable.h powerpc: get rid of zeroing, switch to RAW_COPY_USER esas2r: don't open-code memdup_user() alpha: fix stack smashing in old_adjtimex(2) don't open-code kernel_setsockopt() mips: switch to RAW_COPY_USER mips: get rid of tail-zeroing in primitives mips: make copy_from_user() zero tail explicitly mips: clean and reorder the forest of macros... mips: consolidate __invoke_... wrappers ...
2017-04-29fix a braino in ITER_PIPE iov_iter_revert()Al Viro
Fixes: 27c0e3748e41 Tested-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-02[iov_iter] new privimitive: iov_iter_revert()Al Viro
opposite to iov_iter_advance(); the caller is responsible for never using it to move back past the initial position. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28kill __copy_from_user_nocache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>