summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-08-05afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT)Al Viro
simpler logics in callers that way Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05afs_lookup(): switch to d_splice_alias()Al Viro
->lookup() methods can (and should) use d_splice_alias() instead of d_add(). Even if they are not going to be hit by open_by_handle(), code does get copied around; besides, d_splice_alias() has better calling conventions for use in ->lookup(), so the code gets simpler. Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05afs: switch dynroot lookups to d_splice_alias()Al Viro
->lookup() methods can (and should) use d_splice_alias() instead of d_add(). Even if they are not going to be hit by open_by_handle(), code does get copied around... Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-04Merge tag 'usercopy-fix-v4.18-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull usercopy whitelisting fix from Kees Cook: "Bart Massey discovered that the usercopy whitelist for JFS was incomplete: the inline inode data may intentionally "overflow" into the neighboring "extended area", so the size of the whitelist needed to be raised to include the neighboring field" * tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: jfs: Fix usercopy whitelist for inline inode data
2018-08-04Merge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs bugfix from Darrick Wong: "One more patch for 4.18 to fix a coding error in the iomap_bmap() function introduced in -rc1: fix incorrect shifting" * tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: fix iomap_bmap position calculation
2018-08-04ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa()zhong jiang
The err is not used after initalization. So just remove the variable. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-08-04jfs: Fix usercopy whitelist for inline inode dataKees Cook
Bart Massey reported what turned out to be a usercopy whitelist false positive in JFS when symlink contents exceeded 128 bytes. The inline inode data (i_inline) is actually designed to overflow into the "extended area" following it (i_inline_ea) when needed. So the whitelist needed to be expanded to include both i_inline and i_inline_ea (the whole size of which is calculated internally using IDATASIZE, 256, instead of sizeof(i_inline), 128). $ cd /mnt/jfs $ touch $(perl -e 'print "B" x 250') $ ln -s B* b $ ls -l >/dev/null [ 249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)! Reported-by: Bart Massey <bart.massey@gmail.com> Fixes: 8d2704d382a9 ("jfs: Define usercopy region in jfs_ip slab cache") Cc: Dave Kleikamp <shaggy@kernel.org> Cc: jfs-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-03jfs: don't bother with make_bad_inode() in ialloc()Al Viro
We hit that when inumber allocation has failed. In that case the in-core inode is not hashed and since its ->i_nlink is 1 the only place where jfs checks is_bad_inode() won't be reached. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03adfs: don't put inodes into icacheAl Viro
We never look them up in there; inode_fake_hash() will make them appear hashed for mark_inode_dirty() purposes. And don't leave them around until memory pressure kicks them out - we never look them up again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03new helper: inode_fake_hash()Al Viro
open-coded in a quite a few places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03vfs: don't evict uninitialized inodeMiklos Szeredi
iput() ends up calling ->evict() on new inode, which is not yet initialized by owning fs. So use destroy_inode() instead. Add to sb->s_inodes list only if inode is not in I_CREATING state (meaning that it wasn't allocated with new_inode(), which already does the insertion). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 80ea09a002bf ("vfs: factor out inode_insert5()") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03jfs: switch to discard_new_inode()Al Viro
we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03ext2: make sure that partially set up inodes won't be returned by ext2_iget()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03udf: switch to discard_new_inode()Al Viro
we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03ufs: switch to discard_new_inode()Al Viro
we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03btrfs: switch to discard_new_inode()Al Viro
Make sure that no partially set up inodes can be returned by open-by-handle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03new primitive: discard_new_inode()Al Viro
We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately *after* that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03Merge tag 'nfs-for-4.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: "Fix a NFSv4 file locking regression" * tag 'nfs-for-4.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix _nfs4_do_setlk()
2018-08-02userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK failsMike Rapoport
The fix in commit 0cbb4b4f4c44 ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") cleared the vma->vm_userfaultfd_ctx but kept userfaultfd flags in vma->vm_flags that were copied from the parent process VMA. As the result, there is an inconsistency between the values of vma->vm_userfaultfd_ctx.ctx and vma->vm_flags which triggers BUG_ON in userfaultfd_release(). Clearing the uffd flags from vma->vm_flags in case of UFFD_EVENT_FORK failure resolves the issue. Link: http://lkml.kernel.org/r/1532931975-25473-1-git-send-email-rppt@linux.vnet.ibm.com Fixes: 0cbb4b4f4c44 ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reported-by: syzbot+121be635a7a35ddb7dcb@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02fs: fix iomap_bmap position calculationEric Sandeen
The position calculation in iomap_bmap() shifts bno the wrong way, so we don't progress properly and end up re-mapping block zero over and over, yielding an unchanging physical block range as the logical block advances: # filefrag -Be file ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 0: 21.. 21: 1: merged 1: 1.. 1: 21.. 21: 1: 22: merged Discontinuity: Block 1 is at 21 (was 22) 2: 2.. 2: 21.. 21: 1: 22: merged Discontinuity: Block 2 is at 21 (was 22) 3: 3.. 3: 21.. 21: 1: 22: merged This breaks the FIBMAP interface for anyone using it (XFS), which in turn breaks LILO, zipl, etc. Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: 89eb1906a953 ("iomap: add an iomap-based bmap implementation") Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02Squashfs: Compute expected length from inode size rather than block lengthPhillip Lougher
Previously in squashfs_readpage() when copying data into the page cache, it used the length of the datablock read from the filesystem (after decompression). However, if the filesystem has been corrupted this data block may be short, which will leave pages unfilled. The fix for this is to compute the expected number of bytes to copy from the inode size, and use this to detect if the block is short. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02squashfs: more metadata hardeningLinus Torvalds
The squashfs fragment reading code doesn't actually verify that the fragment is inside the fragment table. The end result _is_ verified to be inside the image when actually reading the fragment data, but before that is done, we may end up taking a page fault because the fragment table itself might not even exist. Another report from Anatoly and his endless squashfs image fuzzing. Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>, Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02ext4: improve code readability in ext4_iget()Liu Song
Merge the duplicated complex conditions to improve code readability. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
2018-08-02ext4: fix spectre gadget in ext4_mb_regular_allocator()Jeremy Cline
'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to index arrays which makes it a potential spectre gadget. Fix this by sanitizing the value assigned to 'ac->ac2_order'. This covers the following accesses found with the help of smatch: * fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential spectre issue 'grp->bb_counters' [w] (local cap) * fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap) * fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap) Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-08-01kill d_instantiate_no_diralias()Al Viro
The only user is fuse_create_new_entry(), and there it's used to mitigate the same mkdir/open-by-handle race as in nfs_mkdir(). The same solution applies - unhash the mkdir argument, then call d_splice_alias() and if that returns a reference to preexisting alias, dput() and report success. ->mkdir() argument left unhashed negative with the preexisting alias moved in the right place is just fine from the ->mkdir() callers point of view. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-01NFSv4: Fix _nfs4_do_setlk()Trond Myklebust
The patch to fix the case where a lock request was interrupted ended up changing default handling of errors such as NFS4ERR_DENIED and caused the client to immediately resend the lock request. Let's do a partial revert of that request so that the default is now to exit, but change the way we handle resends to take into account the fact that the user may have interrupted the request. Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: a3cf9bca2ace ("NFSv4: Don't add a new lock on an interrupted wait..") Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2018-08-01squashfs metadata 2: electric boogalooLinus Torvalds
Anatoly continues to find issues with fuzzed squashfs images. This time, corrupt, missing, or undersized data for the page filling wasn't checked for, because the squashfs_{copy,read}_cache() functions did the squashfs_copy_data() call without checking the resulting data size. Which could result in the page cache pages being incompletely filled in, and no error indication to the user space reading garbage data. So make a helper function for the "fill in pages" case, because the exact same incomplete sequence existed in two places. [ I should have made a squashfs branch for these things, but I didn't intend to start doing them in the first place. My historical connection through cramfs is why I got into looking at these issues at all, and every time I (continue to) think it's a one-off. Because _this_ time is always the last time. Right? - Linus ] Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01ext4: check for NUL characters in extended attribute's nameTheodore Ts'o
Extended attribute names are defined to be NUL-terminated, so the name must not contain a NUL character. This is important because there are places when remove extended attribute, the code uses strlen to determine the length of the entry. That should probably be fixed at some point, but code is currently really messy, so the simplest fix for now is to simply validate that the extended attributes are sane. https://bugzilla.kernel.org/show_bug.cgi?id=200401 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-08-01ext4: use ext4_warning() for sb_getblk failureWang Shilong
Out of memory should not be considered as critical errors; so replace ext4_error() with ext4_warnig(). Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-07-30squashfs: more metadata hardeningLinus Torvalds
Anatoly reports another squashfs fuzzing issue, where the decompression parameters themselves are in a compressed block. This causes squashfs_read_data() to be called in order to read the decompression options before the decompression stream having been set up, making squashfs go sideways. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Acked-by: Phillip Lougher <phillip.lougher@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-29ext4: fix race when setting the bitmap corrupted flagWang Shilong
Whenever we hit block or inode bitmap corruptions we set bit and then reduce this block group free inode/clusters counter to expose right available space. However some of ext4_mark_group_bitmap_corrupted() is called inside group spinlock, some are not, this could make it happen that we double reduce one block group free counters from system. Always hold group spinlock for it could fix it, but it looks a little heavy, we could use test_and_set_bit() to fix race problems here. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-07-29ext4: reset error code in ext4_find_entry in fallbackEric Sandeen
When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-07-29ext4: handle layout changes to pinned DAX mappingsRoss Zwisler
Follow the lead of xfs_break_dax_layouts() and add synchronization between operations in ext4 which remove blocks from an inode (hole punch, truncate down, etc.) and pages which are pinned due to DAX DMA operations. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2018-07-29dax: dax_layout_busy_page() warn on !exceptionalRoss Zwisler
Inodes using DAX should only ever have exceptional entries in their page caches. Make this clear by warning if the iteration in dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a comment for the pagevec_release() call which only deals with struct page pointers. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-07-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some miscellaneous ext4 fixes for 4.18; one fix is for a regression introduced in 4.18-rc4. Sorry for the late-breaking pull. I was originally going to wait for the next merge window, but Eric Whitney found a regression introduced in 4.18-rc4, so I decided to push out the regression plus the other fixes now. (The other commits have been baking in linux-next since early July)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix check to prevent initializing reserved inodes ext4: check for allocation block validity with block group locked ext4: fix inline data updates with checksums enabled ext4: clear mmp sequence number when remounting read-only ext4: fix false negatives *and* false positives in ext4_check_descriptors()
2018-07-29ext4: use swap macro in mext_page_double_lockGustavo A. R. Silva
Make use of the swap macro and remove unnecessary variable *tmp*. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: check allocation failure when duplicating "data" in ext4_remount()Chengguang Xu
There is no check for allocation failure when duplicating "data" in ext4_remount(). Check for failure and return error -ENOMEM in this case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-07-29ext4: fix warning message in ext4_enable_quotas()Junichi Uekawa
Output the warning message before we clobber type and be -1 all the time. The error message would now be [ 1.519791] EXT4-fs warning (device vdb): ext4_enable_quotas:5402: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix. Signed-off-by: Junichi Uekawa <uekawa@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-07-29ext4: super: extend timestamps to 40 bitsArnd Bergmann
The inode timestamps use 34 bits in ext4, but the various timestamps in the superblock are limited to 32 bits. If every user accesses these as 'unsigned', then this is good until year 2106, but it seems better to extend this a bit further in the process of removing the deprecated get_seconds() function. This adds another byte for each timestamp in the superblock, making them long enough to store timestamps beyond what is in the inodes, which seems good enough here (in ocfs2, they are already 64-bit wide, which is appropriate for a new layout). I did not modify e2fsprogs, which obviously needs the same change to actually interpret future timestamps correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29jbd2: replace current_kernel_time64 with ktime equivalentArnd Bergmann
jbd2 is one of the few callers of current_kernel_time64(), which is a wrapper around ktime_get_coarse_real_ts64(). This calls the latter directly for consistency with the rest of the kernel that is moving to the ktime_get_ family of time accessors. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: use timespec64 for all inode timesArnd Bergmann
This is the last missing piece for the inode times on 32-bit systems: now that VFS interfaces use timespec64, we just need to stop truncating the tv_sec values for y2038 compatibililty. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: use ktime_get_real_seconds for i_dtimeArnd Bergmann
We only care about the low 32-bit for i_dtime as explained in commit b5f515735bea ("ext4: avoid Y2038 overflow in recently_deleted()"), so the use of get_seconds() is correct here, but that function is getting removed in the process of the y2038 fixes, so let's use the modern ktime_get_real_seconds() here. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: use 64-bit timestamps for mmp_timeArnd Bergmann
The mmp_time field is 64 bits wide, which is good, but calling get_seconds() results in a 32-bit value on 32-bit architectures. Using ktime_get_real_seconds() instead returns 64 bits everywhere. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: sysfs: print ext4_super_block fields as little-endianArnd Bergmann
While working on extended rand for last_error/first_error timestamps, I noticed that the endianess is wrong; we access the little-endian fields in struct ext4_super_block as native-endian when we print them. This adds a special case in ext4_attr_show() and ext4_attr_store() to byteswap the superblock fields if needed. In older kernels, this code was part of super.c, it got moved to sysfs.c in linux-4.4. Cc: stable@vger.kernel.org Fixes: 52c198c6820f ("ext4: add sysfs entry showing whether the fs contains errors") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29squashfs: be more careful about metadata corruptionLinus Torvalds
Anatoly Trosinenko reports that a corrupted squashfs image can cause a kernel oops. It turns out that squashfs can end up being confused about negative fragment lengths. The regular squashfs_read_data() does check for negative lengths, but squashfs_read_metadata() did not, and the fragment size code just blindly trusted the on-disk value. Fix both the fragment parsing and the metadata reading code. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-29ext4: fix check to prevent initializing reserved inodesTheodore Ts'o
Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-27Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Bigger than usual at this time, mostly due to the O_DIRECT corruption issue and the fact that I was on vacation last week. This contains: - NVMe pull request with two fixes for the FC code, and two target fixes (Christoph) - a DIF bio reset iteration fix (Greg Edwards) - two nbd reply and requeue fixes (Josef) - SCSI timeout fixup (Keith) - a small series that fixes an issue with bio_iov_iter_get_pages(), which ended up causing corruption for larger sized O_DIRECT writes that ended up racing with buffered writes (Martin Wilck)" * tag 'for-linus-20180727' of git://git.kernel.dk/linux-block: block: reset bi_iter.bi_done after splitting bio block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs blkdev: __blkdev_direct_IO_simple: fix leak in error case block: bio_iov_iter_get_pages: fix size of last iovec nvmet: only check for filebacking on -ENOTBLK nvmet: fixup crash on NULL device path scsi: set timed out out mq requests to complete blk-mq: export setting request completion state nvme: if_ready checks to fail io to deleting controller nvmet-fc: fix target sgl list on large transfers nbd: handle unexpected replies better nbd: don't requeue the same request twice.
2018-07-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kvm, mm: account shadow page tables to kmemcg zswap: re-check zswap_is_full() after do zswap_shrink() include/linux/eventfd.h: include linux/errno.h mm: fix vma_is_anonymous() false-positives mm: use vma_init() to initialize VMAs on stack and data segments mm: introduce vma_init() mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL ipc/sem.c: prevent queue.status tearing in semop mm: disallow mappings that conflict for devm_memremap_pages() kasan: only select SLUB_DEBUG with SYSFS=y delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
2018-07-27Merge tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: - Fix some uninitialized variable errors - Fix an incorrect check in metadata verifiers * tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: properly handle free inodes in extent hint validators xfs: Initialize variables in xfs_alloc_get_rec before using them
2018-07-26mm: fix vma_is_anonymous() false-positivesKirill A. Shutemov
vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous VMA. This is unreliable as ->mmap may not set ->vm_ops. False-positive vma_is_anonymous() may lead to crashes: next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0 prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000 pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000 flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare) ------------[ cut here ]------------ kernel BUG at mm/memory.c:1422! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline] RIP: 0010:zap_pud_range mm/memory.c:1466 [inline] RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline] RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508 Call Trace: unmap_single_vma+0x1a0/0x310 mm/memory.c:1553 zap_page_range_single+0x3cc/0x580 mm/memory.c:1644 unmap_mapping_range_vma mm/memory.c:2792 [inline] unmap_mapping_range_tree mm/memory.c:2813 [inline] unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845 unmap_mapping_range+0x48/0x60 mm/memory.c:2880 truncate_pagecache+0x54/0x90 mm/truncate.c:800 truncate_setsize+0x70/0xb0 mm/truncate.c:826 simple_setattr+0xe9/0x110 fs/libfs.c:409 notify_change+0xf13/0x10f0 fs/attr.c:335 do_truncate+0x1ac/0x2b0 fs/open.c:63 do_sys_ftruncate+0x492/0x560 fs/open.c:205 __do_sys_ftruncate fs/open.c:215 [inline] __se_sys_ftruncate fs/open.c:213 [inline] __x64_sys_ftruncate+0x59/0x80 fs/open.c:213 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reproducer: #include <stdio.h> #include <stddef.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <unistd.h> #include <fcntl.h> #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) #define KCOV_ENABLE _IO('c', 100) #define KCOV_DISABLE _IO('c', 101) #define COVER_SIZE (1024<<10) #define KCOV_TRACE_PC 0 #define KCOV_TRACE_CMP 1 int main(int argc, char **argv) { int fd; unsigned long *cover; system("mount -t debugfs none /sys/kernel/debug"); fd = open("/sys/kernel/debug/kcov", O_RDWR); ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE); cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); munmap(cover, COVER_SIZE * sizeof(unsigned long)); cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); memset(cover, 0, COVER_SIZE * sizeof(unsigned long)); ftruncate(fd, 3UL << 20); return 0; } This can be fixed by assigning anonymous VMAs own vm_ops and not relying on it being NULL. If ->mmap() failed to set ->vm_ops, mmap_region() will set it to dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs. Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>