summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-08-21Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Improvements to ext4's block allocator performance for very large file systems, especially when the file system or files which are highly fragmented. There is a new mount option, prefetch_block_bitmaps which will pull in the block bitmaps and set up the in-memory buddy bitmaps when the file system is initially mounted. Beyond that, a lot of bug fixes and cleanups. In particular, a number of changes to make ext4 more robust in the face of write errors or file system corruptions" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits) ext4: limit the length of per-inode prealloc list ext4: reorganize if statement of ext4_mb_release_context() ext4: add mb_debug logging when there are lost chunks ext4: Fix comment typo "the the". jbd2: clean up checksum verification in do_one_pass() ext4: change to use fallthrough macro ext4: remove unused parameter of ext4_generic_delete_entry function mballoc: replace seq_printf with seq_puts ext4: optimize the implementation of ext4_mb_good_group() ext4: delete invalid comments near ext4_mb_check_limits() ext4: fix typos in ext4_mb_regular_allocator() comment ext4: fix checking of directory entry validity for inline directories fs: prevent BUG_ON in submit_bh_wbc() ext4: correctly restore system zone info when remount fails ext4: handle add_system_zone() failure in ext4_setup_system_zone() ext4: fold ext4_data_block_valid_rcu() into the caller ext4: check journal inode extents more carefully ext4: don't allow overlapping system zones ext4: handle error of ext4_setup_system_zone() on remount ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp() ...
2020-08-21afs: Fix NULL deref in afs_dynroot_depopulate()David Howells
If an error occurs during the construction of an afs superblock, it's possible that an error occurs after a superblock is created, but before we've created the root dentry. If the superblock has a dynamic root (ie. what's normally mounted on /afs), the afs_kill_super() will call afs_dynroot_depopulate() to unpin any created dentries - but this will oops if the root hasn't been created yet. Fix this by skipping that bit of code if there is no root dentry. This leads to an oops looking like: general protection fault, ... KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f] ... RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385 ... Call Trace: afs_kill_super+0x13b/0x180 fs/afs/super.c:535 deactivate_locked_super+0x94/0x160 fs/super.c:335 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x2070 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 which is oopsing on this line: inode_lock(root->d_inode); presumably because sb->s_root was NULL. Fixes: 0da0b7fd73e4 ("afs: Display manually added cells in dynamic root mount") Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-20afs: Fix key ref leak in afs_put_operation()David Howells
The afs_put_operation() function needs to put the reference to the key that's authenticating the operation. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19ext4: limit the length of per-inode prealloc listbrookxu
In the scenario of writing sparse files, the per-inode prealloc list may be very long, resulting in high overhead for ext4_mb_use_preallocated(). To circumvent this problem, we limit the maximum length of per-inode prealloc list to 512 and allow users to modify it. After patching, we observed that the sys ratio of cpu has dropped, and the system throughput has increased significantly. We created a process to write the sparse file, and the running time of the process on the fixed kernel was significantly reduced, as follows: Running time on unfixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m2.051s user 0m0.008s sys 0m2.026s Running time on fixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m0.471s user 0m0.004s sys 0m0.395s Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: reorganize if statement of ext4_mb_release_context()brookxu
Reorganize the if statement of ext4_mb_release_context(), make it easier to read. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/5439ac6f-db79-ad68-76c1-a4dda9aa0cc3@gmail.com Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: add mb_debug logging when there are lost chunksbrookxu
Lost chunks are when some other process raced with the current thread to grab a particular block allocation. Add mb_debug log for developers who wants to see how often this is happening for a particular workload. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/0a165ac0-1912-aebd-8a0d-b42e7cd1aea1@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: Fix comment typo "the the".kyoungho koo
I have found double typed comments "the the". So i modified it to one "the" Signed-off-by: kyoungho koo <rnrudgh@gmail.com> Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19jbd2: clean up checksum verification in do_one_pass()Shijie Luo
Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand. [ With changes suggested by jack@ and tytso@ ] Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: change to use fallthrough macroShijie Luo
Change to use fallthrough macro in switch case. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810114435.24182-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: remove unused parameter of ext4_generic_delete_entry functionKyoungho Koo
The ext4_generic_delete_entry function does not use the parameter handle, so it can be removed. Signed-off-by: Kyoungho Koo <rnrudgh@gmail.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810080701.GA14160@koo-Z370-HD3 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18mballoc: replace seq_printf with seq_putsXu Wang
seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810022158.9167-1-vulab@iscas.ac.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: optimize the implementation of ext4_mb_good_group()brookxu
It might be better to adjust the code in two places: 1. Determine whether grp is currupt or not should be placed first. 2. (cr<=2 && free <ac->ac_g_ex.fe_len)should may belong to the crx strategy, and it may be more appropriate to put it in the subsequent switch statement block. For cr1, cr2, the conditions in switch potentially realize the above judgment. For cr0, we should add (free <ac->ac_g_ex.fe_len) judgment, and then delete (free / fragments) >= ac->ac_g_ex.fe_len), because cr0 returns true by default. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/e20b2d8f-1154-adb7-3831-a9e11ba842e9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: delete invalid comments near ext4_mb_check_limits()brookxu
These comments do not seem to be related to ext4_mb_check_limits(), it may be invalid. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c49faf0c-d5d5-9c51-6911-9e0ff57c6bfa@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: fix typos in ext4_mb_regular_allocator() commentbrookxu
Fix typos in ext4_mb_regular_allocator() comment Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/d6514145-73b3-808b-ec5a-a8be27c51f9c@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-16Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A few differerent things in here. Seems like syzbot got some more io_uring bits wired up, and we got a handful of reports and the associated fixes are in here. General fixes too, and a lot of them marked for stable. Lastly, a bit of fallout from the async buffered reads, where we now more easily trigger short reads. Some applications don't really like that, so the io_read() code now handles short reads internally, and got a cleanup along the way so that it's now easier to read (and documented). We're now passing tests that failed before" * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block: io_uring: short circuit -EAGAIN for blocking read attempt io_uring: sanitize double poll handling io_uring: internally retry short reads io_uring: retain iov_iter state over io_read/io_write calls task_work: only grab task signal lock when needed io_uring: enable lookup of links holding inflight files io_uring: fail poll arm on queue proc failure io_uring: hold 'ctx' reference around task_work queue + execute fs: RWF_NOWAIT should imply IOCB_NOIO io_uring: defer file table grabbing request cleanup for locked requests io_uring: add missing REQ_F_COMP_LOCKED for nested requests io_uring: fix recursive completion locking on oveflow flush io_uring: use TWA_SIGNAL for task_work uncondtionally io_uring: account locked memory before potential error case io_uring: set ctx sq/cq entry count earlier io_uring: Fix NULL pointer dereference in loop_rw_iter() io_uring: add comments on how the async buffered read retry works io_uring: io_async_buf_func() need not test page bit
2020-08-15io_uring: short circuit -EAGAIN for blocking read attemptJens Axboe
One case was missed in the short IO retry handling, and that's hitting -EAGAIN on a blocking attempt read (eg from io-wq context). This is a problem on sockets that are marked as non-blocking when created, they don't carry any REQ_F_NOWAIT information to help us terminate them instead of perpetually retrying. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15io_uring: sanitize double poll handlingJens Axboe
There's a bit of confusion on the matching pairs of poll vs double poll, depending on if the request is a pure poll (IORING_OP_POLL_ADD) or poll driven retry. Add io_poll_get_double() that returns the double poll waitqueue, if any, and io_poll_get_single() that returns the original poll waitqueue. With that, remove the argument to io_poll_remove_double(). Finally ensure that wait->private is cleared once the double poll handler has run, so that remove knows it's already been seen. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: - some code cleanup - a couple of static analysis fixes - setattr: try to pick a fid associated with the file rather than the dentry, which might sometimes matter * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux: 9p: Remove unneeded cast from memory allocation 9p: remove unused code in 9p net/9p: Fix sparse endian warning in trans_fd.c 9p: Fix memory leak in v9fs_mount 9p: retrieve fid from file when file instance exist.
2020-08-15Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small cifs/smb3 fixes, one for stable fixing mkdir path with the 'idsfromsid' mount option" * tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: SMB3: Fix mkdir when idsfromsid configured on mount cifs: Convert to use the fallthrough macro cifs: Fix an error pointer dereference in cifs_mount()
2020-08-15Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem" * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits) NFS: Fix flexfiles read failover fs: nfs: delete repeated words in comments rpc_pipefs: convert comma to semicolon nfs: Fix getxattr kernel panic and memory overflow NFS: Don't return layout segments that are in use NFS: Don't move layouts to plh_return_segs list while in use NFS: Add layout segment info to pnfs read/write/commit tracepoints NFS: Add tracepoints for layouterror and layoutstats. NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close() SUNRPC dont update timeout value on connection reset nfs: nfs_file_write() should check for writeback errors nfs: ensure correct writeback errors are returned on close() NFSv4.2: xattr cache: get rid of cache discard work queue NFS: remove redundant initialization of variable result NFSv4.0 allow nconnect for v4.0 freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS sunrpc: destroy rpc_inode_cachep after unregister_filesystem NFSv4.2: add client side xattr caching. NFSv4.2: hook in the user extended attribute handlers NFSv4.2: add the extended attribute proc functions. ...
2020-08-14fs: autofs: delete repeated words in commentsRandy Dunlap
Drop duplicated words {the, at} in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ian Kent <raven@themaw.net> Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14exec: restore EACCES of S_ISDIR execve()Kees Cook
Patch series "Fix S_ISDIR execve() errno". Fix an errno change for execve() of directories, noticed by Marc Zyngier. Along with the fix, include a regression test to avoid seeing this return in the future. This patch (of 2): The return code for attempting to execute a directory has always been EACCES. Adjust the S_ISDIR exec test to reflect the old errno instead of the general EISDIR for other kinds of "open" attempts on directories. Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Greg Kroah-Hartman <gregkh@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@google.com> Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-13SMB3: Fix mkdir when idsfromsid configured on mountSteve French
mkdir uses a compounded create operation which was not setting the security descriptor on create of a directory. Fix so mkdir now sets the mode and owner info properly when idsfromsid and modefromsid are configured on the mount. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.8 Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-08-13io_uring: internally retry short readsJens Axboe
We've had a few application cases of not handling short reads properly, and it is understandable as short reads aren't really expected if the application isn't doing non-blocking IO. Now that we retain the iov_iter over retries, we can implement internal retry pretty trivially. This ensures that we don't return a short read, even for buffered reads on page cache conflicts. Cleanup the deep nesting and hard to read nature of io_read() as well, it's much more straight forward now to read and understand. Added a few comments explaining the logic as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13io_uring: retain iov_iter state over io_read/io_write callsJens Axboe
Instead of maintaining (and setting/remembering) iov_iter size and segment counts, just put the iov_iter in the async part of the IO structure. This is mostly a preparation patch for doing appropriate internal retries for short reads, but it also cleans up the state handling nicely and simplifies it quite a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13Merge tag 'for-5.9-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs updates from David Sterba: "One minor update, the rest are fixes that have arrived a bit late for the first batch. There are also some recent fixes for bugs that were discovered during the merge window and pop up during testing. User visible change: - show correct subvolume path in /proc/mounts for bind mounts Fixes: - fix compression messages when remounting with different level or compression algorithm - tree-log: fix some memory leaks on error handling paths - restore I_VERSION on remount - fix return values and error code mixups - fix umount crash with quotas enabled when removing sysfs files - fix trim range on a shrunk device" * tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary btrfs: fix return value mixup in btrfs_get_extent btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() btrfs: check correct variable after allocation in btrfs_backref_iter_alloc btrfs: make sure SB_I_VERSION doesn't get unset by remount btrfs: fix memory leaks after failure to lookup checksums during inode logging btrfs: don't show full path of bind mounts in subvol= btrfs: fix messages after changing compression level by remount btrfs: only search for left_info if there is no right_info in try_merge_free_space btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
2020-08-13Merge tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Two small fixes that have come in during the past week: - Fix duplicated words in comments - Fix an ubsan complaint about null pointer arithmetic" * tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init xfs: delete duplicated words + other fixes
2020-08-13Merge tag 'exfat-for-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - don't clear MediaFailure and VolumeDirty bit in volume flags if these were already set before mounting - write multiple dirty buffers at once in sync mode - remove unneeded EXFAT_SB_DIRTY bit set * tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: retain 'VolumeFlags' properly exfat: optimize exfat_zeroed_cluster() exfat: add error check when updating dir-entries exfat: write multiple sectors at once exfat: remove EXFAT_SB_DIRTY flag
2020-08-12io_uring: enable lookup of links holding inflight filesJens Axboe
When a process exits, we cancel whatever requests it has pending that are referencing the file table. However, if a link is holding a reference, then we cannot find it by simply looking at the inflight list. Enable checking of the poll and timeout list to find the link, and cancel it appropriately. Cc: stable@vger.kernel.org Reported-by: Josef <josef.grieb@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-12Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Xiubo has completed his work on filesystem client metrics, they are sent to all available MDSes once per second now. Other than that, we have a lot of fixes and cleanups all around the filesystem, including a tweak to cut down on MDS request resends in multi-MDS setups from Yanhu and fixups for SELinux symlink labeling and MClientSession message decoding from Jeff" * tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits) ceph: handle zero-length feature mask in session messages ceph: use frag's MDS in either mode ceph: move sb->wb_pagevec_pool to be a global mempool ceph: set sec_context xattr on symlink creation ceph: remove redundant initialization of variable mds ceph: fix use-after-free for fsc->mdsc ceph: remove unused variables in ceph_mdsmap_decode() ceph: delete repeated words in fs/ceph/ ceph: send client provided metric flags in client metadata ceph: periodically send perf metrics to MDSes ceph: check the sesion state and return false in case it is closed libceph: replace HTTP links with HTTPS ones ceph: remove unnecessary cast in kfree() libceph: just have osd_req_op_init() return a pointer ceph: do not access the kiocb after aio requests ceph: clean up and optimize ceph_check_delayed_caps() ceph: fix potential mdsc use-after-free crash ceph: switch to WARN_ON_ONCE in encode_supported_features() ceph: add global total_caps to count the mdsc's total caps number ceph: add check_session_state() helper and make it global ...
2020-08-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction, mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util, memory-hotplug, cleanups, uaccess, migration, gup, pagemap), - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops, checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump, exec, kdump, rapidio, panic, kcov, kgdb, ipc). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits) mm/gup: remove task_struct pointer for all gup code mm: clean up the last pieces of page fault accountings mm/xtensa: use general page fault accounting mm/x86: use general page fault accounting mm/sparc64: use general page fault accounting mm/sparc32: use general page fault accounting mm/sh: use general page fault accounting mm/s390: use general page fault accounting mm/riscv: use general page fault accounting mm/powerpc: use general page fault accounting mm/parisc: use general page fault accounting mm/openrisc: use general page fault accounting mm/nios2: use general page fault accounting mm/nds32: use general page fault accounting mm/mips: use general page fault accounting mm/microblaze: use general page fault accounting mm/m68k: use general page fault accounting mm/ia64: use general page fault accounting mm/hexagon: use general page fault accounting mm/csky: use general page fault accounting ...
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: move path_noexec() check earlierKees Cook
The path_noexec() check, like the regular file check, was happening too late, letting LSMs see impossible execve()s. Check it earlier as well in may_open() and collect the redundant fs/exec.c path_noexec() test under the same robustness comment as the S_ISREG() check. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs path_noexec() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) security_file_open(f) open() /* old location of path_noexec() test */ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: move S_ISREG() check earlierKees Cook
The execve(2)/uselib(2) syscalls have always rejected non-regular files. Recently, it was noticed that a deadlock was introduced when trying to execute pipes, as the S_ISREG() test was happening too late. This was fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files during execve()"), but it was added after inode_permission() had already run, which meant LSMs could see bogus attempts to execute non-regular files. Move the test into the other inode type checks (which already look for other pathological conditions[1]). Since there is no need to use FMODE_EXEC while we still have access to "acc_mode", also switch the test to MAY_EXEC. Also include a comment with the redundant S_ISREG() checks at the end of execve(2)/uselib(2) to note that they are present to avoid any mistakes. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs S_ISREG() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) /* old location of FMODE_EXEC vs S_ISREG() test */ security_file_open(f) open() [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: change uselib(2) IS_SREG() failure to EACCESKees Cook
Patch series "Relocate execve() sanity checks", v2. While looking at the code paths for the proposed O_MAYEXEC flag, I saw some things that looked like they should be fixed up. exec: Change uselib(2) IS_SREG() failure to EACCES This just regularizes the return code on uselib(2). exec: Move S_ISREG() check earlier This moves the S_ISREG() check even earlier than it was already. exec: Move path_noexec() check earlier This adds the path_noexec() check to the same place as the S_ISREG() check. This patch (of 3): Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so the behavior matches execve(2), and the seemingly documented value. The "not a regular file" failure mode of execve(2) is explicitly documented[1], but it is not mentioned in uselib(2)[2] which does, however, say that open(2) and mmap(2) errors may apply. The documentation for open(2) does not include a "not a regular file" error[3], but mmap(2) does[4], and it is EACCES. [1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS [2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS [3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS [4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12coredump: add %f for executable filenameLepton Wu
The document reads "%e" should be "executable filename" while actually it could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e" in core_pattern get surprised when they find out they get thread name instead of executable filename. This is either a bug of document or a bug of code. Since the behavior of "%e" is there for long time, it could bring another surprise for users if we "fix" the code. So we just "fix" the document. And more, for users who really need the "executable filename" in core_pattern, we introduce a new "%f" for the real executable filename. We already have "%E" for executable path in kernel, so just reuse most of its code for the new added "%f" format. Signed-off-by: Lepton Wu <ytht.net@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/signalfd.c: fix inconsistent return codes for signalfd4Helge Deller
The kernel signalfd4() syscall returns different error codes when called either in compat or native mode. This behaviour makes correct emulation in qemu and testing programs like LTP more complicated. Fix the code to always return -in both modes- EFAULT for unaccessible user memory, and EINVAL when called with an invalid signal mask. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Laurent Vivier <laurent@vivier.eu> Link: http://lkml.kernel.org/r/20200530100707.GA10159@ls3530.fritz.box Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fat: fix fat_ra_init() for data clusters == 0OGAWA Hirofumi
If data clusters == 0, fat_ra_init() calls the ->ent_blocknr() for the cluster beyond ->max_clusters. This checks the limit before initialization to suppress the warning. Reported-by: syzbot+756199124937b31a9b7e@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/87mu462sv4.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS onesAlexander A. Klimov
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `xmlns`: For each link, `http://[^# ]*(?:\w|/)`: If neither `gnu\.org/license`, nor `mozilla\.org/MPL`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Link: http://lkml.kernel.org/r/20200708200409.22293-1-grandmaster@al2klimov.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fatfs: switch write_lock to read_lock in fat_ioctl_get_attributesYubo Feng
There is no need to hold write_lock in fat_ioctl_get_attributes. write_lock may make an impact on concurrency of fat_ioctl_get_attributes. Signed-off-by: Yubo Feng <fengyubo3@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Link: http://lkml.kernel.org/r/1593308053-12702-1-git-send-email-fengyubo3@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/ufs: avoid potential u32 multiplication overflowColin Ian King
The 64 bit ino is being compared to the product of two u32 values, however, the multiplication is being performed using a 32 bit multiply so there is a potential of an overflow. To be fully safe, cast uspi->s_ncg to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of overflow. Fixes: f3e2a520f5fb ("ufs: NFS support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Link: http://lkml.kernel.org/r/20200715170355.1081713-1-colin.king@canonical.com Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12nilfs2: use a more common logging styleJoe Perches
Add macros for nilfs_<level>(sb, fmt, ...) and convert the uses of 'nilfs_msg(sb, KERN_<LEVEL>, ...)' to 'nilfs_<level>(sb, ...)' so nilfs2 uses a logging style more like the typical kernel logging style. Miscellanea: o Realign arguments for these uses Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1595860111-3920-4-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12nilfs2: convert __nilfs_msg to integrate the level and formatJoe Perches
Reduce object size a bit by removing the KERN_<LEVEL> as a separate argument and adding it to the format string. Reduce overall object size by about ~.5% (x86-64 defconfig w/ nilfs2) old: $ size -t fs/nilfs2/built-in.a | tail -1 191738 8676 44 200458 30f0a (TOTALS) new: $ size -t fs/nilfs2/built-in.a | tail -1 190971 8676 44 199691 30c0b (TOTALS) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1595860111-3920-3-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12nilfs2: only call unlock_new_inode() if I_NEWEric Biggers
Patch series "nilfs2 updates". This patch (of 3): unlock_new_inode() is only meant to be called after a new inode has already been inserted into the hash table. But nilfs_new_inode() can call it even before it has inserted the inode, triggering the WARNING in unlock_new_inode(). Fix this by only calling unlock_new_inode() if the inode has the I_NEW flag set, indicating that it's in the table. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1595860111-3920-1-git-send-email-konishi.ryusuke@gmail.com Link: http://lkml.kernel.org/r/1595860111-3920-2-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: remove expected error message in block_to_path()Eric Biggers
When truncating a file to a size within the last allowed logical block, block_to_path() is called with the *next* block. This exceeds the limit, causing the "block %ld too big" error message to be printed. This case isn't actually an error; there are just no more blocks past that point. So, remove this error message. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-7-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: fix block limit check for V1 filesystemsEric Biggers
The minix filesystem reads its maximum file size from its on-disk superblock. This value isn't necessarily a multiple of the block size. When it's not, the V1 block mapping code doesn't allow mapping the last possible block. Commit 6ed6a722f9ab ("minixfs: fix block limit check") fixed this in the V2 mapping code. Fix it in the V1 mapping code too. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-6-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: set s_maxbytes correctlyEric Biggers
The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather than setting it to the actual filesystem-specific limit. This is broken because it means userspace doesn't see the standard behavior like getting EFBIG and SIGXFSZ when exceeding the maximum file size. Fix this by setting s_maxbytes correctly. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-5-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: reject too-large maximum file sizeEric Biggers
If the minix filesystem tries to map a very large logical block number to its on-disk location, block_to_path() can return offsets that are too large, causing out-of-bounds memory accesses when accessing indirect index blocks. This should be prevented by the check against the maximum file size, but this doesn't work because the maximum file size is read directly from the on-disk superblock and isn't validated itself. Fix this by validating the maximum file size at mount time. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+c7d9ec7a1a7272dd71b3@syzkaller.appspotmail.com Reported-by: syzbot+3b7b03a0c28948054fb5@syzkaller.appspotmail.com Reported-by: syzbot+6e056ee473568865f3e6@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-4-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: don't allow getting deleted inodesEric Biggers
If an inode has no links, we need to mark it bad rather than allowing it to be accessed. This avoids WARNINGs in inc_nlink() and drop_nlink() when doing directory operations on a fuzzed filesystem. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a9ac3de1b5de5fb10efc@syzkaller.appspotmail.com Reported-by: syzbot+df958cf5688a96ad3287@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-3-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12fs/minix: check return value of sb_getblk()Eric Biggers
Patch series "fs/minix: fix syzbot bugs and set s_maxbytes". This series fixes all syzbot bugs in the minix filesystem: KASAN: null-ptr-deref Write in get_block KASAN: use-after-free Write in get_block KASAN: use-after-free Read in get_block WARNING in inc_nlink KMSAN: uninit-value in get_block WARNING in drop_nlink It also fixes the minix filesystem to set s_maxbytes correctly, so that userspace sees the correct behavior when exceeding the max file size. This patch (of 6): sb_getblk() can fail, so check its return value. This fixes a NULL pointer dereference. Originally from Qiujun Huang. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+4a88b2b9dc280f47baf4@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-1-ebiggers@kernel.org Link: http://lkml.kernel.org/r/20200628060846.682158-2-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>