summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-04-02ocfs2: use memalloc_nofs_save instead of memalloc_noio_saveMatthew Wilcox (Oracle)
OCFS2 doesn't mind if memory reclaim makes I/Os happen; it just cares that it won't be reentered, so it can use memalloc_nofs_save() instead of memalloc_noio_save(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200326200214.1102-1-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: use scnprintf() for avoiding potential buffer overflowTakashi Iwai
Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200311093516.25300-1-tiwai@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: roll back the reference count modification of the parent directory if ↵wangjian
an error occurs Under some conditions, the directory cannot be deleted. The specific scenarios are as follows: (for example, /mnt/ocfs2 is the mount point) 1. Create the /mnt/ocfs2/p_dir directory. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 2. 2. During the process of creating the /mnt/ocfs2/p_dir/s_dir directory, if the call to the inc_nlink function in ocfs2_mknod succeeds, the functions such as ocfs2_init_acl, ocfs2_init_security_set, and ocfs2_dentry_attach_lock fail. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 3, but /mnt/ocfs2/p_dir/s_dir is not added to the /mnt/ocfs2/p_dir directory entry. 3. Delete the /mnt/ocfs2/p_dir directory (rm -rf /mnt/ocfs2/p_dir). At this time, it is found that the i_nlink corresponding to the inode corresponding to the /mnt/ocfs2/p_dir directory is equal to 3. Therefore, the /mnt/ocfs2/p_dir directory cannot be deleted. Signed-off-by: Jian wang <wangjian161@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/a44f6666-bbc4-405e-0e6c-0f4e922eeef6@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: ocfs2_fs.h: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBWb40QCkg$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBUhNn9M6g$ [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309202155.GA8432@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: dlm: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbV5v7YiUA$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbXs8Eh8eg$ [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309202016.GA8210@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: cluster: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYl00czlw$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYHG1nAnw$ [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309201907.GA8005@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200213160244.GA6088@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: add missing annotations for ocfs2_refcount_cache_lock() and ↵Jules Irenge
ocfs2_refcount_cache_unlock() Sparse reports warnings at ocfs2_refcount_cache_lock() and ocfs2_refcount_cache_unlock() warning: context imbalance in ocfs2_refcount_cache_lock() - wrong count at exit warning: context imbalance in ocfs2_refcount_cache_unlock() - unexpected unlock The root cause is the missing annotation at ocfs2_refcount_cache_lock() and at ocfs2_refcount_cache_unlock() Add the missing __acquires(&rf->rf_lock) annotation to ocfs2_refcount_cache_lock() Add the missing __releases(&rf->rf_lock) annotation to ocfs2_refcount_cache_unlock() Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200224204130.18178-1-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: remove useless errAlex Shi
We don't need 'err' in these 2 places, better to remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: ChenGang <cg.chen@huawei.com> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1579577836-251879-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: correct annotation from "l_next_rec" to "l_next_free_rec"wangyan
Correct annotation from "l_next_rec" to "l_next_free_rec" Signed-off-by: Yan Wang <wangyan122@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Link: http://lkml.kernel.org/r/5e76c953-3479-1280-023c-ad05e4c75608@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: there is no need to log twice in several functionswangyan
There is no need to log twice in several functions. Signed-off-by: Yan Wang <wangyan122@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Link: http://lkml.kernel.org/r/77eec86a-f634-5b98-4f7d-0cd15185a37b@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: remove dlm_lock_is_remoteAlex Shi
This macro has been unused since it was introduced. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579578203-254451-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: use OCFS2_SEC_BITS in macroAlex Shi
This macro should be used. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577840-251956-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: remove unused macrosAlex Shi
O2HB_DEFAULT_BLOCK_BITS/DLM_THREAD_MAX_ASTS/DLM_MIGRATION_RETRY_MS and OCFS2_MAX_RESV_WINDOW_BITS/OCFS2_MIN_RESV_WINDOW_BITS have been unused since commit 66effd3c6812 ("ocfs2/dlm: Do not migrate resource to a node that is leaving the domain"). Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: ChenGang <cg.chen@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Fontana <rfontana@redhat.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577827-251796-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: remove FS_OCFS2_NMAlex Shi
This macro is unused since commit ab09203e302b ("sysctl fs: Remove dead binary sysctl support"). Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577812-251572-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02iomap: Handle memory allocation failure in readaheadMatthew Wilcox (Oracle)
bio_alloc() can fail when we use GFP_NORETRY. If it does, allocate a bio large enough for a single page like mpage_readpages() does. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-02xfs: fix inode number overflow in ifree cluster helperBrian Foster
Qian Cai reports seemingly random buffer read verifier errors during filesystem writeback. This was isolated to a recent patch that factored out some inode cluster freeing code and happened to cast an unsigned inode number type to a signed value. If the inode number value overflows, we can skip marking in-core inodes associated with the underlying buffer stale at the time the physical inodes are freed. If such an inode happens to be dirty, xfsaild will eventually attempt to write it back over non-inode blocks. The invalidation of the underlying inode buffer causes writeback to read the buffer from disk. This fails the read verifier (preventing eventual corruption) if the buffer no longer looks like an inode cluster. Analysis by Dave Chinner. Fix up the helper to use the proper type for inode number values. Fixes: 5806165a6663 ("xfs: factor inode lookup from xfs_ifree_cluster") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-03powerpc: Add back __ARCH_WANT_SYS_LLSEEK macroMichal Suchanek
This partially reverts commit caf6f9c8a326 ("asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro") When CONFIG_COMPAT is disabled on ppc64 the kernel does not build. There is resistance to both removing the llseek syscall from the 64bit syscall tables and building the llseek interface unconditionally. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/ Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/ Link: https://lore.kernel.org/r/dd4575c51e31766e87f7e7fa121d099ab78d3290.1584699455.git.msuchanek@suse.de
2020-04-02lookup_open(): don't bother with fallbacks to lookup+createAl Viro
We fall back to lookup+create (instead of atomic_open) in several cases: 1) we don't have write access to filesystem and O_TRUNC is present in the flags. It's not something we want ->atomic_open() to see - it just might go ahead and truncate the file. However, we can pass it the flags sans O_TRUNC - eventually do_open() will call handle_truncate() anyway. 2) we have O_CREAT | O_EXCL and we can't write to parent. That's going to be an error, of course, but we want to know _which_ error should that be - might be EEXIST (if file exists), might be EACCES or EROFS. Simply stripping O_CREAT (and checking if we see ENOENT) would suffice, if not for O_EXCL. However, we used to have ->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL on existing file and just stripping O_CREAT would've disarmed those checks. With nothing downstream to catch the problem - FMODE_OPENED used to be "don't bother with EEXIST checks, ->atomic_open() has done those". Now EEXIST checks downstream are skipped only if FMODE_CREATED is set - FMODE_OPENED alone is not enough. That has eliminated the need to fall back onto lookup+create path in this case. 3) O_WRONLY or O_RDWR when we have no write access to filesystem, with nothing else objectionable. Fallback is (and had always been) pointless. IOW, we don't really need that fallback; all we need in such cases is to trim O_TRUNC and O_CREAT properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02atomic_open(): no need to pass struct open_flags anymoreAl Viro
argument had been unused since 1643b43fbd052 (lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open()) back in 2016 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02open_last_lookups(): move complete_walk() into do_open()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()Al Viro
Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one of the ways out of open_last_lookups(). There are 4 cases: 1) the last component is . or ..; check is not done. 2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open(); check is not done. 3) symlink to be traversed is found; check is not done (nor should it be) 4) everything else: check done (before complete_walk(), even). In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's open("/tmp/.", O_CREAT|O_EXCL, 0600) Note that in the same conditions open("/tmp", O_CREAT|O_EXCL, 0600) would have yielded EEXIST. Either error is allowed, switching to -EEXIST in these cases would've been more consistent. Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the object hadn't existed prior to the call. The check should not be done in such a case. The rest is problematic, though - we have FMODE_OPENED set (i.e. it went through ->atomic_open() and got successfully opened there) FMODE_CREATED is *NOT* set O_CREAT and O_EXCL are both set. Any such case is a bug - either we failed to set FMODE_CREATED when we had, in fact, created an object (no such instances in the tree) or we have opened a pre-existing file despite having had both O_CREAT and O_EXCL passed. One of those was, in fact caught (and fixed) while sorting out this mess (gfs2 on cold dcache). And in such situations we should fail with EEXIST. Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing in handle_dots() to set it, for (4) we'd explicitly checked that. And (1), (2) and (4) are exactly the cases when we leave the loop in the caller, with do_open() called immediately after that loop. IOW, we can move the check over there, and make it If we have O_CREAT|O_EXCL and after successful pathname resolution FMODE_CREATED is *not* set, we must have run into a preexisting file and should fail with EEXIST. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02open_last_lookups(): don't abuse complete_walk() when all we want is unlazyAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02open_last_lookups(): consolidate fsnotify_create() callsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02take post-lookup part of do_last() out of loopAl Viro
now we can have open_last_lookups() directly from the loop in path_openat() - the rest of do_last() never returns a symlink to follow, so we can bloody well leave the loop first. Rename the rest of that thing from do_last() to do_open() and make it return an int. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02link_path_walk(): sample parent's i_uid and i_mode for the last componentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02__nd_alloc_stack(): make it return boolAl Viro
... and adjust the caller (reserve_stack()). Rename to nd_alloc_stack(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02reserve_stack(): switch to __nd_alloc_stack()Al Viro
expand the call of nd_alloc_stack() into it (and don't recheck the depth on the second call) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): take reserving space on stack into a new helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): more straightforward handling of allocation failuresAl Viro
pick_link() needs to push onto stack; we start with using two-element array embedded into struct nameidata and the first time we need more than that we switch to separately allocated array. Allocation can fail, of course, and handling of that would be simple enough - we need to drop 'link' and bugger off. However, the things get more complicated in RCU mode. There we must do GFP_ATOMIC allocation. If that fails, we try to switch to non-RCU mode and repeat the allocation. To switch to non-RCU mode we need to grab references to 'link' and to everything in nameidata. The latter done by unlazy_walk(); the former - legitimize_path(). 'link' must go first - after unlazy_walk() we are out of RCU-critical period and it's too late to call legitimize_path() since the references in link->mnt and link->dentry might be pointing to freed and reused memory. So we do legitimize_path(), then unlazy_walk(). And that's where it gets too subtle: what to do if the former fails? We MUST do path_put(link) to avoid leaks. And we can't do that under rcu_read_lock(). Solution in mainline was to empty then nameidata manually, drop out of RCU mode and then do put_path(). In effect, we open-code the things eventual terminate_walk() would've done on error in RCU mode. That looks badly out of place and confusing. We could add a comment along the lines of the explanation above, but... there's a simpler solution. Call unlazy_walk() even if legitimaze_path() fails. It will take us out of RCU mode, so we'll be able to do path_put(link). Yes, it will do unnecessary work - attempt to grab references on the stuff in nameidata, only to have them dropped as soon as we return the error to upper layer and get terminate_walk() called there. So what? We are thoroughly off the fast path by that point - we had GFP_ATOMIC allocation fail, we had ->d_seq or mount_lock mismatch and we are about to try walking the same path from scratch in non-RCU mode. Which will need to do the same allocation, this time with GFP_KERNEL, so it will be able to apply memory pressure for blocking stuff. Compared to that the cost of several lockref_get_not_dead() is noise. And the logics become much easier to understand that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02fold path_to_nameidata() into its only remaining callerAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02pick_link(): pass it struct path already with normal refcounting rulesAl Viro
step_into() tries to avoid grabbing and dropping mount references on the steps that do not involve crossing mountpoints (which is obviously the majority of cases). So it uses a local struct path with unusual refcounting rules - path.mnt is pinned if and only if it's not equal to nd->path.mnt. We used to have similar beasts all over the place and we had quite a few bugs crop up in their handling - it's easy to get confused when changing e.g. cleanup on failure exits (or adding a new check, etc.) Now that's mostly gone - the step_into() instance (which is what we need them for) is the only one left. It is exposed to mount traversal and it's (shortly) seen by pick_link(). Since pick_link() needs to store it in link stack, where the normal rules apply, it has to make sure that mount is pinned regardless of nd->path.mnt value. That's done on all calls of pick_link() and very early in those. Let's do that in the caller (step_into()) instead - that way the fewer places need to be aware of such struct path instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02fs/namei.c: kill follow_mount()Al Viro
The only remaining caller (path_pts()) should be using follow_down() anyway. And clean path_pts() a bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02non-RCU analogue of the previous commitAl Viro
new helper: choose_mountpoint(). Wrapper around choose_mountpoint_rcu(), similar to lookup_mnt() vs. __lookup_mnt(). follow_dotdot() switched to it. Now we don't grab mount_lock exclusive anymore; note that the primitive used non-RCU mount traversals in other direction (lookup_mnt()) doesn't bother with that either - it uses mount_lock seqcount instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02helper for mount rootwards traversalAl Viro
The loops in follow_dotdot{_rcu()} are doing the same thing: we have a mount and we want to find out how far up the chain of mounts do we need to go. We follow the chain of mount until we find one that is not directly overmounting the root of another mount. If such a mount is found, we want the location it's mounted upon. If we run out of chain (i.e. get to a mount that is not mounted on anything else) or run into process' root, we report failure. On success, we want (in RCU case) d_seq of resulting location sampled or (in non-RCU case) references to that location acquired. This commit introduces such primitive for RCU case and switches follow_dotdot_rcu() to it; non-RCU case will be go in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot(): be lazy about changing nd->pathAl Viro
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. NOTE: path_get/path_put introduced here are temporary. They will go away later in the series. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot_rcu(): be lazy about changing nd->pathAl Viro
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. Don't recheck mount_lock on each step either. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. Note that the sequence for d_seq/d_inode here is * sample mount_lock seqcount ... * sample d_seq * fetch d_inode * verify mount_lock seqcount The last step makes sure that d_inode value we'd got matches d_seq - it dentry is guaranteed to have been a mountpoint through the entire thing, so its d_inode must have been stable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02follow_dotdot{,_rcu}(): massage loopsAl Viro
The logics in both of them is the same: while true if in process' root // uncommon break if *not* in mount root // normal case find the parent return if at absolute root // very uncommon break move to underlying mountpoint report that we are in root Pull the common path out of the loop: if in process' root // uncommon goto in_root if unlikely(in mount root) while true if at absolute root goto in_root move to underlying mountpoint if in process' root goto in_root if in mount root break; find the parent // we are not in mount root return in_root: report that we are in root The reason for that transformation is that we get to keep the common path straight *and* get a separate block for "move through underlying mountpoints", which will allow to sanitize NO_XDEV handling there. What's more, the pared-down loops will be easier to deal with - in particular, non-RCU case has no need to grab mount_lock and rewriting it to the form that wouldn't do that is a non-trivial change. Better do that with less stuff getting in the way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-02lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcuAl Viro
lift step_into() into handle_dots() (where they merge with each other); have follow_... return dentry and pass inode/seq to the caller. [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-01Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix out-of-sync IVs in self-test for IPsec AEAD algorithms Algorithms: - Use formally verified implementation of x86/curve25519 Drivers: - Enhance hwrng support in caam - Use crypto_engine for skcipher/aead/rsa/hash in caam - Add Xilinx AES driver - Add uacce driver - Register zip engine to uacce in hisilicon - Add support for OCTEON TX CPT engine in marvell" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits) crypto: af_alg - bool type cosmetics crypto: arm[64]/poly1305 - add artifact to .gitignore files crypto: caam - limit single JD RNG output to maximum of 16 bytes crypto: caam - enable prediction resistance in HRWNG bus: fsl-mc: add api to retrieve mc version crypto: caam - invalidate entropy register during RNG initialization crypto: caam - check if RNG job failed crypto: caam - simplify RNG implementation crypto: caam - drop global context pointer and init_done crypto: caam - use struct hwrng's .init for initialization crypto: caam - allocate RNG instantiation descriptor with GFP_DMA crypto: ccree - remove duplicated include from cc_aead.c crypto: chelsio - remove set but not used variable 'adap' crypto: marvell - enable OcteonTX cpt options for build crypto: marvell - add the Virtual Function driver for CPT crypto: marvell - add support for OCTEON TX CPT engine crypto: marvell - create common Kconfig and Makefile for Marvell crypto: arm/neon - memzero_explicit aes-cbc key crypto: bcm - Use scnprintf() for avoiding potential buffer overflow crypto: atmel-i2c - Fix wakeup fail ...
2020-04-01ext4: save all error info in save_error_info() and drop ext4_set_errno()Theodore Ts'o
Using a separate function, ext4_set_errno() to set the errno is problematic because it doesn't do the right thing once s_last_error_errorcode is non-zero. It's also less racy to set all of the error information all at once. (Also, as a bonus, it shrinks code size slightly.) Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu Fixes: 878520ac45f9 ("ext4: save the error code which triggered...") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-04-01NFS: Try to join page groups before an O_DIRECT retransmissionTrond Myklebust
If we have to retransmit requests, try to join their page groups first. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Refactor nfs_lock_and_join_requests()Trond Myklebust
Refactor nfs_lock_and_join_requests() in order to separate out the subrequest merging into its own function nfs_lock_and_join_group() that can be used by O_DIRECT. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Reverse the submission order of requests in __nfs_pageio_add_request()Trond Myklebust
If we have to split the request up into subrequests, we have to submit the request pointed to by the function call parameter last, in case there is an error or other issue that causes us to exit before the last request is submitted. The reason is that the caller is expected to perform cleanup in those cases. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Clean up nfs_lock_and_join_requests()Trond Myklebust
Clean up nfs_lock_and_join_requests() to simplify the calculation of the range covered by the page group, taking into account the presence of mirrors. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Remove the redundant function nfs_pgio_has_mirroring()Trond Myklebust
We need to trust that desc->pg_mirror_idx is set correctly, whether or not mirroring is enabled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Fix memory leaks in nfs_pageio_stop_mirroring()Trond Myklebust
If we just set the mirror count to 1 without first clearing out the mirrors, we can leak queued up requests. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()Trond Myklebust
nfs_direct_write_scan_commit_list() will lock the request and bump the reference count, but we also need to account for the reference that was taken when we initially added the request to the commit list. Fixes: fb5f7f20cdb9 ("NFS: commit errors should be fatal") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Fix use-after-free issues in nfs_pageio_add_request()Trond Myklebust
We need to ensure that we create the mirror requests before calling nfs_pageio_add_request_mirror() on the request we are adding. Otherwise, we can end up with a use-after-free if the call to nfs_pageio_add_request_mirror() triggers I/O. Fixes: c917cfaf9bbe ("NFS: Fix up NFS I/O subrequest creation") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()Trond Myklebust
When a subrequest is being detached from the subgroup, we want to ensure that it is not holding the group lock, or in the process of waiting for the group lock. Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>