summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2020-06-03ext4: mballoc: make mb_debug() implementation to use pr_debug()Ritesh Harjani
mb_debug() msg had only 1 control level for all type of msgs. And if we enable mballoc_debug then all of those msgs would be enabled. Instead of adding multiple debug levels for mb_debug() msgs, use pr_debug() with which we could have finer control to print msgs at all of different levels (i.e. at file, func, line no.). Also add process name/pid, superblk id, and other info in mb_debug() msg. This also kills the mballoc_debug module parameter, since it is not needed any more. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/f0c660cbde9e2edbe95c67942ca9ad80dd2231eb.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: replace EXT_DEBUG with __maybe_unused in ↵Ritesh Harjani
ext4_ext_handle_unwritten_extents() Replace EXT_DEBUG with __maybe_unused from inside ext4_ext_handle_unwritten_extents() function. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/ae335b94506cd9db9d2648c1f4dd25a80f9f3ce2.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: improve ext_debug() msg in case of block allocation failureRitesh Harjani
ext4_map_blocks() has ext_debug msg early at the start of function. We also get ext_debug msg if we could allocate a block from ext4_ext_map_blocks(). But there is no ext_debug() msg in case of block allocation failure. So add one along with error code. Also add more info in ext_debug() msg like how many blocks were allocated v/s how many were requested in ext4_ext_map_blocks(). Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/1610ec2aa932396be00f9d552fe29da473ead176.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: use BIT() macro for BH_** state bitsRitesh Harjani
Simply use BIT() macro for all BH_** state bits instead of open coding it. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/57667689f51a3f9dba2fcef7d3425187fa3ba69f.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: balloc: use task_pid_nr() helperRitesh Harjani
Use task_pid_nr() function instead of current->pid. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/4b58403e15e9c8deb34a1b93deb3fc9cd153ab84.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: fix possible NULL ptr & remove BUG_ONs from DOUBLE_CHECKRitesh Harjani
Make sure to check for e4b->bd_info->bb_bitmap == NULL, in mb_cmp_bitmaps() and return if NULL, to avoid possible NULL ptr dereference. Similar to how we do this in other ifdef DOUBLE_CHECK functions. Also remove the BUG_ON() logic if kmalloc() or ext4_read_block_bitmap() fails. We should simply mark grp->bb_bitmap as NULL if above happens. In fact ext4_read_block_bitmap() may even return an error in case of resize ioctl. Hence remove this BUG_ON logic (fstests ext4/032 may trigger this). Link: https://lore.kernel.org/r/9a54f8a696ff17c057cd571be3d15ac3ec1407f1.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: refactor code inside DOUBLE_CHECK into separate functionRitesh Harjani
This patch implemets mb_group_bb_bitmap_alloc() and mb_group_bb_bitmap_free() function to remove #ifdef DOUBLE_CHECK macro and it's related code from inside ext4_mb_add_groupinfo()/ext4_mb_release(). There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8c2095d74b779f0254a19b24982490dc6f07c4f9.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: make ext4_mb_use_preallocated() return type as boolRitesh Harjani
Change return type of function ext4_mb_use_preallocated() to bool to better reflect what this function can return. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/7880cb6ef911465beafefcd7e9c3ea214688744b.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: simplify error handling in ext4_init_mballoc()Ritesh Harjani
This patch simplifies error handling logic in ext4_init_mballoc(), by adding all the cleanups at one place at the end of that function. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8621a7bc68f7107a9ac4292afeb784515333bd25.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: fix few other format specifier in mb_debug()Ritesh Harjani
Fix few other format specifiers in mb_debug() msgs. As such no other functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/574fa7f833abf2dbf3b53a2fea3195e71f6cdbd8.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: correct the mb_debug() format specifier for pa_len varRitesh Harjani
pa->pa_len is an integer. Fix all of the format specifier used in mb_debug() for pa_len to %d instead of %u. As such no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/af4987f643c586f62bcc9961e43f0a67151d5551.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: add more mb_debug() msgsRitesh Harjani
This patch adds some more debugging mb_debug() msgs to help improve mballoc code debugging. Other than adding more mb_debug() msgs at few more places, there should be no other functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/5fc8e7788b924e211fcfa4a4c1d2f8503511661a.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: refactor ext4_mb_show_ac()Ritesh Harjani
This factors out ext4_mb_show_pa() function to show all the group's preallocation info. This could be useful info to be added in later patches. There should be no functionality change in this patch. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8f07d890b0038dcc935e9c10e6043ec9f3792721.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: mballoc: print bb_free info even when it is 0Ritesh Harjani
Improve the debugging msg by also printing even if bb_free is 0. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c894f1d1d30f86ae38f4e3a861949665b6dc61cd.1589086800.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: avoid ext4_error()'s caused by ENOMEM in the truncate pathTheodore Ts'o
We can't fail in the truncate path without requiring an fsck. Add work around for this by using a combination of retry loops and the __GFP_NOFAIL flag. From: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Anna Pendleton <pendleton@google.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20200507175028.15061-1-pendleton@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix race between ext4_sync_parent() and rename()Eric Biggers
'igrab(d_inode(dentry->d_parent))' without holding dentry->d_lock is broken because without d_lock, d_parent can be concurrently changed due to a rename(). Then if the old directory is immediately deleted, old d_parent->inode can be NULL. That causes a NULL dereference in igrab(). To fix this, use dget_parent() to safely grab a reference to the parent dentry, which pins the inode. This also eliminates the need to use d_find_any_alias() other than for the initial inode, as we no longer throw away the dentry at each step. This is an extremely hard race to hit, but it is possible. Adding a udelay() in between the reads of ->d_parent and its ->d_inode makes it reproducible on a no-journal filesystem using the following program: #include <fcntl.h> #include <unistd.h> int main() { if (fork()) { for (;;) { mkdir("dir1", 0700); int fd = open("dir1/file", O_RDWR|O_CREAT|O_SYNC); write(fd, "X", 1); close(fd); } } else { mkdir("dir2", 0700); for (;;) { rename("dir1/file", "dir2/file"); rmdir("dir1"); } } } Fixes: d59729f4e794 ("ext4: fix races in ext4_sync_parent()") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200506183140.541194-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix a typo in a commentChristophe JAILLET
s/extnets/extents/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20200503200647.154701-1-christophe.jaillet@wanadoo.fr Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: clean up ext4_ext_convert_to_initialized() error handlingEric Whitney
If ext4_ext_convert_to_initialized() fails when called within ext4_ext_handle_unwritten_extents(), immediately error out through the exit point at function end. Fix the error handling in the event ext4_ext_convert_to_initialized() returns 0, which it shouldn't do when converting an existing extent. The current code returns the passed in value of allocated (which is likely non-zero) while failing to set m_flags, m_pblk, and m_len. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200430185320.23001-5-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: clean up GET_BLOCKS_PRE_IO error handlingEric Whitney
If the call to ext4_split_convert_extents() fails in the EXT4_GET_BLOCKS_PRE_IO case within ext4_ext_handle_unwritten_extents(), error out through the exit point at function end rather than jumping through an intermediate point. Fix the error handling in the event ext4_split_convert_extents() returns 0, which it shouldn't do when splitting an existing extent. The current code returns the passed in value of allocated (which is likely non-zero) while failing to set m_flags, m_pblk, and m_len. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200430185320.23001-4-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: remove redundant GET_BLOCKS_CONVERT codeEric Whitney
Remove the redundant code assigning values to ext4_map_blocks components in ext4_ext_handle_unwritten_extents() for the EXT4_GET_BLOCKS_CONVERT case, using the code at the function exit instead. Clean up and reorder that code to eliminate more redundancy and improve readability. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200430185320.23001-3-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: remove dead GET_BLOCKS_ZERO codeEric Whitney
There's no call to ext4_map_blocks() in the current ext4 code with a flags argument that combines EXT4_GET_BLOCKS_CONVERT and EXT4_GET_BLOCKS_ZERO. Remove the code that corresponds to this case from ext4_ext_handle_unwritten_extents(). Signed-off-by: Eric Whitney <enwlinux@gmail.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200430185320.23001-2-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: don't ignore return values from ext4_ext_dirty()Harshad Shirwadkar
Don't ignore return values from ext4_ext_dirty, since the errors indicate valid failures below Ext4. In all of the other instances of ext4_ext_dirty calls, the error return value is handled in some way. This patch makes those remaining couple of places to handle ext4_ext_dirty errors as well. In case of ext4_split_extent_at(), the ignorance of return value is intentional. The reason is that we are already in error path and there isn't much we can do if ext4_ext_dirty returns error. This patch adds a comment for that case explaining why we ignore the return value. In the longer run, we probably should make sure that errors from other mark_dirty routines are handled as well. Ran gce-xfstests smoke tests and verified that there were no regressions. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200427013438.219117-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: handle ext4_mark_inode_dirty errorsHarshad Shirwadkar
ext4_mark_inode_dirty() can fail for real reasons. Ignoring its return value may lead ext4 to ignore real failures that would result in corruption / crashes. Harden ext4_mark_inode_dirty error paths to fail as soon as possible and return errors to the caller whenever appropriate. One of the possible scnearios when this bug could affected is that while creating a new inode, its directory entry gets added successfully but while writing the inode itself mark_inode_dirty returns error which is ignored. This would result in inconsistency that the directory entry points to a non-existent inode. Ran gce-xfstests smoke tests and verified that there were no regressions. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20200427013438.219117-1-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix error pointer dereferenceJeffle Xu
Don't pass error pointers to brelse(). commit 7159a986b420 ("ext4: fix some error pointer dereferences") has fixed some cases, fix the remaining one case. Once ext4_xattr_block_find()->ext4_sb_bread() failed, error pointer is stored in @bs->bh, which will be passed to brelse() in the cleanup routine of ext4_xattr_set_handle(). This will then cause a NULL panic crash in __brelse(). BUG: unable to handle kernel NULL pointer dereference at 000000000000005b RIP: 0010:__brelse+0x1b/0x50 Call Trace: ext4_xattr_set_handle+0x163/0x5d0 ext4_xattr_set+0x95/0x110 __vfs_setxattr+0x6b/0x80 __vfs_setxattr_noperm+0x68/0x1b0 vfs_setxattr+0xa0/0xb0 setxattr+0x12c/0x1a0 path_setxattr+0x8d/0xc0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x60/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe In this case, @bs->bh stores '-EIO' actually. Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: stable@kernel.org # 2.6.19 Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1587628004-95123-1-git-send-email-jefflexu@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: Avoid freeing inodes on dirty listJan Kara
When we are evicting inode with journalled data, we may race with transaction commit in the following way: CPU0 CPU1 jbd2_journal_commit_transaction() evict(inode) inode_io_list_del() inode_wait_for_writeback() process BJ_Forget list __jbd2_journal_insert_checkpoint() __jbd2_journal_refile_buffer() __jbd2_journal_unfile_buffer() if (test_clear_buffer_jbddirty(bh)) mark_buffer_dirty(bh) __mark_inode_dirty(inode) ext4_evict_inode(inode) frees the inode This results in use-after-free issues in the writeback code (or the assertion added in the previous commit triggering). Fix the problem by removing inode from writeback lists once all the page cache is evicted and so inode cannot be added to writeback lists again. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200421085445.5731-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix buffer_head refcnt leak when ext4_iget() failsXiyu Yang
ext4_orphan_get() invokes ext4_read_inode_bitmap(), which returns a reference of the specified buffer_head object to "bitmap_bh" with increased refcnt. When ext4_orphan_get() returns, local variable "bitmap_bh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of ext4_orphan_get(). When ext4_iget() fails, the function forgets to decrease the refcnt increased by ext4_read_inode_bitmap(), causing a refcnt leak. Fix this issue by calling brelse() when ext4_iget() fails. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/1587618568-13418-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix EXT_MAX_EXTENT/INDEX to check for zeroed eh_maxHarshad Shirwadkar
If eh->eh_max is 0, EXT_MAX_EXTENT/INDEX would evaluate to unsigned (-1) resulting in illegal memory accesses. Although there is no consistent repro, we see that generic/019 sometimes crashes because of this bug. Ran gce-xfstests smoke and verified that there were no regressions. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20200421023959.20879-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-06-03ext4: remove unnecessary comparisons to boolJason Yan
Fix the following coccicheck warning: fs/ext4/extents_status.c:1057:5-28: WARNING: Comparison to bool fs/ext4/inode.c:2314:18-24: WARNING: Comparison to bool Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200420042918.19459-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: remove EXT4_GET_BLOCKS_KEEP_SIZE flagEric Whitney
The eofblocks code was removed in the 5.7 release by "ext4: remove EOFBLOCKS_FL and associated code" (4337ecd1fe99). The ext4_map_blocks() flag used to trigger it can now be removed as well. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200415203140.30349-2-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03ext4: fix a style issue in fs/ext4/acl.cCarlos Guerrero Álvarez
Fixed an if statement where braces were not needed. Link: https://lore.kernel.org/r/20200416141456.1089-1-carlosteniswarrior@gmail.com Signed-off-by: Carlos Guerrero Álvarez <carlosteniswarrior@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02ext4: pass the inode to ext4_mpage_readpagesMatthew Wilcox (Oracle)
This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-22-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ext4: convert from readpages to readaheadMatthew Wilcox (Oracle)
Use the new readahead operation in ext4 Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-21-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: add page_cache_readahead_unboundedMatthew Wilcox (Oracle)
ext4 and f2fs have duplicated the guts of the readahead code so they can read past i_size. Instead, separate out the guts of the readahead code so they can call it directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTSAnders Roxell
This makes it easier to enable all KUnit fragments. Adding 'if !KUNIT_ALL_TESTS' so individual tests can not be turned off. Therefore if KUNIT_ALL_TESTS is enabled that will hide the prompt in menuconfig. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-06-01Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Eric Biggers: - Add the IV_INO_LBLK_32 encryption policy flag which modifies the encryption to be optimized for eMMC inline encryption hardware. - Make the test_dummy_encryption mount option for ext4 and f2fs support v2 encryption policies. - Fix kerneldoc warnings and some coding style inconsistencies. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: add support for IV_INO_LBLK_32 policies fscrypt: make test_dummy_encryption use v2 by default fscrypt: support test_dummy_encryption=v2 fscrypt: add fscrypt_add_test_dummy_key() linux/parser.h: add include guards fscrypt: remove unnecessary extern keywords fscrypt: name all function parameters fscrypt: fix all kerneldoc warnings
2020-05-31vfs, afs, ext4: Make the inode hash table RCU searchableDavid Howells
Make the inode hash table RCU searchable so that searches that want to access or modify an inode without taking a ref on that inode can do so without taking the inode hash table lock. The main thing this requires is some RCU annotation on the list manipulation operations. Inodes are already freed by RCU in most cases. Users of this interface must take care as the inode may be still under construction or may be being torn down around them. There are at least three instances where this can be of use: (1) Testing whether the inode number iunique() is going to return is currently unique (the iunique_lock is still held). (2) Ext4 date stamp updating. (3) AFS callback breaking. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> cc: linux-ext4@vger.kernel.org cc: linux-afs@lists.infradead.org
2020-05-28fs/ext4: Introduce DAX inode flagIra Weiny
Add a flag ([EXT4|FS]_DAX_FL) to preserve FS_XFLAG_DAX in the ext4 inode. Set the flag to be user visible and changeable. Set the flag to be inherited. Allow applications to change the flag at any time except if it conflicts with the set of mutually exclusive flags (Currently VERITY, ENCRYPT, JOURNAL_DATA). Furthermore, restrict setting any of the exclusive flags if DAX is set. While conceptually possible, we do not allow setting EXT4_DAX_FL while at the same time clearing exclusion flags (or vice versa) for 2 reasons: 1) The DAX flag does not take effect immediately which introduces quite a bit of complexity 2) There is no clear use case for being this flexible Finally, on regular files, flag the inode to not be cached to facilitate changing S_DAX on the next creation of the inode. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-9-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Remove jflag variableIra Weiny
The jflag variable serves almost no purpose. Remove it. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-8-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Make DAX mount option a tri-stateIra Weiny
We add 'always', 'never', and 'inode' (default). '-o dax' continues to operate the same which is equivalent to 'always'. This new functionality is limited to ext4 only. Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode. We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX. Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user specified that option for printing. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-7-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Only change S_DAX on inode loadIra Weiny
To prevent complications with in memory inodes we only set S_DAX on inode load. FS_XFLAG_DAX can be changed at any time and S_DAX will change after inode eviction and reload. Add init bool to ext4_set_inode_flags() to indicate if the inode is being newly initialized. Assert that S_DAX is not set on an inode which is just being loaded. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-6-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Update ext4_should_use_dax()Ira Weiny
S_DAX should only be enabled when the underlying block device supports dax. Cache the underlying support for DAX in the super block and modify ext4_should_use_dax() to check for device support prior to the over riding mount option. While we are at it change the function to ext4_should_enable_dax() as this better reflects the ask as well as matches xfs. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-5-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYSIra Weiny
In prep for the new tri-state mount option which then introduces EXT4_MOUNT_DAX_NEVER. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-4-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Disallow verity if inode is DAXIra Weiny
Verity and DAX are incompatible. Changing the DAX mode due to a verity flag change is wrong without a corresponding address_space_operations update. Make the 2 options mutually exclusive by returning an error if DAX was set first. (Setting DAX is already disabled if Verity is set first.) Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-3-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28fs/ext4: Narrow scope of DAX check in setflagsIra Weiny
When preventing DAX and journaling on an inode. Use the effective DAX check rather than the mount option. This will be required to support per inode DAX flags. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-2-ira.weiny@intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-22block: remove the error_sector argument to blkdev_issue_flushChristoph Hellwig
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19ext4: fix fiemap size checks for bitmap filesChristoph Hellwig
Add an extra validation of the len parameter, as for ext4 some files might have smaller file size limits than others. This also means the redundant size check in ext4_ioctl_get_es_cache can go away, as all size checking is done in the shared fiemap handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-19ext4: fix EXT4_MAX_LOGICAL_BLOCK macroRitesh Harjani
ext4 supports max number of logical blocks in a file to be 0xffffffff. (This is since ext4_extent's ee_block is __le32). This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting from 0 logical offset). This patch fixes this. The issue was seen when ext4 moved to iomap_fiemap API and when overlayfs was mounted on top of ext4. Since overlayfs was missing filemap_check_ranges(), so it could pass a arbitrary huge length which lead to overflow of map.m_len logic. This patch fixes that. Fixes: d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-18fscrypt: support test_dummy_encryption=v2Eric Biggers
v1 encryption policies are deprecated in favor of v2, and some new features (e.g. encryption+casefolding) are only being added for v2. Therefore, the "test_dummy_encryption" mount option (which is used for encryption I/O testing with xfstests) needs to support v2 policies. To do this, extend its syntax to be "test_dummy_encryption=v1" or "test_dummy_encryption=v2". The existing "test_dummy_encryption" (no argument) also continues to be accepted, to specify the default setting -- currently v1, but the next patch changes it to v2. To cleanly support both v1 and v2 while also making it easy to support specifying other encryption settings in the future (say, accepting "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a pointer to the dummy fscrypt_context rather than using mount flags. To avoid concurrency issues, don't allow test_dummy_encryption to be set or changed during a remount. (The former restriction is new, but xfstests doesn't run into it, so no one should notice.) Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4, there are two regressions, both of which are test bugs: ext4/023 and ext4/028 fail because they set an xattr and expect it to be stored inline, but the increase in size of the fscrypt_context from 24 to 40 bytes causes this xattr to be spilled into an external block. Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-14ext4: remove redundant variable has_bigalloc in ext4_fill_superKaixu Xia
We can use the ext4_has_feature_bigalloc() function directly to check bigalloc feature and the variable has_bigalloc is reduncant, so remove it. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1586935542-29588-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>