summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2021-05-02Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: useful constants: struct qstr for ".." hostfs_open(): don't open-code file_dentry() whack-a-mole: kill strlen_user() (again) autofs: should_expire() argument is guaranteed to be positive apparmor:match_mn() - constify devpath argument buffer: a small optimization in grow_buffers get rid of autofs_getpath() constify dentry argument of dentry_path()/dentry_path_raw()
2021-04-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features for ext4 this cycle include support for encrypted casefold, ensure that deleted file names are cleared in directory blocks by zeroing directory entries when they are unlinked or moved as part of a hash tree node split. We also improve the block allocator's performance on a freshly mounted file system by prefetching block bitmaps. There are also the usual cleanups and bug fixes, including fixing a page cache invalidation race when there is mixed buffered and direct I/O and the block size is less than page size, and allow the dax flag to be set and cleared on inline directories" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: wipe ext4_dir_entry2 upon file deletion ext4: Fix occasional generic/418 failure fs: fix reporting supported extra file attributes for statx() ext4: allow the dax flag to be set and cleared on inline directories ext4: fix debug format string warning ext4: fix trailing whitespace ext4: fix various seppling typos ext4: fix error return code in ext4_fc_perform_commit() ext4: annotate data race in jbd2_journal_dirty_metadata() ext4: annotate data race in start_this_handle() ext4: fix ext4_error_err save negative errno into superblock ext4: fix error code in ext4_commit_super ext4: always panic when errors=panic is specified ext4: delete redundant uptodate check for buffer ext4: do not set SB_ACTIVE in ext4_orphan_cleanup() ext4: make prefetch_block_bitmaps default ext4: add proc files to monitor new structures ext4: improve cr 0 / cr 1 group scanning ext4: add MB_NUM_ORDERS macro ext4: add mballoc stats proc file ...
2021-04-29Merge tag 'fsnotify_for_v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - support for limited fanotify functionality for unpriviledged users - faster merging of fanotify events - a few smaller fsnotify improvements * tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: shmem: allow reporting fanotify events with file handles on tmpfs fs: introduce a wrapper uuid_to_fsid() fanotify_user: use upper_32_bits() to verify mask fanotify: support limited functionality for unprivileged users fanotify: configurable limits via sysfs fanotify: limit number of event merge attempts fsnotify: use hash table for faster events merge fanotify: mix event info and pid into merge key hash fanotify: reduce event objectid to 29-bit hash fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
2021-04-27Merge tag 'netfs-lib-20210426' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull network filesystem helper library updates from David Howells: "Here's a set of patches for 5.13 to begin the process of overhauling the local caching API for network filesystems. This set consists of two parts: (1) Add a helper library to handle the new VM readahead interface. This is intended to be used unconditionally by the filesystem (whether or not caching is enabled) and provides a common framework for doing caching, transparent huge pages and, in the future, possibly fscrypt and read bandwidth maximisation. It also allows the netfs and the cache to align, expand and slice up a read request from the VM in various ways; the netfs need only provide a function to read a stretch of data to the pagecache and the helper takes care of the rest. (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb facility to do async DIO to transfer data to/from the netfs's pages, rather than using readpage with wait queue snooping on one side and vfs_write() on the other. It also uses less memory, since it doesn't do buffered I/O on the backing file. Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available to be read from the cache. Whilst this is an improvement from the bmap interface, it still has a problem with regard to a modern extent-based filesystem inserting or removing bridging blocks of zeros. Fixing that requires a much greater overhaul. This is a step towards overhauling the fscache API. The change is opt-in on the part of the network filesystem. A netfs should not try to mix the old and the new API because of conflicting ways of handling pages and the PG_fscache page flag and because it would be mixing DIO with buffered I/O. Further, the helper library can't be used with the old API. This does not change any of the fscache cookie handling APIs or the way invalidation is done at this time. In the near term, I intend to deprecate and remove the old I/O API (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), fscache_write_page() and fscache_uncache_page()) and eventually replace most of fscache/cachefiles with something simpler and easier to follow. This patchset contains the following parts: - Some helper patches, including provision of an ITER_XARRAY iov iterator and a function to do readahead expansion. - Patches to add the netfs helper library. - A patch to add the fscache/cachefiles kiocb API. - A pair of patches to fix some review issues in the ITER_XARRAY and read helpers as spotted by Al and Willy. Jeff Layton has patches to add support in Ceph for this that he intends for this merge window. I have a set of patches to support AFS that I will post a separate pull request for. With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul. Ceph also passes the expected tests. I also have patches in a separate branch to tidy up the handling of PG_fscache/PG_private_2 and their contribution to page refcounting in the core kernel here, but I haven't included them in this set and will route them separately" Link: https://lore.kernel.org/lkml/3779937.1619478404@warthog.procyon.org.uk/ * tag 'netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: Miscellaneous fixes iov_iter: Four fixes for ITER_XARRAY fscache, cachefiles: Add alternate API to use kiocb for read/write to cache netfs: Add a tracepoint to log failures that would be otherwise unseen netfs: Define an interface to talk to a cache netfs: Add write_begin helper netfs: Gather stats netfs: Add tracepoints netfs: Provide readahead and readpage netfs helpers netfs, mm: Add set/end/wait_on_page_fscache() aliases netfs, mm: Move PG_fscache helper funcs to linux/netfs.h netfs: Documentation for helper library netfs: Make a netfs helper module mm: Implement readahead_control pageset expansion mm/readahead: Handle ractl nr_pages being modified fs: Document file_ra_state mm/filemap: Pass the file_ra_state in the ractl mm: Add set/end/wait functions for PG_private_2 iov_iter: Add ITER_XARRAY
2021-04-27Merge tag 'fs.idmapped.helpers.v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs mapping helper updates from Christian Brauner: "This adds kernel-doc to all new idmapping helpers and improves their naming which was triggered by a discussion with some fs developers. Some of the names are based on suggestions by Vivek and Al. Also remove the open-coded permission checking in a few places with simple helpers. Overall this should lead to more clarity and make it easier to maintain" * tag 'fs.idmapped.helpers.v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: introduce two inode i_{u,g}id initialization helpers fs: introduce fsuidgid_has_mapping() helper fs: document and rename fsid helpers fs: document mapping helpers
2021-04-27Merge branch 'miklos.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fileattr conversion updates from Miklos Szeredi via Al Viro: "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a separate method. The interface is reasonably uniform across the filesystems that support it and gives nice boilerplate removal" * 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) ovl: remove unneeded ioctls fuse: convert to fileattr fuse: add internal open/release helpers fuse: unsigned open flags fuse: move ioctl to separate source file vfs: remove unused ioctl helpers ubifs: convert to fileattr reiserfs: convert to fileattr ocfs2: convert to fileattr nilfs2: convert to fileattr jfs: convert to fileattr hfsplus: convert to fileattr efivars: convert to fileattr xfs: convert to fileattr orangefs: convert to fileattr gfs2: convert to fileattr f2fs: convert to fileattr ext4: convert to fileattr ext2: convert to fileattr btrfs: convert to fileattr ...
2021-04-23mm/filemap: Pass the file_ra_state in the ractlMatthew Wilcox (Oracle)
For readahead_expand(), we need to modify the file ra_state, so pass it down by adding it to the ractl. We have to do this because it's not always the same as f_ra in the struct file that is already being passed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/20210407201857.3582797-2-willy@infradead.org/ Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/ # v6
2021-04-22ext4: wipe ext4_dir_entry2 upon file deletionLeah Rumancik
Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len. In case sensitive data is stored in filenames, this ensures no potentially sensitive data is left in the directory entry upon deletion. Also, wipe these fields upon moving a directory entry during the conversion to an htree and when splitting htree nodes. The data wiped may still exist in the journal, but there are future commits planned to address this. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-22ext4: Fix occasional generic/418 failureJan Kara
Eric has noticed that after pagecache read rework, generic/418 is occasionally failing for ext4 when blocksize < pagesize. In fact, the pagecache rework just made hard to hit race in ext4 more likely. The problem is that since ext4 conversion of direct IO writes to iomap framework (commit 378f32bab371), we update inode size after direct IO write only after invalidating page cache. Thus if buffered read sneaks at unfortunate moment like: CPU1 - write at offset 1k CPU2 - read from offset 0 iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT); ext4_readpage(); ext4_handle_inode_extension() the read will zero out tail of the page as it still sees smaller inode size and thus page cache becomes inconsistent with on-disk contents with all the consequences. Fix the problem by moving inode size update into end_io handler which gets called before the page cache is invalidated. Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com> Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-19fs: introduce a wrapper uuid_to_fsid()Amir Goldstein
Some filesystem's use a digest of their uuid for f_fsid. Create a simple wrapper for this open coded folding. Filesystems that have a non null uuid but use the block device number for f_fsid may also consider using this helper. [JK: Added missing asm/byteorder.h include] Link: https://lore.kernel.org/r/20210322173944.449469-2-amir73il@gmail.com Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-04-15useful constants: struct qstr for ".."Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-04-12ext4: allow the dax flag to be set and cleared on inline directoriesTheodore Ts'o
This is needed to allow generic/607 to pass for file systems with the inline data_feature enabled, and it allows the use of file systems where the directories use inline_data, while the files are accessed via DAX. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-12ext4: convert to fileattrMiklos Szeredi
Use the fileattr API to let the VFS handle locking, permission checking and conversion. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu>
2021-04-09ext4: fix debug format string warningArnd Bergmann
Using no_printk() for jbd_debug() revealed two warnings: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=] 256 | jbd_debug(3, "Processing fast commit blk with seq %d"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range': fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=] 1732 | jbd_debug(1, "Converting from %d to %d %lld", The first one was added incorrectly, and was also missing a few newlines in debug output, and the second one happened when the type of an argument changed. Reported-by: kernel test robot <lkp@intel.com> Fixes: d556435156b7 ("jbd2: avoid -Wempty-body warnings") Fixes: 6db074618969 ("ext4: use BIT() macro for BH_** state bits") Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix trailing whitespaceJack Qiu
Made suggested modifications from checkpatch in reference to ERROR: trailing whitespace Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20210409042035.15516-1-jack.qiu@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix various seppling typosBhaskar Chowdhury
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Link: https://lore.kernel.org/r/cover.1616840203.git.unixbhaskar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix error return code in ext4_fc_perform_commit()Xu Yihang
In case of if not ext4_fc_add_tlv branch, an error return code is missing. Cc: stable@kernel.org Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix ext4_error_err save negative errno into superblockYe Bin
Fix As write_mmp_block() so that it returns -EIO instead of 1, so that the correct error gets saved into the superblock. Cc: stable@kernel.org Fixes: 54d3adbc29f0 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix error code in ext4_commit_superFengnan Chang
We should set the error code when ext4_commit_super check argument failed. Found in code review. Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs"). Cc: stable@kernel.org Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: always panic when errors=panic is specifiedYe Bin
Before commit 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()"), the following series of commands would trigger a panic: 1. mount /dev/sda -o ro,errors=panic test 2. mount /dev/sda -o remount,abort test After commit 014c9caa29d3, remounting a file system using the test mount option "abort" will no longer trigger a panic. This commit will restore the behaviour immediately before commit 014c9caa29d3. (However, note that the Linux kernel's behavior has not been consistent; some previous kernel versions, including 5.4 and 4.19 similarly did not panic after using the mount option "abort".) This also makes a change to long-standing behaviour; namely, the following series commands will now cause a panic, when previously it did not: 1. mount /dev/sda -o ro,errors=panic test 2. echo test > /sys/fs/ext4/sda/trigger_fs_error However, this makes ext4's behaviour much more consistent, so this is a good thing. Cc: stable@kernel.org Fixes: 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210401081903.3421208-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: delete redundant uptodate check for bufferYang Guo
The buffer uptodate state has been checked in function set_buffer_uptodate, there is no need use buffer_uptodate before calling set_buffer_uptodate and delete it. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Yang Guo <guoyang2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/1617260610-29770-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()Zhang Yi
When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due to some error happens behind ext4_orphan_cleanup(), it will end up triggering a after free issue of super_block. The problem is that ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is enabled, after we cleanup the truncated inodes, the last iput() will put them into the lru list, and these inodes' pages may probably dirty and will be write back by the writeback thread, so it could be raced by freeing super_block in the error path of mount_bdev(). After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it was used to ensure updating the quota file properly, but evict inode and trash data immediately in the last iput does not affect the quotafile, so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by just remove the SB_ACTIVE setting. [1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00 Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: make prefetch_block_bitmaps defaultHarshad Shirwadkar
Block bitmap prefetching is needed for these allocator optimization data structures to get populated and provide better group scanning order. So, turn it on bu default. prefetch_block_bitmaps mount option is now marked as removed and a new option no_prefetch_block_bitmaps is added to disable block bitmap prefetching. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-8-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add proc files to monitor new structuresHarshad Shirwadkar
This patch adds a new file "mb_structs_summary" which allows us to see the summary of the new allocator structures added in this series. Here's the sample output of file: optimize_scan: 1 max_free_order_lists: list_order_0_groups: 0 list_order_1_groups: 0 list_order_2_groups: 0 list_order_3_groups: 0 list_order_4_groups: 0 list_order_5_groups: 0 list_order_6_groups: 0 list_order_7_groups: 0 list_order_8_groups: 0 list_order_9_groups: 0 list_order_10_groups: 0 list_order_11_groups: 0 list_order_12_groups: 0 list_order_13_groups: 40 fragment_size_tree: tree_min: 16384 tree_max: 32768 tree_nodes: 40 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: improve cr 0 / cr 1 group scanningHarshad Shirwadkar
Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free order >= the order of the request. So, with this patch, we maintain lists for each possible order and insert each group into a list based on the largest free order in its buddy bitmap. During cr 0 allocation, we traverse these lists in the increasing order of largest free orders. This allows us to find a group with the best available cr 0 match in constant time. If nothing can be found, we fallback to cr 1 immediately. At CR1, the story is slightly different. We want to traverse in the order of increasing average fragment size. For CR1, we maintain a rb tree of groupinfos which is sorted by average fragment size. Instead of traversing linearly, at CR1, we traverse in the order of increasing average fragment size, starting at the most optimal group. This brings down cr 1 search complexity to log(num groups). For cr >= 2, we just perform the linear search as before. Also, in case of lock contention, we intermittently fallback to linear search even in CR 0 and CR 1 cases. This allows us to proceed during the allocation path even in case of high contention. There is an opportunity to do optimization at CR2 too. That's because at CR2 we only consider groups where bb_free counter (number of free blocks) is greater than the request extent size. That's left as future work. All the changes introduced in this patch are protected under a new mount option "mb_optimize_scan". With this patchset, following experiment was performed: Created a highly fragmented disk of size 65TB. The disk had no contiguous 2M regions. Following command was run consecutively for 3 times: time dd if=/dev/urandom of=file bs=2M count=10 Here are the results with and without cr 0/1 optimizations introduced in this patch: |---------+------------------------------+---------------------------| | | Without CR 0/1 Optimizations | With CR 0/1 Optimizations | |---------+------------------------------+---------------------------| | 1st run | 5m1.871s | 2m47.642s | | 2nd run | 2m28.390s | 0m0.611s | | 3rd run | 2m26.530s | 0m1.255s | |---------+------------------------------+---------------------------| Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add MB_NUM_ORDERS macroHarshad Shirwadkar
A few arrays in mballoc.c use the total number of valid orders as their size. Currently, this value is set as "sb->s_blocksize_bits + 2". This makes code harder to read. So, instead add a new macro MB_NUM_ORDERS(sb) to make the code more readable. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add mballoc stats proc fileHarshad Shirwadkar
Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here: https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch This patch reorganizes the stats by cr level. This is how the output looks like: mballoc: reqs: 0 success: 0 groups_scanned: 0 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 0 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 0 goal_hits: 0 2^n_hits: 0 breaks: 0 lost: 0 buddies_generated: 0/40 buddies_time_used: 0 preallocated: 0 discarded: 0 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add ability to return parsed options from parse_optionsHarshad Shirwadkar
Before this patch, the function parse_options() was returning journal_devnum and journal_ioprio variables to the caller. This patch generalizes that interface to allow parse_options to return any parsed options to return back to the caller. In this patch series, it gets used to capture the value of "mb_optimize_scan=%u" mount option. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: drop s_mb_bal_lock and convert protected fields to atomicHarshad Shirwadkar
s_mb_buddies_generated gets used later in this patch series to determine if the cr 0 and cr 1 optimziations should be performed or not. Currently, s_mb_buddies_generated is protected under a spin_lock. In the allocation path, it is better if we don't depend on the lock and instead read the value atomically. In order to do that, we drop s_bal_lock altogether and we convert the only two protected fields by it s_mb_buddies_generated and s_mb_generation_time to atomic type. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix check to prevent false positive report of incorrect used inodesZhang Yi
Commit <50122847007> ("ext4: fix check to prevent initializing reserved inodes") check the block group zero and prevent initializing reserved inodes. But in some special cases, the reserved inode may not all belong to the group zero, it may exist into the second group if we format filesystem below. mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda So, it will end up triggering a false positive report of a corrupted file system. This patch fix it by avoid check reserved inodes if no free inode blocks will be zeroed. Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-08treewide: Change list_sort to use const pointersSami Tolvanen
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-05ext4: optimize match for casefolded encrypted dirsDaniel Rosenberg
Matching names with casefolded encrypting directories requires decrypting entries to confirm case since we are case preserving. We can avoid needing to decrypt if our hash values don't match. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210319073414.1381041-3-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-05ext4: handle casefolding with encryptionDaniel Rosenberg
This adds support for encryption with casefolding. Since the name on disk is case preserving, and also encrypted, we can no longer just recompute the hash on the fly. Additionally, to avoid leaking extra information from the hash of the unencrypted name, we use siphash via an fscrypt v2 policy. The hash is stored at the end of the directory entry for all entries inside of an encrypted and casefolded directory apart from those that deal with '.' and '..'. This way, the change is backwards compatible with existing ext4 filesystems. [ Changed to advertise this feature via the file: /sys/fs/ext4/features/encrypted_casefold -- TYT ] Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210319073414.1381041-2-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-02ext4: remove unnecessary braces in fs/ext4/dir.cMilan Djurovic
Removes braces to follow the coding style. Signed-off-by: Milan Djurovic <mdjurovic@zohomail.com> Link: https://lore.kernel.org/r/20210316052953.67616-1-mdjurovic@zohomail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25ext4: use memcpy_to_page() in pagecache_write()Chaitanya Kulkarni
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-7-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25ext4: use memcpy_from_page() in pagecache_read()Chaitanya Kulkarni
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-6-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-23fs: introduce two inode i_{u,g}id initialization helpersChristian Brauner
Give filesystem two little helpers that do the right thing when initializing the i_uid and i_gid fields on idmapped and non-idmapped mounts. Filesystems shouldn't have to be concerned with too many details. Link: https://lore.kernel.org/r/20210320122623.599086-5-christian.brauner@ubuntu.com Inspired-by: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-03-23fs: document and rename fsid helpersChristian Brauner
Vivek pointed out that the fs{g,u}id_into_mnt() naming scheme can be misleading as it could be understood as implying they do the exact same thing as i_{g,u}id_into_mnt(). The original motivation for this naming scheme was to signal to callers that the helpers will always take care to map the k{g,u}id such that the ownership is expressed in terms of the mnt_users. Get rid of the confusion by renaming those helpers to something more sensible. Al suggested mapped_fs{g,u}id() which seems a really good fit. Usually filesystems don't need to bother with these helpers directly only in some cases where they allocate objects that carry {g,u}ids which are either filesystem specific (e.g. xfs quota objects) or don't have a clean set of helpers as inodes have. Link: https://lore.kernel.org/r/20210320122623.599086-3-christian.brauner@ubuntu.com Inspired-by: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-03-21Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v5.12" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: initialize ret to suppress smatch warning ext4: stop inode update before return ext4: fix rename whiteout with fast commit ext4: fix timer use-after-free on failed mount ext4: fix potential error in ext4_do_update_inode ext4: do not try to set xattr into ea_inode if value is empty ext4: do not iput inode under running transaction in ext4_rename() ext4: find old entry again if failed to rename whiteout ext4: fix error handling in ext4_end_enable_verity() ext4: fix bh ref count on error paths fs/ext4: fix integer overflow in s_log_groups_per_flex ext4: add reclaim checks to xattr code ext4: shrink race window in ext4_should_retry_alloc()
2021-03-21ext4: initialize ret to suppress smatch warningTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: stop inode update before returnPan Bian
The inode update should be stopped before returing the error code. Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Cc: stable@kernel.org Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: fix rename whiteout with fast commitHarshad Shirwadkar
This patch adds rename whiteout support in fast commits. Note that the whiteout object that gets created is actually char device. Which imples, the function ext4_inode_journal_mode(struct inode *inode) would return "JOURNAL_DATA" for this inode. This has a consequence in fast commit code that it will make creation of the whiteout object a fast-commit ineligible behavior and thus will fall back to full commits. With this patch, this can be observed by running fast commits with rename whiteout and seeing the stats generated by ext4_fc_stats tracepoint as follows: ext4_fc_stats: dev 254:32 fc ineligible reasons: XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0, RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16; num_commits:6, ineligible: 6, numblks: 3 So in short, this patch guarantees that in case of rename whiteout, we fall back to full commits. Amir mentioned that instead of creating a new whiteout object for every rename, we can create a static whiteout object with irrelevant nlink. That will make fast commits to not fall back to full commit. But until this happens, this patch will ensure correctness by falling back to full commits. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Cc: stable@kernel.org Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: fix timer use-after-free on failed mountJan Kara
When filesystem mount fails because of corrupted filesystem we first cancel the s_err_report timer reminding fs errors every day and only then we flush s_error_work. However s_error_work may report another fs error and re-arm timer thus resulting in timer use-after-free. Fix the problem by first flushing the work and only after that canceling the s_err_report timer. Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: fix potential error in ext4_do_update_inodeShijie Luo
If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(), the error code will be overridden, go to out_brelse to avoid this situation. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: do not try to set xattr into ea_inode if value is emptyzhangyi (F)
Syzbot report a warning that ext4 may create an empty ea_inode if set an empty extent attribute to a file on the file system which is no free blocks left. WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ... Call trace: ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942 ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390 ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491 ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37 __vfs_setxattr+0x208/0x23c fs/xattr.c:177 ... Now, ext4 try to store extent attribute into an external inode if ext4_xattr_block_set() return -ENOSPC, but for the case of store an empty extent attribute, store the extent entry into the extent attribute block is enough. A simple reproduce below. fallocate test.img -l 1M mkfs.ext4 -F -b 2048 -O ea_inode test.img mount test.img /mnt dd if=/dev/zero of=/mnt/foo bs=2048 count=500 setfattr -n "user.test" /mnt/foo Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com Fixes: 9c6e7853c531 ("ext4: reserve space for xattr entries/names") Cc: stable@kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: do not iput inode under running transaction in ext4_rename()zhangyi (F)
In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into directory, it ends up dropping new created whiteout inode under the running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode under running transaction"), we follow the assumptions that evict() does not get called from a transaction context but in ext4_rename() it breaks this suggestion. Although it's not a real problem, better to obey it, so this patch add inode to orphan list and stop transaction before final iput(). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21ext4: find old entry again if failed to rename whiteoutzhangyi (F)
If we failed to add new entry on rename whiteout, we cannot reset the old->de entry directly, because the old->de could have moved from under us during make indexed dir. So find the old entry again before reset is needed, otherwise it may corrupt the filesystem as below. /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED. /dev/sda: Unattached inode 75 /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-11ext4: fix error handling in ext4_end_enable_verity()Eric Biggers
ext4 didn't properly clean up if verity failed to be enabled on a file: - It left verity metadata (pages past EOF) in the page cache, which would be exposed to userspace if the file was later extended. - It didn't truncate the verity metadata at all (either from cache or from disk) if an error occurred while setting the verity bit. Fix these bugs by adding a call to truncate_inode_pages() and ensuring that we truncate the verity metadata (both from cache and from disk) in all error paths. Also rework the code to cleanly separate the success path from the error paths, which makes it much easier to understand. Reported-by: Yunlei He <heyunlei@hihonor.com> Fixes: c93d8f885809 ("ext4: add basic fs-verity support") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20210302200420.137977-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-11block: rename BIO_MAX_PAGES to BIO_MAX_VECSChristoph Hellwig
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-06ext4: fix bh ref count on error pathsZhaolong Zhang
__ext4_journalled_writepage should drop bhs' ref count on error paths Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>