summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-06ext4: Only advertise encrypted_casefold when encryption and unicode are enabledDaniel Rosenberg
Encrypted casefolding is only supported when both encryption and casefolding are both enabled in the config. Fixes: 471fbbea7ff7 ("ext4: handle casefolding with encryption") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix no-key deletion for encrypt+casefoldDaniel Rosenberg
commit 471fbbea7ff7 ("ext4: handle casefolding with encryption") is missing a few checks for the encryption key which are needed to support deleting enrypted casefolded files when the key is not present. This bug made it impossible to delete encrypted+casefolded directories without the encryption key, due to errors like: W : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key Repro steps in kvm-xfstests test appliance: mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc mount /vdc mkdir /vdc/dir chattr +F /vdc/dir keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}') xfs_io -c "set_encpolicy $keyid" /vdc/dir for i in `seq 1 100`; do mkdir /vdc/dir/$i done xfs_io -c "rm_enckey $keyid" /vdc rm -rf /vdc/dir # fails with the bug Fixes: 471fbbea7ff7 ("ext4: handle casefolding with encryption") Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix memory leak in ext4_fill_superAlexey Makhalov
Buffer head references must be released before calling kill_bdev(); otherwise the buffer head (and its page referenced by b_data) will not be freed by kill_bdev, and subsequently that bh will be leaked. If blocksizes differ, sb_set_blocksize() will kill current buffers and page cache by using kill_bdev(). And then super block will be reread again but using correct blocksize this time. sb_set_blocksize() didn't fully free superblock page and buffer head, and being busy, they were not freed and instead leaked. This can easily be reproduced by calling an infinite loop of: systemctl start <ext4_on_lvm>.mount, and systemctl stop <ext4_on_lvm>.mount ... since systemd creates a cgroup for each slice which it mounts, and the bh leak get amplified by a dying memory cgroup that also never gets freed, and memory consumption is much more easily noticed. Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize") Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-06-06ext4: fix fast commit alignment issuesHarshad Shirwadkar
Fast commit recovery data on disk may not be aligned. So, when the recovery code reads it, this patch makes sure that fast commit info found on-disk is first memcpy-ed into an aligned variable before accessing it. As a consequence of it, we also remove some macros that could resulted in unaligned accesses. Cc: stable@kernel.org Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failedYe Bin
We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0 This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/ "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)." Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-02ext4: fix accessing uninit percpu counter variable with fast_commitRitesh Harjani
When running generic/527 with fast_commit configuration, the following issue is seen on Power. With fast_commit, during ext4_fc_replay() (which can be called from ext4_fill_super()), if inode eviction happens then it can access an uninitialized percpu counter variable. This patch adds the check before accessing the counters in ext4_free_inode() path. [ 321.165371] run fstests generic/527 at 2021-04-29 08:38:43 [ 323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none. [ 323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000 [ 323.619767] Faulting instruction address: 0xc000000000bae78c cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0] pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100 lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90 pid = 5593, comm = mount ext4_free_inode+0x780/0xb90 ext4_evict_inode+0xa8c/0xc60 evict+0xfc/0x1e0 ext4_fc_replay+0xc50/0x20f0 do_one_pass+0xfe0/0x1350 jbd2_journal_recover+0x184/0x2e0 jbd2_journal_load+0x1c0/0x4a0 ext4_fill_super+0x2458/0x4200 mount_bdev+0x1dc/0x290 ext4_mount+0x28/0x40 legacy_get_tree+0x4c/0xa0 vfs_get_tree+0x4c/0x120 path_mount+0xcf8/0xd70 do_mount+0x80/0xd0 sys_mount+0x3fc/0x490 system_call_exception+0x384/0x3d0 system_call_common+0xec/0x278 Cc: stable@kernel.org Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.1619692410.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-05-20ext4: fix memory leak in ext4_mb_init_backend on error path.Phillip Potter
Fix a memory leak discovered by syzbot when a file system is corrupted with an illegally large s_log_groups_per_flex. Reported-by: syzbot+aa12d6106ea4ca1b6aae@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210412073837.1686-1-phil@philpotter.co.uk Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-22ext4: wipe ext4_dir_entry2 upon file deletionLeah Rumancik
Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len. In case sensitive data is stored in filenames, this ensures no potentially sensitive data is left in the directory entry upon deletion. Also, wipe these fields upon moving a directory entry during the conversion to an htree and when splitting htree nodes. The data wiped may still exist in the journal, but there are future commits planned to address this. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-22ext4: Fix occasional generic/418 failureJan Kara
Eric has noticed that after pagecache read rework, generic/418 is occasionally failing for ext4 when blocksize < pagesize. In fact, the pagecache rework just made hard to hit race in ext4 more likely. The problem is that since ext4 conversion of direct IO writes to iomap framework (commit 378f32bab371), we update inode size after direct IO write only after invalidating page cache. Thus if buffered read sneaks at unfortunate moment like: CPU1 - write at offset 1k CPU2 - read from offset 0 iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT); ext4_readpage(); ext4_handle_inode_extension() the read will zero out tail of the page as it still sees smaller inode size and thus page cache becomes inconsistent with on-disk contents with all the consequences. Fix the problem by moving inode size update into end_io handler which gets called before the page cache is invalidated. Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com> Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-17fs: fix reporting supported extra file attributes for statx()Theodore Ts'o
statx(2) notes that any attribute that is not indicated as supported by stx_attributes_mask has no usable value. Commits 801e523796004 ("fs: move generic stat response attr handling to vfs_getattr_nosec") and 712b2698e4c02 ("fs/stat: Define DAX statx attribute") sets STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting stx_attributes_mask, which can cause xfstests generic/532 to fail. Fix this in the same way as commit 1b9598c8fb99 ("xfs: fix reporting supported extra file attributes for statx()") Fixes: 801e523796004 ("fs: move generic stat response attr handling to vfs_getattr_nosec") Fixes: 712b2698e4c02 ("fs/stat: Define DAX statx attribute") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-12ext4: allow the dax flag to be set and cleared on inline directoriesTheodore Ts'o
This is needed to allow generic/607 to pass for file systems with the inline data_feature enabled, and it allows the use of file systems where the directories use inline_data, while the files are accessed via DAX. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix debug format string warningArnd Bergmann
Using no_printk() for jbd_debug() revealed two warnings: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=] 256 | jbd_debug(3, "Processing fast commit blk with seq %d"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range': fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=] 1732 | jbd_debug(1, "Converting from %d to %d %lld", The first one was added incorrectly, and was also missing a few newlines in debug output, and the second one happened when the type of an argument changed. Reported-by: kernel test robot <lkp@intel.com> Fixes: d556435156b7 ("jbd2: avoid -Wempty-body warnings") Fixes: 6db074618969 ("ext4: use BIT() macro for BH_** state bits") Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix trailing whitespaceJack Qiu
Made suggested modifications from checkpatch in reference to ERROR: trailing whitespace Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20210409042035.15516-1-jack.qiu@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix various seppling typosBhaskar Chowdhury
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Link: https://lore.kernel.org/r/cover.1616840203.git.unixbhaskar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix error return code in ext4_fc_perform_commit()Xu Yihang
In case of if not ext4_fc_add_tlv branch, an error return code is missing. Cc: stable@kernel.org Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: annotate data race in jbd2_journal_dirty_metadata()Jan Kara
Assertion checks in jbd2_journal_dirty_metadata() are known to be racy but we don't want to be grabbing locks just for them. We thus recheck them under b_state_lock only if it looks like they would fail. Annotate the checks with data_race(). Cc: stable@kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: annotate data race in start_this_handle()Jan Kara
Access to journal->j_running_transaction is not protected by appropriate lock and thus is racy. We are well aware of that and the code handles the race properly. Just add a comment and data_race() annotation. Cc: stable@kernel.org Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix ext4_error_err save negative errno into superblockYe Bin
Fix As write_mmp_block() so that it returns -EIO instead of 1, so that the correct error gets saved into the superblock. Cc: stable@kernel.org Fixes: 54d3adbc29f0 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix error code in ext4_commit_superFengnan Chang
We should set the error code when ext4_commit_super check argument failed. Found in code review. Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs"). Cc: stable@kernel.org Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: always panic when errors=panic is specifiedYe Bin
Before commit 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()"), the following series of commands would trigger a panic: 1. mount /dev/sda -o ro,errors=panic test 2. mount /dev/sda -o remount,abort test After commit 014c9caa29d3, remounting a file system using the test mount option "abort" will no longer trigger a panic. This commit will restore the behaviour immediately before commit 014c9caa29d3. (However, note that the Linux kernel's behavior has not been consistent; some previous kernel versions, including 5.4 and 4.19 similarly did not panic after using the mount option "abort".) This also makes a change to long-standing behaviour; namely, the following series commands will now cause a panic, when previously it did not: 1. mount /dev/sda -o ro,errors=panic test 2. echo test > /sys/fs/ext4/sda/trigger_fs_error However, this makes ext4's behaviour much more consistent, so this is a good thing. Cc: stable@kernel.org Fixes: 014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210401081903.3421208-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: delete redundant uptodate check for bufferYang Guo
The buffer uptodate state has been checked in function set_buffer_uptodate, there is no need use buffer_uptodate before calling set_buffer_uptodate and delete it. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Yang Guo <guoyang2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/1617260610-29770-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()Zhang Yi
When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due to some error happens behind ext4_orphan_cleanup(), it will end up triggering a after free issue of super_block. The problem is that ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is enabled, after we cleanup the truncated inodes, the last iput() will put them into the lru list, and these inodes' pages may probably dirty and will be write back by the writeback thread, so it could be raced by freeing super_block in the error path of mount_bdev(). After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it was used to ensure updating the quota file properly, but evict inode and trash data immediately in the last iput does not affect the quotafile, so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by just remove the SB_ACTIVE setting. [1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00 Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: make prefetch_block_bitmaps defaultHarshad Shirwadkar
Block bitmap prefetching is needed for these allocator optimization data structures to get populated and provide better group scanning order. So, turn it on bu default. prefetch_block_bitmaps mount option is now marked as removed and a new option no_prefetch_block_bitmaps is added to disable block bitmap prefetching. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-8-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add proc files to monitor new structuresHarshad Shirwadkar
This patch adds a new file "mb_structs_summary" which allows us to see the summary of the new allocator structures added in this series. Here's the sample output of file: optimize_scan: 1 max_free_order_lists: list_order_0_groups: 0 list_order_1_groups: 0 list_order_2_groups: 0 list_order_3_groups: 0 list_order_4_groups: 0 list_order_5_groups: 0 list_order_6_groups: 0 list_order_7_groups: 0 list_order_8_groups: 0 list_order_9_groups: 0 list_order_10_groups: 0 list_order_11_groups: 0 list_order_12_groups: 0 list_order_13_groups: 40 fragment_size_tree: tree_min: 16384 tree_max: 32768 tree_nodes: 40 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: improve cr 0 / cr 1 group scanningHarshad Shirwadkar
Instead of traversing through groups linearly, scan groups in specific orders at cr 0 and cr 1. At cr 0, we want to find groups that have the largest free order >= the order of the request. So, with this patch, we maintain lists for each possible order and insert each group into a list based on the largest free order in its buddy bitmap. During cr 0 allocation, we traverse these lists in the increasing order of largest free orders. This allows us to find a group with the best available cr 0 match in constant time. If nothing can be found, we fallback to cr 1 immediately. At CR1, the story is slightly different. We want to traverse in the order of increasing average fragment size. For CR1, we maintain a rb tree of groupinfos which is sorted by average fragment size. Instead of traversing linearly, at CR1, we traverse in the order of increasing average fragment size, starting at the most optimal group. This brings down cr 1 search complexity to log(num groups). For cr >= 2, we just perform the linear search as before. Also, in case of lock contention, we intermittently fallback to linear search even in CR 0 and CR 1 cases. This allows us to proceed during the allocation path even in case of high contention. There is an opportunity to do optimization at CR2 too. That's because at CR2 we only consider groups where bb_free counter (number of free blocks) is greater than the request extent size. That's left as future work. All the changes introduced in this patch are protected under a new mount option "mb_optimize_scan". With this patchset, following experiment was performed: Created a highly fragmented disk of size 65TB. The disk had no contiguous 2M regions. Following command was run consecutively for 3 times: time dd if=/dev/urandom of=file bs=2M count=10 Here are the results with and without cr 0/1 optimizations introduced in this patch: |---------+------------------------------+---------------------------| | | Without CR 0/1 Optimizations | With CR 0/1 Optimizations | |---------+------------------------------+---------------------------| | 1st run | 5m1.871s | 2m47.642s | | 2nd run | 2m28.390s | 0m0.611s | | 3rd run | 2m26.530s | 0m1.255s | |---------+------------------------------+---------------------------| Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add MB_NUM_ORDERS macroHarshad Shirwadkar
A few arrays in mballoc.c use the total number of valid orders as their size. Currently, this value is set as "sb->s_blocksize_bits + 2". This makes code harder to read. So, instead add a new macro MB_NUM_ORDERS(sb) to make the code more readable. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add mballoc stats proc fileHarshad Shirwadkar
Add new stats for measuring the performance of mballoc. This patch is forked from Artem Blagodarenko's work that can be found here: https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch This patch reorganizes the stats by cr level. This is how the output looks like: mballoc: reqs: 0 success: 0 groups_scanned: 0 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 0 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 0 goal_hits: 0 2^n_hits: 0 breaks: 0 lost: 0 buddies_generated: 0/40 buddies_time_used: 0 preallocated: 0 discarded: 0 Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: add ability to return parsed options from parse_optionsHarshad Shirwadkar
Before this patch, the function parse_options() was returning journal_devnum and journal_ioprio variables to the caller. This patch generalizes that interface to allow parse_options to return any parsed options to return back to the caller. In this patch series, it gets used to capture the value of "mb_optimize_scan=%u" mount option. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: drop s_mb_bal_lock and convert protected fields to atomicHarshad Shirwadkar
s_mb_buddies_generated gets used later in this patch series to determine if the cr 0 and cr 1 optimziations should be performed or not. Currently, s_mb_buddies_generated is protected under a spin_lock. In the allocation path, it is better if we don't depend on the lock and instead read the value atomically. In order to do that, we drop s_bal_lock altogether and we convert the only two protected fields by it s_mb_buddies_generated and s_mb_generation_time to atomic type. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20210401172129.189766-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix check to prevent false positive report of incorrect used inodesZhang Yi
Commit <50122847007> ("ext4: fix check to prevent initializing reserved inodes") check the block group zero and prevent initializing reserved inodes. But in some special cases, the reserved inode may not all belong to the group zero, it may exist into the second group if we format filesystem below. mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda So, it will end up triggering a false positive report of a corrupted file system. This patch fix it by avoid check reserved inodes if no free inode blocks will be zeroed. Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-05jbd2: avoid -Wempty-body warningsArnd Bergmann
Building with 'make W=1' shows a harmless -Wempty-body warning: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:267:75: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 267 | jbd_debug(3, "Fast commit replay failed, err = %d\n", err); | ^ Change the empty dprintk() macros to no_printk(), which avoids this warning and adds format string checking. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210322102152.95684-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-05ext4: optimize match for casefolded encrypted dirsDaniel Rosenberg
Matching names with casefolded encrypting directories requires decrypting entries to confirm case since we are case preserving. We can avoid needing to decrypt if our hash values don't match. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210319073414.1381041-3-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-05ext4: handle casefolding with encryptionDaniel Rosenberg
This adds support for encryption with casefolding. Since the name on disk is case preserving, and also encrypted, we can no longer just recompute the hash on the fly. Additionally, to avoid leaking extra information from the hash of the unencrypted name, we use siphash via an fscrypt v2 policy. The hash is stored at the end of the directory entry for all entries inside of an encrypted and casefolded directory apart from those that deal with '.' and '..'. This way, the change is backwards compatible with existing ext4 filesystems. [ Changed to advertise this feature via the file: /sys/fs/ext4/features/encrypted_casefold -- TYT ] Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210319073414.1381041-2-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-02ext4: remove unnecessary braces in fs/ext4/dir.cMilan Djurovic
Removes braces to follow the coding style. Signed-off-by: Milan Djurovic <mdjurovic@zohomail.com> Link: https://lore.kernel.org/r/20210316052953.67616-1-mdjurovic@zohomail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-02ext4: delete some unused tracepoint definitionsEric Whitney
A number of tracepoint instances have been removed from ext4 by past patches but the definitions of those tracepoints have not. All instances of ext4_ext_in_cache and ext4_ext_put_in_cache were removed by commit 69eb33dc24dc ("ext4: remove single extent cache"). ext4_get_reserved_cluster_alloc was removed by commit b6bf9171ef5c ("ext4: reduce reserved cluster count by number of allocated clusters"). ext4_find_delalloc_range was removed by commit 7d1b1fbc95eb ("ext4: reimplement ext4_find_delay_alloc_range on extent status tree"). All instances of ext4_direct_IO_enter and ext4_direct_IO_exit were removed by commit 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure"). Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20210216191634.20957-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-02Updated locking documentation for transaction_tAlexander Lochmann
Some members of transaction_t are allowed to be read without any lock being held if accessed from the correct context. We used LockDoc's findings to determine those members. Each member of them is marked with a short comment: "no lock needed for jbd2 thread". Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210211171410.17984-1-alexander.lochmann@tu-dortmund.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-02ext4: updated locking documentation for journal_tAlexander Lochmann
Some members of transaction_t are allowed to be read without any lock being held if consistency doesn't matter. Based on LockDoc's findings, we extended the locking documentation of those members. Each one of them is marked with a short comment: "no lock for quick racy checks". Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ad82c7a9-a624-4ed5-5ada-a6410c44c0b3@tu-dortmund.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25ext4: use memcpy_to_page() in pagecache_write()Chaitanya Kulkarni
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-7-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25ext4: use memcpy_from_page() in pagecache_read()Chaitanya Kulkarni
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210207190425.38107-6-chaitanya.kulkarni@wdc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-21Linux 5.12-rc4v5.12-rc4Linus Torvalds
2021-03-21Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v5.12" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: initialize ret to suppress smatch warning ext4: stop inode update before return ext4: fix rename whiteout with fast commit ext4: fix timer use-after-free on failed mount ext4: fix potential error in ext4_do_update_inode ext4: do not try to set xattr into ea_inode if value is empty ext4: do not iput inode under running transaction in ext4_rename() ext4: find old entry again if failed to rename whiteout ext4: fix error handling in ext4_end_enable_verity() ext4: fix bh ref count on error paths fs/ext4: fix integer overflow in s_log_groups_per_flex ext4: add reclaim checks to xattr code ext4: shrink race window in ext4_should_retry_alloc()
2021-03-21Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring followup fixes from Jens Axboe: - The SIGSTOP change from Eric, so we properly ignore that for PF_IO_WORKER threads. - Disallow sending signals to PF_IO_WORKER threads in general, we're not interested in having them funnel back to the io_uring owning task. - Stable fix from Stefan, ensuring we properly break links for short send/sendmsg recv/recvmsg if MSG_WAITALL is set. - Catch and loop when needing to run task_work before a PF_IO_WORKER threads goes to sleep. * tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block: io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL io-wq: ensure task is running before processing task_work signal: don't allow STOP on PF_IO_WORKER threads signal: don't allow sending any signals to PF_IO_WORKER threads
2021-03-21Merge tag 'staging-5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging and IIO driver fixes from Greg KH: "Some small staging and IIO driver fixes: - MAINTAINERS changes for the move of the staging mailing list - comedi driver fixes to get request_irq() to work correctly - counter driver fixes for reported issues with iio devices - tiny iio driver fixes for reported issues. All of these have been in linux-next with no reported problems" * tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vt665x: fix alignment constraints staging: comedi: cb_pcidas64: fix request_irq() warn staging: comedi: cb_pcidas: fix request_irq() warn MAINTAINERS: move the staging subsystem to lists.linux.dev MAINTAINERS: move some real subsystems off of the staging mailing list iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler iio: hid-sensor-temperature: Fix issues of timestamp channel iio: hid-sensor-humidity: Fix alignment issue of timestamp channel counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register counter: stm32-timer-cnt: fix ceiling write max value counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED iio: adc: ab8500-gpadc: Fix off by 10 to 3 iio:adc:stm32-adc: Add HAS_IOMEM dependency iio: adis16400: Fix an error code in adis16400_initial_setup() iio: adc: adi-axi-adc: add proper Kconfig dependencies iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask iio: hid-sensor-prox: Fix scale not correct issue iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel
2021-03-21Merge tag 'usb-5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB and Thunderbolt driver fixes from Greg KH: "Here are some small Thunderbolt and USB driver fixes for some reported issues: - thunderbolt fixes for minor problems - typec fixes for power issues - usb-storage quirk addition - usbip bugfix - dwc3 bugfix when stopping transfers - cdnsp bugfix for isoc transfers - gadget use-after-free fix All have been in linux-next this week with no reported issues" * tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy usb: dwc3: gadget: Prevent EP queuing while stopping transfers usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy- usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct usb-storage: Add quirk to defeat Kindle's automatic unload usb: gadget: configfs: Fix KASAN use-after-free usbip: Fix incorrect double assignment to udc->ud.tcp_rx usb: cdnsp: Fixes incorrect value in ISOC TRB thunderbolt: Increase runtime PM reference count on DP tunnel discovery thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
2021-03-21Merge tag 'irq-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "A change to robustify force-threaded IRQ handlers to always disable interrupts, plus a DocBook fix. The force-threaded IRQ handler change has been accelerated from the normal schedule of such a change to keep the bad pattern/workaround of spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from spreading" * tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Disable interrupts for force threaded handlers genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)
2021-03-21Merge tag 'perf-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Boundary condition fixes for bugs unearthed by the perf fuzzer" * tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT perf/x86/intel: Fix a crash caused by zero PEBS status
2021-03-21Merge tag 'locking-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Get static calls & modules right. Hopefully. - WW mutex fixes * tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call: Fix static_call_update() sanity check static_call: Align static_call_is_init() patching condition static_call: Fix static_call_set_init() locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
2021-03-21Merge tag 'efi-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: - another missing RT_PROP table related fix, to ensure that the efivarfs pseudo filesystem fails gracefully if variable services are unsupported - use the correct alignment for literal EFI GUIDs - fix a use after unmap issue in the memreserve code * tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: use 32-bit alignment for efi_guid_t literals firmware/efi: Fix a use after bug in efi_mem_reserve_persistent efivars: respect EFI_UNSUPPORTED return from firmware
2021-03-21Merge tag 'x86_urgent_for_v5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21Merge tag 'powerpc-5.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a possible stack corruption and subsequent DLPAR failure in the rpadlpar_io PCI hotplug driver - Two build fixes for uncommon configurations Thanks to Christophe Leroy and Tyrel Datwyler. * tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: PCI: rpadlpar: Fix potential drc_name corruption in store functions powerpc: Force inlining of cpu_has_feature() to avoid build failure powerpc/vdso32: Add missing _restgpr_31_x to fix build failure