Age | Commit message (Collapse) | Author |
|
Update the arch_support_sort_key() function in powerpc to enable
presenting local and global variants of sort key 'p_stage_cyc'.
Update the "se_header" strings for these in arch_perf_header_entry()
along with instruction latency.
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sort key 'p_stage_cyc' is used to present the latency cycles spent in
pipeline stages.
perf has local 'p_stage_cyc' sort key to display this info. There is no
global variant available for this sort key. The local variant shows
latency in a single sample, whereas the global value will be useful to
present the total latency (sum of latencies) in the hist entry. It
represents the latency number multiplied by the number of samples.
Add global ('p_stage_cyc') and local variant ('local_p_stage_cyc') for
this sort key. Use 'local_p_stage_cyc' as default option for "mem" sort
mode.
Also add this to the list of dynamic sort keys and made the
"dynamic_headers" and "arch_specific_sort_keys" as static.
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We probably want to remove the indirect block to extents migration
feature after a deprecation window, but until then, let's fix a
potential data loss problem caused by the fact that we put the
tmp_inode on the orphan list. In the unlikely case where we crash and
do a journal recovery, the data blocks belonging to the inode being
migrated are also represented in the tmp_inode on the orphan list ---
and so its data blocks will get marked unallocated, and available for
reuse.
Instead, stop putting the tmp_inode on the oprhan list. So in the
case where we crash while migrating the inode, we'll leak an inode,
which is not a disaster. It will be easily fixed the next time we run
fsck, and it's better than potentially having blocks getting claimed
by two different files, and losing data as a result.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
|
|
BUG_ON would be better.
This issue was detected with the help of Coccinelle.
Reported-by: Zeal robot <zealci@zte.com.cn>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Link: https://lore.kernel.org/r/20211228073252.580296-1-xu.xin16@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This was obviously supposed to be an ext4 struct, not xfs. GCC
doesn't care either way so it doesn't affect the build or runtime.
Fixes: cebe85d570cf ("ext4: switch to the new mount api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20211215114309.GB14552@kili
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When migrating to extents, the temporary inode will have it's own checksum
seed. This means that, when swapping the inodes data, the inode checksums
will be incorrect.
This can be fixed by recalculating the extents checksums again. Or simply
by copying the seed into the temporary inode.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357
Reported-by: Jeroen van Wolffelaar <jeroen@wolffelaar.nl>
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20211214175058.19511-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Although it is in the loop, offset is reassigned at the beginning of the
while loop. And after the loop, the value will not be used
The clang_analyzer complains as follows:
fs/ext4/dir.c:306:3 warning:
Value stored to 'offset' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211208075307.404703-1-luo.penghao@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The if will goto out of the loop, and until the end of the
function execution, o_start will not be used again.
The clang_analyzer complains as follows:
fs/ext4/move_extent.c:635:5 warning:
Value stored to 'o_start' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211208075157.404535-1-luo.penghao@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
EXT_FIRST_INDEX(ptr) is ptr+12, which can't possibly be null; gcc-12
warns about this.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211115172020.57853-1-kilobyte@angband.pl
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The eh assignment in these two places is meaningless, because the
function will goto to merge, which will not use eh.
The clang_analyzer complains as follows:
fs/ext4/extents.c:1988:4 warning:
fs/ext4/extents.c:2016:4 warning:
Value stored to 'eh' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211104064007.2919-1-luo.penghao@zte.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The local variable assignment at the end of the function is meaningless.
The clang_analyzer complains as follows:
fs/ext4/fast_commit.c:779:2 warning:
Value stored to 'dst' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211104063406.2747-1-luo.penghao@zte.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The command "make clang-analyzer" detects dead stores in
mpage_process_page() function.
Do not reset io_end_size to 0 in the current paths, as the function
exits on those paths without further using io_end_size.
Signed-off-by: Nghia Le <nghialm78@gmail.com>
Link: https://lore.kernel.org/r/20211025221803.3326-1-nghialm78@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that
should help speed up subsequent calls of FITRIM ioctl by skipping the
groups that were previously trimmed. However because the FITRIM allows
to set the minimum size of an extent to trim, ext4 stores the last
minimum extent size and only avoids trimming the group if it was
previously trimmed with minimum extent size equal to, or smaller than
the current call.
There is currently no way to bypass the optimization without
umount/mount cycle. This becomes a problem when the file system is
live migrated to a different storage, because the optimization will
prevent possibly useful discard calls to the storage.
Fix it by exporting the s_last_trim_minblks via sysfs interface which
will allow us to set the minimum size to the number of blocks larger
than subsequent FITRIM call, effectively bypassing the optimization.
By setting the s_last_trim_minblks to ULONG_MAX the optimization will be
effectively cleared regardless of the previous state, or file system
configuration.
For example:
getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Laurent GUERBY <laurent@guerby.net>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211103145122.17338-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There is no good reason for the s_last_trim_minblks to be atomic. There is
no data integrity needed and there is no real danger in setting and
reading it in a racy manner. Change it to be unsigned long, the same type
as s_clusters_per_group which is the maximum that's allowed.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for
online reading and setting of file system label.
ext4_ioctl_getlabel() is simple, just get the label from the primary
superblock. This might not be the first sb on the file system if
'sb=' mount option is used.
In ext4_ioctl_setlabel() we update what ext4 currently views as a
primary superblock and then proceed to update backup superblocks. There
are two caveats:
- the primary superblock might not be the first superblock and so it
might not be the one used by userspace tools if read directly
off the disk.
- because the primary superblock might not be the first superblock we
potentialy have to update it as part of backup superblock update.
However the first sb location is a bit more complicated than the rest
so we have to account for that.
The superblock modification is created generic enough so the
infrastructure can be used for other potential superblock modification
operations, such as chaning UUID.
Tested with generic/492 with various configurations. I also checked the
behavior with 'sb=' mount options, including very large file systems
with and without sparse_super/sparse_super2.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Only set EXT4_MOUNT_QUOTA when journalled quota file is specified,
otherwise simply disabling specific quota type (usrjquota=) will also
set the EXT4_MOUNT_QUOTA super block option.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Fixes: e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
Link: https://lore.kernel.org/r/20220104143518.134465-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
During ext4 mount api rework the commit e6e268cb6822 ("ext4: move quota
configuration out of handle_mount_opt()") introduced a bug where we
would kfree(sbi->s_qf_names[i]) before assigning the new quota name in
ext4_apply_quota_options().
This is wrong because we're using kfree() on rcu prointer that could be
simultaneously accessed from ext4_show_quota_options() during remount.
Fix it by using rcu_replace_pointer() to replace the old qname with the
new one and then kfree_rcu() the old quota name.
Also use get_qf_name() instead of sbi->s_qf_names in strcmp() to silence
the sparse warning.
Fixes: e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220104143518.134465-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
A user reported FITRIM ioctl failing for him on ext4 on some devices
without apparent reason. After some debugging we've found out that
these devices (being LVM volumes) report rather large discard
granularity of 42MB and the filesystem had 1k blocksize and thus group
size of 8MB. Because ext4 FITRIM implementation puts discard
granularity into minlen, ext4_trim_fs() declared the trim request as
invalid. However just silently doing nothing seems to be a more
appropriate reaction to such combination of parameters since user did
not specify anything wrong.
CC: Lukas Czerner <lczerner@redhat.com>
Fixes: 5c2ed62fd447 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211112152202.26614-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Our syzkaller report an use-after-free issue that accessing the freed
buffer_head on the writeback page in __ext4_journalled_writepage(). The
problem is that if there was a truncate racing with the data=journalled
writeback procedure, the writeback length could become zero and
bget_one() refuse to get buffer_head's refcount, then the truncate
procedure release buffer once we drop page lock, finally, the last
ext4_walk_page_buffers() trigger the use-after-free problem.
sync truncate
ext4_sync_file()
file_write_and_wait_range()
ext4_setattr(0)
inode->i_size = 0
ext4_writepage()
len = 0
__ext4_journalled_writepage()
page_bufs = page_buffers(page)
ext4_walk_page_buffers(bget_one) <- does not get refcount
do_invalidatepage()
free_buffer_head()
ext4_walk_page_buffers(page_bufs) <- trigger use-after-free
After commit bdf96838aea6 ("ext4: fix race between truncate and
__ext4_journalled_writepage()"), we have already handled the racing
case, so the bget_one() and bput_one() are not needed. So this patch
simply remove these hunk, and recheck the i_size to make it safe.
Fixes: bdf96838aea6 ("ext4: fix race between truncate and __ext4_journalled_writepage()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211225090937.712867-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got issue as follows when run syzkaller test:
[ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user
[ 1901.130901] Aborting journal on device vda-8.
[ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal
[ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal
[ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal
[ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal
[ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted
[ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal
[ 1901.136915] ==================================================================
[ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal
[ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal
[ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968
[ 1901.138817]
[ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal
[ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1
[ 1901.146479] Hardware name: linux,dummy-virt (DT)
[ 1901.147317] Call trace:
[ 1901.147552] dump_backtrace+0x0/0x2d8
[ 1901.147898] show_stack+0x28/0x38
[ 1901.148215] dump_stack+0xec/0x15c
[ 1901.148746] kasan_report+0x108/0x338
[ 1901.149207] __asan_load8+0x58/0xb0
[ 1901.149753] __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.150579] ext4_xattr_delete_inode+0xe4/0x700 [ext4]
[ 1901.151316] ext4_evict_inode+0x524/0xba8 [ext4]
[ 1901.151985] evict+0x1a4/0x378
[ 1901.152353] iput+0x310/0x428
[ 1901.152733] do_unlinkat+0x260/0x428
[ 1901.153056] __arm64_sys_unlinkat+0x6c/0xc0
[ 1901.153455] el0_svc_common+0xc8/0x320
[ 1901.153799] el0_svc_handler+0xf8/0x160
[ 1901.154265] el0_svc+0x10/0x218
[ 1901.154682] ==================================================================
This issue may happens like this:
Process1 Process2
ext4_evict_inode
ext4_journal_start
ext4_truncate
ext4_ind_truncate
ext4_free_branches
ext4_ind_truncate_ensure_credits
ext4_journal_ensure_credits_fn
ext4_journal_restart
handle->h_transaction = NULL;
mount -o remount,abort /mnt
-> trigger JBD abort
start_this_handle -> will return failed
ext4_xattr_delete_inode
ext4_journal_ensure_credits
ext4_journal_ensure_credits_fn
__ext4_journal_ensure_credits
jbd2_handle_buffer_credits
journal = handle->h_transaction->t_journal; ->null-ptr-deref
Now, indirect truncate process didn't handle error. To solve this issue
maybe simply add check handle is abort in '__ext4_journal_ensure_credits'
is enough, and i also think this is necessary.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211224100341.3299128-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It is not guaranteed that __ext4_get_inode_loc will definitely set
err_blk pointer when it returns EIO. To avoid using uninitialized
variables, let's first set err_blk to 0.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
We found on older kernel (3.10) that in the scenario of insufficient
disk space, system may trigger an ABBA deadlock problem, it seems that
this problem still exists in latest kernel, try to fix it here. The
main process triggered by this problem is that task A occupies the PA
and waits for the jbd2 transaction finish, the jbd2 transaction waits
for the completion of task B's IO (plug_list), but task B waits for
the release of PA by task A to finish discard, which indirectly forms
an ABBA deadlock. The related calltrace is as follows:
Task A
vfs_write
ext4_mb_new_blocks()
ext4_mb_mark_diskspace_used() JBD2
jbd2_journal_get_write_access() -> jbd2_journal_commit_transaction()
->schedule() filemap_fdatawait()
| |
| Task B |
| do_unlinkat() |
| ext4_evict_inode() |
| jbd2_journal_begin_ordered_truncate() |
| filemap_fdatawrite_range() |
| ext4_mb_new_blocks() |
-ext4_mb_discard_group_preallocations() <-----
Here, try to cancel ext4_mb_discard_group_preallocations() internal
retry due to PA busy, and do a limited number of retries inside
ext4_mb_discard_preallocations(), which can circumvent the above
problems, but also has some advantages:
1. Since the PA is in a busy state, if other groups have free PAs,
keeping the current PA may help to reduce fragmentation.
2. Continue to traverse forward instead of waiting for the current
group PA to be released. In most scenarios, the PA discard time
can be reduced.
However, in the case of smaller free space, if only a few groups have
space, then due to multiple traversals of the group, it may increase
CPU overhead. But in contrast, I feel that the overall benefit is
better than the cost.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
coccicheck complains about the use of snprintf() in sysfs show functions.
Fix the coccicheck warning:
WARNING: use scnprintf or sprintf.
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Signed-off-by: Qing Wang <wangqing@vivo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1634095731-4528-1-git-send-email-wangqing@vivo.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When we succeed in enabling some quota type but fail to enable another
one with quota feature, we correctly disable all enabled quota types.
However we forget to reset i_data_sem lockdep class. When the inode gets
freed and reused, it will inherit this lockdep class (i_data_sem is
initialized only when a slab is created) and thus eventually lockdep
barfs about possible deadlocks.
Reported-and-tested-by: syzbot+3b6f9218b1301ddda3e2@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20211007155336.12493-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When we hit an error when enabling quotas and setting inode flags, we do
not properly shutdown quota subsystem despite returning error from
Q_QUOTAON quotactl. This can lead to some odd situations like kernel
using quota file while it is still writeable for userspace. Make sure we
properly cleanup the quota subsystem in case of error.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20211007155336.12493-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got issue as follows when run syzkaller:
[ 167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[ 167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[ 167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[ 167.983601] ------------[ cut here ]------------
[ 167.984245] kernel BUG at fs/ext4/inode.c:847!
[ 167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G B 5.16.0-rc5-next-20211217+ #123
[ 167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[ 167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[ 167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[ 167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[ 167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[ 167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[ 167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[ 167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[ 167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[ 167.997158] FS: 00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[ 167.998238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[ 167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 168.001899] Call Trace:
[ 168.002235] <TASK>
[ 168.007167] ext4_bread+0xd/0x53
[ 168.007612] ext4_quota_write+0x20c/0x5c0
[ 168.010457] write_blk+0x100/0x220
[ 168.010944] remove_free_dqentry+0x1c6/0x440
[ 168.011525] free_dqentry.isra.0+0x565/0x830
[ 168.012133] remove_tree+0x318/0x6d0
[ 168.014744] remove_tree+0x1eb/0x6d0
[ 168.017346] remove_tree+0x1eb/0x6d0
[ 168.019969] remove_tree+0x1eb/0x6d0
[ 168.022128] qtree_release_dquot+0x291/0x340
[ 168.023297] v2_release_dquot+0xce/0x120
[ 168.023847] dquot_release+0x197/0x3e0
[ 168.024358] ext4_release_dquot+0x22a/0x2d0
[ 168.024932] dqput.part.0+0x1c9/0x900
[ 168.025430] __dquot_drop+0x120/0x190
[ 168.025942] ext4_clear_inode+0x86/0x220
[ 168.026472] ext4_evict_inode+0x9e8/0xa22
[ 168.028200] evict+0x29e/0x4f0
[ 168.028625] dispose_list+0x102/0x1f0
[ 168.029148] evict_inodes+0x2c1/0x3e0
[ 168.030188] generic_shutdown_super+0xa4/0x3b0
[ 168.030817] kill_block_super+0x95/0xd0
[ 168.031360] deactivate_locked_super+0x85/0xd0
[ 168.031977] cleanup_mnt+0x2bc/0x480
[ 168.033062] task_work_run+0xd1/0x170
[ 168.033565] do_exit+0xa4f/0x2b50
[ 168.037155] do_group_exit+0xef/0x2d0
[ 168.037666] __x64_sys_exit_group+0x3a/0x50
[ 168.038237] do_syscall_64+0x3b/0x90
[ 168.038751] entry_SYSCALL_64_after_hwframe+0x44/0xae
In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;
As in ext4_quota_write:
...
if (EXT4_SB(sb)->s_journal && !handle) {
ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
" cancelled because transaction is not started",
(unsigned long long)off, (unsigned long long)len);
return -EIO;
}
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
The kmemcache for ext4_fc_dentry_cachep remains registered after module
removal.
Destroy ext4_fc_dentry_cachep kmemcache on module removal.
Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de
Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de
Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
If use FALLOC_FL_KEEP_SIZE to alloc unwritten range at bottom, the
inode->i_size will not include the unwritten range. When call
ftruncate with fast commit enabled, it will miss to track the
unwritten range.
Change to trace the full range during ftruncate.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223032337.5198-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
For now ,we use ext4_punch_hole() during fast commit replay delete range
procedure. But it will be affected by inode->i_size, which may not
correct during fast commit replay procedure. The following test will
failed.
-create & write foo (len 1000K)
-falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K)
-create & fsync bar
-falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K)
-fsync foo
-crash before a full commit
After the fast_commit reply procedure, the range 400K-500K will not be
removed. Because in this case, when calling ext4_punch_hole() the
inode->i_size is 0, and it just retruns with doing nothing.
Change to use ext4_ext_remove_space() instead of ext4_punch_hole()
to remove blocks of inode directly.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223032337.5198-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
when call falloc with FALLOC_FL_ZERO_RANGE, to set an range to unwritten,
which has been already initialized. If the range is align to blocksize,
fast commit will not track range for this change.
Also track range for unwritten range in ext4_map_blocks().
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211221022839.374606-1-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
The MSI entries for multi-MSI are populated en bloc for the MSI descriptor,
but the current code invokes the population inside the per interrupt loop
which triggers a warning in the sysfs code and causes the interrupt
allocation to fail.
Move it outside of the loop so it works correctly for single and multi-MSI.
Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/87leznqx2a.ffs@tglx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Borislav Petkov:
"Remove -nostdlib compiler flag now that the vDSO uses the linker
instead of the compiler driver to link files"
* tag 'x86_vdso_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove -nostdlib compiler flag
x86/vdso: Remove -nostdlib compiler flag
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build fix from Borislav Petkov:
"A fix for cross-compiling the compressed stub on arm64 with clang"
* tag 'x86_build_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Move CLANG_FLAGS to beginning of KBUILD_CFLAGS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Enable the short string copies for CPUs which support them, in
copy_user_enhanced_fast_string()
- Avoid writing MSR_CSTAR on Intel due to TDX guests raising a #VE trap
* tag 'x86_cpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Add fast-short-rep-movs check to copy_user_enhanced_fast_string()
x86/cpu: Don't write CSTAR MSR on Intel CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
"The mandatory set of random minor cleanups all over tip"
* tag 'x86_cleanups_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/events/amd/iommu: Remove redundant assignment to variable shift
x86/boot/string: Add missing function prototypes
x86/fpu: Remove duplicate copy_fpstate_to_sigframe() prototype
x86/uaccess: Move variable into switch case statement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
"The pile which we cannot find the proper topic for so we stick it in
x86/misc:
- Add support for decoding instructions which do MMIO accesses in
order to use it in SEV and TDX guests
- An include fix and reorg to allow for removing set_fs in UML later"
* tag 'x86_misc_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Remove the mtrr_bp_init() stub
x86/sev-es: Use insn_decode_mmio() for MMIO implementation
x86/insn-eval: Introduce insn_decode_mmio()
x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
x86/insn-eval: Handle insn_get_opcode() failure
|
|
for-5.16-fixes contains two subtle race conditions which were introduced by
scheduler side code cleanups. The branch didn't get pushed out, so merge
into for-5.17.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Flush *all* mappings from the TLB after switching to the trampoline
pagetable to prevent any stale entries' presence
- Flush global mappings from the TLB, in addition to the CR3-write,
after switching off of the trampoline_pgd during boot to clear the
identity mappings
- Prevent instrumentation issues resulting from the above changes
* tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Prevent early boot triple-faults with instrumentation
x86/mm: Include spinlock_t definition in pgtable.
x86/mm: Flush global TLB when switching to trampoline page-table
x86/mm/64: Flush global TLB on boot and AP bringup
x86/realmode: Add comment for Global bit usage in trampoline_pgd
x86/mm: Add missing <asm/cpufeatures.h> dependency to <asm/page_64.h>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov:
- Add support for handling hw errors in SGX pages: poisoning,
recovering from poison memory and error injection into SGX pages
- A bunch of changes to the SGX selftests to simplify and allow of SGX
features testing without the need of a whole SGX software stack
- Add a sysfs attribute which is supposed to show the amount of SGX
memory in a NUMA node, similar to what /proc/meminfo is to normal
memory
- The usual bunch of fixes and cleanups too
* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/sgx: Fix NULL pointer dereference on non-SGX systems
selftests/sgx: Fix corrupted cpuid macro invocation
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
x86/sgx: Fix minor documentation issues
selftests/sgx: Add test for multiple TCS entry
selftests/sgx: Enable multiple thread support
selftests/sgx: Add page permission and exception test
selftests/sgx: Rename test properties in preparation for more enclave tests
selftests/sgx: Provide per-op parameter structs for the test enclave
selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
selftests/sgx: Move setup_test_encl() to each TEST_F()
selftests/sgx: Encpsulate the test enclave creation
selftests/sgx: Dump segments and /proc/self/maps only on failure
selftests/sgx: Create a heap for the test enclave
selftests/sgx: Make data measurement for an enclave segment optional
selftests/sgx: Assign source for each segment
selftests/sgx: Fix a benign linker warning
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
x86/sgx: Add hook to error injection address validation
x86/sgx: Hook arch_memory_failure() into mainline code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control fixlet from Borislav Petkov:
"A minor code cleanup removing a redundant assignment"
* tag 'x86_cache_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Remove redundant assignment to variable chunks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
"The accumulated pile of x86/sev generalizations and cleanups:
- Share the SEV string unrolling logic with TDX as TDX guests need it
too
- Cleanups and generalzation of code shared by SEV and TDX"
* tag 'x86_sev_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Move common memory encryption code to mem_encrypt.c
x86/sev: Rename mem_encrypt.c to mem_encrypt_amd.c
x86/sev: Use CC_ATTR attribute to generalize string I/O unroll
x86/sev: Remove do_early_exception() forward declarations
x86/head64: Carve out the guest encryption postprocessing into a helper
x86/sev: Get rid of excessive use of defines
x86/sev: Shorten GHCB terminate macro names
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform fix from Borislav Petkov:
"A single DT compatibility fix for the Intel media processor CE4100
driver"
* tag 'x86_platform_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ce4100: Replace "ti,pcf8575" by "nxp,pcf8575"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirtualization fix from Borislav Petkov:
"Define the INTERRUPT_RETURN macro only when CONFIG_XEN_PV is enabled
as it is its only user"
* tag 'x86_paravirt_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Fix build PARAVIRT_XXL=y without XEN_PV
|
|
Merge int340x thermal driver update fixing RFIM mailbox write
commands handling for 5.17-rc1.
* thermal-int340x:
thermal/drivers/int340x: Fix RFIM mailbox write commands
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:
"A single x86/fpu update for 5.17:
- Exclude AVX opmask registers use from AVX512 state tracking as they
don't contribute to frequency throttling"
* tag 'x86_fpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Correct AVX512 state tracking
|
|
If the pinned file has a hole by partial truncation, application that has
the block map will be broken.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge an operating performance points (OPP) update, devfreq updates
and power capping updates for 5.17-rc1:
- Update outdated OPP documentation (Tang Yizhou).
- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).
- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).
- Add missing COMMON_CLK dependency to the sun8i devfreq driver (Arnd
Bergmann).
- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).
- Fix typo in a comment in idle_inject.c (Jason Wang).
- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).
- Reduce DTPM trace verbosity (Daniel Lezcano).
* pm-opp:
Documentation: power: Update outdated contents in opp.rst
* pm-devfreq:
PM / devfreq: Reduce log severity for informative message
PM / devfreq: sun8i: addd COMMON_CLK dependency
PM / devfreq: Add a driver for the sun8i/sun50i MBUS
* powercap:
powercap/drivers/dtpm: Reduce trace verbosity
powercap/drivers/dtpm: Remove unused function definition
powercap: fix typo in a comment in idle_inject.c
powercap: intel_rapl: support new layout of Psys PowerLimit Register on SPR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- enable memtest functionality
- defconfig updates
* tag 'm68k-for-v5.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: defconfig: Update defconfigs for v5.16-rc1
m68k: Enable memtest functionality
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
"Besides all the small improvements and cleanups the most notable part
is the fast vector/SIMD implementation of the ChaCha20 stream cipher,
which is an adaptation of Andy Polyakov's code for the kernel.
Summary:
- add fast vector/SIMD implementation of the ChaCha20 stream cipher,
which mainly adapts Andy Polyakov's code for the kernel
- add status attribute to AP queue device so users can easily figure
out its status
- fix race in page table release code, and and lots of documentation
- remove uevent suppress from cio device driver, since it turned out
that it generated more problems than it solved problems
- quite a lot of virtual vs physical address confusion fixes
- various other small improvements and cleanups all over the place"
* tag 's390-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/dasd: use default_groups in kobj_type
s390/sclp_sd: use default_groups in kobj_type
s390/pci: simplify __pciwb_mio() inline asm
s390: remove unused TASK_SIZE_OF
s390/crash_dump: fix virtual vs physical address handling
s390/crypto: fix compile error for ChaCha20 module
s390/mm: check 2KB-fragment page on release
s390/mm: better annotate 2KB pagetable fragments handling
s390/mm: fix 2KB pgtable release race
s390/sclp: release SCLP early buffer after kernel initialization
s390/nmi: disable interrupts on extended save area update
s390/zcrypt: CCA control CPRB sending
s390/disassembler: update opcode table
s390/uv: fix memblock virtual vs physical address confusion
s390/smp: fix memblock_phys_free() vs memblock_free() confusion
s390/sclp: fix memblock_phys_free() vs memblock_free() confusion
s390/exit: remove dead reference to do_exit from copy_thread
s390/ap: add missing virt_to_phys address conversion
s390/pgalloc: use pointers instead of unsigned long values
s390/pgalloc: add virt/phys address handling to base asce functions
...
|