Age | Commit message (Collapse) | Author |
|
Use string choice helpers for better readability and to fix cocci warning
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202410062256.BoynX3c2-lkp@intel.com/
Signed-off-by: R Sundar <prosunofficial@gmail.com>
Link: https://patch.msgid.link/20241007172006.83339-1-prosunofficial@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Keep 'success' internally to track if any error happened and then
return it at the end in do_one_pass(). If jbd2_do_replay() return
-ENOMEM then stop replay journal.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240930005942.626942-7-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The judgement 'if (block_error && success == 0)' is never valid. Just
remove useless 'block_error' variable.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-6-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Factor out jbd2_do_replay() no funtional change.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240930005942.626942-5-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
To make JBD2_COMMIT_BLOCK process more clean, no functional change.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-4-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now buffer_head free is very fragmented in do_one_pass(), unified release
of buffer_head in do_one_pass()
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-3-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
'need_check_commit_time' is only used by v2/v3 checksum, so there isn't
need to add 'need_check_commit_time' judegement for v1 checksum logic.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Instead of directly casting and returning an error-valued pointer,
use ERR_CAST to make the error handling more explicit and improve
code clarity.
Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com>
Link: https://patch.msgid.link/20240920021440.1959243-1-yujiaoliang@vivo.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
On some filesystems, it is currently possible to create a transient
data inconsistency between pagecache and on-disk state. For example,
on a 1k block size ext4 filesystem:
$ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \
-c "truncate 8k" -c "fiemap -v" -c "pread -v 2k 16" <file>
...
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..3]: 17410..17413 4 0x1
1: [4..15]: hole 12
00000800: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 2k 16" <file>
00000800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
This allocates and writes two 1k blocks, map writes to the post-eof
portion of the (4k) eof folio, extends the file, and then shows that
the post-eof data is not cleared before the file size is extended.
The result is pagecache with a clean and uptodate folio over a hole
that returns non-zero data. Once reclaimed, pagecache begins to
return valid data.
Some filesystems avoid this problem by flushing the EOF folio before
inode size extension. This triggers writeback time partial post-eof
zeroing. XFS explicitly zeroes newly exposed file ranges via
iomap_zero_range(), but this includes a hack to flush dirty but
hole-backed folios, which means writeback actually does the zeroing
in this particular case as well. bcachefs explicitly flushes the eof
folio on truncate extension to the same effect, but doesn't handle
the analogous write extension case (i.e., replace "truncate 8k" with
"pwrite 4k 4k" in the above example command to reproduce the same
problem on bcachefs). btrfs doesn't seem to support subpage block
sizes.
The two main options to avoid this behavior are to either flush or
do the appropriate zeroing during size extending operations. Zeroing
is only required when the size change exposes ranges of the file
that haven't been directly written, such as a write or truncate that
starts beyond the current eof. The pagecache_isize_extended() helper
is already used for this particular scenario. It currently cleans
any pte's for the eof folio to ensure preexisting mappings fault and
allow the filesystem to take action based on the updated inode size.
This is required to ensure the folio is fully backed by allocated
blocks, for example, but this also happens to be the same scenario
zeroing is required.
Update pagecache_isize_extended() to zero the post-eof range of the
eof folio if it is dirty at the time of the size change, since
writeback now won't have the chance. If non-dirty, the folio has
either not been written or the post-eof portion was zeroed by
writeback.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://patch.msgid.link/20240919160741.208162-3-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Using mapped writes, it's technically possible to expose stale
post-eof data on a truncate up operation. Consider the following
example:
$ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \
-c "truncate 8k" -c "pread -v 2k 16" <file>
...
00000800: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX
...
This shows that the post-eof data written via mwrite lands within
EOF after a truncate up. While this is deliberate of the test case,
behavior is somewhat unpredictable because writeback does post-eof
zeroing, and writeback can occur at any time in the background. For
example, an fsync inserted between the mwrite and truncate causes
the subsequent read to instead return zeroes. This basically means
that there is a race window in this situation between any subsequent
extending operation and writeback that dictates whether post-eof
data is exposed to the file or zeroed.
To prevent this problem, perform partial block zeroing as part of
the various inode size extending operations that are susceptible to
it. For truncate extension, zero around the original eof similar to
how truncate down does partial zeroing of the new eof. For extension
via writes and fallocate related operations, zero the newly exposed
range of the file to cover any partial zeroing that must occur at
the original and new eof blocks.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://patch.msgid.link/20240919160741.208162-2-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The commit 91562895f803 ("ext4: properly sync file size update after O_SYNC
direct IO") causes confusion about the meaning of the return value of
ext4_dio_write_end_io().
Specifically, when the ext4_handle_inode_extension() operation succeeds,
ext4_dio_write_end_io() directly returns count instead of 0.
This does not cause a bug in the current kernel, but the semantics of the
return value of the ext4_dio_write_end_io() function are wrong, which is
likely to introduce bugs in the future code evolution.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240919082539.381626-1-alexjlzheng@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Commit 449813515d3e ("block, fs: Restore the per-bio/request data
lifetime fields") restored write-hint support in ext4. But that is
applicable only for direct IO. This patch supports passing
write-hint for buffered IO from ext4 file system to block layer
by filling bi_write_hint of struct bio in io_submit_add_bh().
Signed-off-by: j.xia <j.xia@samsung.com>
Link: https://patch.msgid.link/20240919020341.2657646-1-j.xia@samsung.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When I enabled ext4 debug for fault injection testing, I encountered the
following warning:
EXT4-fs error (device sda): ext4_read_inode_bitmap:201: comm fsstress:
Cannot read inode bitmap - block_group = 8, inode_bitmap = 1051
WARNING: CPU: 0 PID: 511 at fs/buffer.c:1181 mark_buffer_dirty+0x1b3/0x1d0
The root cause of the issue lies in the improper implementation of ext4's
buffer_head read fault injection. The actual completion of buffer_head
read and the buffer_head fault injection are not atomic, which can lead
to the uptodate flag being cleared on normally used buffer_heads in race
conditions.
[CPU0] [CPU1] [CPU2]
ext4_read_inode_bitmap
ext4_read_bh()
<bh read complete>
ext4_read_inode_bitmap
if (buffer_uptodate(bh))
return bh
jbd2_journal_commit_transaction
__jbd2_journal_refile_buffer
__jbd2_journal_unfile_buffer
__jbd2_journal_temp_unlink_buffer
ext4_simulate_fail_bh()
clear_buffer_uptodate
mark_buffer_dirty
<report warning>
WARN_ON_ONCE(!buffer_uptodate(bh))
The best approach would be to perform fault injection in the IO completion
callback function, rather than after IO completion. However, the IO
completion callback function cannot get the fault injection code in sb.
Fix it by passing the result of fault injection into the bh read function,
we simulate faults within the bh read function itself. This requires adding
an extra parameter to the bh read functions that need fault injection.
Fixes: 46f870d690fe ("ext4: simulate various I/O and checksum errors when reading metadata")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Link: https://patch.msgid.link/20240906091746.510163-1-leo.lilong@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When converting a delalloc extent in ext4_es_insert_extent(), since we
only want to pass the info of whether the quota has already been claimed
if the allocation is a direct allocation from ext4_map_create_blocks(),
there is no need to pass full mapping flags, so changes to just pass
whether the EXT4_GET_BLOCKS_DELALLOC_RESERVE bit is set.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240906061401.2980330-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When ctx_set_flags() is unused, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:
.../ext4/super.c:2120:1: error: unused function 'ctx_set_flags' [-Werror,-Wunused-function]
2120 | EXT4_SET_CTX(flags); /* set only */
| ^~~~~~~~~~~~~~~~~~~
Fix this by marking ctx_*_flags() with __maybe_unused
(mark both for the sake of symmetry).
See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20240905163229.140522-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This is the logic behavior and one that we would like to verify
using a generic fstest similar to xfs/546.
Link: https://lore.kernel.org/fstests/20240830152648.GE6216@frogsfrogsfrogs/
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240904084657.1062243-1-amir73il@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
cma_alloc() keep failed in our system which thanks to a jh->bh->b_page
can not be migrated out of CMA area[1] as the jh has one cp_transaction
pending on it because of j_free > j_max_transaction_buffers[2][3][4][5][6].
We temporarily solve this by launching jbd2_log_do_checkpoint forcefully
somewhere. Since journal is common mechanism to all JFSs and
cp_transaction has a little fewer opportunity to be launched, the
cma_alloc() could be affected under the same scenario. This patch
would like to have buffer_head of ext4 not use CMA pages when doing
sb_getblk.
[1]
crash_arm64_v8.0.4++> kmem -p|grep ffffff808f0aa150(sb->s_bdev->bd_inode->i_mapping)
fffffffe01a51c00 e9470000 ffffff808f0aa150 3 2 8000000008020 lru,private
fffffffe03d189c0 174627000 ffffff808f0aa150 4 2 2004000000008020 lru,private
fffffffe03d88e00 176238000 ffffff808f0aa150 3f9 2 2008000000008020 lru,private
fffffffe03d88e40 176239000 ffffff808f0aa150 6 2 2008000000008020 lru,private
fffffffe03d88e80 17623a000 ffffff808f0aa150 5 2 2008000000008020 lru,private
fffffffe03d88ec0 17623b000 ffffff808f0aa150 1 2 2008000000008020 lru,private
fffffffe03d88f00 17623c000 ffffff808f0aa150 0 2 2008000000008020 lru,private
fffffffe040e6540 183995000 ffffff808f0aa150 3f4 2 2004000000008020 lru,private
[2] page -> buffer_head
crash_arm64_v8.0.4++> struct page.private fffffffe01a51c00 -x
private = 0xffffff802fca0c00
[3] buffer_head -> journal_head
crash_arm64_v8.0.4++> struct buffer_head.b_private 0xffffff802fca0c00
b_private = 0xffffff8041338e10,
[4] journal_head -> b_cp_transaction
crash_arm64_v8.0.4++> struct journal_head.b_cp_transaction 0xffffff8041338e10 -x
b_cp_transaction = 0xffffff80410f1900,
[5] transaction_t -> journal
crash_arm64_v8.0.4++> struct transaction_t.t_journal 0xffffff80410f1900 -x
t_journal = 0xffffff80e70f3000,
[6] j_free & j_max_transaction_buffers
crash_arm64_v8.0.4++> struct journal_t.j_free,j_max_transaction_buffers 0xffffff80e70f3000 -x
j_free = 0x3f1,
j_max_transaction_buffers = 0x100,
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240904075300.1148836-1-zhaoyang.huang@unisoc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The if condition !A || A && B can be simplified to !A || B.
./fs/ext4/fast_commit.c:362:21-23: WARNING !A || A && B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9837
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830071713.40565-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The original implementation ext4's FS_IOC_GETFSMAP handling only
worked when the range of queried blocks included at least one free
(unallocated) block range. This is because how the metadata blocks
were emitted was as a side effect of ext4_mballoc_query_range()
calling ext4_getfsmap_datadev_helper(), and that function was only
called when a free block range was identified. As a result, this
caused generic/365 to fail.
Fix this by creating a new function ext4_getfsmap_meta_helper() which
gets called so that blocks before the first free block range in a
block group can get properly reported.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called. So unless ext4_dx_add_entry() and
make_indexed_dir() or some other functions are buggy, 'split == 0' will
not occur.
Setting 'continued' to 0 in this case masks the problem that the file
system has become corrupted, even though it prevents possible out-of-bounds
access. Hence WARN_ON_ONCE() is used to check if 'split' is 0, and if it is
then warns and returns an error to abort split.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20240823160518.GA424729@mit.edu
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241008121152.3771906-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
After commit 21175ca434c5 ("ext4: make prefetch_block_bitmaps default"),
we enable 'prefetch_block_bitmaps' by default, but this is not shown in
the '/proc/fs/ext4/sdx/options' procfs interface.
This makes it impossible to distinguish whether the feature is enabled by
default or not, so 'prefetch_block_bitmaps' is shown in the 'options'
procfs interface when prefetch_block_bitmaps is enabled by default.
This makes it easy to notice changes to the default mount options between
versions through the '/proc/fs/ext4/sdx/options' procfs interface.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241008120134.3758097-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
cros_ec hwmon driver probes available thermal sensors when probing the
driver. Register these thermal sensors to the thermal framework as well
via setting HWMON_C_REGISTER_TZ as a chip info, such that thermal
framework can adopt these sensors as well.
To make cros_ec registrable to thermal framework, the cros_ec dts need
the corresponding changes:
&cros_ec {
#thermal-sensor-cells = <1>;
};
Signed-off-by: Sung-Chi, Li <lschyi@chromium.org>
Message-ID: <20241113024000.3327161-1-lschyi@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
CONFIG_DEBUG_VM
On a web server workload, the cpumask_test_cpu() inside the
WARN_ON_ONCE() in the 'prev == next branch' takes about 17% of
all the CPU time of switch_mm_irqs_off().
On a large fleet, this WARN_ON_ONCE() has not fired in at least
a month, possibly never.
Move this test under CONFIG_DEBUG_VM so it does not get compiled
in production kernels.
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241109003727.3958374-4-riel@surriel.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
hnodes.
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":
tc filter add dev myve $FILTER1
tc filter add dev myve $FILTER2
for i in {1..2048}
do
echo $i
tc filter del dev myve $FILTER2
tc filter add dev myve $FILTER2
done
This patch adds the missing decoding logic for handles that
deserve it.
Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20241110172836.331319-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Removing full driver sections also removed mailing list entries, causing
submitters of future patches to forget CCing these mailing lists.
Hence re-add the sections for the Renesas Ethernet AVB, R-Car SATA, and
SuperH Ethernet drivers. Add people who volunteered to maintain these
drivers (thanks a lot!), and mark all of them as supported.
Fixes: 6e90b675cf942e50 ("MAINTAINERS: Remove some entries due to various compliance requirements.")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Niklas Cassel <cassel@kernel.org>
Acked-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Link: https://patch.msgid.link/4b2105332edca277f07ffa195796975e9ddce994.1731319098.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f.
Sebastian noticed the ISR indirectly acquires spin_locks, which are
sleeping locks under PREEMPT_RT, which leads to kernel splats.
Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btintel: Direct exception event to bluetooth stack
- hci_core: Fix calling mgmt_device_connected
* tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
====================
Link: https://patch.msgid.link/20241112175326.930800-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull virtio fix from Michael Tsirkin:
"A last minute mlx5 bugfix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
|
|
This commit fixes the bug in the handling of partial mapping of the
buffer objects to the GPU, which caused kernel warnings.
Panthor didn't correctly handle the case where the partial mapping
spanned multiple scatterlists and the mapping offset didn't point
to the 1st page of starting scatterlist. The offset variable was
not cleared after reaching the starting scatterlist.
Following warning messages were seen.
WARNING: CPU: 1 PID: 650 at drivers/iommu/io-pgtable-arm.c:659 __arm_lpae_unmap+0x254/0x5a0
<snip>
pc : __arm_lpae_unmap+0x254/0x5a0
lr : __arm_lpae_unmap+0x2cc/0x5a0
<snip>
Call trace:
__arm_lpae_unmap+0x254/0x5a0
__arm_lpae_unmap+0x108/0x5a0
__arm_lpae_unmap+0x108/0x5a0
__arm_lpae_unmap+0x108/0x5a0
arm_lpae_unmap_pages+0x80/0xa0
panthor_vm_unmap_pages+0xac/0x1c8 [panthor]
panthor_gpuva_sm_step_unmap+0x4c/0xc8 [panthor]
op_unmap_cb.isra.23.constprop.30+0x54/0x80
__drm_gpuvm_sm_unmap+0x184/0x1c8
drm_gpuvm_sm_unmap+0x40/0x60
panthor_vm_exec_op+0xa8/0x120 [panthor]
panthor_vm_bind_exec_sync_op+0xc4/0xe8 [panthor]
panthor_ioctl_vm_bind+0x10c/0x170 [panthor]
drm_ioctl_kernel+0xbc/0x138
drm_ioctl+0x210/0x4b0
__arm64_sys_ioctl+0xb0/0xf8
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.1+0x98/0xf8
do_el0_svc+0x24/0x38
el0_svc+0x34/0xc8
el0t_64_sync_handler+0xa0/0xc8
el0t_64_sync+0x174/0x178
<snip>
panthor : [drm] drm_WARN_ON(unmapped_sz != pgsize * pgcount)
WARNING: CPU: 1 PID: 650 at drivers/gpu/drm/panthor/panthor_mmu.c:922 panthor_vm_unmap_pages+0x124/0x1c8 [panthor]
<snip>
pc : panthor_vm_unmap_pages+0x124/0x1c8 [panthor]
lr : panthor_vm_unmap_pages+0x124/0x1c8 [panthor]
<snip>
panthor : [drm] *ERROR* failed to unmap range ffffa388f000-ffffa3890000 (requested range ffffa388c000-ffffa3890000)
Fixes: 647810ec2476 ("drm/panthor: Add the MMU/VM logical block")
Signed-off-by: Akash Goel <akash.goel@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111134720.780403-1-akash.goel@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
|
|
These two fields are populated and stored as a "frequently used value"
in ufs_fill_super, but are not used afterwards in the driver.
Moreover, one of the shifts triggers UBSAN: shift-out-of-bounds when
apbshift is 12 because 12 * 3 = 36 and 1 << 36 does not fit in the 32
bit integer used to store the value.
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2087853
Signed-off-by: Agathe Porte <agathe.porte@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
syzbot and Daan report a NULL pointer crash in the new full swap cluster
reclaim work:
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 1 UID: 0 PID: 51 Comm: kworker/1:1 Not tainted 6.12.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Workqueue: events swap_reclaim_work
> RIP: 0010:__list_del_entry_valid_or_report+0x20/0x1c0 lib/list_debug.c:49
> Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 83 c7 08 48 83 ec 18 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 19 01 00 00 48 89 f2 48 8b 4e 08 48 b8 00 00 00
> RSP: 0018:ffffc90000bb7c30 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88807b9ae078
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 000000000000004f R12: dffffc0000000000
> R13: ffffffffffffffb8 R14: ffff88807b9ae000 R15: ffffc90003af1000
> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffaca68fb8 CR3: 00000000791c8000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_del_entry_valid include/linux/list.h:124 [inline]
> __list_del_entry include/linux/list.h:215 [inline]
> list_move_tail include/linux/list.h:310 [inline]
> swap_reclaim_full_clusters+0x109/0x460 mm/swapfile.c:748
> swap_reclaim_work+0x2e/0x40 mm/swapfile.c:779
The syzbot console output indicates a virtual environment where swapfile
is on a rotational device. In this case, clusters aren't actually used,
and si->full_clusters is not initialized. Daan's report is from qemu, so
likely rotational too.
Make sure to only schedule the cluster reclaim work when clusters are
actually in use.
Link: https://lkml.kernel.org/r/20241107142335.GB1172372@cmpxchg.org
Link: https://lore.kernel.org/lkml/672ac50b.050a0220.2edce.1517.GAE@google.com/
Link: https://github.com/systemd/systemd/issues/35044
Fixes: 5168a68eb78f ("mm, swap: avoid over reclaim of full clusters")
Reported-by: syzbot+078be8bfa863cb9e0c6b@syzkaller.appspotmail.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Cc: Kairui Song <ryncsn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It seems that the cacheflush syscall got broken when PAN for LPAE was
implemented. User access was not enabled around the cache maintenance
instructions, causing them to fault.
Fixes: 7af5b901e847 ("ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
Reported-by: Michał Pecio <michal.pecio@gmail.com>
Tested-by: Michał Pecio <michal.pecio@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
There is a spelling mistake of 'absense' in comments which
should be 'absence'.
Link: https://lore.kernel.org/all/fca25741-c89f-43ea-95af-5e3232d513fc@arm.com/
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
When enabling expert mode CONFIG_EXPERT and using that power
user mode to disable the branch prediction hardening
!CONFIG_HARDEN_BRANCH_PREDICTOR, the assembly linker
in CLANG notices that some assembly in proc-v7.S does
not have corresponding C call sites, i.e. the prototypes
in proc-v7-bugs.c are enclosed in ifdef
CONFIG_HARDEN_BRANCH_PREDICTOR so this assembly:
SYM_TYPED_FUNC_START(cpu_v7_smc_switch_mm)
SYM_TYPED_FUNC_START(cpu_v7_hvc_switch_mm)
Results in:
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_smc_switch_mm
>>> referenced by proc-v7.S:94 (.../arch/arm/mm/proc-v7.S:94)
>>> arch/arm/mm/proc-v7.o:(.text+0x108) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_hvc_switch_mm
>>> referenced by proc-v7.S:105 (.../arch/arm/mm/proc-v7.S:105)
>>> arch/arm/mm/proc-v7.o:(.text+0x124) in archive vmlinux.a
Fix this by adding an additional requirement that
CONFIG_HARDEN_BRANCH_PREDICTOR has to be enabled to compile
these assembly calls.
Closes: https://lore.kernel.org/oe-kbuild-all/202411041456.ZsoEiD7T-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
For compiling with W=1, the following warning can be seen:
drivers/md/raid5.c: In function ‘setup_conf’:
drivers/md/raid5.c:2423:12: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size between 16 and 26 [-Werror=format-truncation=]
"raid%d-%s", conf->level, mdname(conf->mddev));
^~
drivers/md/raid5.c:2422:3: note: ‘snprintf’ output between 7 and 48 bytes into a destination of size 32
snprintf(conf->cache_name[0], namelen,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"raid%d-%s", conf->level, mdname(conf->mddev));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Increase the array size to avoid this warning.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241112161019.4154616-2-john.g.garry@oracle.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
When calculating the physical address range based on the iotlb and mr
[start,end) ranges, the offset of mr->start relative to map->start
is not taken into account. This leads to some incorrect and duplicate
mappings.
For the case when mr->start < map->start the code is already correct:
the range in [mr->start, map->start) was handled by a different
iteration.
Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Cc: stable@vger.kernel.org
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20241021134040.975221-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 447a54a0f79c9a409ceaa17804bdd2e0206397b9)
Cc: stable@vger.kernel.org
|
|
This reverts commit 694c79769cb384bca8b1ec1d1e84156e726bd106.
This was not the root cause. Revert.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3678
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: aurabindo.pillai@amd.com
Cc: hamishclaxton@gmail.com
(cherry picked from commit 3c2296b1eec55b50c64509ba15406142d4a958dc)
Cc: stable@vger.kernel.org # 6.11.x
|
|
The static declaration causes the check to fail. Remove it.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678
Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamish Claxton <hamishclaxton@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: aurabindo.pillai@amd.com
Cc: hamishclaxton@gmail.com
(cherry picked from commit 91314e7dfd83345b8b820b782b2511c9c32866cd)
Cc: stable@vger.kernel.org # 6.11.x
|
|
That is just a waste of time on APUs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704
Fixes: 216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e8fc090d322346e5ce4c4cfe03a8100e31f61c3c)
Cc: stable@vger.kernel.org
|
|
If srcu_read_lock_lite() is used on a given srcu_struct structure, then
the grace-period processing must do synchronize_rcu() instead of smp_mb()
between the scans of the ->srcu_unlock_count[] and ->srcu_lock_count[]
counters. Currently, it does that by testing the SRCU_READ_FLAVOR_LITE
bit of the ->srcu_reader_flavor mask, which works well. But only if
the CPU running that srcu_struct structure's grace period has previously
executed srcu_read_lock_lite(), which might not be the case, especially
just after that srcu_struct structure has been created and initialized.
This commit therefore updates the srcu_readers_unlock_idx() function
to OR together the ->srcu_reader_flavor masks from all CPUs, and
then make the srcu_readers_active_idx_check() function that test the
SRCU_READ_FLAVOR_LITE bit in the resulting mask.
Note that the srcu_readers_unlock_idx() function is already scanning all
the CPUs to sum up the ->srcu_unlock_count[] fields and that this is on
the grace-period slow path, hence no concerns about the small amount of
extra work.
Reported-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Closes: https://lore.kernel.org/all/d07e8f4a-d5ff-4c8e-8e61-50db285c57e9@amd.com/
Fixes: c0f08d6b5a61 ("srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
The srcu_read_unlock_lite() function invokes __srcu_read_unlock() instead
of __srcu_read_unlock_lite(), which means that it is doing an unnecessary
smp_mb(). This is harmless other than the performance degradation.
This commit therefore switches to __srcu_read_unlock_lite().
Reported-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Closes: https://lore.kernel.org/all/d07e8f4a-d5ff-4c8e-8e61-50db285c57e9@amd.com/
Fixes: c0f08d6b5a61 ("srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
If a CPU runs throughout the stalled grace period without passing
through a quiescent state, RCU priority boosting cannot help.
The rcu_torture_boost_failed() function therefore prints a message
flagging the first such CPU. However, if the stall was instead due to
(for example) RCU's grace-period kthread being starved of CPU, there will
be no such CPU, causing rcu_check_boost_fail() to instead pass back -1
through its cpup CPU-pointer parameter.
Therefore, the current message complains about a mythical CPU -1.
This commit therefore checks for this situation, and notes that all CPUs
have passed through a quiescent state.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
This commit adds a guest_os_delay module parameter that extends warm-up
and cool-down the specified number of seconds before and after the series
of test runs. This allows the data-collection intervals from any given
rcuscale guest OSes to line up with active periods in the other rcuscale
guest OSes, and also allows the thermal warm-up period required to obtain
consistent results from one test to the next.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
The current affinity check works fine until there are more reader
processes than CPUs, at which point the affinity check is looking for
non-existent CPUs. This commit therefore applies the same modulus to
the check as is present in the set_cpus_allowed_ptr() call.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
In performance tests, it can be counter-productive to spread torture-test
guest OSes across sockets. Plus the experimenter might have ideas about
what CPUs individual guest OSes are to run on. This commit therefore
adds a --no-affinity parameter to kvm.sh to prevent it from running
taskset on its guest OSes.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
Add support for I3C device in the tmp108 driver to handle the P3T1085
sensor. Register the I3C device driver to enable I3C functionality for the
sensor.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Message-ID: <20241112-p3t1085-v4-2-a1334314b1e6@nxp.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add help function tmp108_common_probe() to pave road to support i3c for
P3T1085(NXP) chip.
Use dev_err_probe() to simplify the code.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Message-ID: <20241112-p3t1085-v4-1-a1334314b1e6@nxp.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
method
According to the ACPI specification, the _PMD method is optional. The
acpi_power_meter driver shouldn't fail to load if the platform has no
_PMD method.
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Message-ID: <20241112021228.22914-1-lihuisong@huawei.com>
[groeck: Reworded commit description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|