summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-11btrfs: tests: add selftests for raid-stripe-treeJohannes Thumshirn
Add first stash of very basic self tests for the RAID stripe-tree. More test cases will follow exercising the tree. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: correct typos in multiple comments across various filesShen Lichuan
Fix some confusing spelling errors that were currently identified, the details are as follows: block-group.c: 2800: uncompressible ==> incompressible extent-tree.c: 3131: EXTEMT ==> EXTENT extent_io.c: 3124: utlizing ==> utilizing extent_map.c: 1323: ealier ==> earlier extent_map.c: 1325: possiblity ==> possibility fiemap.c: 189: emmitted ==> emitted fiemap.c: 197: emmitted ==> emitted fiemap.c: 203: emmitted ==> emitted transaction.h: 36: trasaction ==> transaction volumes.c: 5312: filesysmte ==> filesystem zoned.c: 1977: trasnsaction ==> transaction Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove unused page_to_inode and page_to_fs_info macrosYouling Tang
This macro is no longer used after the "btrfs: Cleaned up folio->page conversion" series patch [1] was applied, so remove it. [1]: https://patchwork.kernel.org/project/linux-btrfs/cover/20240828182908.3735344-1-lizetao1@huawei.com/ Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove redundant stop_loop variable in scrub_stripe()Riyan Dhiman
The variable stop_loop was originally introduced in commit 625f1c8dc66d7 ("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in commit 3b080b2564287 ("Btrfs: scrub raid56 stripes in the right way"). However, in a later commit 18d30ab961497 ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code that modified stop_loop was removed, making the variable redundant. Currently, stop_loop is only initialized with 0 and is never used or modified within the scrub_stripe() function. As a result, this patch removes the stop_loop variable to clean up the code and eliminate unnecessary redundancy. This change has no impact on functionality, as stop_loop was never utilized in any meaningful way in the final version of the code. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()Filipe Manana
The qgroup record was allocated with kzalloc(), so it's pointless to set its old_roots member to NULL. Remove the assignment. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()Filipe Manana
Instead of dereferencing the delayed refs from the transaction multiple times, store it early in the local variable and then always use the variable. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()Filipe Manana
There's no need to hold the delayed refs spinlock when calling btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since it doesn't change anything in delayed refs and it only changes the xarray used to track qgroup extent records, which is protected by the xarray's lock. Holding the lock is only adding unnecessary lock contention with other tasks that actually need to take the lock to add/remove/change delayed references. So remove the locking. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()Filipe Manana
Instead of extracting fs_info from the transaction multiples times, store it in a local variable and use it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_recordFilipe Manana
Now that we track qgroup extent records in a xarray we don't need to have a "bytenr" field in struct btrfs_qgroup_extent_record, since we can get it from the index of the record in the xarray. So remove the field and grab the bytenr from either the index key or any other place where it's available (delayed refs). This reduces the size of struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning that we now can store 128 instances of this structure instead of 102 per 4K page. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: remove code duplication in ordered extent finishingJohannes Thumshirn
Remove the duplicated transaction joining, block reserve setting and raid extent inserting in btrfs_finish_ordered_extent(). While at it, also abort the transaction in case inserting a RAID stripe-tree entry fails. Suggested-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: canonicalize the device path before adding itQu Wenruo
[PROBLEM] Currently btrfs accepts any file path for its device, resulting some weird situation: # ./mount_by_fd /dev/test/scratch1 /mnt/btrfs/ The program has the following source code: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> int main(int argc, char *argv[]) { int fd = open(argv[1], O_RDWR); char path[256]; snprintf(path, sizeof(path), "/proc/self/fd/%d", fd); return mount(path, argv[2], "btrfs", 0, NULL); } Then we can have the following weird device path: BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440) Normally it's not a big deal, and later udev can trigger a device path rename. But if udev didn't trigger, the device path "/proc/self/fd/3" will show up in mtab. [CAUSE] For filename "/proc/self/fd/3", it means the opened file descriptor 3. In above case, it's exactly the device we want to open, aka points to "/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2". Inside kernel we solve the mount source using LOOKUP_FOLLOW, which follows the symbolic link and grab the proper block device. But inside btrfs we also save the filename into btrfs_device::name, and utilize that member to report our mount source, which leads to the above situation. [FIX] Instead of unconditionally trust the path, check if the original file (not following the symbolic link) is inside "/dev/", if not, then manually lookup the path to its final destination, and use that as our device path. This allows us to still use symbolic links, like "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs with LVM2 setup. And for really weird names, like the above case, we solve it to "/dev/dm-2" instead. Reviewed-by: Filipe Manana <fdmanana@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <fvogt@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: avoid unnecessary device path update for the same deviceQu Wenruo
[PROBLEM] It is very common for udev to trigger device scan, and every time a mounted btrfs device got re-scan from different soft links, we will get some of unnecessary device path updates, this is especially common for LVM based storage: # lvs scratch1 test -wi-ao---- 10.00g scratch2 test -wi-a----- 10.00g scratch3 test -wi-a----- 10.00g scratch4 test -wi-a----- 10.00g scratch5 test -wi-a----- 10.00g test test -wi-a----- 10.00g # mkfs.btrfs -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg -c [ 205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154) [ 205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 [ 205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm [ 205.713856] BTRFS info (device dm-4): using free-space-tree [ 205.722324] BTRFS info (device dm-4): checking UUID tree So far so good, but even if we just touched any soft link of "dm-4", we will get quite some unnecessary device path updates. # touch /dev/mapper/test-scratch1 # dmesg -c [ 469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221) [ 469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221) Such device path rename is unnecessary and can lead to random path change due to the udev race. [CAUSE] Inside device_list_add(), we are using a very primitive way checking if the device has changed, strcmp(). Which can never handle links well, no matter if it's hard or soft links. So every different link of the same device will be treated as a different device, causing the unnecessary device path update. [FIX] Introduce a helper, is_same_device(), and use path_equal() to properly detect the same block device. So that the different soft links won't trigger the rename race. Reviewed-by: Filipe Manana <fdmanana@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt <fvogt@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: allow compression even if the range is not page alignedQu Wenruo
Previously for btrfs with sector size smaller than page size (subpage), we only allow compression if the range is fully page aligned. This is to work around the asynchronous submission of compressed range, which delayed the page unlock and writeback into a workqueue, furthermore asynchronous submission can lock multiple sector range across page boundary. Such asynchronous submission makes it very hard to co-operate with other regular writes. With the recent changes to the subpage folio unlock path, now asynchronous submission of compressed pages can co-operate with regular submission, so enable sector perfect compression if it's an experimental build. The ETA for moving this feature out of experimental is 6.15, and I hope all remaining corner cases can be exposed before that. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: mark all dirty sectors as locked inside writepage_delalloc()Qu Wenruo
Currently we only mark sectors as locked if there is a *NEW* delalloc range for it. But NEW delalloc range is not the same as dirty sectors we want to submit, e.g: 0 32K 64K 96K 128K | |////////||///////| |////| 120K For above 64K page size case, writepage_delalloc() for page 0 will find and lock the delalloc range [32K, 96K), which is beyond the page boundary. Then when writepage_delalloc() is called for the page 64K, since [64K, 96K) is already locked, only [120K, 128K) will be locked. This means, although range [64K, 96K) is dirty and will be submitted later by extent_writepage_io(), it will not be marked as locked. This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to free every non-compressed sector, and compression is only allowed for full page range. But this is not safe for future sector perfect compression support, as this can lead to double folio unlock: Thread A | Thread B ---------------------------------------+-------------------------------- | submit_one_async_extent() | |- extent_clear_unlock_delalloc() extent_writepage() | |- btrfs_folio_end_writer_lock() |- btrfs_folio_end_writer_lock_bitmap()| |- btrfs_subpage_end_and_test_writer() | | | |- atomic_sub_and_test() | | | /* Now the atomic value is 0 */ |- if (atomic_read() == 0) | | |- folio_unlock() | |- folio_unlock() The root cause is the above range [64K, 96K) is dirtied and should also be locked but it isn't. So to make everything more consistent and prepare for the incoming sector perfect compression, mark all dirty sectors as locked. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: move the delalloc range bitmap search into extent_io.cQu Wenruo
Currently for subpage (sector size < page size) cases, we reuse subpage locked bitmap to find out all delalloc ranges we have locked, and run all those found ranges. However such reuse is not perfect, e.g.: 0 32K 64K 96K 128K | |////////||///////| |////| 120K For above range, writepage_delalloc() for page 0 will handle the range [32K, 96k), note delalloc range can be beyond the page boundary. But writepage_delalloc() for page 64K will only handle range [120K, 128K), as the previous run on page 0 has already handled range [64K, 96K). Meanwhile for the writeback we should expect range [64K, 96K) to also be locked, this leads to the mismatch from locked bitmap and delalloc range. This is not causing problems yet, but it's still an inconsistent behavior. So instead of relying on the subpage locked bitmap, move the delalloc range search using local @delalloc_bitmap, so that we can remove the existing btrfs_folio_find_writer_locked(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: do not assume the full page range is not dirty in extent_writepage_io()Qu Wenruo
The function extent_writepage_io() will submit the dirty sectors inside the page for the write. But recently to co-operate with the incoming subpage compression enhancement, a new bitmap is introduced to btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty range. This is because we can have the following cases with 64K page size: 0 16K 32K 48K 64K | |/////////| |/| 52K For range [16K, 32K), we queue the dirty range for compression, which is ran in a delayed workqueue. Then for range [48K, 52K), we go through the regular submission path. In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range [16K, 32K). The dirty flags for the range [16K, 32K) is only cleared when the compression is done, by the extent_clear_unlock_delalloc() call inside submit_one_async_extent(). This patch fix the false alert by removing the btrfs_folio_assert_not_dirty() check, since it's no longer correct for subpage compression cases. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: make extent_range_clear_dirty_for_io() to handle sector size < page ↵Qu Wenruo
size cases For btrfs with sector size < page size (e.g. 4K sector size, 64K page size), and enable the sector perfect compression support, then the following dirty range can lead to problems: 0 32K 64K 96K 128K | |///////||//////| |/| 124K In above case, if we start writeback for that inode, the last dirty range [124K, 128K) will not be submitted and cause reserved space leakage: - Start writeback for page 0 We find the range [32K, 96K) is suitable for compression, and queue it into a workqueue to do the delayed compression and submission. - Compression happens for range [32K, 96K) Function extent_range_clear_dirty_for_io() is called, however it is only doing full page handling, not considering any the extra bitmaps for subpage cases. That function will clear page dirty for both page 0 and page 64K. - Writeback for the inode is done Because page 64K has its dirty flag cleared, it will not be considered as a writeback target. This means the range [124K, 128K) will not be submitted, and reserved space for it will be leaked. Fix this problem by using the subpage helper to clear the dirty flag. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: wait for writeback if sector size is smaller than page sizeQu Wenruo
[PROBLEM] If sector perfect compression is enabled for sector size < page size case, the following case can lead dirty ranges not being written back: 0 32K 64K 96K 128K | |///////||//////| |/| 124K In above example, the page size is 64K, and we need to write back above two pages. - Submit for page 0 (main thread) We found delalloc range [32K, 96K), which can be compressed. So we queue an async range for [32K, 96K). This means, the page unlock/clearing dirty/setting writeback will all happen in a workqueue context. - The compression is done, and compressed range is submitted (workqueue) Since the compression is done in asynchronously, the compression can be done before the main thread to submit for page 64K. Now the whole range [32K, 96K), involving two pages, will be marked writeback. - Submit for page 64K (main thread) extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE, so it skips the writeback wait. And unlock the page and exit. This means the dirty range [124K, 128K) will never be submitted, until next writeback happens for page 64K. This will never happen for previous kernels because: - For sector size == page size case Since one page is one sector, if a page is marked writeback it will not have dirty flags. So this corner case will never hit. - For sector size < page size case We never do subpage compression, a range can only be submitted for compression if the range is fully page aligned. This change makes the subpage behavior mostly the same as non-subpage cases. [ENHANCEMENT] Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then always wait for writeback flags. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: compression: add an ASSERT() to ensure the read-in length is saneQu Wenruo
There are already two bugs (one in zlib, one in zstd) that involved compression path is not handling sector size < page size cases well. So it makes more sense to make sure that btrfs_compress_folios() returns Since we already have two bugs (one in zlib, one in zstd) in the compression path resulting the @total_in be to larger than the to-be-compressed range length, there is enough reason to add an ASSERT() to make sure the total read-in length doesn't exceed the input length. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: zstd: make the compression path to handle sector size < page sizeQu Wenruo
Inside zstd_compress_folios(), after exhausted one input page, we need to switch to the next page as input. However when counting the total input bytes (@tot_in), we always increase it by PAGE_SIZE. For the following case, it can cause incorrect value: 0 32K 64K 96K | |///////////||///////////| After compressing range [32K, 64K), we switch to the next page, and increasing @tot_in by 64K, while we only read 32K. This will cause the @total_in to return a value larger than the input length. Fix it by only increase @tot_in by the input size. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: zlib: make the compression path to handle sector size < page sizeQu Wenruo
Inside zlib_compress_folios(), each time we switch the input page cache, the @start is increased by PAGE_SIZE. But for the incoming compression support for sector size < page size (previously we support compression only when the range is fully page aligned), this is not going to handle the following case: 0 32K 64K 96K | |///////////||///////////| @start has the initial value 32K, indicating the start filepos of the to-be-compressed range. And when grabbing the first page as input, we always call "start += PAGE_SIZE;". But since @start is starting at 32K, it will be increased by 64K, resulting it to be 96K for the next range, causing incorrect input range and corruption for the future subpage compression. Fix it by only increase @start by the input size. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from CONFIG_BTRFS_DEBUGQu Wenruo
Currently CONFIG_BTRFS_EXPERIMENTAL is not only for the extra debugging output, but also for experimental features. This is not ideal to distinguish planned but not yet stable features from those purely designed for debugging. This patch splits the following features into CONFIG_BTRFS_EXPERIMENTAL: - Extent map shrinker This seems to be the first one to exit experimental. - Extent tree v2 This seems to be the last one to graduate from experimental. - Raid stripe tree - Csum offload mode - Send protocol v3 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: make assert_rbio() to only check CONFIG_BTRFS_ASSERTQu Wenruo
According to the description, CONFIG_BTRFS_DEBUG is only for extra debug info, meanwhile sanity checks should be managed by CONFIG_BTRFS_ASSERT. There is no need to check both to enable assert_rbio(). Just remove the check for CONFIG_BTRFS_DEBUG. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11btrfs: don't take dev_replace rwsem on task already holding itJohannes Thumshirn
Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of the RAID stripe-tree, we get the following splat from lockdep: BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started ============================================ WARNING: possible recursive locking detected 6.11.0-rc3-btrfs-for-next #599 Not tainted -------------------------------------------- btrfs/2326 is trying to acquire lock: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 but task is already holding lock: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs_info->dev_replace.rwsem); lock(&fs_info->dev_replace.rwsem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by btrfs/2326: #0: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250 stack backtrace: CPU: 1 UID: 0 PID: 2326 Comm: btrfs Not tainted 6.11.0-rc3-btrfs-for-next #599 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0x5b/0x80 __lock_acquire+0x2798/0x69d0 ? __pfx___lock_acquire+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 lock_acquire+0x19d/0x4a0 ? btrfs_map_block+0x39f/0x2250 ? __pfx_lock_acquire+0x10/0x10 ? find_held_lock+0x2d/0x110 ? lock_is_held_type+0x8f/0x100 down_read+0x8e/0x440 ? btrfs_map_block+0x39f/0x2250 ? __pfx_down_read+0x10/0x10 ? do_raw_read_unlock+0x44/0x70 ? _raw_read_unlock+0x23/0x40 btrfs_map_block+0x39f/0x2250 ? btrfs_dev_replace_by_ioctl+0xd69/0x1d00 ? btrfs_bio_counter_inc_blocked+0xd9/0x2e0 ? __kasan_slab_alloc+0x6e/0x70 ? __pfx_btrfs_map_block+0x10/0x10 ? __pfx_btrfs_bio_counter_inc_blocked+0x10/0x10 ? kmem_cache_alloc_noprof+0x1f2/0x300 ? mempool_alloc_noprof+0xed/0x2b0 btrfs_submit_chunk+0x28d/0x17e0 ? __pfx_btrfs_submit_chunk+0x10/0x10 ? bvec_alloc+0xd7/0x1b0 ? bio_add_folio+0x171/0x270 ? __pfx_bio_add_folio+0x10/0x10 ? __kasan_check_read+0x20/0x20 btrfs_submit_bio+0x37/0x80 read_extent_buffer_pages+0x3df/0x6c0 btrfs_read_extent_buffer+0x13e/0x5f0 read_tree_block+0x81/0xe0 read_block_for_search+0x4bd/0x7a0 ? __pfx_read_block_for_search+0x10/0x10 btrfs_search_slot+0x78d/0x2720 ? __pfx_btrfs_search_slot+0x10/0x10 ? lock_is_held_type+0x8f/0x100 ? kasan_save_track+0x14/0x30 ? __kasan_slab_alloc+0x6e/0x70 ? kmem_cache_alloc_noprof+0x1f2/0x300 btrfs_get_raid_extent_offset+0x181/0x820 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_btrfs_get_raid_extent_offset+0x10/0x10 ? down_read+0x194/0x440 ? __pfx_down_read+0x10/0x10 ? do_raw_read_unlock+0x44/0x70 ? _raw_read_unlock+0x23/0x40 btrfs_map_block+0x5b5/0x2250 ? __pfx_btrfs_map_block+0x10/0x10 scrub_submit_initial_read+0x8fe/0x11b0 ? __pfx_scrub_submit_initial_read+0x10/0x10 submit_initial_group_read+0x161/0x3a0 ? lock_release+0x20e/0x710 ? __pfx_submit_initial_group_read+0x10/0x10 ? __pfx_lock_release+0x10/0x10 scrub_simple_mirror.isra.0+0x3eb/0x580 scrub_stripe+0xe4d/0x1440 ? lock_release+0x20e/0x710 ? __pfx_scrub_stripe+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? do_raw_read_unlock+0x44/0x70 ? _raw_read_unlock+0x23/0x40 scrub_chunk+0x257/0x4a0 scrub_enumerate_chunks+0x64c/0xf70 ? __mutex_unlock_slowpath+0x147/0x5f0 ? __pfx_scrub_enumerate_chunks+0x10/0x10 ? bit_wait_timeout+0xb0/0x170 ? __up_read+0x189/0x700 ? scrub_workers_get+0x231/0x300 ? up_write+0x490/0x4f0 btrfs_scrub_dev+0x52e/0xcd0 ? create_pending_snapshots+0x230/0x250 ? __pfx_btrfs_scrub_dev+0x10/0x10 btrfs_dev_replace_by_ioctl+0xd69/0x1d00 ? lock_acquire+0x19d/0x4a0 ? __pfx_btrfs_dev_replace_by_ioctl+0x10/0x10 ? lock_release+0x20e/0x710 ? btrfs_ioctl+0xa09/0x74f0 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_lock+0x11e/0x240 ? __pfx_do_raw_spin_lock+0x10/0x10 btrfs_ioctl+0xa14/0x74f0 ? lock_acquire+0x19d/0x4a0 ? find_held_lock+0x2d/0x110 ? __pfx_btrfs_ioctl+0x10/0x10 ? lock_release+0x20e/0x710 ? do_sigaction+0x3f0/0x860 ? __pfx_do_vfs_ioctl+0x10/0x10 ? do_raw_spin_lock+0x11e/0x240 ? lockdep_hardirqs_on_prepare+0x270/0x3e0 ? _raw_spin_unlock_irq+0x28/0x50 ? do_sigaction+0x3f0/0x860 ? __pfx_do_sigaction+0x10/0x10 ? __x64_sys_rt_sigaction+0x18e/0x1e0 ? __pfx___x64_sys_rt_sigaction+0x10/0x10 ? __x64_sys_close+0x7c/0xd0 __x64_sys_ioctl+0x137/0x190 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f0bd1114f9b Code: Unable to access opcode bytes at 0x7f0bd1114f71. RSP: 002b:00007ffc8a8c3130 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0bd1114f9b RDX: 00007ffc8a8c35e0 RSI: 00000000ca289435 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc8a8c6c85 R13: 00000000398e72a0 R14: 0000000000004361 R15: 0000000000000004 </TASK> This happens because on RAID stripe-tree filesystems we recurse back into btrfs_map_block() on scrub to perform the logical to device physical mapping. But as the device replace task is already holding the dev_replace::rwsem we deadlock. So don't take the dev_replace::rwsem in case our task is the task performing the device replace. Suggested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11m68k: defconfig: Update defconfigs for v6.12-rc1Geert Uytterhoeven
- Enable modular build of the new mul_u64_u64_div_u64() test. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/4092672cb64b86ec3f300b4cf0ea0c2db2b52e2e.1727699197.git.geert@linux-m68k.org
2024-11-11ASoC: intel: sof_sdw: add quirk for Dell SKUDeep Harsora
This patch adds a quirk to include the codec amplifier function for this Dell SKU. Note: In this SKU '0CF1', the RT722 codec amplifier is excluded, and an external amplifier is used instead. Signed-off-by: Deep Harsora <deep_harsora@dell.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20241111070618.5414-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11ASoC: audio-graph-card2: Purge absent supplies for device tree nodesJohn Watts
The audio graph card doesn't mark its subnodes such as multi {}, dpcm {} and c2c {} as not requiring any suppliers. This causes a hang as Linux waits for these phantom suppliers to show up on boot. Make it clear these nodes have no suppliers. Example error message: [ 15.208558] platform 2034000.i2s: deferred probe pending: platform: wait for supplier /sound/multi [ 15.208584] platform sound: deferred probe pending: asoc-audio-graph-card2: parse error Signed-off-by: John Watts <contact@jookia.org> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/20241108-graph_dt_fix-v1-1-173e2f9603d6@jookia.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann
Backmerging to get fixes from v6.12-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-11-11mm: count zeromap read and set for swapout and swapinBarry Song
When the proportion of folios from the zeromap is small, missing their accounting may not significantly impact profiling. However, it's easy to construct a scenario where this becomes an issue—for example, allocating 1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out and swap-in counts seem to vanish into a black hole, potentially causing semantic ambiguity. On the other hand, Usama reported that zero-filled pages can exceed 10% in workloads utilizing zswap, while Hailong noted that some app in Android have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented similar optimizations, leading to these optimized-out pages being counted in either zswap or zRAM counters (with pswpin/pswpout also increasing for zRAM). With zeromap functioning prior to both zswap and zRAM, userspace will no longer detect these swap-out and swap-in actions. We have three ways to address this: 1. Introduce a dedicated counter specifically for the zeromap. 2. Use pswpin/pswpout accounting, treating the zero map as a standard backend. This approach aligns with zRAM's current handling of same-page fills at the device level. However, it would mean losing the optimized-out page counters previously available in zRAM and would not align with systems using zswap. Additionally, as noted by Nhat Pham, pswpin/pswpout counters apply only to I/O done directly to the backend device. 3. Count zeromap pages under zswap, aligning with system behavior when zswap is enabled. However, this would not be consistent with zRAM, nor would it align with systems lacking both zswap and zRAM. Given the complications with options 2 and 3, this patch selects option 1. We can find these counters from /proc/vmstat (counters for the whole system) and memcg's memory.stat (counters for the interested memcg). For example: $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat swpin_zero 1648 swpout_zero 33536 $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat swpin_zero 3905 swpout_zero 3985 This patch does not address any specific zeromap bug, but the missing swpout and swpin counts for zero-filled pages can be highly confusing and may mislead user-space agents that rely on changes in these counters as indicators. Therefore, we add a Fixes tag to encourage the inclusion of this counter in any kernel versions with zeromap. Many thanks to Kanchana for the contribution of changing count_objcg_event() to count_objcg_events() to support large folios[1], which has now been incorporated into this patch. [1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Hailong Liu <hailong.liu@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Andi Kleen <ak@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11bcachefs: Allow for unknown key types in backpointers fsckKent Overstreet
We can't assume that btrees only contain keys of a given type - even if they only have a single key type listed in the allowed key types for that btree; this is a forwards compatibility issue. Reported-by: syzbot+a27c3aaa3640dd3e1dfb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-11bcachefs: Fix assertion pop in topology repairKent Overstreet
Fixes: baefd3f849ed ("bcachefs: btree_cache.freeable list fixes") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-10Linux 6.12-rc7v6.12-rc7Linus Torvalds
2024-11-10Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of Qualcomm clk driver fixes: - Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks - Fix alpha PLL post_div mask for the cases where width is not specified - Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL trigger feature on the video clocks" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
2024-11-10Merge tag 'i2c-for-6.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "i2c-host fixes for v6.12-rc7 (from Andi): - Fix designware incorrect behavior when concluding a transmission - Fix Mule multiplexer error value evaluation" * tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set i2c: muxes: Fix return value check in mule_i2c_mux_probe()
2024-11-10filemap: Fix bounds checking in filemap_read()Trond Myklebust
If the caller supplies an iocb->ki_pos value that is close to the filesystem upper limit, and an iterator with a count that causes us to overflow that limit, then filemap_read() enters an infinite loop. This behaviour was discovered when testing xfstests generic/525 with the "localio" optimisation for loopback NFS mounts. Reported-by: Mike Snitzer <snitzer@kernel.org> Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Tested-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-10Merge tag 'irq_urgent_for_v6.12_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Make sure GICv3 controller interrupt activation doesn't race with a concurrent deactivation due to propagation delays of the register write * tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Force propagation of the active state with a read-back
2024-11-10Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes, 14 of which are cc:stable. Three affect DAMON. Lorenzo's five-patch series to address the mmap_region error handling is here also. Apart from that, various singletons" * tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Thorsten Blum ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove() signal: restore the override_rlimit logic fs/proc: fix compile warning about variable 'vmcore_mmap_ops' ucounts: fix counter leak in inc_rlimit_get_ucounts() selftests: hugetlb_dio: check for initial conditions to skip in the start mm: fix docs for the kernel parameter ``thp_anon=`` mm/damon/core: avoid overflow in damon_feed_loop_next_input() mm/damon/core: handle zero schemes apply interval mm/damon/core: handle zero {aggregation,ops_update} intervals mm/mlock: set the correct prev on failure objpool: fix to make percpu slot allocation more robust mm/page_alloc: keep track of free highatomic mm: resolve faulty mmap_region() error path behaviour mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling mm: refactor map_deny_write_exec() mm: unconditionally close VMAs on error mm: avoid unsafe VMA hook invocation when error arises on mmap hook mm/thp: fix deferred split unqueue naming and locking mm/thp: fix deferred split queue not partially_mapped
2024-11-10Merge tag 'usb-6.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are some small remaining USB and Thunderbolt fixes and device ids for 6.12-rc7. Included in here are: - new USB serial driver device ids - thunderbolt driver fixes for reported problems - typec bugfixes - dwc3 driver fix - musb driver fix All of these have been in linux-next this past week with no reported issues" * tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: qcserial: add support for Sierra Wireless EM86xx thunderbolt: Fix connection issue with Pluggable UD-4VPD dock usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd() usb: dwc3: fix fault at system suspend if device was already runtime suspended usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier usb: musb: sunxi: Fix accessing an released usb phy USB: serial: io_edgeport: fix use after free in debug printk USB: serial: option: add Quectel RG650V USB: serial: option: add Fibocom FG132 0x0112 composition thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
2024-11-10Merge tag 'staging-6.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are two small memory leak fixes for the vchiq_arm staging driver that have been sitting in my tree for weeks and should get merged for 6.12-rc7 so that people don't keep tripping over them. They both have been in linux-next for a while with no reported problems" * tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
2024-11-10hwrng: bcm74110 - Add Broadcom BCM74110 RNG driverMarkus Mayer
Add a driver for the random number generator present on the Broadcom BCM74110 SoC. Signed-off-by: Markus Mayer <mmayer@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-10dt-bindings: rng: add binding for BCM74110 RNGMarkus Mayer
Add a binding for the random number generator used on the BCM74110. Signed-off-by: Markus Mayer <mmayer@broadcom.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-10padata: Clean up in padata_do_multithreaded()Zicheng Qu
In commit 24cc57d8faaa ("padata: Honor the caller's alignment in case of chunk_size 0"), the line 'ps.chunk_size = max(ps.chunk_size, 1ul)' was added, making 'ps.chunk_size = 1U' redundant and never executed. Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-10crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init()Li Huafei
The commit 320406cb60b6 ("crypto: inside-secure - Replace generic aes with libaes") replaced crypto_alloc_cipher() with kmalloc(), but did not modify the handling of the return value. When kmalloc() returns NULL, PTR_ERR_OR_ZERO(NULL) returns 0, but in fact, the memory allocation has failed, and -ENOMEM should be returned. Fixes: 320406cb60b6 ("crypto: inside-secure - Replace generic aes with libaes") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-10crypto: qat - Fix missing destroy_workqueue in adf_init_aer()Wang Hai
The adf_init_aer() won't destroy device_reset_wq when alloc_workqueue() for device_sriov_wq failed. Add destroy_workqueue for device_reset_wq to fix this issue. Fixes: 4469f9b23468 ("crypto: qat - re-enable sriov after pf reset") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-10crypto: rsassa-pkcs1 - Reinstate support for legacy protocolsLukas Wunner
Commit 1e562deacecc ("crypto: rsassa-pkcs1 - Migrate to sig_alg backend") enforced that rsassa-pkcs1 sign/verify operations specify a hash algorithm. That is necessary because per RFC 8017 sec 8.2, a hash algorithm identifier must be prepended to the hash before generating or verifying the signature ("Full Hash Prefix"). However the commit went too far in that it changed user space behavior: KEYCTL_PKEY_QUERY system calls now return -EINVAL unless they specify a hash algorithm. Intel Wireless Daemon (iwd) is one application issuing such system calls (for EAP-TLS). Closer analysis of the Embedded Linux Library (ell) used by iwd reveals that the problem runs even deeper: When iwd uses TLS 1.1 or earlier, it not only queries for keys, but performs sign/verify operations without specifying a hash algorithm. These legacy TLS versions concatenate an MD5 to a SHA-1 hash and omit the Full Hash Prefix: https://git.kernel.org/pub/scm/libs/ell/ell.git/tree/ell/tls-suites.c#n97 TLS 1.1 was deprecated in 2021 by RFC 8996, but removal of support was inadvertent in this case. It probably should be coordinated with iwd maintainers first. So reinstate support for such legacy protocols by defaulting to hash algorithm "none" which uses an empty Full Hash Prefix. If it is later on decided to remove TLS 1.1 support but still allow KEYCTL_PKEY_QUERY without a hash algorithm, that can be achieved by reverting the present commit and replacing it with the following patch: https://lore.kernel.org/r/ZxalYZwH5UiGX5uj@wunner.de/ It's worth noting that Python's cryptography library gained support for such legacy use cases very recently, so they do seem to still be a thing. The Python developers identified IKE version 1 as another protocol omitting the Full Hash Prefix: https://github.com/pyca/cryptography/issues/10226 https://github.com/pyca/cryptography/issues/5495 The author of those issues, Zoltan Kelemen, spent considerable effort searching for test vectors but only found one in a 2019 blog post by Kevin Jones. Add it to testmgr.h to verify correctness of this feature. Examination of wpa_supplicant as well as various IKE daemons (libreswan, strongswan, isakmpd, raccoon) has determined that none of them seems to use the kernel's Key Retention Service, so iwd is the only affected user space application known so far. Fixes: 1e562deacecc ("crypto: rsassa-pkcs1 - Migrate to sig_alg backend") Reported-by: Klara Modin <klarasmodin@gmail.com> Tested-by: Klara Modin <klarasmodin@gmail.com> Closes: https://lore.kernel.org/r/2ed09a22-86c0-4cf0-8bda-ef804ccb3413@gmail.com/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-09s390/dasd: Fix typo in commentYu Jiaoliang
Fix typo in comment: requeust->request, Removve->Remove, notthing->nothing. Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20241108133913.3068782-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09s390/dasd: fix redundant /proc/dasd* entries removalMiroslav Franc
In case of an early failure in dasd_init, dasd_proc_init is never called and /proc/dasd* files are never created. That can happen, for example, if an incompatible or incorrect argument is provided to the dasd_mod.dasd= kernel parameter. However, the attempted removal of /proc/dasd* files causes 8 warnings and backtraces in this case. 4 on the error path within dasd_init and 4 when the dasd module is unloaded. Notice the "removing permanent /proc entry 'devices'" message that is caused by the dasd_proc_exit function trying to remove /proc/devices instead of /proc/dasd/devices since dasd_proc_root_entry is NULL and /proc/devices is indeed permanent. Example: ------------[ cut here ]------------ removing permanent /proc entry 'devices' WARNING: CPU: 6 PID: 557 at fs/proc/generic.c:701 remove_proc_entry+0x22e/0x240 CPU: 6 PID: 557 Comm: modprobe Not tainted 6.10.5-1-default #1 openSUSE Tumbleweed f6917bfd6e5a5c7a7e900e0e3b517786fb5c6301 Hardware name: QEMU 8561 QEMU (KVM/Linux) Krnl PSW : 0704c00180000000 000003fffed0e9f2 (remove_proc_entry+0x232/0x240) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 000003ff00000027 000003ff00000023 0000000000000028 000002f200000000 000002f3f05bec20 0000037ffecfb7d0 000003ffffdabab0 000003ff7ee4ec72 000003ff7ee4ec72 0000000000000007 000002f280e22600 000002f280e22688 000003ffa252cfa0 0000000000010000 000003fffed0e9ee 0000037ffecfba38 Krnl Code: 000003fffed0e9e2: c020004e7017 larl %r2,000003ffff6dca10 000003fffed0e9e8: c0e5ffdfad24 brasl %r14,000003fffe904430 #000003fffed0e9ee: af000000 mc 0,0 >000003fffed0e9f2: a7f4ff4c brc 15,000003fffed0e88a 000003fffed0e9f6: 0707 bcr 0,%r7 000003fffed0e9f8: 0707 bcr 0,%r7 000003fffed0e9fa: 0707 bcr 0,%r7 000003fffed0e9fc: 0707 bcr 0,%r7 Call Trace: [<000003fffed0e9f2>] remove_proc_entry+0x232/0x240 ([<000003fffed0e9ee>] remove_proc_entry+0x22e/0x240) [<000003ff7ef5a084>] dasd_proc_exit+0x34/0x60 [dasd_mod] [<000003ff7ef560c2>] dasd_exit+0x22/0xc0 [dasd_mod] [<000003ff7ee5a26e>] dasd_init+0x26e/0x280 [dasd_mod] [<000003fffe8ac9d0>] do_one_initcall+0x40/0x220 [<000003fffe9bc758>] do_init_module+0x78/0x260 [<000003fffe9bf3a6>] __do_sys_init_module+0x216/0x250 [<000003ffff37ac9e>] __do_syscall+0x24e/0x2d0 [<000003ffff38cca8>] system_call+0x70/0x98 Last Breaking-Event-Address: [<000003fffef7ea20>] __s390_indirect_jump_r14+0x0/0x10 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ While the cause is a user failure, the dasd module should handle the situation more gracefully. One of the simplest solutions is to make removal of the /proc/dasd* entries idempotent. Signed-off-by: Miroslav Franc <mfranc@suse.cz> [ sth: shortened if clause ] Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20241108133913.3068782-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09Merge tag 'md-6.13-20241107' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.13/block Pull raid5 fix from Song. * tag 'md-6.13-20241107' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: MAINTAINERS: Make Yu Kuai co-maintainer of md/raid subsystem md/raid5: Wait sync io to finish before changing group cnt
2024-11-09loop: fix type of block sizeLi Wang
PAGE_SIZE may be 64K, and the max block size can be PAGE_SIZE, so any variable for holding block size can't be defined as 'unsigned short'. Unfortunately commit 473516b36193 ("loop: use the atomic queue limits update API") passes 'bsize' with type of 'unsigned short' to loop_reconfigure_limits(), and causes LTP/ioctl_loop06 test failure: 12 ioctl_loop06.c:76: TINFO: Using LOOP_SET_BLOCK_SIZE with arg > PAGE_SIZE 13 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly ... 18 ioctl_loop06.c:76: TINFO: Using LOOP_CONFIGURE with block_size > PAGE_SIZE 19 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly Fixes the issue by defining 'block size' variable with 'unsigned int', which is aligned with block layer's definition. (improve commit log & add fixes tag) Fixes: 473516b36193 ("loop: use the atomic queue limits update API") Cc: John Garry <john.g.garry@oracle.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Li Wang <liwang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241109022744.1126003-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09selftests: net: add netlink-dumps to .gitignoreJakub Kicinski
Commit 55d42a0c3f9c ("selftests: net: add a test for closing a netlink socket ith dump in progress") added a new test but did not add it to gitignore. Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241108004731.2979878-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>