summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-18btrfs: pass struct btrfs_inode to btrfs_inode_type()David Sterba
Pass a struct btrfs_inode to btrfs_inode() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to new_simple_dir()David Sterba
Pass a struct btrfs_inode to new_simple_dir() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to btrfs_iget_locked()David Sterba
Pass a struct btrfs_inode to btrfs_inode() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to btrfs_read_locked_inode()David Sterba
Pass a struct btrfs_inode to btrfs_read_locked_inode() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to extent_range_clear_dirty_for_io()David Sterba
Pass a struct btrfs_inode to extent_range_clear_dirty_for_io() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: pass struct btrfs_inode to can_nocow_extent()David Sterba
Pass a struct btrfs_inode to can_nocow_extent() as it's an internal interface, allowing to remove some use of BTRFS_I. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: update include and forward declarations in headersDavid Sterba
Pass over all header files and add missing forward declarations, includes or fix include types. Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: simplify returns and labels in btrfs_init_fs_root()David Sterba
There's a label that does nothing else than return, so remove it and also change other gotos to immediate returns as the function is short enough for this pattern. Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: unify ordering of btrfs_key initializationsDavid Sterba
The btrfs_key is defined as objectid/type/offset and the keys are also printed like that. For better readability, update all key initializations to match this order. Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: zstd: remove local variable for storing page offsetsDavid Sterba
When using offset_in_page() it's clear what it means, we don't need to store it in the local variable just to use it right away. There's no change in the generated code, but keeps the declarations smaller. Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: zstd: move zstd_parameters to the workspaceDavid Sterba
Reduce stack consumption of zstd_compress_folios() by 40 bytes (10*sizeof(int)) as we can store struct zstd_parameters in the workspace that is reused for each call. typedef struct { ZSTD_compressionParameters cParams; ZSTD_frameParameters fParams; } ZSTD_parameters; typedef struct { unsigned windowLog; unsigned chainLog; unsigned hashLog; unsigned searchLog; unsigned minMatch; unsigned targetLength; ZSTD_strategy strategy; } ZSTD_compressionParameters; typedef struct { int contentSizeFlag; int checksumFlag; int noDictIDFlag; } ZSTD_frameParameters; Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: async-thread: switch local variables need_order boolDavid Sterba
Use bool for 0/1 indicators in thresh_exec_hook() and btrfs_work_helper(). Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: add __cold attribute to extent_io_tree_panic()David Sterba
This is a wrapper that leads to a panic, so add the annotation like the other similar functions have. Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is setJohannes Thumshirn
If BTRFS_FS_NEED_ZONE_FINISH is already set for the whole filesystem, exit early in btrfs_can_activate_zone(). There's no need to check if BTRFS_FS_NEED_ZONE_FINISH needs to be set if it is already set. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: require strict data/metadata split for subpage checksQu Wenruo
Since we have btrfs_meta_is_subpage(), we should make btrfs_is_subpage() to be data inode specific. This change involves: - Simplify btrfs_is_subpage() Now we only need to do a very simple sectorsize check against PAGE_SIZE. And since the function is pretty simple now, just make it an inline function. - Add an extra ASSERT() to make sure btrfs_is_subpage() is only called on data inode mapping - Migrate btree_csum_one_bio() to use btrfs_meta_folio_*() helpers - Migrate alloc_extent_buffer() to use btrfs_meta_folio_*() helpers - Migrate end_bbio_meta_write() to use btrfs_meta_folio_*() helpers Or we will trigger the ASSERT() due to calling btrfs_folio_*() on metadata folios. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: simplify subpage handling of read_extent_buffer_pages_nowait()Qu Wenruo
By using a shared bio_add_folio_nofail() with calculated range_start/range_len, so no more explicit subpage routine needed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: simplify subpage handling of write_one_eb()Qu Wenruo
Currently inside write_one_eb() we have two different ways of handling subpage and regular metadata. The differences are: - Extra offset/length calculation when adding the folio range to bio for subpage cases - Only decrease wbc->nr_to_write if the whole page is no longer dirty for subpage cases - Use subpage helper for subpage cases Merge the tow ways into a shared one: - Always calculate the to-be-queued range So that bio_add_folio() can use the same calculated resulted length and offset for both cases. - Use btrfs_meta_folio_clear_dirty() and btrfs_meta_folio_set_writeback() helpers This will cover both cases. - Only decrease wbc->nr_to_write if the folio is no longer dirty Since we have the folio locked, no one else can modify the folio dirty flags (set_extent_buffer_dirty() will also lock the folio for subpage cases). Thus after our btrfs_meta_folio_clear_dirty() call, if the whole folio is no longer dirty, we're submitting the last dirty eb of the folio, and can decrease wbc->nr_to_write properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: simplify subpage handling of btrfs_clear_buffer_dirty()Qu Wenruo
The function btrfs_clear_buffer_dirty() is called on dirty extent buffer that will not be written back. The function will call btree_clear_folio_dirty() to clear the folio dirty flag and also clear PAGECACHE_TAG_DIRTY flag. And we split the subpage and regular handling, as for subpage cases we should only clear PAGECACHE_TAG_DIRTY if the last dirty extent buffer in the page is cleared. So here we can simplify the function by: - Use the newly introduced btrfs_meta_folio_clear_and_test_dirty() helper The helper will return true if we cleared the folio dirty flag. With that we can use the same helper for both subpage and regular cases. - Rename btree_clear_folio_dirty() to btree_clear_folio_dirty_tag() As we move the folio dirty clearing in the btrfs_clear_buffer_dirty(). - Call btrfs_meta_folio_clear_and_test_dirty() to clear the dirty flags for both regular and subpage metadata cases - Only call btree_clear_folio_dirty_tag() when the folio is no longer dirty - Update the comment inside set_extent_buffer_dirty() As there is no separate clear_subpage_extent_buffer_dirty() anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: use metadata specific helpers to simplify extent buffer helpersQu Wenruo
The following functions are doing metadata specific checks: - set_extent_buffer_uptodate() - clear_extent_buffer_uptodate() The reason why we do not use btrfs_folio_*() helpers for those helpers is, btrfs_is_subpage() cannot handle dummy extent buffer if nodesize >= PAGE_SIZE but block size < PAGE_SIZE. In that case, we do not need to attach extra bitmaps to the extent buffer folio. But since dummy extent buffer folios are not attached to btree inode, btrfs_is_subpage() will return true, causing problems. And the following are using btrfs_folio_*() helpers for metadata, but in theory we should use metadata specific checks: - set_extent_buffer_dirty() This is not causing problems because a dummy extent buffer should never be marked dirty. To make code simpler, introduce btrfs_meta_folio_*() helpers, to do the metadata specific handling, so that we do not to open-code such checks in above involved functions. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: make subpage attach and detach handle metadata properlyQu Wenruo
Currently subpage attach/detach is not doing proper dummy extent buffer subpage check, as btrfs_is_subpage() is not reliable for dummy extent buffer folios. Since we have a metadata specific check now, use that for btrfs_attach_subpage() first. Then enhance btrfs_detach_subpage() to accept a type parameter, so that we can do extra checks for dummy extent buffers properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: factor out metadata subpage detection into a dedicated helperQu Wenruo
Currently we have only one btrfs_is_subpage() to cover both data and metadata. But there is a special case for metadata: - dummy extent buffer, sector size < PAGE_SIZE and node size >= PAGE_SIZE In such case, btrfs_is_subpage() will return true for extent buffer folio. But that is not correct, and that's exactly why we have some open-coded checks for functions like set_extent_buffer_uptodate() and clear_extent_buffer_uptodate(). Just extract the metadata specific checks into a helper, and replace those call sites. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: remove btrfs_fs_info::sectors_per_pageQu Wenruo
For the future large folio support, our filemap can have folios with different sizes, thus we can no longer rely on a fixed blocks_per_page value. To prepare for that future, here we do: - Remove btrfs_fs_info::sectors_per_page - Introduce a helper, btrfs_blocks_per_folio() Which uses the folio size to calculate the number of blocks for each folio. - Migrate the existing btrfs_fs_info::sectors_per_page to use that helper There are some exceptions: * Metadata nodesize < page size support In the future, even if we support large folios, we will only allocate a folio that matches our nodesize. Thus we won't have a folio covering multiple metadata unless nodesize < page size. * Existing subpage bitmap dump We use a single unsigned long to store the bitmap. That means until we change the bitmap dumping code, our upper limit for folio size will only be 256K (4K block size, 64 bit unsigned long). * btrfs_is_subpage() check This will be migrated into a future patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: zstd: enable negative compression levels mount optionDaniel Vacek
Allow using the fast modes (negative compression levels) of zstd as a mount option. As per the results, the compression ratio is (expectedly) lower: for level in {-15..-1} 1 2 3; \ do printf "level %3d\n" $level; \ mount -o compress=zstd:$level /dev/sdb /mnt/test/; \ grep sdb /proc/mounts; \ cp -r /usr/bin /mnt/test/; sync; compsize /mnt/test/bin; \ cp -r /usr/share/doc /mnt/test/; sync; compsize /mnt/test/doc; \ cp enwik9 /mnt/test/; sync; compsize /mnt/test/enwik9; \ cp linux-6.13.tar /mnt/test/; sync; compsize /mnt/test/linux-6.13.tar; \ rm -r /mnt/test/{bin,doc,enwik9,linux-6.13.tar}; \ umount /mnt/test/; \ done |& tee results | \ awk '/^level/{print}/^TOTAL/{print$3"\t"$2" |"}' | paste - - - - - 266M bin | 45M doc | 953M wiki | 1.4G source =============================+===============+===============+===============+ level -15 180M 67% | 30M 68% | 694M 72% | 598M 40% | level -14 180M 67% | 30M 67% | 683M 71% | 581M 39% | level -13 177M 66% | 29M 66% | 671M 70% | 566M 38% | level -12 174M 65% | 29M 65% | 658M 69% | 548M 37% | level -11 174M 65% | 28M 64% | 645M 67% | 530M 35% | level -10 171M 64% | 28M 62% | 631M 66% | 512M 34% | level -9 165M 62% | 27M 61% | 615M 64% | 493M 33% | level -8 161M 60% | 27M 59% | 598M 62% | 475M 32% | level -7 155M 58% | 26M 58% | 582M 61% | 457M 30% | level -6 151M 56% | 25M 56% | 565M 59% | 437M 29% | level -5 145M 54% | 24M 55% | 545M 57% | 417M 28% | level -4 139M 52% | 23M 52% | 520M 54% | 391M 26% | level -3 135M 50% | 22M 50% | 495M 51% | 369M 24% | level -2 127M 47% | 22M 48% | 470M 49% | 349M 23% | level -1 120M 45% | 21M 47% | 452M 47% | 332M 22% | level 1 110M 41% | 17M 39% | 362M 38% | 290M 19% | level 2 106M 40% | 17M 38% | 349M 36% | 288M 19% | level 3 104M 39% | 16M 37% | 340M 35% | 276M 18% | The samples represent some data sets that can be commonly found and show approximate compressibility. The fast levels trade off speed for ratio and are best suitable for highly compressible data. As can be seen above, comparing the results to the current default zstd level 3, the negative levels are roughly 2x worse at -15 and the ratio increases almost linearly with each level. Signed-off-by: Daniel Vacek <neelx@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: move ordered extent cleanup to where they are allocatedQu Wenruo
The ordered extent cleanup is hard to grasp because it doesn't follow the common cleanup-asap pattern. E.g. run_delalloc_nocow() and cow_file_range() allocate one or more ordered extent, but if any error is hit, the cleanup is done later inside btrfs_run_delalloc_range(). To change the existing delayed cleanup: - Update the comment on error handling of run_delalloc_nocow() There are in fact 3 different cases other than 2 if we are doing ordered extents cleanup inside run_delalloc_nocow(): 1) @cow_start and @cow_end not set No fallback to COW at all. Before @cur_offset we need to cleanup the OE and page dirty. After @cur_offset just clear all involved page and extent flags. 2) @cow_start set but @cow_end not set. This means we failed before even calling fallback_to_cow(). It's just a variant of case 1), where it's @cow_start splitting the two parts (and we should just ignore @cur_offset since it's advanced without any new ordered extent). 3) @cow_start and @cow_end both set This means fallback_to_cow() failed, meaning [start, cow_start) needs the regular OE and dirty folio cleanup, and skip range [cow_start, cow_end) as cow_file_range() has done the cleanup, and eventually cleanup [cow_end, end) range. - Only reset @cow_start after fallback_to_cow() succeeded As above case 2) and 3) are both relying on @cow_start to determine the cleanup range. - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(), cow_file_range() and nocow_one_range() For cow_file_range() it's pretty straightforward and easy. For run_delalloc_nocow() refer to the above 3 different error cases. For nocow_one_range() if we hit an error, we need to cleanup the ordered extents by ourselves. And then it will fallback to case 1), since @cur_offset is not yet advanced, the existing cleanup will co-operate with nocow_one_range() well. - Remove the btrfs_cleanup_ordered_extents() inside submit_uncompressed_range() As failed cow_file_range() will do all the proper cleanup now. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: factor out nocow ordered extent and extent map generation into a helperQu Wenruo
Currently we're doing all the ordered extent and extent map generation inside a while() loop of run_delalloc_nocow(). This makes it pretty hard to read, nor doing proper error handling. So move that part of code into a helper, nocow_one_range(). This should not change anything, but there is a tiny timing change where btrfs_dec_nocow_writers() is only called after nocow_one_range() helper exits. This timing change is small, and makes error handling easier, thus should be fine. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: expose per-inode stable writes flagQu Wenruo
The address space flag AS_STABLE_WRITES determine if FGP_STABLE for will wait for the folio to finish its writeback. For btrfs, due to the default data checksum behavior, if we modify the folio while it's still under writeback, it will cause data checksum mismatch. Thus for quite some call sites we manually call folio_wait_writeback() to prevent such problem from happening. Currently there is only one call site inside btrfs really utilizing FGP_STABLE, and in that case we also manually call folio_wait_writeback() to do the waiting. But it's better to properly expose the stable writes flag to a per-inode basis, to allow call sites to fully benefit from FGP_STABLE flag. E.g. for inodes with NODATASUM allowing beginning dirtying the page without waiting for writeback. This involves: - Update the mapping's stable write flag when setting/clearing NODATASUM inode flag using ioctl This only works for empty files, so it should be fine. - Update the mapping's stable write flag when reading an inode from disk - Remove the explicit folio_wait_writeback() for FGP_BEGINWRITE call site Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: zlib: refactor S390x HW acceleration buffer preparationQu Wenruo
Currently for s390x HW zlib compression, to get the best performance we need a buffer size which is larger than a page. This means we need to copy multiple pages into workspace->buf, then use that buffer as zlib compression input. Currently it's hardcoded using page sized folio, and all the handling are deep inside a loop. Refactor the code by: - Introduce a dedicated helper to do the buffer copy The new helper will be called copy_data_into_buffer(). - Add extra ASSERT()s * Make sure we only go into the function for hardware acceleration * Make sure we still get page sized folio - Prepare for future large folios This means we will rely on the folio size, other than PAGE_SIZE to do the copy. - Handle the folio mapping and unmapping inside the helper function For S390x hardware acceleration case, it never utilize the @data_in pointer, thus we can do folio mapping/unmapping all inside the function. Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: avoid assigning twice to block_start at btrfs_do_readpage()Filipe Manana
At btrfs_do_readpage() if we get an extent map for a prealloc extent we end up assigning twice to the 'block_start' variable, first the value returned by extent_map_block_start() and then EXTENT_MAP_HOLE. This is pointless so make it more clear by using an if-else statement and doing only one assignment. Also, while at it, move the declaration of 'block_start' into the while loop's scope, since it's not used outside of it and the related 'disk_bytenr' is also declared in this scope. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18btrfs: always fallback to buffered write if the inode requires checksumQu Wenruo
[BUG] It is a long known bug that VM image on btrfs can lead to data csum mismatch, if the qemu is using direct-io for the image (this is commonly known as cache mode 'none'). [CAUSE] Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the fs is allowed to dirty/modify the folio even if the folio is under writeback (as long as the address space doesn't have AS_STABLE_WRITES flag inherited from the block device). This is a valid optimization to improve the concurrency, and since these filesystems have no extra checksum on data, the content change is not a problem at all. But the final write into the image file is handled by btrfs, which needs the content not to be modified during writeback, or the checksum will not match the data (checksum is calculated before submitting the bio). So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but btrfs requires no modification, this leads to the false csum mismatch. This is only a controlled example, there are even cases where multi-thread programs can submit a direct IO write, then another thread modifies the direct IO buffer for whatever reason. For such cases, btrfs has no sane way to detect such cases and leads to false data csum mismatch. [FIX] I have considered the following ideas to solve the problem: - Make direct IO to always skip data checksum This not only requires a new incompatible flag, as it breaks the current per-inode NODATASUM flag. But also requires extra handling for no csum found cases. And this also reduces our checksum protection. - Let hardware handle all the checksum AKA, just nodatasum mount option. That requires trust for hardware (which is not that trustful in a lot of cases), and it's not generic at all. - Always fallback to buffered write if the inode requires checksum This was suggested by Christoph, and is the solution utilized by this patch. The cost is obvious, the extra buffer copying into page cache, thus it reduces the performance. But at least it's still user configurable, if the end user still wants the zero-copy performance, just set NODATASUM flag for the inode (which is a common practice for VM images on btrfs). Since we cannot trust user space programs to keep the buffer consistent during direct IO, we have no choice but always falling back to buffered IO. At least by this, we avoid the more deadly false data checksum mismatch error. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18blk-cgroup: improve policy registration error handlingChen Linxuan
This patch improve the returned error code of blkcg_policy_register(). 1. Move the validation check for cpd/pd_alloc_fn and cpd/pd_free_fn function pairs to the start of blkcg_policy_register(). This ensures we immediately return -EINVAL if the function pairs are not correctly provided, rather than returning -ENOSPC after locking and unlocking mutexes unnecessarily. Those locks should not contention any problems, as error of policy registration is a super cold path. 2. Return -ENOMEM when cpd_alloc_fn() failed. Co-authored-by: Wen Tao <wentao@uniontech.com> Signed-off-by: Wen Tao <wentao@uniontech.com> Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/3E333A73B6B6DFC0+20250317022924.150907-1-chenlinxuan@uniontech.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-18Merge tag 'pmdomain-v6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fix from Ulf Hansson: - Fix amlogic T7 ISP secpower * tag 'pmdomain-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: amlogic: fix T7 ISP secpower
2025-03-18idpf: check error for register_netdev() on initEmil Tantilov
Current init logic ignores the error code from register_netdev(), which will cause WARN_ON() on attempt to unregister it, if there was one, and there is no info for the user that the creation of the netdev failed. WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10 ... [ 3707.563641] unregister_netdev+0x1c/0x30 [ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf] [ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf] [ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf] [ 3707.563739] idpf_remove+0xbf/0x780 [idpf] [ 3707.563769] pci_device_remove+0xab/0x1e0 [ 3707.563786] device_release_driver_internal+0x371/0x530 [ 3707.563803] driver_detach+0xbf/0x180 [ 3707.563816] bus_remove_driver+0x11b/0x2a0 [ 3707.563829] pci_unregister_driver+0x2a/0x250 Introduce an error check and log the vport number and error code. On removal make sure to check VPORT_REG_NETDEV flag prior to calling unregister and free on the netdev. Add local variables for idx, vport_config and netdev for readability. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: fix using untrusted value of pkt_len in ice_vc_fdir_parse_raw()Mateusz Polchlopek
Fix using the untrusted value of proto->raw.pkt_len in function ice_vc_fdir_parse_raw() by verifying if it does not exceed the VIRTCHNL_MAX_SIZE_RAW_PACKET value. Fixes: 99f419df8a5c ("ice: enable FDIR filters from raw binary patterns for VFs") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: fix input validation for virtchnl BWLukasz Czapnik
Add missing validation of tc and queue id values sent by a VF in ice_vc_cfg_q_bw(). Additionally fixed logged value in the warning message, where max_tx_rate was incorrectly referenced instead of min_tx_rate. Also correct error handling in this function by properly exiting when invalid configuration is detected. Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: validate queue quanta parameters to prevent OOB accessJan Glaza
Add queue wraparound prevention in quanta configuration. Ensure end_qid does not overflow by validating start_qid and num_queues. Fixes: 015307754a19 ("ice: Support VF queue rate limit and quanta size configuration") Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: stop truncating queue ids when checkingJan Glaza
Queue IDs can be up to 4096, fix invalid check to stop truncating IDs to 8 bits. Fixes: bf93bf791cec8 ("ice: introduce ice_virtchnl.c and ice_virtchnl.h") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18virtchnl: make proto and filter action count unsignedJan Glaza
The count field in virtchnl_proto_hdrs and virtchnl_filter_action_set should never be negative while still being valid. Changing it from int to u32 ensures proper handling of values in virtchnl messages in driverrs and prevents unintended behavior. In its current signed form, a negative count does not trigger an error in ice driver but instead results in it being treated as 0. This can lead to unexpected outcomes when processing messages. By using u32, any invalid values will correctly trigger -EINVAL, making error detection more robust. Fixes: 1f7ea1cd6a374 ("ice: Enable FDIR Configure for AVF") Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: fix reservation of resources for RDMA when disabledJesse Brandeburg
If the CONFIG_INFINIBAND_IRDMA symbol is not enabled as a module or a built-in, then don't let the driver reserve resources for RDMA. The result of this change is a large savings in resources for older kernels, and a cleaner driver configuration for the IRDMA=n case for old and new kernels. Implement this by avoiding enabling the RDMA capability when scanning hardware capabilities. Note: Loading the out-of-tree irdma driver in connection to the in-kernel ice driver, is not supported, and should not be attempted, especially when disabling IRDMA in the kernel config. Fixes: d25a0fc41c1f ("ice: Initialize RDMA support") Signed-off-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Acked-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: ensure periodic output start time is in the futureKarol Kolacinski
On E800 series hardware, if the start time for a periodic output signal is programmed into GLTSYN_TGT_H and GLTSYN_TGT_L registers, the hardware logic locks up and the periodic output signal never starts. Any future attempt to reprogram the clock function is futile as the hardware will not reset until a power on. The ice_ptp_cfg_perout function has logic to prevent this, as it checks if the requested start time is in the past. If so, a new start time is calculated by rounding up. Since commit d755a7e129a5 ("ice: Cache perout/extts requests and check flags"), the rounding is done to the nearest multiple of the clock period, rather than to a full second. This is more accurate, since it ensures the signal matches the user request precisely. Unfortunately, there is a race condition with this rounding logic. If the current time is close to the multiple of the period, we could calculate a target time that is extremely soon. It takes time for the software to program the registers, during which time this requested start time could become a start time in the past. If that happens, the periodic output signal will lock up. For large enough periods, or for the logic prior to the mentioned commit, this is unlikely. However, with the new logic rounding to the period and with a small enough period, this becomes inevitable. For example, attempting to enable a 10MHz signal requires a period of 100 nanoseconds. This means in the *best* case, we have 99 nanoseconds to program the clock output. This is essentially impossible, and thus such a small period practically guarantees that the clock output function will lock up. To fix this, add some slop to the clock time used to check if the start time is in the past. Because it is not critical that output signals start immediately, but it *is* critical that we do not brick the function, 0.5 seconds is selected. This does mean that any requested output will be delayed by at least 0.5 seconds. This slop is applied before rounding, so that we always round up to the nearest multiple of the period that is at least 0.5 seconds in the future, ensuring a minimum of 0.5 seconds to program the clock output registers. Finally, to ensure that the hardware registers programming the clock output complete in a timely manner, add a write flush to the end of ice_ptp_write_perout. This ensures we don't risk any issue with PCIe transaction batching. Strictly speaking, this fixes a race condition all the way back at the initial implementation of periodic output programming, as it is theoretically possible to trigger this bug even on the old logic when always rounding to a full second. However, the window is narrow, and the code has been refactored heavily since then, making a direct backport not apply cleanly. Fixes: d755a7e129a5 ("ice: Cache perout/extts requests and check flags") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: health.c: fix compilation on gcc 7.5Przemek Kitszel
GCC 7 is not as good as GCC 8+ in telling what is a compile-time const, and thus could be used for static storage. Fortunately keeping strings as const arrays is enough to make old gcc happy. Excerpt from the report: My GCC is: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0. CC [M] drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o drivers/net/ethernet/intel/ice/devlink/health.c:35:3: error: initializer element is not constant ice_common_port_solutions, {ice_port_number_label}}, ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/ice/devlink/health.c:35:3: note: (near initialization for 'ice_health_status_lookup[0].solution') drivers/net/ethernet/intel/ice/devlink/health.c:35:31: error: initializer element is not constant ice_common_port_solutions, {ice_port_number_label}}, ^~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/ice/devlink/health.c:35:31: note: (near initialization for 'ice_health_status_lookup[0].data_label[0]') drivers/net/ethernet/intel/ice/devlink/health.c:37:46: error: initializer element is not constant "Change or replace the module or cable.", {ice_port_number_label}}, ^~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/ice/devlink/health.c:37:46: note: (near initialization for 'ice_health_status_lookup[1].data_label[0]') drivers/net/ethernet/intel/ice/devlink/health.c:39:3: error: initializer element is not constant ice_common_port_solutions, {ice_port_number_label}}, ^~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 85d6164ec56d ("ice: add fw and port health reporters") Reported-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Closes: https://lore.kernel.org/netdev/CY8PR11MB7134BF7A46D71E50D25FA7A989F72@CY8PR11MB7134.namprd11.prod.outlook.com Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18dt-bindings: mtd: atmel,dataflash: convert txt to yamlNayab Sayed
Convert atmel-dataflash.txt into atmel,dataflash.yaml Signed-off-by: Nayab Sayed <nayabbasha.sayed@microchip.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18mtd: mchp48l640: Use str_enable_disable() in mchp48l640_write_prepare()Zhang Heng
Remove hard-coded strings by using the str_enable_disable() helper function. Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Reviewed-by: Heiko Schocher <hs@denx.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18mtd: rawnand: gpmi: Use str_enabled_disabled() in gpmi_nand_attach_chip()Zhang Heng
Remove hard-coded strings by using the str_enabled_disabled() helper function. Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Reviewed-by: Han Xu <han.xu@nxp.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18mtd: mtdpart: Do not supply NULL to printf()Andy Shevchenko
GCC compiler is not happy about NULL being supplied as printf() parameter: drivers/mtd/mtdpart.c:693:34: error: ‘%s’ directive argument is null [-Werror=format-overflow=] Move the code after the parser test for NULL, and drop the ternary completely. The user can deduct this since when it's not NULL two messages will be printed. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18dt-bindings: mtd: gpmi-nand: Add compatible string for i.MX8 chipsFrank Li
Add compatible string "fsl,imx8mp-gpmi-nand" and "fsl,imx8mq-gpmi-nand", which back compatible with i.MX7D. So set these fall back to "fsl,imx7d-gpmi-nand". Add compatible string "fsl,imx8qm-gpmi-nand" and "fsl,imx8dxl-gpmi-nand", which back compatible with i.MX8QXP. So set these fall back to "fsl,imx8qxp-gpmi-nand". Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Han Xu <han.xu@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18mtd: nand: Fix a kdoc commentMiquel Raynal
The max_bad_eraseblocks_per_lun member of nand_device obviously describes a number of *maximum* number of bad eraseblocks per LUN. Fix this obvious typo. Fixes: 377e517b5fa5 ("mtd: nand: Add max_bad_eraseblocks_per_lun info to memorg") Cc: <stable+noautosel@kernel.org> # fix kdoc comment Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18mtd: spinand: Improve spinand_info macros styleMiquel Raynal
Let's assume all these macros should not have a trailing comma, this way the caller can use a more formal and usual C writing style, as reflected in the Macronix driver. Acked-by: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-03-18dlm: make tcp still work in multi-link envHeming Zhao
This patch bypasses multi-link errors in TCP mode, allowing dlm to operate on the first tcp link. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-03-18ASoC: codecs: Add aw88166 amplifier driverMark Brown
Merge series from wangweidong.a@awinic.com: Add the awinic,aw88166 property to support the aw88166 chip. The driver is for amplifiers aw88166 of Awinic Technology Corporation. The AW88166 is a high efficiency digital Smart K audio amplifier
2025-03-18add sof support on imx95Mark Brown
Merge series from Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>: Add sof support on imx95. This series also includes some changes to the audio-graph-card2 binding required for the support.