summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-09-02ext4: drop all delonly descriptionsZhang Yi
When counting reserved clusters, delayed type is always equal to delonly type now, hence drop all delonly descriptions in parameters and comments. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-13-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: drop ext4_es_is_delonly()Zhang Yi
Since we don't add delayed flag in unwritten extents, so there is no difference between ext4_es_is_delayed() and ext4_es_is_delonly(), just drop ext4_es_is_delonly(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-12-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: make extent status types exclusiveZhang Yi
Since we don't add delayed flag in unwritten extents, all of the four extent status types EXTENT_STATUS_WRITTEN, EXTENT_STATUS_UNWRITTEN, EXTENT_STATUS_DELAYED and EXTENT_STATUS_HOLE are exclusive now, add assertion when storing pblock before inserting extent into status tree and add comment to the status definition. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-11-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: drop unused ext4_es_store_status()Zhang Yi
The helper ext4_es_store_status() is unused now, just drop it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: use ext4_map_query_blocks() in ext4_map_blocks()Zhang Yi
The blocks map querying logic in ext4_map_blocks() are the same as ext4_map_query_blocks(), so switch to directly use it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240813123452.2824659-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: drop ext4_es_delayed_clu()Zhang Yi
Since we move ext4_da_update_reserve_space() to ext4_es_insert_extent(), no one uses ext4_es_delayed_clu() and __es_delayed_clu(), just drop them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: update delalloc data reserve spcae in ext4_es_insert_extent()Zhang Yi
Now that we update data reserved space for delalloc after allocating new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is enabled, we also need to query the extents_status tree to calculate the exact reserved clusters. This is complicated now and it appears that it's better to do this job in ext4_es_insert_extent(), because __es_remove_extent() have already count delalloc blocks when removing delalloc extents and __revise_pending() return new adding pending count, we could update the reserved blocks easily in ext4_es_insert_extent(). We direct reduce the reserved cluster count when replacing a delalloc extent. However, thers are two special cases need to concern about the quota claiming when doing direct block allocation (e.g. from fallocate). A), fallocate a range that covers a delalloc extent but start with non-delayed allocated blocks, e.g. a hole. hhhhhhh+ddddddd+ddddddd ^^^^^^^^^^^^^^^^^^^^^^^ fallocate this range Current ext4_map_blocks() can't always trim the extent since it may release i_data_sem before calling ext4_map_create_blocks() and raced by another delayed allocation. Hence the EXT4_GET_BLOCKS_DELALLOC_RESERVE may not set even when we are replacing a delalloc extent, without this flag set, the quota has already been claimed by ext4_mb_new_blocks(), so we should release the quota reservations instead of claim them again. B), bigalloc feature is enabled, fallocate a range that contains non-delayed allocated blocks. |< one cluster >| hhhhhhh+hhhhhhh+hhhhhhh+ddddddd ^^^^^^^ fallocate this range This case is similar to above case, the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag is also not set. Hence we should release the quota reservations if we replace a delalloc extent but without EXT4_GET_BLOCKS_DELALLOC_RESERVE set. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: passing block allocation information to ext4_es_insert_extent()Zhang Yi
Just pass the block allocation flag to ext4_es_insert_extent() when we replacing a current extent after an actually block allocation or extent status conversion, this flag will be used by later changes. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: let __revise_pending() return newly inserted pendingsZhang Yi
Let __insert_pending() return 1 after successfully inserting a new pending cluster, and also let __revise_pending() to return the number of of newly inserted pendings. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240813123452.2824659-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: don't set EXTENT_STATUS_DELAYED on allocated blocksZhang Yi
Currently, we release delayed allocation reservation when removing delayed extent from extent status tree (which also happens when overwriting one extent with another one). When we allocated unwritten extent under some delayed allocated extent, we don't need the reservation anymore and hence we don't need to preserve the EXT4_MAP_DELAYED status bit. Allocating the new extent blocks will properly release the reservation. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240813123452.2824659-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag setZhang Yi
When doing block allocation, magic EXT4_GET_BLOCKS_DELALLOC_RESERVE means the allocating range covers a range of delayed allocated clusters, the blocks and quotas have already been reserved in ext4_da_map_blocks(), we should update the reserved space and don't need to claim them again. At the moment, we only set this magic in mpage_map_one_extent() when allocating a range of delayed allocated clusters in the write back path, it makes things complicated since we have to notice and deal with the case of allocating non-delayed allocated clusters separately in ext4_ext_map_blocks(). For example, it we fallocate some blocks that have been delayed allocated, free space would be claimed again in ext4_mb_new_blocks() (this is wrong exactily), and we can't claim quota space again, we have to release the quota reservations made for that previously delayed allocated clusters. Move the position thats set the EXT4_GET_BLOCKS_DELALLOC_RESERVE to where we actually do block allocation, it could simplify above handling a lot, it means that we always set this magic once the allocation range covers delalloc blocks, no need to take care of the allocation path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240813123452.2824659-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: factor out ext4_map_create_blocks() to allocate new blocksZhang Yi
Factor out a common helper ext4_map_create_blocks() from ext4_map_blocks() to do a real blocks allocation, no logic changes. [ Note: this first patch of a ten patch series named "v3: simplify the counting and management of delalloc reserved blocks". The link to the v1 and v2 patch series are below. -- TYT ] Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240802115120.362902-1-yi.zhang@huaweicloud.com # v2 of patch series Link: https://patch.msgid.link/20240601034149.2169771-1-yi.zhang@huaweicloud.com # v1 of the patch series Link: https://patch.msgid.link/20240813123452.2824659-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02btrfs: qgroup: don't use extent changeset when not neededFedor Pchelkin
The local extent changeset is passed to clear_record_extent_bits() where it may have some additional memory dynamically allocated for ulist. When qgroup is disabled, the memory is leaked because in this case the changeset is not released upon __btrfs_qgroup_release_data() return. Since the recorded contents of the changeset are not used thereafter, just don't pass it. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+81670362c283f3dd889c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-02xfs: make the calculation generic in xfs_sb_validate_fsb_count()Pankaj Raghav
Instead of assuming that PAGE_SHIFT is always higher than the blocklog, make the calculation generic so that page cache count can be calculated correctly for LBS. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-10-kernel@pankajraghav.com Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02xfs: expose block size in statPankaj Raghav
For block size larger than page size, the unit of efficient IO is the block size, not the page size. Leaving stat() to report PAGE_SIZE as the block size causes test programs like fsx to issue illegal ranges for operations that require block size alignment (e.g. fallocate() insert range). Hence update the preferred IO size to reflect the block size in this case. This change is based on a patch originally from Dave Chinner.[1] [1] https://lwn.net/ml/linux-fsdevel/20181107063127.3902-16-david@fromorbit.com/ Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240822135018.1931258-9-kernel@pankajraghav.com Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02xfs: use kvmalloc for xattr buffersDave Chinner
Pankaj Raghav reported that when filesystem block size is larger than page size, the xattr code can use kmalloc() for high order allocations. This triggers a useless warning in the allocator as it is a __GFP_NOFAIL allocation here: static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ >>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ... Fix this by changing all these call sites to use kvmalloc(), which will strip the NOFAIL from the kmalloc attempt and if that fails will do a __GFP_NOFAIL vmalloc(). This is not an issue that productions systems will see as filesystems with block size > page size cannot be mounted by the kernel; Pankaj is developing this functionality right now. Reported-by: Pankaj Raghav <kernel@pankajraghav.com> Fixes: f078d4ea8276 ("xfs: convert kmem_alloc() to kmalloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20240822135018.1931258-8-kernel@pankajraghav.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02iomap: fix iomap_dio_zero() for fs bs > system page sizePankaj Raghav
iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-7-kernel@pankajraghav.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02isofs: Annotate struct SL_component with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member text to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830164902.112682-2-thorsten.blum@toblux.com
2024-09-02gfs2: Remove gfs2_aspace_writepage()Matthew Wilcox (Oracle)
There are no remaining callers of gfs2_aspace_writepage() other than vmscan, which is known to do more harm than good. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02gfs2: Remove gfs2_jdata_writepage()Matthew Wilcox (Oracle)
There are no remaining callers of gfs2_jdata_writepage() other than vmscan, which is known to do more harm than good. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02gfs2: Remove __gfs2_writepage()Matthew Wilcox (Oracle)
Call aops->writepages() instead of using write_cache_pages() to call aops->writepage. Change the handling of -ENODATA to not set the persistent error on the block device. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02gfs2: Add gfs2_aspace_writepages()Matthew Wilcox (Oracle)
This saves one indirect function call per folio and gets us closer to removing aops->writepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02sysctl: avoid spurious permanent empty tablesThomas Weißschuh
The test if a table is a permanently empty one, inspects the address of the registered ctl_table argument. However as sysctl_mount_point is an empty array and does not occupy and space it can end up sharing an address with another object in memory. If that other object itself is a "struct ctl_table" then registering that table will fail as it's incorrectly recognized as permanently empty. Avoid this issue by adding a dummy element to the array so that is not empty anymore. Explicitly register the table with zero elements as otherwise the dummy element would be recognized as a sentinel element which would lead to a runtime warning from the sysctl core. While the issue seems not being encountered at this time, this seems mostly to be due to luck. Also a future change, constifying sysctl_mount_point and root_table, can reliably trigger this issue on clang 18. Given that empty arrays are non-standard in the first place it seems prudent to avoid them if possible. Fixes: 4a7b29f65094 ("sysctl: move sysctl type to ctl_table_header") Fixes: a35dd3a786f5 ("sysctl: drop now unnecessary out-of-bounds check") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Closes: https://lore.kernel.org/oe-lkp/202408051453.f638857e-lkp@intel.com Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-09-01nilfs2: refactor nilfs_segctor_thread()Ryusuke Konishi
Simplify nilfs_segctor_thread(), the main loop function of the log writer thread, to make the basic structure easier to understand. In particular, the acquisition and release of the sc_state_lock spinlock was scattered throughout the function, so extract the determination of whether log writing is required into a helper function and make the spinlock lock sections clearer. Link: https://lkml.kernel.org/r/20240826174116.5008-9-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: use kthread_create and kthread_stop for the log writer threadRyusuke Konishi
By using kthread_create() and kthread_stop() to start and stop the log writer thread, eliminate custom thread start and stop helpers, as well as the wait queue "sc_wait_task" on the "nilfs_sc_info" struct and NILFS_SEGCTOR_QUIT flag that exist only to implement them. Also, update the kernel doc comments of the changed functions as appropriate. Link: https://lkml.kernel.org/r/20240826174116.5008-8-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: remove sc_timer_taskRyusuke Konishi
After commit f5d4e04634c9 ("nilfs2: fix use-after-free of timer for log writer thread") is applied, nilfs_construct_timeout(), which is called by a timer and wakes up the log writer thread, is never called after the log writer thread has terminated. As a result, the member variable "sc_timer_task" of the "nilfs_sc_info" structure, which was added when timer_setup() was adopted to retain a reference to the log writer thread's task even after it had terminated, is no longer needed, as it should be; we can simply use "sc_task" instead, which holds a reference to the log writer thread's task for its lifetime. So, eliminate "sc_timer_task" by this means. Link: https://lkml.kernel.org/r/20240826174116.5008-7-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: do not repair reserved inode bitmap in nilfs_new_inode()Ryusuke Konishi
After commit 93aef9eda1ce ("nilfs2: fix incorrect inode allocation from reserved inodes") is applied, the inode number returned by nilfs_ifile_create_inode() is guaranteed to always be greater than or equal to NILFS_USER_INO, so if the inode number is a reserved inode number (less than NILFS_USER_INO), the code to repair the bitmap immediately following it is no longer executed. So, delete it. Link: https://lkml.kernel.org/r/20240826174116.5008-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: eliminate the shared counter and spinlock for i_generationRyusuke Konishi
Use get_random_u32() as the source for inode->i_generation for new inodes, and eliminate the original source, the shared counter ns_next_generation along with its exclusive access spinlock ns_next_gen_lock. Link: https://lkml.kernel.org/r/20240826174116.5008-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: separate inode type information from i_state fieldRyusuke Konishi
In nilfs_iget_locked() and nilfs_ilookup(), which are used to find or obtain nilfs2 inodes, the nilfs_iget_args structure used to identify inodes has type information divided into multiple booleans, making type determination complicated. Simplify inode type determination by consolidating inode type information into an unsigned integer represented by a comibination of flags and by separating the type identification information for on-memory inodes from the i_state member in the nilfs_inode_info structure. Link: https://lkml.kernel.org/r/20240826174116.5008-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: use the BITS_PER_LONG macroRyusuke Konishi
The macros NILFS_BMAP_KEY_BIT and NILFS_BMAP_NEW_PTR_INIT calculate, within their definitions, the number of bits in an unsigned long variable. Use the BITS_PER_LONG macro to make them simpler. Link: https://lkml.kernel.org/r/20240826174116.5008-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: use common implementation of file typeHuang Xiaojia
Patch series "nilfs2: assorted cleanups". This is a collection of cleanup patches, with only the last three focused on the log writer thread, the rest are miscellaneous. Patches 1/8, 4/8, and 7/8 adopt common implementations, 2/8 uses a generic macro, 5/8 removes dead code, 6/8 removes an unnecessary reference, and 3/8 and 8/8 each simplify a paticular messy implementation. This patch (of 8): Deduplicate the nilfs2 file type conversion implementation. Link: https://lkml.kernel.org/r/20240826174116.5008-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240815013442.1220909-1-huangxiaojia2@huawei.com Link: https://lkml.kernel.org/r/20240826174116.5008-2-konishi.ryusuke@gmail.com Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nfs make use of str_false_true helperHongbo Li
The helper str_false_true() was introduced to return "false/true" string literal. We can simplify this format by str_false_true. Link: https://lkml.kernel.org/r/20240827024517.914100-4-lihongbo22@huawei.com Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: Anna Schumaker <anna@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kees Cook <kees@kernel.org> Cc: Trond Myklebust <trondmy@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: do not propagate ENOENT error from nilfs_sufile_mark_dirty()Ryusuke Konishi
nilfs_sufile_mark_dirty(), which marks a block in the sufile metadata file as dirty in preparation for log writing, returns -ENOENT to the caller if the block containing the segment usage of the specified segment is missing. This internal code can propagate through the log writer to system calls such as fsync. To prevent this, treat this case as a filesystem error and return -EIO instead. Link: https://lkml.kernel.org/r/20240821154627.11848-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: do not propagate ENOENT error from sufile during GCRyusuke Konishi
nilfs_sufile_freev(), which is used to free segments in GC, aborts with -ENOENT if the target segment usage is on a hole block. This error only occurs if one of the segment numbers to be freed passed by the GC ioctl is invalid, so return -EINVAL instead. To avoid impairing readability, introduce a wrapper function that encapsulates error handling including the error code conversion (and error message output). Link: https://lkml.kernel.org/r/20240821154627.11848-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: do not propagate ENOENT error from sufile during recoveryRyusuke Konishi
nilfs_sufile_free() returns the error code -ENOENT when the block where the segment usage should be placed does not exist (hole block case), but this error should not be propagated upwards to the mount system call. In nilfs_prepare_segment_for_recovery(), one of the recovery steps during mount, nilfs_sufile_free() is used and may return -ENOENT as is, so in that case return -EINVAL instead. Link: https://lkml.kernel.org/r/20240821154627.11848-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: treat missing cpfile header block as metadata corruptionRyusuke Konishi
The cpfile, a metadata file that holds metadata for checkpoint management, also has statistical information in its first block, and if reading this block fails, it receives the internal code -ENOENT and returns that code to the callers. As with sufile, to prevent this -ENOENT from being propagated to system calls, return -EIO instead when reading the header block fails. Link: https://lkml.kernel.org/r/20240821154627.11848-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: treat missing sufile header block as metadata corruptionRyusuke Konishi
Patch series "nilfs2: prevent unexpected ENOENT propagation". This series fixes potential issues where the result code -ENOENT, which is returned internally when a metadata file operation encouters a hole block, is exposed to user space without being properly handled. Several issues with the same cause leading to hangs or WARN_ON check failures have been reported by syzbot and fixed each time in the past. This collectively fixes the missing -ENOENT conversions that do not cause stability issues and are not covered by syzbot. This patch (of 5): The sufile, a metadata file that holds metadata for segment management, has statistical information in its first block, but if reading this block fails, it receives the internal code -ENOENT and returns it unchanged to the callers. To prevent this -ENOENT from being propagated to system calls, if reading the header block fails, return -EIO (or -EINVAL depending on the context) instead. Link: https://lkml.kernel.org/r/20240821154627.11848-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240821154627.11848-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01ocfs2: use max() to improve ocfs2_dlm_seq_show()Thorsten Blum
Use the max() macro to simplify the ocfs2_dlm_seq_show() function and improve its readability. Link: https://lkml.kernel.org/r/20240820021605.97887-3-thorsten.blum@toblux.com Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01ocfs2: fix shift-out-of-bounds UBSAN bug in ocfs2_verify_volume()qasdev
This patch addresses a shift-out-of-bounds error in the ocfs2_verify_volume() function, identified by UBSAN. The bug was triggered by an invalid s_clustersize_bits value (e.g., 1548), which caused the expression "1 << le32_to_cpu(di->id2.i_super.s_clustersize_bits)" to exceed the limits of a 32-bit integer, leading to an out-of-bounds shift. Link: https://lkml.kernel.org/r/ZsPvwQAXd5R/jNY+@hostname Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reported-by: syzbot <syzbot+f3fff775402751ebb471@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=f3fff775402751ebb471 Tested-by: syzbot <syzbot+f3fff775402751ebb471@syzkaller.appspotmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01ocfs2: fix unexpected zeroing of virtual diskChi Zhiling
In a guest virtual machine, we found that there is unexpected data zeroing problem detected occassionly: XFS (vdb): Mounting V5 Filesystem XFS (vdb): Ending clean mount XFS (vdb): Metadata CRC error detected at xfs_refcountbt_read_verify+0x2c/0xf0, xfs_refcountbt block 0x200028 XFS (vdb): Unmount and run xfs_repair XFS (vdb): First 128 bytes of corrupted metadata buffer: 00000000e0cd2f5e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cafd57f5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000d0298d7d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000f0698484: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000adb789a7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000000005292b878: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000885b4700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000fd4b4df7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (vdb): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x200028 len 8 error 74 XFS (vdb): Error -117 recovering leftover CoW allocations. XFS (vdb): xfs_do_force_shutdown(0x8) called from line 994 of file fs/xfs/xfs_mount.c. Return address = 000000003a53523a XFS (vdb): Corruption of in-memory data detected. Shutting down filesystem XFS (vdb): Please umount the filesystem and rectify the problem(s) It turns out that the root cause is from the physical host machine. More specifically, it is caused by the ocfs2. when the page_size is 64k, the block should advance by 16 each time instead of 1. This will lead to a wrong mapping from the page to the disk, which will zero some adjacent part of the disk. Link: https://lkml.kernel.org/r/20240815092141.1223238-1-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Suggested-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01ocfs2: remove custom swap functions in favor of built-in sort swapKuan-Wei Chiu
The custom swap functions used in ocfs2 do not perform any special operations and can be replaced with the built-in swap function of sort. This change not only reduces code size but also improves efficiency, especially in scenarios where CONFIG_RETPOLINE is enabled, as it makes indirect function calls more expensive. By using the built-in swap, we avoid these costly indirect function calls, leading to better performance. Link: https://lkml.kernel.org/r/20240810195316.186504-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: fix missing initial short descriptions of kernel-doc commentsRyusuke Konishi
Update some kernel-doc comments that are missing the initial short description and fix the following warnings output by the kernel-doc script: fs/nilfs2/bmap.c:353: warning: missing initial short description on line: * nilfs_bmap_lookup_dirty_buffers - fs/nilfs2/cpfile.c:708: warning: missing initial short description on line: * nilfs_cpfile_delete_checkpoint - fs/nilfs2/cpfile.c:972: warning: missing initial short description on line: * nilfs_cpfile_is_snapshot - fs/nilfs2/dat.c:275: warning: missing initial short description on line: * nilfs_dat_mark_dirty - fs/nilfs2/sufile.c:844: warning: missing initial short description on line: * nilfs_sufile_get_suinfo - Link: https://lkml.kernel.org/r/20240816074319.3253-9-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: fix inconsistencies in kernel-doc comments in segment.hRyusuke Konishi
Fix incorrect or missing variable names in the member variable descriptions in the nilfs_recovery_info and nilfs_sc_info structures, thereby eliminating the following warnings output by the kernel-doc script: fs/nilfs2/segment.h:49: warning: Function parameter or struct member 'ri_cno' not described in 'nilfs_recovery_info' fs/nilfs2/segment.h:49: warning: Function parameter or struct member 'ri_lsegs_start_seq' not described in 'nilfs_recovery_info' fs/nilfs2/segment.h:49: warning: Excess struct member 'ri_ri_cno' description in 'nilfs_recovery_info' fs/nilfs2/segment.h:49: warning: Excess struct member 'ri_lseg_start_seq' description in 'nilfs_recovery_info' fs/nilfs2/segment.h:177: warning: Function parameter or struct member 'sc_seq_accepted' not described in 'nilfs_sc_info' fs/nilfs2/segment.h:177: warning: Function parameter or struct member 'sc_timer_task' not described in 'nilfs_sc_info' fs/nilfs2/segment.h:177: warning: Excess struct member 'sc_seq_accept' description in 'nilfs_sc_info' Link: https://lkml.kernel.org/r/20240816074319.3253-8-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: describe the members of nilfs_bmap_operations structureRyusuke Konishi
Add missing member variable descriptions in the kernel-doc comments for the nilfs_bmap_operations structure, hiding the internal operations with the "private:" tag. This eliminates the following warnings output by the kernel-doc script: fs/nilfs2/bmap.h:74: warning: Function parameter or struct member 'bop_lookup' not described in 'nilfs_bmap_operations' fs/nilfs2/bmap.h:74: warning: Function parameter or struct member 'bop_lookup_contig' not described in 'nilfs_bmap_operations' ... fs/nilfs2/bmap.h:74: warning: Function parameter or struct member 'bop_gather_data' not described in 'nilfs_bmap_operations' Link: https://lkml.kernel.org/r/20240816074319.3253-7-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: add missing description of nilfs_btree_path structureRyusuke Konishi
Add missing kernel-doc comment for the 'bp_ctxt' member variable of the nilfs_btree_path structure, and eliminate the following warning output by the kenrel-doc script: fs/nilfs2/btree.h:39: warning: Function parameter or struct member 'bp_ctxt' not described in 'nilfs_btree_path' Link: https://lkml.kernel.org/r/20240816074319.3253-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: fix incorrect kernel-doc declaration of nilfs_palloc_req structureRyusuke Konishi
The "struct" keyword is missing from the kernel-doc comment of the nilfs_palloc_req structure, so add it to eliminate the following warning output by the kernel-doc script: fs/nilfs2/alloc.h:46: warning: cannot understand function prototype: 'struct nilfs_palloc_req ' Link: https://lkml.kernel.org/r/20240816074319.3253-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: improve kernel-doc comments for b-tree node helpersRyusuke Konishi
Revise kernel-doc comments for helper functions related to changing the search key for b-tree node blocks, and eliminate the following warnings output by the kernel-doc script: fs/nilfs2/btnode.c:175: warning: Function parameter or struct member 'btnc' not described in 'nilfs_btnode_prepare_change_key' fs/nilfs2/btnode.c:175: warning: Function parameter or struct member 'ctxt' not described in 'nilfs_btnode_prepare_change_key' fs/nilfs2/btnode.c:238: warning: Function parameter or struct member 'btnc' not described in 'nilfs_btnode_commit_change_key' fs/nilfs2/btnode.c:238: warning: Function parameter or struct member 'ctxt' not described in 'nilfs_btnode_commit_change_key' fs/nilfs2/btnode.c:278: warning: Function parameter or struct member 'btnc' not described in 'nilfs_btnode_abort_change_key' fs/nilfs2/btnode.c:278: warning: Function parameter or struct member 'ctxt' not described in 'nilfs_btnode_abort_change_key' Link: https://lkml.kernel.org/r/20240816074319.3253-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: add missing argument descriptions for ioctl-related helpersRyusuke Konishi
Add missing argument descriptions and return value information to the kernel-doc comments for ioctl helper functions, and eliminate the following warnings output by the kernel-doc script: fs/nilfs2/ioctl.c:120: warning: Function parameter or struct member 'dentry' not described in 'nilfs_fileattr_get' fs/nilfs2/ioctl.c:120: warning: Function parameter or struct member 'fa' not described in 'nilfs_fileattr_get' fs/nilfs2/ioctl.c:133: warning: Function parameter or struct member 'idmap' not described in 'nilfs_fileattr_set' fs/nilfs2/ioctl.c:133: warning: Function parameter or struct member 'dentry' not described in 'nilfs_fileattr_set' fs/nilfs2/ioctl.c:133: warning: Function parameter or struct member 'fa' not described in 'nilfs_fileattr_set' fs/nilfs2/ioctl.c:164: warning: Function parameter or struct member 'inode' not described in 'nilfs_ioctl_getversion' fs/nilfs2/ioctl.c:164: warning: Function parameter or struct member 'argp' not described in 'nilfs_ioctl_getversion' Link: https://lkml.kernel.org/r/20240816074319.3253-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: add missing argument description for __nilfs_error()Ryusuke Konishi
Patch series "This series fixes a number of formatting issues in kernel doc comments" This series fixes a number of formatting issues in kernel doc comments that were detected as warnings by the kernel-doc script, making violations more noticeable when adding or modifying kernel doc. There are still warnings output by "kernel-doc -Wall", but they are widespread, so I plan to fix them at another time while considering priorities. This patch (of 8): Add missing argument description to __nilfs_error function and remove the following warnings from kernel-doc script output: fs/nilfs2/super.c:121: warning: Function parameter or struct member 'sb' not described in '__nilfs_error' fs/nilfs2/super.c:121: warning: Function parameter or struct member 'function' not described in '__nilfs_error' fs/nilfs2/super.c:121: warning: Function parameter or struct member 'fmt' not described in '__nilfs_error' Link: https://lkml.kernel.org/r/20240816074319.3253-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240816074319.3253-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: do not output warnings when clearing dirty buffersRyusuke Konishi
After detecting file system corruption and degrading to a read-only mount, dirty folios and buffers in the page cache are cleared, and a large number of warnings are output at that time, often filling up the kernel log. In this case, since the degrading to a read-only mount is output to the kernel log, these warnings are not very meaningful, and are rather a nuisance in system management and debugging. The related nilfs2-specific page/folio routines have a silent argument that suppresses the warning output, but since it is not currently used meaningfully, remove both the silent argument and the warning output. Link: https://lkml.kernel.org/r/20240816090128.4561-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>