summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
12 daysMerge branch 'no-rebase-mnt_ns_tree_remove'Christian Brauner
Bring in the fix for removing a mount namespace from the mount namespace rbtree and list. Signed-off-by: Christian Brauner <brauner@kernel.org>
12 daysmnt: use ns_common_init()Christian Brauner
Don't cargo-cult the same thing over and over. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 daysiomap: error out on file IO when there is no inline_data bufferDarrick J. Wong
Return IO errors if an ->iomap_begin implementation returns an IOMAP_INLINE buffer but forgets to set the inline_data pointer. Filesystems should never do this, but we could help fs developers (me) fix their bugs by handling this more gracefully than crashing the kernel. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/175803480324.966383.7414345025943296442.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 daysiomap: trace iomap_zero_iter zeroing activitiesDarrick J. Wong
Trace which bytes actually get zeroed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/175803480303.966383.2380024013746734540.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 daysfs: add might_sleep() annotation to iput() and moreMax Kellermann
When iput() drops the reference counter to zero, it may sleep via inode_wait_for_writeback(). This happens rarely because it's usually the dcache which evicts inodes, but really iput() should only ever be called in contexts where sleeping is allowed. This annotation allows finding buggy callers. Additionally, this patch annotates a few low-level functions that can call iput() conditionally. Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/20250917153632.2228828-1-max.kellermann@ionos.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 dayswriteback: Add tracepoint to track pending inode switchesJan Kara
Add trace_inode_switch_wbs_queue tracepoint to allow insight into how many inodes are queued to switch their bdi_writeback structure. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 dayswriteback: Avoid excessively long inode switching timesJan Kara
With lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits We need to maintain b_dirty list ordering to keep writeback happy so instead of sorting inode into appropriate place just append it at the end of the list and clobber dirtied_time_when. This may result in inode writeback starting later after cgroup switch however cgroup switches are rare so it shouldn't matter much. Since the cgroup had write access to the inode, there are no practical concerns of the possible DoS issues. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 dayswriteback: Avoid softlockup when switching many inodesJan Kara
process_inode_switch_wbs_work() can be switching over 100 inodes to a different cgroup. Since switching an inode requires counting all dirty & under-writeback pages in the address space of each inode, this can take a significant amount of time. Add a possibility to reschedule after processing each inode to avoid softlockups. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 dayswriteback: Avoid contention on wb->list_lock when switching inodesJan Kara
There can be multiple inode switch works that are trying to switch inodes to / from the same wb. This can happen in particular if some cgroup exits which owns many (thousands) inodes and we need to switch them all. In this case several inode_switch_wbs_work_fn() instances will be just spinning on the same wb->list_lock while only one of them makes forward progress. This wastes CPU cycles and quickly leads to softlockup reports and unusable system. Instead of running several inode_switch_wbs_work_fn() instances in parallel switching to the same wb and contending on wb->list_lock, run just one work item per wb and manage a queue of isw items switching to this wb. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
13 dayssmb: client: fix smbdirect_recv_io leak in smbd_negotiate() error pathStefan Metzmacher
During tests of another unrelated patch I was able to trigger this error: Objects remaining on __kmem_cache_shutdown() Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
13 dayssmb: client: fix file open check in __cifs_unlink()Paulo Alcantara
Fix the file open check to decide whether or not silly-rename the file in SMB2+. Fixes: c5ea3065586d ("smb: client: fix data loss due to broken rename(2)") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Frank Sorenson <sorenson@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
13 daysxfs: improve default maximum number of open zonesDamien Le Moal
For regular block devices using the zoned allocator, the default maximum number of open zones is set to 1/4 of the number of realtime groups. For a large capacity device, this leads to a very large limit. E.g. with a 26 TB HDD: mount /dev/sdb /mnt ... XFS (sdb): 95836 zones of 65536 blocks size (23959 max open) In turn such large limit on the number of open zones can lead, depending on the workload, on a very large number of concurrent write streams which devices generally do not handle well, leading to poor performance. Introduce the default limit XFS_DEFAULT_MAX_OPEN_ZONES, defined as 128 to match the hardware limit of most SMR HDDs available today, and use this limit to set mp->m_max_open_zones in xfs_calc_open_zones() instead of calling xfs_max_open_zones(), when the user did not specify a limit with the max_open_zones mount option. For the 26 TB HDD example, we now get: mount /dev/sdb /mnt ... XFS (sdb): 95836 zones of 65536 blocks (128 max open zones) This change does not prevent the user from specifying a lareger number for the open zones limit. E.g. mount -o max_open_zones=4096 /dev/sdb /mnt ... XFS (sdb): 95836 zones of 65536 blocks (4096 max open zones) Finally, since xfs_calc_open_zones() checks and caps the mp->m_max_open_zones limit against the value calculated by xfs_max_open_zones() for any type of device, this new default limit does not increase m_max_open_zones for small capacity devices. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: improve zone statistics messageDamien Le Moal
Reword the information message displayed in xfs_mount_zones() indicating the total zone count and maximum number of open zones. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: centralize error tag definitionsChristoph Hellwig
Right now 5 places in the kernel and one in xfsprogs need to be updated for each new error tag. Add a bit of macro magic so that only the error tag definition and a single table, which reside next to each other, need to be updated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: remove pointless externs in xfs_error.hChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: remove the expr argument to XFS_TEST_ERRORChristoph Hellwig
Don't pass expr to XFS_TEST_ERROR. Most calls pass a constant false, and the places that do pass an expression become cleaner by moving it out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: remove xfs_errortag_setChristoph Hellwig
xfs_errortag_set is only called by xfs_errortag_attr_store, , which does not need to validate the error tag, because it can only be called on valid error tags that had a sysfs attribute registered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysxfs: remove xfs_errortag_getChristoph Hellwig
xfs_errortag_get is only called by xfs_errortag_attr_show, which does not need to validate the error tag, because it can only be called on valid error tags that had a sysfs attribute registered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
13 daysbtrfs: reject invalid compression levelQu Wenruo
Inspired by recent changes to compression level parsing in 6db1df415d73fc ("btrfs: accept and ignore compression level for lzo") it turns out that we do not do any extra validation for compression level input string, thus allowing things like "compress=lzo:invalid" to be accepted without warnings. Although we accept levels that are beyond the supported algorithm ranges, accepting completely invalid level specification is not correct. Fix the too loose checks for compression level, by doing proper error handling of kstrtoint(), so that we will reject not only too large values (beyond int range) but also completely wrong levels like "lzo:invalid". Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysMerge tag 'mm-hotfixes-stable-2025-09-17-21-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 13 of these fixes are for MM. The usual shower of singletons, plus - fixes from Hugh to address various misbehaviors in get_user_pages() - patches from SeongJae to address a quite severe issue in DAMON - another series also from SeongJae which completes some fixes for a DAMON startup issue" * tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zram: fix slot write race condition nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* samples/damon/mtier: avoid starting DAMON before initialization samples/damon/prcl: avoid starting DAMON before initialization samples/damon/wsse: avoid starting DAMON before initialization MAINTAINERS: add Lance Yang as a THP reviewer MAINTAINERS: add Jann Horn as rmap reviewer mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control mm/damon/core: introduce damon_call_control->dealloc_on_cancel mm: folio_may_be_lru_cached() unless folio_test_large() mm: revert "mm: vmscan.c: fix OOM on swap stress test" mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" mm/gup: local lru_add_drain() to avoid lru_add_drain_all() mm/gup: check ref_count instead of lru before migration
13 daysbtrfs: ref-verify: handle damaged extent root treeDavid Sterba
Syzbot hits a problem with enabled ref-verify, ignorebadroots and a fuzzed/damaged extent tree. There's no fallback option like in other places that can deal with it so disable the whole ref-verify as it is just a debugging feature. Reported-by: syzbot+9c3e0cdfbfe351b0bc0e@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/0000000000001b6052062139be1c@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: tree-checker: fix the incorrect inode ref size checkQu Wenruo
[BUG] Inside check_inode_ref(), we need to make sure every structure, including the btrfs_inode_extref header, is covered by the item. But our code is incorrectly using "sizeof(iref)", where @iref is just a pointer. This means "sizeof(iref)" will always be "sizeof(void *)", which is much smaller than "sizeof(struct btrfs_inode_extref)". This will allow some bad inode extrefs to sneak in, defeating tree-checker. [FIX] Fix the typo by calling "sizeof(*iref)", which is the same as "sizeof(struct btrfs_inode_extref)", and will be the correct behavior we want. Fixes: 71bf92a9b877 ("btrfs: tree-checker: Add check for INODE_REF") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 dayssmb: client: let smbd_destroy() call ↵Stefan Metzmacher
disable_work_sync(&info->post_send_credits_work) In smbd_destroy() we may destroy the memory so we better wait until post_send_credits_work is no longer pending and will never be started again. I actually just hit the case using rxe: WARNING: CPU: 0 PID: 138 at drivers/infiniband/sw/rxe/rxe_verbs.c:1032 rxe_post_recv+0x1ee/0x480 [rdma_rxe] ... [ 5305.686979] [ T138] smbd_post_recv+0x445/0xc10 [cifs] [ 5305.687135] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687149] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687185] [ T138] ? __pfx_smbd_post_recv+0x10/0x10 [cifs] [ 5305.687329] [ T138] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 5305.687356] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687368] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687378] [ T138] ? _raw_spin_unlock_irqrestore+0x11/0x60 [ 5305.687389] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687399] [ T138] ? get_receive_buffer+0x168/0x210 [cifs] [ 5305.687555] [ T138] smbd_post_send_credits+0x382/0x4b0 [cifs] [ 5305.687701] [ T138] ? __pfx_smbd_post_send_credits+0x10/0x10 [cifs] [ 5305.687855] [ T138] ? __pfx___schedule+0x10/0x10 [ 5305.687865] [ T138] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 5305.687875] [ T138] ? queue_delayed_work_on+0x8e/0xa0 [ 5305.687889] [ T138] process_one_work+0x629/0xf80 [ 5305.687908] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687917] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687933] [ T138] worker_thread+0x87f/0x1570 ... It means rxe_post_recv was called after rdma_destroy_qp(). This happened because put_receive_buffer() was triggered by ib_drain_qp() and called: queue_work(info->workqueue, &info->post_send_credits_work); Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
13 dayssmb: client: use disable[_delayed]_work_sync in smbdirect.cStefan Metzmacher
This makes it safer during the disconnect and avoids requeueing. It's ok to call disable[delayed_]work[_sync]() more than once. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 050b8c374019 ("smbd: Make upper layer decide when to destroy the transport") Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
13 dayssmb: client: fix filename matching of deferred filesPaulo Alcantara
Fix the following case where the client would end up closing both deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call in cifs_close_deferred_file_under_dentry(): fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666); fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd1); close(fd2); unlink("foo"); Fixes: e3fc065682eb ("cifs: Deferred close performance improvements") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Cc: Frank Sorenson <sorenson@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
13 dayssmb: client: let recv_done verify data_offset, data_length and ↵Stefan Metzmacher
remaining_data_length This is inspired by the related server fixes. Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
13 daysMerge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Two fixes for remaining_data_length and offset checks in receive path - Don't go over max SGEs which caused smbdirect send to fail (and trigger disconnect) * tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
14 daysfsverity: Use 2-way interleaved SHA-256 hashing when supportedEric Biggers
When the crypto library provides an optimized implementation of sha256_finup_2x(), use it to interleave the hashing of pairs of data blocks. On some CPUs this nearly doubles hashing performance. The increase in overall throughput of cold-cache fsverity reads that I'm seeing on arm64 and x86_64 is roughly 35% (though this metric is hard to measure as it jumps around a lot). For now this is only done on the verification path, and only for data blocks, not Merkle tree blocks. We could use sha256_finup_2x() on Merkle tree blocks too, but that is less important as there aren't as many Merkle tree blocks as data blocks, and that would require some additional code restructuring. We could also use sha256_finup_2x() to accelerate building the Merkle tree, but verification performance is more important. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
14 daysfsverity: Remove inode parameter from fsverity_hash_block()Eric Biggers
Due to the conversion from crypto_shash to the library API, fsverity_hash_block() can no longer fail. Therefore, the inode parameter, which was used only to print an error message in the case of a failure, is no longer necessary. Remove it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
14 daysMerge tag 'for-6.17-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in zoned mode, turn assertion to proper code when reserving space in relocation block group - fix search key of extended ref (hardlink) when replaying log - fix initialization of file extent tree on filesystems without no-holes feature - add harmless data race annotation to block group comparator * tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: annotate block group access with data_race() when sorting for reclaim btrfs: initialize inode::file_extent_tree after i_mode has been set btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg() btrfs: fix invalid extref key setup when replaying dentry
2025-09-17fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabledBabu Moger
rdt_resource::resctrl_mon::mbm_assign_on_mkdir determines if a counter will automatically be assigned to an RMID, MBM event pair when its associated monitor group is created via mkdir. Testing shows that counters are always automatically assigned to new monitor groups, whether mbm_assign_on_mkdir is set or not. To support automatic counter assignment the check for mbm_assign_on_mkdir should be in rdtgroup_assign_cntrs() that assigns counters during monitor group creation. Instead, the check for mbm_assign_on_mkdir is in rdtgroup_unassign_cntrs() that is called on monitor group deletion from where counters should always be unassigned, whether mbm_assign_on_mkdir is set or not. Fix automatic counter assignment by moving the mbm_assign_on_mkdir check from rdtgroup_unassign_cntrs() to rdtgroup_assign_cntrs(). [ bp: Replace commit message with Reinette's version. ] Fixes: ef712fe97ec57 ("fs/resctrl: Auto assign counters on mkdir and clean up on group removal") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com>
2025-09-16xfs: move the XLOG_REG_ constants out of xfs_log_format.hChristoph Hellwig
These are purely in-memory values and not used at all in xfsprogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: adjust the hint based zone allocation policyHans Holmberg
As we really can't make any general assumptions about files that don't have any life time hint set or are set to "NONE", adjust the allocation policy to avoid co-locating data from those files with files with a set life time. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: refactor hint based zone allocationHans Holmberg
Replace the co-location code with a matrix that makes it more clear on how the decisions are made. The matrix contains scores for zone/file hint combinations. A "GOOD" score for an open zone will result in immediate co-location while "OK" combinations will only be picked if we cannot open a new zone. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: fix log CRC mismatches between i386 and other architecturesChristoph Hellwig
When mounting file systems with a log that was dirtied on i386 on other architectures or vice versa, log recovery is unhappy: [ 11.068052] XFS (vdb): Torn write (CRC failure) detected at log block 0x2. Truncating head block from 0xc. This is because the CRCs generated by i386 and other architectures always diff. The reason for that is that sizeof(struct xlog_rec_header) returns different values for i386 vs the rest (324 vs 328), because the struct is not sizeof(uint64_t) aligned, and i386 has odd struct size alignment rules. This issue goes back to commit 13cdc853c519 ("Add log versioning, and new super block field for the log stripe") in the xfs-import tree, which adds log v2 support and the h_size field that causes the unaligned size. At that time it only mattered for the crude debug only log header checksum, but with commit 0e446be44806 ("xfs: add CRC checks to the log") it became a real issue for v5 file system, because now there is a proper CRC, and regular builds actually expect it match. Fix this by allowing checksums with and without the padding. Fixes: 0e446be44806 ("xfs: add CRC checks to the log") Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: rename the old_crc variable in xlog_recover_processChristoph Hellwig
old_crc is a very misleading name. Rename it to expected_crc as that described the usage much better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_log_iovec_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_qoff_logformat_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_dq_logformat_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_buf_log_format_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_efd_log_format_64_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the unused xfs_efd_log_format_32_t typedefChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_efd_log_format_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_efi_log_format_64_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_efi_log_format_32_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_efi_log_format_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_extent64_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_extent32_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_extent_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Also fix up the comment about the struct xfs_extent definition to be correct and read more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-09-16xfs: remove the xfs_trans_header_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>