summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-12-06bcachefs: don't attempt rw on unfreeze when shutdownBrian Foster
The internal freeze mechanism in bcachefs mostly reuses the generic rw<->ro transition code. If the fs happens to shutdown during or after freeze, a transition back to rw can fail. This is expected, but returning an error from the unfreeze callout prevents the filesystem from being unfrozen. Skip the read write transition if the fs is shutdown. This allows the fs to unfreeze at the vfs level so writes will no longer block, but will still fail due to the emergency read-only state of the fs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06bcachefs: Fix creating snapshot with implict sourceKent Overstreet
When creating a snapshot without specifying the source subvolume, we use the subvolume containing the new snapshot. Previously, this worked if the directory containing the new snapshot was the subvolume root - but we were using the incorrect helper, and got a subvolume ID of 0 when the parent directory wasn't the root of the subvolume, causing an emergency read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-05cifs: Fix non-availability of dedup breaking generic/304David Howells
Deduplication isn't supported on cifs, but cifs doesn't reject it, instead treating it as extent duplication/cloning. This can cause generic/304 to go silly and run for hours on end. Fix cifs to indicate EOPNOTSUPP if REMAP_FILE_DEDUP is set in ->remap_file_range(). Note that it's unclear whether or not commit b073a08016a1 is meant to cause cifs to return an error if REMAP_FILE_DEDUP. Fixes: b073a08016a1 ("cifs: fix that return -EINVAL when do dedupe operation") Cc: stable@vger.kernel.org Suggested-by: Dave Chinner <david@fromorbit.com> cc: Xiaoli Feng <fengxiaoli0714@gmail.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: Darrick Wong <darrick.wong@oracle.com> cc: fstests@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3876191.1701555260@warthog.procyon.org.uk/ Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05smb: client: fix potential NULL deref in parse_dfs_referrals()Paulo Alcantara
If server returned no data for FSCTL_DFS_GET_REFERRALS, @dfs_rsp will remain NULL and then parse_dfs_referrals() will dereference it. Fix this by returning -EIO when no output data is returned. Besides, we can't fix it in SMB2_ioctl() as some FSCTLs are allowed to return no data as per MS-SMB2 2.2.32. Fixes: 9d49640a21bf ("CIFS: implement get_dfs_refer for SMB2+") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05ksmbd: downgrade RWH lease caching state to RH for directoryNamjae Jeon
RWH(Read + Write + Handle) caching state is not supported for directory. ksmbd downgrade it to RH for directory if client send RWH caching lease state. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05ksmbd: set v2 lease capabilityNamjae Jeon
Set SMB2_GLOBAL_CAP_DIRECTORY_LEASING to ->capabilities to inform server support directory lease to client. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05ksmbd: set epoch in create context v2 leaseNamjae Jeon
To support v2 lease(directory lease), ksmbd set epoch in create context v2 lease response. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05ksmbd: fix memory leak in smb2_lock()Zizhi Wo
In smb2_lock(), if setup_async_work() executes successfully, work->cancel_argv will bind the argv that generated by kmalloc(). And release_async_work() is called in ksmbd_conn_try_dequeue_request() or smb2_lock() to release argv. However, when setup_async_work function fails, work->cancel_argv has not been bound to the argv, resulting in the previously allocated argv not being released. Call kfree() to fix it. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04bcachefs: Don't run indirect extent trigger unless inserting/deletingKent Overstreet
This fixes a transaction path overflow reported in the snapshot deletion path, when moving extents to the correct snapshot. The root of the issue is that creating/deleting a reflink pointer can generate an unbounded number of updates, if it is allowed to reference an unbounded number of indirect extents; to prevent this, merging of reflink pointers has been disabled. But there's a hole, which is that copygc/rebalance may fragment existing extents in the course of moving them around, and if an indirect extent becomes too fragmented we'll then become unable to delete the reflink pointer. The eventual solution is going to be to tweak trigger handling so that we can process large reflink pointers incrementally when necessary, and notice that trigger updates don't need to be run for the part of the reflink pointer not changing. That is going to be a bigger project though, for another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Convert compression_stats to for_each_btree_key2Kent Overstreet
for_each_btree_key2() runs each loop iteration in a btree transaction, and thus does not cause SRCU lock hold time problems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Fix bch2_extent_drop_ptrs() callKent Overstreet
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and non-extents iterators. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Fix a journal deadlock in replayKent Overstreet
Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs; Don't use btree write buffer until journal replay is finishedKent Overstreet
The keys being replayed by journal replay have to be synchronized with updates by other threads that overwrite them. We rely on btree node locks for synchronizing - but since btree write buffer updates take no btree locks, that won't work. Instead, simply disable using the btree write buffer until journal replay is finished. This fixes a rare backpointers error in the merge_torture_flakey test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04cifs: Fix flushing, invalidation and file size with FICLONEDavid Howells
Fix a number of issues in the cifs filesystem implementation of the FICLONE ioctl in cifs_remap_file_range(). This is analogous to the previously fixed bug in cifs_file_copychunk_range() and can share the helper functions. Firstly, the invalidation of the destination range is handled incorrectly: We shouldn't just invalidate the whole file as dirty data in the file may get lost and we can't just call truncate_inode_pages_range() to invalidate the destination range as that will erase parts of a partial folio at each end whilst invalidating and discarding all the folios in the middle. We need to force all the folios covering the range to be reloaded, but we mustn't lose dirty data in them that's not in the destination range. Further, we shouldn't simply round out the range to PAGE_SIZE at each end as cifs should move to support multipage folios. Secondly, there's an issue whereby a write may have extended the file locally, but not have been written back yet. This can leaves the local idea of the EOF at a later point than the server's EOF. If a clone request is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE (which gets translated to -EIO locally) if the clone source extends past the server's EOF. Fix this by: (0) Flush the source region (already done). The flush does nothing and the EOF isn't moved if the source region has no dirty data. (1) Move the EOF to the end of the source region if it isn't already at least at this point. If we can't do this, for instance if the server doesn't support it, just flush the entire source file. (2) Find the folio (if present) at each end of the range, flushing it and increasing the region-to-be-invalidated to cover those in their entirety. (3) Fully discard all the folios covering the range as we want them to be reloaded. (4) Then perform the extent duplication. Thirdly, set i_size after doing the duplicate_extents operation as this value may be used by various things internally. stat() hides the issue because setting ->time to 0 causes cifs_getatr() to revalidate the attributes. These were causing the cifs/001 xfstest to fail. Fixes: 04b38d601239 ("vfs: pull btrfs clone API to vfs layer") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org cc: Christoph Hellwig <hch@lst.de> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04cifs: Fix flushing, invalidation and file size with copy_file_range()David Howells
Fix a number of issues in the cifs filesystem implementation of the copy_file_range() syscall in cifs_file_copychunk_range(). Firstly, the invalidation of the destination range is handled incorrectly: We shouldn't just invalidate the whole file as dirty data in the file may get lost and we can't just call truncate_inode_pages_range() to invalidate the destination range as that will erase parts of a partial folio at each end whilst invalidating and discarding all the folios in the middle. We need to force all the folios covering the range to be reloaded, but we mustn't lose dirty data in them that's not in the destination range. Further, we shouldn't simply round out the range to PAGE_SIZE at each end as cifs should move to support multipage folios. Secondly, there's an issue whereby a write may have extended the file locally, but not have been written back yet. This can leaves the local idea of the EOF at a later point than the server's EOF. If a copy request is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE (which gets translated to -EIO locally) if the copy source extends past the server's EOF. Fix this by: (0) Flush the source region (already done). The flush does nothing and the EOF isn't moved if the source region has no dirty data. (1) Move the EOF to the end of the source region if it isn't already at least at this point. If we can't do this, for instance if the server doesn't support it, just flush the entire source file. (2) Find the folio (if present) at each end of the range, flushing it and increasing the region-to-be-invalidated to cover those in their entirety. (3) Fully discard all the folios covering the range as we want them to be reloaded. (4) Then perform the copy. Thirdly, set i_size after doing the copychunk_range operation as this value may be used by various things internally. stat() hides the issue because setting ->time to 0 causes cifs_getatr() to revalidate the attributes. These were causing the generic/075 xfstest to fail. Fixes: 620d8745b35d ("Introduce cifs_copy_file_range()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAPAmir Goldstein
The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite(): generic/095 5s ... kernel BUG at fs/fuse/dir.c:1756! ... ? fuse_set_nowrite+0x3d/0xdd ? do_raw_spin_unlock+0x88/0x8f ? _raw_spin_unlock+0x2d/0x43 ? fuse_range_is_writeback+0x71/0x84 fuse_sync_writes+0xf/0x19 fuse_direct_io+0x167/0x5bd fuse_direct_write_iter+0xf0/0x146 Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated FUSE_DIRECT_IO_ALLOW_MMAP. Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()Hangyu Hua
fuse_dax_conn_free() will be called when fuse_fill_super_common() fails after fuse_dax_conn_alloc(). Then deactivate_locked_super() in virtio_fs_get_tree() will call virtio_kill_sb() to release the discarded superblock. This will call fuse_dax_conn_free() again in fuse_conn_put(), resulting in a possible double free. Fixes: 1dd539577c42 ("virtiofs: add a mount option to enable dax") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v5.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04fuse: share lookup state between submount and its parentKrister Johansen
Fuse submounts do not perform a lookup for the nodeid that they inherit from their parent. Instead, the code decrements the nlookup on the submount's fuse_inode when it is instantiated, and no forget is performed when a submount root is evicted. Trouble arises when the submount's parent is evicted despite the submount itself being in use. In this author's case, the submount was in a container and deatched from the initial mount namespace via a MNT_DEATCH operation. When memory pressure triggered the shrinker, the inode from the parent was evicted, which triggered enough forgets to render the submount's nodeid invalid. Since submounts should still function, even if their parent goes away, solve this problem by sharing refcounted state between the parent and its submount. When all of the references on this shared state reach zero, it's safe to forget the final lookup of the fuse nodeid. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: 1866d779d5d2 ("fuse: Allow fuse_fill_super_common() for submounts") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAPTyler Fanelli
Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its description indicates a purpose of reducing memory footprint. This may imply that it could be further used to relax other DIRECT_IO operations in the future. Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing, allow shared mmap of DIRECT_IO files while still bypassing the cache on regular reads and writes. [Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility. Signed-off-by: Tyler Fanelli <tfanelli@redhat.com> Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"Johannes Berg
This reverts commit f4acfcd4deb1 ("debugfs: annotate debugfs handlers vs. removal with lockdep"), it appears to have false positives and really shouldn't have been in the -rc series with the fixes anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20231202114936.fd55431ab160.I911aa53abeeca138126f690d383a89b13eb05667@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-03bcachefs: Don't drop journal pins in exit pathKent Overstreet
There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-03Merge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Two fallocate fixes - Fix warnings from new gcc - Two symlink fixes * tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client, common: fix fortify warnings cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved smb: client: report correct st_size for SMB and NFS symlinks smb: client: fix missing mode bits for SMB symlinks
2023-12-02Merge tag 'fs_for_v6.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 fix from Jan Kara: "Fix an ext2 bug introduced by changes in ext2 & iomap stepping on each other toes (apparently ext2 driver does not get much testing in linux-next)" * tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix ki_pos update for DIO buffered-io fallback case
2023-12-02Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs bugfixes from Kent Overstreet: - bcache & bcachefs were broken with CFI enabled; patch for closures to fix type punning - mark erasure coding as extra-experimental; there are incompatible disk space accounting changes coming for erasure coding, and I'm still seeing checksum errors in some tests - several fixes for durability-related issues (durability is a device specific setting where we can tell bcachefs that data on a given device should be counted as replicated x times) - a fix for a rare livelock when a btree node merge then updates a parent node that is almost full - fix a race in the device removal path, where dropping a pointer in a btree node to a device would be clobbered by an in flight btree write updating the btree node key on completion - fix one SRCU lock hold time warning in the btree gc code - ther's still a bunch more of these to fix - fix a rare race where we'd start copygc before initializing the "are we rw" percpu refcount; copygc would think we were already ro and die immediately * tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits) bcachefs: Extra kthread_should_stop() calls for copygc bcachefs: Convert gc_alloc_start() to for_each_btree_key2() bcachefs: Fix race between btree writes and metadata drop bcachefs: move journal seq assertion bcachefs: -EROFS doesn't count as move_extent_start_fail bcachefs: trace_move_extent_start_fail() now includes errcode bcachefs: Fix split_race livelock bcachefs: Fix bucket data type for stripe buckets bcachefs: Add missing validation for jset_entry_data_usage bcachefs: Fix zstd compress workspace size bcachefs: bpos is misaligned on big endian bcachefs: Fix ec + durability calculation bcachefs: Data update path won't accidentaly grow replicas bcachefs: deallocate_extra_replicas() bcachefs: Proper refcounting for journal_keys bcachefs: preserve device path as device name bcachefs: Fix an endianness conversion bcachefs: Start gc, copygc, rebalance threads after initing writes ref bcachefs: Don't stop copygc thread on device resize bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops ...
2023-11-30ext4: fix warning in ext4_dio_write_end_io()Jan Kara
The syzbot has reported that it can hit the warning in ext4_dio_write_end_io() because i_size < i_disksize. Indeed the reproducer creates a race between DIO IO completion and truncate expanding the file and thus ext4_dio_write_end_io() sees an inconsistent inode state where i_disksize is already updated but i_size is not updated yet. Since we are careful when setting up DIO write and consider it extending (and thus performing the IO synchronously with i_rwsem held exclusively) whenever it goes past either of i_size or i_disksize, we can use the same test during IO completion without risking entering ext4_handle_inode_extension() without i_rwsem held. This way we make it obvious both i_size and i_disksize are large enough when we report DIO completion without relying on unreliable WARN_ON. Reported-by: <syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com> Fixes: 91562895f803 ("ext4: properly sync file size update after O_SYNC direct IO") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20231130095653.22679-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: increase the journal IO's priorityZhang Yi
Current jbd2 only add REQ_SYNC for descriptor block, metadata log buffer, commit buffer and superblock buffer, the submitted IO could be throttled by writeback throttle in block layer, that could lead to priority inversion in some cases. The log IO looks like a kind of high priority metadata IO, so it should not be throttled by WBT like QOS policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from writeback throttle, and also add REQ_META together indicates it's a metadata IO. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: correct the printing of write_flags in jbd2_write_superblock()Zhang Yi
The write_flags print in the trace of jbd2_write_superblock() is not real, so move the modification before the trace. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKSBaokun Li
For files with logical blocks close to EXT_MAX_BLOCKS, the file size predicted in ext4_mb_normalize_request() may exceed EXT_MAX_BLOCKS. This can cause some blocks to be preallocated that will not be used. And after [Fixes], the following issue may be triggered: ========================================================= kernel BUG at fs/ext4/mballoc.c:4653! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 1 PID: 2357 Comm: xfs_io 6.7.0-rc2-00195-g0f5cc96c367f Hardware name: linux,dummy-virt (DT) pc : ext4_mb_use_inode_pa+0x148/0x208 lr : ext4_mb_use_inode_pa+0x98/0x208 Call trace: ext4_mb_use_inode_pa+0x148/0x208 ext4_mb_new_inode_pa+0x240/0x4a8 ext4_mb_use_best_found+0x1d4/0x208 ext4_mb_try_best_found+0xc8/0x110 ext4_mb_regular_allocator+0x11c/0xf48 ext4_mb_new_blocks+0x790/0xaa8 ext4_ext_map_blocks+0x7cc/0xd20 ext4_map_blocks+0x170/0x600 ext4_iomap_begin+0x1c0/0x348 ========================================================= Here is a calculation when adjusting ac_b_ex in ext4_mb_new_inode_pa(): ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); if (ac->ac_o_ex.fe_logical >= ex.fe_logical) goto adjust_bex; The problem is that when orig_goal_end is subtracted from ac_b_ex.fe_len it is still greater than EXT_MAX_BLOCKS, which causes ex.fe_logical to overflow to a very small value, which ultimately triggers a BUG_ON in ext4_mb_new_inode_pa() because pa->pa_free < len. The last logical block of an actual write request does not exceed EXT_MAX_BLOCKS, so in ext4_mb_normalize_request() also avoids normalizing the last logical block to exceed EXT_MAX_BLOCKS to avoid the above issue. The test case in [Link] can reproduce the above issue with 64k block size. Link: https://patchwork.kernel.org/project/fstests/list/?series=804003 Cc: <stable@kernel.org> # 6.4 Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231127063313.3734294-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-01Merge tag 'net-6.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and wifi. Current release - regressions: - neighbour: fix __randomize_layout crash in struct neighbour - r8169: fix deadlock on RTL8125 in jumbo mtu mode Previous releases - regressions: - wifi: - mac80211: fix warning at station removal time - cfg80211: fix CQM for non-range use - tools: ynl-gen: fix unexpected response handling - octeontx2-af: fix possible buffer overflow - dpaa2: recycle the RX buffer only after all processing done - rswitch: fix missing dev_kfree_skb_any() in error path Previous releases - always broken: - ipv4: fix uaf issue when receiving igmp query packet - wifi: mac80211: fix debugfs deadlock at device removal time - bpf: - sockmap: af_unix stream sockets need to hold ref for pair sock - netdevsim: don't accept device bound programs - selftests: fix a char signedness issue - dsa: mv88e6xxx: fix marvell 6350 probe crash - octeontx2-pf: restore TC ingress police rules when interface is up - wangxun: fix memory leak on msix entry - ravb: keep reverse order of operations in ravb_remove()" * tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits) net: ravb: Keep reverse order of operations in ravb_remove() net: ravb: Stop DMA in case of failures on ravb_open() net: ravb: Start TX queues after HW initialization succeeded net: ravb: Make write access to CXR35 first before accessing other EMAC registers net: ravb: Use pm_runtime_resume_and_get() net: ravb: Check return value of reset_control_deassert() net: libwx: fix memory leak on msix entry ice: Fix VF Reset paths when interface in a failed over aggregate bpf, sockmap: Add af_unix test with both sockets in map bpf, sockmap: af_unix stream sockets need to hold ref for pair sock tools: ynl-gen: always construct struct ynl_req_state ethtool: don't propagate EOPNOTSUPP from dumps ravb: Fix races between ravb_tx_timeout_work() and net related ops r8169: prevent potential deadlock in rtl8169_close r8169: fix deadlock on RTL8125 in jumbo mtu mode neighbour: Fix __randomize_layout crash in struct neighbour octeontx2-pf: Restore TC ingress police rules when interface is up octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64 net: stmmac: xgmac: Disable FPE MMC interrupts octeontx2-af: Fix possible buffer overflow ...
2023-11-30smb: client, common: fix fortify warningsDmitry Antipov
When compiling with gcc version 14.0.0 20231126 (experimental) and CONFIG_FORTIFY_SOURCE=y, I've noticed the following: In file included from ./include/linux/string.h:295, from ./include/linux/bitmap.h:12, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:9, from ./include/linux/preempt.h:79, from ./include/linux/spinlock.h:56, from ./include/linux/wait.h:9, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from fs/smb/client/smb2pdu.c:18: In function 'fortify_memcpy_chk', inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and: In file included from ./include/linux/string.h:295, from ./include/linux/bitmap.h:12, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:9, from ./include/linux/preempt.h:79, from ./include/linux/spinlock.h:56, from ./include/linux/wait.h:9, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from fs/smb/client/cifssmb.c:17: In function 'fortify_memcpy_chk', inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In both cases, the fortification logic inteprets calls to 'memcpy()' as an attempts to copy an amount of data which exceeds the size of the specified field (i.e. more than 8 bytes from __le64 value) and thus issues an overread warning. Both of these warnings may be silenced by using the convenient 'struct_group()' quirk. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-29Merge tag 'wireless-2023-11-29' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== wireless fixes: - debugfs had a deadlock (removal vs. use of files), fixes going through wireless ACKed by Greg - support for HT STAs on 320 MHz channels, even if it's not clear that should ever happen (that's 6 GHz), best not to WARN() - fix for the previous CQM fix that broke most cases - various wiphy locking fixes - various small driver fixes * tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: use wiphy locked debugfs for sdata/link wifi: mac80211: use wiphy locked debugfs helpers for agg_status wifi: cfg80211: add locked debugfs wrappers debugfs: add API to allow debugfs operations cancellation debugfs: annotate debugfs handlers vs. removal with lockdep debugfs: fix automount d_fsdata usage wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap wifi: avoid offset calculation on NULL pointer wifi: cfg80211: hold wiphy mutex for send_interface wifi: cfg80211: lock wiphy mutex for rfkill poll wifi: cfg80211: fix CQM for non-range use wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta() wifi: mt76: mt7925: fix typo in mt7925_init_he_caps wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config ==================== Link: https://lore.kernel.org/r/20231129150809.31083-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF movedDavid Howells
Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in smb3_insert_range(), to set i_size after extending the file on the server and before we do the copy to open the gap (as we don't clean up the EOF marker if the copy fails). Fixes: 7fe6fe95b936 ("cifs: add FALLOC_FL_INSERT_RANGE support") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-29cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF movedDavid Howells
Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in smb3_zero_range(), to set i_size after extending the file on the server. Fixes: 72c419d9b073 ("cifs: fix smb3_zero_range so it can expand the file-size when required") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-28bcachefs: Extra kthread_should_stop() calls for copygcKent Overstreet
This fixes a bug where going read-only was taking longer than it should have due to copygc forgetting to check kthread_should_stop() Additionally: fix a missing is_kthread check in bch2_move_ratelimit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Convert gc_alloc_start() to for_each_btree_key2()Kent Overstreet
This eliminates some SRCU warnings: for_each_btree_key2() runs every loop iteration in a distinct transaction context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix race between btree writes and metadata dropKent Overstreet
btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: move journal seq assertionKent Overstreet
journal_cur_seq() can legitimately be used outside of the journal lock, where this assert can race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: -EROFS doesn't count as move_extent_start_failKent Overstreet
The automated tests check if we've hit too many slowpath/error path events and fail the test - if we're just shutting down, that naturally shouldn't count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28smb: client: report correct st_size for SMB and NFS symlinksPaulo Alcantara
We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse points as they will be always zero. Set it to symlink target's length as specified by POSIX. This will make stat() family of syscalls return the correct st_size for such files. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-28smb: client: fix missing mode bits for SMB symlinksPaulo Alcantara
When instantiating inodes for SMB symlinks, add the mode bits from @cifs_sb->ctx->file_mode as we already do for the other special files. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-28bcachefs: trace_move_extent_start_fail() now includes errcodeKent Overstreet
Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix split_race livelockKent Overstreet
bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix bucket data type for stripe bucketsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Add missing validation for jset_entry_data_usageKent Overstreet
Validation was completely missing for replicas entries in the journal (not the superblock replicas section) - we can't have replicas entries pointing to invalid devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix zstd compress workspace sizeKent Overstreet
zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28Merge tag 'for-6.7-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few fixes and message updates: - for simple quotas, handle the case when a snapshot is created and the target qgroup already exists - fix a warning when file descriptor given to send ioctl is not writable - fix off-by-one condition when checking chunk maps - free pages when page array allocation fails during compression read, other cases were handled - fix memory leak on error handling path in ref-verify debugging feature - copy missing struct member 'version' in 64/32bit compat send ioctl - tree-checker verifies inline backref ordering - print messages to syslog on first mount and last unmount - update error messages when reading chunk maps" * tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: send: ensure send_fd is writable btrfs: free the allocated memory if btrfs_alloc_page_array() fails btrfs: fix 64bit compat send ioctl arguments not initializing version member btrfs: make error messages more clear when getting a chunk map btrfs: fix off-by-one when checking chunk map includes logical address btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod() btrfs: add dmesg output for first mount and last unmount of a filesystem btrfs: do not abort transaction if there is already an existing qgroup btrfs: tree-checker: add type and sequence check for inline backrefs
2023-11-27Merge tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Memory leak fix - Fix possible deadlock in open - Multiple SMB3 leasing (caching) fixes including: - incorrect open count (found via xfstest generic/002 with leases) - lease breaking incorrect serialization - lease break error handling fix - fix sending async response when lease pending - Async command fix * tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId ksmbd: release interim response after sending status pending response ksmbd: move oplock handling after unlock parent dir ksmbd: separately allocate ci per dentry ksmbd: fix possible deadlock in smb2_open ksmbd: prevent memory leak on error return
2023-11-27debugfs: add API to allow debugfs operations cancellationJohannes Berg
In some cases there might be longer-running hardware accesses in debugfs files, or attempts to acquire locks, and we want to still be able to quickly remove the files. Introduce a cancellations API to use inside the debugfs handler functions to be able to cancel such operations on a per-file basis. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27debugfs: annotate debugfs handlers vs. removal with lockdepJohannes Berg
When you take a lock in a debugfs handler but also try to remove the debugfs file under that lock, things can deadlock since the removal has to wait for all users to finish. Add lockdep annotations in debugfs_file_get()/_put() to catch such issues. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27debugfs: fix automount d_fsdata usageJohannes Berg
debugfs_create_automount() stores a function pointer in d_fsdata, but since commit 7c8d469877b1 ("debugfs: add support for more elaborate ->d_fsdata") debugfs_release_dentry() will free it, now conditionally on DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, but that's not set for the function pointer in automount. As a result, removing an automount dentry would attempt to free the function pointer. Luckily, the only user of this (tracing) never removes it. Nevertheless, it's safer if we just handle the fsdata in one way, namely either DEBUGFS_FSDATA_IS_REAL_FOPS_BIT or allocated. Thus, change the automount to allocate it, and use the real_fops in the data to indicate whether or not automount is filled, rather than adding a type tag. At least for now this isn't actually needed, but the next changes will require it. Also check in debugfs_file_get() that it gets only called on regular files, just to make things clearer. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>