summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-02-02Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"Andreas Gruenbacher
Commit "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" has several issues, some of which are non-trivial to fix, so revert it for now: https://lore.kernel.org/gfs2/20240202050230.GA875515@ZenIV/T/ This reverts commit dd00aaeb343255a8a30de671bd27bde79a47c8e5. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-02-01ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map typeZhang Yi
Since ext4_map_blocks() can recognize a delayed allocated only extent, make ext4_set_iomap() can also recognize it, and remove the useless separate check in ext4_iomap_begin_report(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01ext4: make ext4_map_blocks() distinguish delalloc only extentZhang Yi
Add a new map flag EXT4_MAP_DELAYED to indicate the mapping range is a delayed allocated only (not unwritten) one, and making ext4_map_blocks() can distinguish it, no longer mixing it with holes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01ext4: add a hole extent entry in cache after punchZhang Yi
In order to cache hole extents in the extent status tree and keep the hole length as long as possible, re-add a hole entry to the cache just after punching a hole. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01ext4: correct the hole length returned by ext4_map_blocks()Zhang Yi
In ext4_map_blocks(), if we can't find a range of mapping in the extents cache, we are calling ext4_ext_map_blocks() to search the real path and ext4_ext_determine_hole() to determine the hole range. But if the querying range was partially or completely overlaped by a delalloc extent, we can't find it in the real extent path, so the returned hole length could be incorrect. Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc extent, but it searches start from the expanded hole_start, doesn't start from the querying range, so the delalloc extent found could not be the one that overlaped the querying range, plus, it also didn't adjust the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle delalloc and insert adjusted hole extent in ext4_ext_determine_hole(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01ext4: convert to exclusive lock while inserting delalloc extentsZhang Yi
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem when inserting delalloc extents, it could be raced by another querying path of ext4_map_blocks() without i_rwsem, .e.g buffered read path. Suppose we buffered read a file containing just a hole, and without any cached extents tree, then it is raced by another delayed buffered write to the same area or the near area belongs to the same hole, and the new delalloc extent could be overwritten to a hole extent. pread() pwrite() filemap_read_folio() ext4_mpage_readpages() ext4_map_blocks() down_read(i_data_sem) ext4_ext_determine_hole() //find hole ext4_ext_put_gap_in_cache() ext4_es_find_extent_range() //no delalloc extent ext4_da_map_blocks() down_read(i_data_sem) ext4_insert_delayed_block() //insert delalloc extent ext4_es_insert_extent() //overwrite delalloc extent to hole This race could lead to inconsistent delalloc extents tree and incorrect reserved space counter. Fix this by converting to hold i_data_sem in exclusive mode when adding a new delalloc extent in ext4_da_map_blocks(). Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01ext4: refactor ext4_da_map_blocks()Zhang Yi
Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary parameters and branches, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01Merge tag 'exfat-for-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Fix BUG in iov_iter_revert reported from syzbot * tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix zero the unwritten part for dio read
2024-02-01smb: client: increase number of PDUs allowed in a compound requestPaulo Alcantara
With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send 5 commands in a single compound request in order to query xattrs from potential WSL reparse points, which should be fine as we currently allow up to 5 PDUs in a single compound request. However, if encryption is enabled (e.g. 'seal' mount option) or enforced by the server, current MAX_COMPOUND(5) won't be enough as we require an extra PDU for the transform header. Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we attempt to send a compound request that couldn't include the extra transform header. Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01cifs: failure to add channel on iface should bump up weightShyam Prasad N
After the interface selection policy change to do a weighted round robin, each iface maintains a weight_fulfilled. When the weight_fulfilled reaches the total weight for the iface, we know that the weights can be reset and ifaces can be allocated from scratch again. During channel allocation failures on a particular channel, weight_fulfilled is not incremented. If a few interfaces are inactive, we could end up in a situation where the active interfaces are all allocated for the total_weight, and inactive ones are all that remain. This can cause a situation where no more channels can be allocated further. This change fixes it by increasing weight_fulfilled, even when channel allocation failure happens. This could mean that if there are temporary failures in channel allocation, the iface weights may not strictly be adhered to. But that's still okay. Fixes: a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01cifs: do not search for channel if server is terminatingShyam Prasad N
In order to scale down the channels, the following sequence of operations happen: 1. server struct is marked for terminate 2. the channel is deallocated in the ses->chans array 3. at a later point the cifsd thread actually terminates the server Between 2 and 3, there can be calls to find the channel for a server struct. When that happens, there can be an ugly warning that's logged. But this is expected. So this change does two things: 1. in cifs_ses_get_chan_index, if server->terminate is set, return 2. always make sure server->terminate is set with chan_lock held Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01cifs: avoid redundant calls to disable multichannelShyam Prasad N
When the server reports query network interface info call as unsupported following a tree connect, it means that multichannel is unsupported, even if the server capabilities report otherwise. When this happens, cifs_chan_skip_or_disable is called to disable multichannel on the client. However, we only need to call this when multichannel is currently setup. Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01eventfs: Keep all directory links at 1Steven Rostedt (Google)
The directory link count in eventfs was somewhat bogus. It was only being updated when a directory child was being looked up and not on creation. One solution would be to update in get_attr() the link count by iterating the ei->children list and then adding 2. But that could slow down simple stat() calls, especially if it's done on all directories in eventfs. Another solution would be to add a parent pointer to the eventfs_inode and keep track of the number of sub directories it has on creation. But this adds overhead for something not really worthwhile. The solution decided upon is to keep all directory links in eventfs as 1. This tells user space not to rely on the hard links of directories. Which in this case it shouldn't. Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis.org Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Remove fsnotify*() functions from lookup()Steven Rostedt (Google)
The dentries and inodes are created when referenced in the lookup code. There's no reason to call fsnotify_*() functions when they are created by a reference. It doesn't make any sense. Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis.org Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: a376007917776 ("eventfs: Implement functions to create files and dirs when accessed"); Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Restructure eventfs_inode structure to be more condensedSteven Rostedt (Google)
Some of the eventfs_inode structure has holes in it. Rework the structure to be a bit more condensed, and also remove the no longer used llist field. Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Warn if an eventfs_inode is freed without is_freed being setSteven Rostedt (Google)
There should never be a case where an evenfs_inode is being freed without is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would mean there was one too many put_ei()s. Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Get rid of dentry pointers without refcountsLinus Torvalds
The eventfs inode had pointers to dentries (and child dentries) without actually holding a refcount on said pointer. That is fundamentally broken, and while eventfs tried to then maintain coherence with dentries going away by hooking into the '.d_iput' callback, that doesn't actually work since it's not ordered wrt lookups. There were two reasonms why eventfs tried to keep a pointer to a dentry: - the creation of a 'events' directory would actually have a stable dentry pointer that it created with tracefs_start_creating(). And it needed that dentry when tearing it all down again in eventfs_remove_events_dir(). This use is actually ok, because the special top-level events directory dentries are actually stable, not just a temporary cache of the eventfs data structures. - the 'eventfs_inode' (aka ei) needs to stay around as long as there are dentries that refer to it. It then used these dentry pointers as a replacement for doing reference counting: it would try to make sure that there was only ever one dentry associated with an event_inode, and keep a child dentry array around to see which dentries might still refer to the parent ei. This gets rid of the invalid dentry pointer use, and renames the one valid case to a different name to make it clear that it's not just any random dentry. The magic child dentry array that is kind of a "reverse reference list" is simply replaced by having child dentries take a ref to the ei. As does the directory dentries. That makes the broken use case go away. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Clean up dentry ops and add revalidate functionLinus Torvalds
In order for the dentries to stay up-to-date with the eventfs changes, just add a 'd_revalidate' function that checks the 'is_freed' bit. Also, clean up the dentry release to actually use d_release() rather than the slightly odd d_iput() function. We don't care about the inode, all we want to do is to get rid of the refcount to the eventfs data added by dentry->d_fsdata. It would probably be cleaner to make eventfs its own filesystem, or at least set its own dentry ops when looking up eventfs files. But as it is, only eventfs dentries use d_fsdata, so we don't really need to split these things up by use. Another thing that might be worth doing is to make all eventfs lookups mark their dentries as not worth caching. We could do that with d_delete(), but the DCACHE_DONTCACHE flag would likely be even better. As it is, the dentries are all freeable, but they only tend to get freed at memory pressure rather than more proactively. But that's a separate issue. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01eventfs: Remove unused d_parent pointer fieldLinus Torvalds
It's never used Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01tracefs: dentry lookup crapectomyLinus Torvalds
The dentry lookup for eventfs files was very broken, and had lots of signs of the old situation where the filesystem names were all created statically in the dentry tree, rather than being looked up dynamically based on the eventfs data structures. You could see it in the naming - how it claimed to "create" dentries rather than just look up the dentries that were given it. You could see it in various nonsensical and very incorrect operations, like using "simple_lookup()" on the dentries that were passed in, which only results in those dentries becoming negative dentries. Which meant that any other lookup would possibly return ENOENT if it saw that negative dentry before the data was then later filled in. You could see it in the immense amount of nonsensical code that didn't actually just do lookups. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.local.home Cc: stable@vger.kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-31cifs: make sure that channel scaling is done only onceShyam Prasad N
Following a successful cifs_tree_connect, we have the code to scale up/down the number of channels in the session. However, it is not protected by a lock today. As a result, this code can be executed by several processes that select the same channel. The core functions handle this well, as they pick chan_lock. However, we've seen cases where smb2_reconnect throws some warnings. To fix that, this change introduces a flags bitmap inside the cifs_ses structure. A new flag type is used to ensure that only one process enters this section at any time. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-31tracefs: Avoid using the ei->dentry pointer unnecessarilyLinus Torvalds
The eventfs_find_events() code tries to walk up the tree to find the event directory that a dentry belongs to, in order to then find the eventfs inode that is associated with that event directory. However, it uses an odd combination of walking the dentry parent, looking up the eventfs inode associated with that, and then looking up the dentry from there. Repeat. But the code shouldn't have back-pointers to dentries in the first place, and it should just walk the dentry parenthood chain directly. Similarly, 'set_top_events_ownership()' looks up the dentry from the eventfs inode, but the only reason it wants a dentry is to look up the superblock in order to look up the root dentry. But it already has the real filesystem inode, which has that same superblock pointer. So just pass in the superblock pointer using the information that's already there, instead of looking up extraneous data that is irrelevant. Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-31eventfs: Initialize the tracefs inode properlyLinus Torvalds
The tracefs-specific fields in the inode were not initialized before the inode was exposed to others through the dentry with 'd_instantiate()'. Move the field initializations up to before the d_instantiate. Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.478449628@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-31tracefs: Zero out the tracefs_inode when allocating itSteven Rostedt (Google)
eventfs uses the tracefs_inode and assumes that it's already initialized to zero. That is, it doesn't set fields to zero (like ti->private) after getting its tracefs_inode. This causes bugs due to stale values. Just initialize the entire structure to zero on allocation so there isn't any more surprises. This is a partial fix to access to ti->private. The assignment still needs to be made before the dentry is instantiated. Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.315825944@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-31btrfs: do not ASSERT() if the newly created subvolume already got readQu Wenruo
[BUG] There is a syzbot crash, triggered by the ASSERT() during subvolume creation: assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319 ------------[ cut here ]------------ kernel BUG at fs/btrfs/disk-io.c:1319! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60 <TASK> btrfs_get_new_fs_root+0xd3/0xf0 create_subvol+0xd02/0x1650 btrfs_mksubvol+0xe95/0x12b0 __btrfs_ioctl_snap_create+0x2f9/0x4f0 btrfs_ioctl_snap_create+0x16b/0x200 btrfs_ioctl+0x35f0/0x5cf0 __x64_sys_ioctl+0x19d/0x210 do_syscall_64+0x3f/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace 0000000000000000 ]--- [CAUSE] During create_subvol(), after inserting root item for the newly created subvolume, we would trigger btrfs_get_new_fs_root() to get the btrfs_root of that subvolume. The idea here is, we have preallocated an anonymous device number for the subvolume, thus we can assign it to the new subvolume. But there is really nothing preventing things like backref walk to read the new subvolume. If that happens before we call btrfs_get_new_fs_root(), the subvolume would be read out, with a new anonymous device number assigned already. In that case, we would trigger ASSERT(), as we really expect no one to read out that subvolume (which is not yet accessible from the fs). But things like backref walk is still possible to trigger the read on the subvolume. Thus our assumption on the ASSERT() is not correct in the first place. [FIX] Fix it by removing the ASSERT(), and just free the @anon_dev, reset it to 0, and continue. If the subvolume tree is read out by something else, it should have already get a new anon_dev assigned thus we only need to free the preallocated one. Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: forbid deleting live subvol qgroupBoris Burkov
If a subvolume still exists, forbid deleting its qgroup 0/subvolid. This behavior generally leads to incorrect behavior in squotas and doesn't have a legitimate purpose. Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: forbid creating subvol qgroupsBoris Burkov
Creating a qgroup 0/subvolid leads to various races and it isn't helpful, because you can't specify a subvol id when creating a subvol, so you can't be sure it will be the right one. Any requirements on the automatic subvol can be gratified by using a higher level qgroup and the inheritance parameters of subvol creation. Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: send: return EOPNOTSUPP on unknown flagsDavid Sterba
When some ioctl flags are checked we return EOPNOTSUPP, like for BTRFS_SCRUB_SUPPORTED_FLAGS, BTRFS_SUBVOL_CREATE_ARGS_MASK or fallocate modes. The EINVAL is supposed to be for a supported but invalid values or combination of options. Fix that when checking send flags so it's consistent with the rest. CC: stable@vger.kernel.org # 4.14+ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5rryOLzp3EKq8RTbjMHMHeaJubfpsVLF6H4qJnKCUR1w@mail.gmail.com/ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-30Merge tag 'erofs-for-6.8-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - fix an infinite loop issue of sub-page compressed data support found with lengthy stress tests on a 64k-page arm64 VM - optimize the temporary buffer allocation for low-memory scenarios, which can reduce 20.21% on average under a heavy multi-app launch benchmark workload - get rid of unnecessary GFP_NOFS * tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: relaxed temporary buffers allocation on readahead erofs: fix infinite loop due to a race of filling compressed_bvecs erofs: get rid of unneeded GFP_NOFS
2024-01-30xfs: remove conditional building of rt geometry validator functionsDarrick J. Wong
I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64 variant of the djwong-wtf git branch. Unfortunately, it took me a good hour to figure out that RT wasn't built because this is what got printed to dmesg: XFS (sda2): realtime geometry sanity check failed XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0 Whereas I would have expected: XFS (sda2): Not built with CONFIG_XFS_RT XFS (sda2): RT mount failed The root cause of these problems is the conditional compilation of the new functions xfs_validate_rtextents and xfs_compute_rextslog that I introduced in the two commits listed below. The !RT versions of these functions return false and 0, respectively, which causes primary superblock validation to fail, which explains the first message. Move the two functions to other parts of libxfs that are not conditionally defined by CONFIG_XFS_RT and remove the broken stubs so that validation works again. Fixes: e14293803f4e ("xfs: don't allow overly small or large realtime volumes") Fixes: a6a38f309afc ("xfs: make rextslog computation consistent with mkfs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-01-29Merge tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Revert a bad sanity check" * tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggy: Revert "jfs: fix shift-out-of-bounds in dbJoin"
2024-01-29Merge tag 'trace-v6.8-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "Two small fixes for tracefs and eventfs: - Fix register_snapshot_trigger() on allocation error If the snapshot fails to allocate, the register_snapshot_trigger() can still return success. If the call to tracing_alloc_snapshot_instance() returned anything but 0, it returned 0, but it should have been returning the error code from that allocation function. - Remove leftover code from tracefs doing a dentry walk on remount. The update_gid() function was called by the tracefs code on remount to update the gid of eventfs, but that is no longer the case, but that code wasn't deleted. Nothing calls it. Remove it" * tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracefs: remove stale 'update_gid' code tracing/trigger: Fix to return error if failed to alloc snapshot
2024-01-29Merge tag 'mm-hotfixes-stable-2024-01-28-23-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 11 are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: thp_get_unmapped_area must honour topdown preference mm: huge_memory: don't force huge page alignment on 32 bit userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory scs: add CONFIG_MMU dependency for vfree_atomic() mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range() mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty() selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag selftests: mm: fix map_hugetlb failure on 64K page size systems MAINTAINERS: supplement of zswap maintainers update stackdepot: make fast paths lock-less again stackdepot: add stats counters exported via debugfs mm, kmsan: fix infinite recursion due to RCU critical section mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again selftests/mm: switch to bash from sh MAINTAINERS: add man-pages git trees mm: memcontrol: don't throttle dying tasks on memory.high mm: mmap: map MAP_STACK to VM_NOHUGEPAGE uprobes: use pagesize-aligned virtual address when replacing pages selftests/mm: mremap_test: fix build warning ...
2024-01-29Revert "jfs: fix shift-out-of-bounds in dbJoin"Dave Kleikamp
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30. The added sanity check is incorrect. BUDMIN is not the wrong value and is too small. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-29fs/ntfs3: Slightly simplify ntfs_inode_printk()Christophe JAILLET
The size passed to snprintf() includes the space for the trailing space. So there is no reason here not to use all the available space. So remove the -1 when computing 'name_len'. While at it, use the size of the array directly instead of the intermediate 'name_len' variable. snprintf() also guaranties that the buffer if NULL terminated, so there is no need to write an additional trailing NULL "To be sure". Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Add ioctl operation for directories (FITRIM)Nekun
While ntfs3 supports discards, FITRIM ioctl() command has defined only for regular files. This may confuse users trying to invoke `fstrim` utility with the directory argument (for example, call `fstrim <mountpoint>` which is the common practice). In this case, ioctl() returns -ENOTTY without any error messages in kernel ring buffer, this may be easily interpreted as no support for discards in ntfs3 driver. Currently only FITRIM command implemented in ntfs_ioctl() and passed inode used only for dereferencing NTFS superblock, so no need for separate ioctl() handler for directories, just add existing ntfs_ioctl() handler to ntfs_dir_operations. Signed-off-by: Nekun <nekokun@firemail.cc> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fix oob in ntfs_listxattrEdward Adam Davis
The length of name cannot exceed the space occupied by ea. Reported-and-tested-by: syzbot+65e940cfb8f99a97aca7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fix an NULL dereference bugDan Carpenter
The issue here is when this is called from ntfs_load_attr_list(). The "size" comes from le32_to_cpu(attr->res.data_size) so it can't overflow on a 64bit systems but on 32bit systems the "+ 1023" can overflow and the result is zero. This means that the kmalloc will succeed by returning the ZERO_SIZE_PTR and then the memcpy() will crash with an Oops on the next line. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29xfs: reset XFS_ATTR_INCOMPLETE filter on node removalAndrey Albershteyn
In XFS_DAS_NODE_REMOVE_ATTR case, xfs_attr_mode_remove_attr() sets filter to XFS_ATTR_INCOMPLETE. The filter is then reset in xfs_attr_complete_op() if XFS_DA_OP_REPLACE operation is performed. The filter is not reset though if XFS just removes the attribute (args->value == NULL) with xfs_attr_defer_remove(). attr code goes to XFS_DAS_DONE state. Fix this by always resetting XFS_ATTR_INCOMPLETE filter. The replace operation already resets this filter in anyway and others are completed at this step hence don't need it. Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-01-29fs/ntfs3: Update inode->i_size after success write into compressed fileKonstantin Komarov
Reported-by: Giovanni Santini <giovannisantini93@yahoo.it> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fixed overflow check in mi_enum_attr()Konstantin Komarov
Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Correct function is_rst_area_validKonstantin Komarov
Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Use i_size_read and i_size_writeKonstantin Komarov
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Prevent generic message "attempt to access beyond end of device"Konstantin Komarov
It used in test environment. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cacheIsm Hong
Since the buffer cache for ntfs3 metadata is not released until the file system is unmounted, allocating from the movable zone may result in cma allocation failures. This is due to the page still being used by ntfs3, leading to migration failures. To address this, this commit use sb_bread_umovable() instead of sb_bread(). This change prevents allocation from the movable zone, ensuring compatibility with scenarios where the buffer head is not released until unmount. This patch is inspired by commit a8ac900b8163("ext4: use non-movable memory for the ext4 superblock"). The issue is found when playing video files stored in NTFS on the Android TV platform. During this process, the media parser reads the video file, causing ntfs3 to allocate buffer cache from the CMA area. Subsequently, the hardware decoder attempts to allocate memory from the same CMA area. However, the page is still in use by ntfs3, resulting in a migrate failure in alloc_contig_range(). The pinned page and allocating stacktrace reported by page owner shows below: page:ffffffff00b68880 refcount:3 mapcount:0 mapping:ffffff80046aa828 index:0xc0040 pfn:0x20fa4 aops:def_blk_aops ino:0 flags: 0x2020(active|private) page dumped because: migration failure page last allocated via order 0, migratetype Movable, gfp_mask 0x108c48 (GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), page_owner tracks the page as allocated prep_new_page get_page_from_freelist __alloc_pages_nodemask pagecache_get_page __getblk_gfp __bread_gfp ntfs_read_run_nb ntfs_read_bh mi_read ntfs_iget5 dir_search_u ntfs_lookup __lookup_slow lookup_slow walk_component path_lookupat Signed-off-by: Ism Hong <ism.hong@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-28bcachefs: unlock parent dir if entry is not found in subvolume deletionGuoyu Ou
Parent dir is locked by user_path_locked_at() before validating the required dentry. It should be unlocked if we can not perform the deletion. This fixes the problem: $ bcachefs subvolume delete not-exist-entry BCH_IOCTL_SUBVOLUME_DESTROY ioctl error: No such file or directory $ bcachefs subvolume delete not-exist-entry the second will stuck because the parent dir is locked in the previous deletion. Signed-off-by: Guoyu Ou <benogy@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-28bcachefs: Fix build on parisc by avoiding __multi3()Helge Deller
The gcc compiler on paric does support the __int128 type, although the architecture does not have native 128-bit support. The effect is, that the bcachefs u128_square() function will pull in the libgcc __multi3() helper, which breaks the kernel build when bcachefs is built as module since this function isn't currently exported in arch/parisc/kernel/parisc_ksyms.c. The build failure can be seen in the latest debian kernel build at: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=hppa&ver=6.7.1-1%7Eexp1&stamp=1706132569&raw=0 We prefer to not export that symbol, so fall back to the optional 64-bit implementation provided by bcachefs and thus avoid usage of __multi3(). Signed-off-by: Helge Deller <deller@gmx.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-28tracefs: remove stale 'update_gid' codeLinus Torvalds
The 'eventfs_update_gid()' function is no longer called, so remove it (and the helper function it uses). Link: https://lore.kernel.org/all/CAHk-=wj+DsZZ=2iTUkJ-Nojs9fjYMvPs1NuoM3yK7aTDtJfPYQ@mail.gmail.com/ Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-27Merge tag 'xfs-6.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Chandan Babu: - Fix read only mounts when using fsopen mount API * tag 'xfs-6.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: read only mounts with fsopen mount API are busted
2024-01-27Merge tag 'bcachefs-2024-01-26' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - fix for REQ_OP_FLUSH usage; this fixes filesystems going read only with -EOPNOTSUPP from the block layer. (this really should have gone in with the block layer patch causing the -EOPNOTSUPP, or should have gone in before). - fix an allocation in non-sleepable context - fix one source of srcu lock latency, on devices with terrible discard latency - fix a reattach_inode() issue in fsck * tag 'bcachefs-2024-01-26' of https://evilpiepirate.org/git/bcachefs: bcachefs: __lookup_dirent() works in snapshot, not subvol bcachefs: discard path uses unlock_long() bcachefs: fix incorrect usage of REQ_OP_FLUSH bcachefs: Add gfp flags param to bch2_prt_task_backtrace()