summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-11-24btrfs: move file_start_write() to after permission hookAmir Goldstein
In vfs code, file_start_write() is usually called after the permission hook in rw_verify_area(). btrfs_ioctl_encoded_write() in an exception to this rule. Move file_start_write() to after the rw_verify_area() check in encoded write to make the permission hook "start-write-safe". This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-9-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24remap_range: move file_start_write() to after permission hookAmir Goldstein
In vfs code, file_start_write() is usually called after the permission hook in rw_verify_area(). vfs_dedupe_file_range_one() is an exception to this rule. In vfs_dedupe_file_range_one(), move file_start_write() to after the the rw_verify_area() checks to make them "start-write-safe". This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-8-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24remap_range: move permission hooks out of do_clone_file_range()Amir Goldstein
In many of the vfs helpers, file permission hook is called before taking sb_start_write(), making them "start-write-safe". do_clone_file_range() is an exception to this rule. do_clone_file_range() has two callers - vfs_clone_file_range() and overlayfs. Move remap_verify_area() checks from do_clone_file_range() out to vfs_clone_file_range() to make them "start-write-safe". Overlayfs already has calls to rw_verify_area() with the same security permission hooks as remap_verify_area() has. The rest of the checks in remap_verify_area() are irrelevant for overlayfs that calls do_clone_file_range() offset 0 and positive length. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-7-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24splice: remove permission hook from iter_file_splice_write()Amir Goldstein
All the callers of ->splice_write(), (e.g. do_splice_direct() and do_splice()) already check rw_verify_area() for the entire range and perform all the other checks that are in vfs_write_iter(). Instead of creating another tiny helper for special caller, just open-code it. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-6-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24splice: move permission hook out of splice_file_to_pipe()Amir Goldstein
vfs_splice_read() has a permission hook inside rw_verify_area() and it is called from splice_file_to_pipe(), which is called from do_splice() and do_sendfile(). do_sendfile() already has a rw_verify_area() check for the entire range. do_splice() has a rw_verify_check() for the splice to file case, not for the splice from file case. Add the rw_verify_area() check for splice from file case in do_splice() and use a variant of vfs_splice_read() without rw_verify_area() check in splice_file_to_pipe() to avoid the redundant rw_verify_area() checks. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-5-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24splice: move permission hook out of splice_direct_to_actor()Amir Goldstein
vfs_splice_read() has a permission hook inside rw_verify_area() and it is called from do_splice_direct() -> splice_direct_to_actor(). The callers of do_splice_direct() (e.g. vfs_copy_file_range()) already call rw_verify_area() for the entire range, but the other caller of splice_direct_to_actor() (nfsd) does not. Add the rw_verify_area() checks in nfsd_splice_read() and use a variant of vfs_splice_read() without rw_verify_area() check in splice_direct_to_actor() to avoid the redundant rw_verify_area() checks. This is needed for fanotify "pre content" events. Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-4-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24splice: remove permission hook from do_splice_direct()Amir Goldstein
All callers of do_splice_direct() have a call to rw_verify_area() for the entire range that is being copied, e.g. by vfs_copy_file_range() or do_sendfile() before calling do_splice_direct(). The rw_verify_area() check inside do_splice_direct() is redundant and is called after sb_start_write(), so it is not "start-write-safe". Remove this redundant check. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-3-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24ovl: add permission hooks outside of do_splice_direct()Amir Goldstein
The main callers of do_splice_direct() also call rw_verify_area() for the entire range that is being copied, e.g. by vfs_copy_file_range() or do_sendfile() before calling do_splice_direct(). The only caller that does not have those checks for entire range is ovl_copy_up_file(). In preparation for removing the checks inside do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file(). For extra safety, perform minimal sanity checks from rw_verify_area() for non negative offsets also in the copy up do_splice_direct() loop without calling the file permission hooks. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-2-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-24bcachefs: deallocate_extra_replicas()Kent Overstreet
When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysKent Overstreet
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: preserve device path as device nameBrian Foster
Various userspace scripts/tools may expect mount entries in /proc/mounts to reflect the device path names used to mount the associated filesystem. bcachefs seems to normalize the device path to the underlying device name based on the block device. This confuses tools like fstests when the test devices might be lvm or device-mapper based. The default behavior for show_vfsmnt() appers to be to use the string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path at device superblock read time and to display it via ->show_devname(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Fix an endianness conversionKent Overstreet
cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Start gc, copygc, rebalance threads after initing writes refKent Overstreet
This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Don't stop copygc thread on device resizeKent Overstreet
copygc no longer has to scan the buckets, so it's no longer a problem if the number of buckets is changing while it's running. This also fixes a bug where we forgot to restart copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Make sure bch2_move_ratelimit() also waits for move_opsKent Overstreet
This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout while also issueing pending moves as reads complete. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: bch2_moving_ctxt_flush_all()Kent Overstreet
Introduce a new helper to flush all move IOs, and use it in a few places where we should have been. The new helper also drops btree locks before waiting on outstanding move writes, avoiding potential deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig optionKent Overstreet
We still have disk space accounting changes coming for erasure coding, and the changes won't be as strictly backwards compatible as they'd ought to be - specifically, we need to start accounting striped data under a separate counter in bch_alloc (which describes buckets). A fsck will suffice for upgrading/downgrading, but since erasure coding is the most incomplete major feature of bcachefs it still makes sense to put behind a separate kconfig option, so that users are fully aware. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet
Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-23ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on errorNamjae Jeon
ksmbd set ->op_state as OPLOCK_STATE_NONE on lease break ack error. op_state of lease should not be updated because client can send lease break ack again. This patch fix smb2.lease.breaking2 test failure. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncIdNamjae Jeon
Directly set SMB2_FLAGS_ASYNC_COMMAND flags and AsyncId in smb2 header of interim response instead of current response header. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: release interim response after sending status pending responseNamjae Jeon
Add missing release async id and delete interim response entry after sending status pending response. This only cause when smb2 lease is enable. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: move oplock handling after unlock parent dirNamjae Jeon
ksmbd should process secound parallel smb2 create request during waiting oplock break ack. parent lock range that is too large in smb2_open() causes smb2_open() to be serialized. Move the oplock handling to the bottom of smb2_open() and make it called after parent unlock. This fixes the failure of smb2.lease.breaking1 testcase. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: separately allocate ci per dentryNamjae Jeon
xfstests generic/002 test fail when enabling smb2 leases feature. This test create hard link file, but removeal failed. ci has a file open count to count file open through the smb client, but in the case of hard link files, The allocation of ci per inode cause incorrectly open count for file deletion. This patch allocate ci per dentry to counts open counts for hard link. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: fix possible deadlock in smb2_openNamjae Jeon
[ 8743.393379] ====================================================== [ 8743.393385] WARNING: possible circular locking dependency detected [ 8743.393391] 6.4.0-rc1+ #11 Tainted: G OE [ 8743.393397] ------------------------------------------------------ [ 8743.393402] kworker/0:2/12921 is trying to acquire lock: [ 8743.393408] ffff888127a14460 (sb_writers#8){.+.+}-{0:0}, at: ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393510] but task is already holding lock: [ 8743.393515] ffff8880360d97f0 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: ksmbd_vfs_kern_path_locked+0x181/0x670 [ksmbd] [ 8743.393618] which lock already depends on the new lock. [ 8743.393623] the existing dependency chain (in reverse order) is: [ 8743.393628] -> #1 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}: [ 8743.393648] down_write_nested+0x9a/0x1b0 [ 8743.393660] filename_create+0x128/0x270 [ 8743.393670] do_mkdirat+0xab/0x1f0 [ 8743.393680] __x64_sys_mkdir+0x47/0x60 [ 8743.393690] do_syscall_64+0x5d/0x90 [ 8743.393701] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 8743.393711] -> #0 (sb_writers#8){.+.+}-{0:0}: [ 8743.393728] __lock_acquire+0x2201/0x3b80 [ 8743.393737] lock_acquire+0x18f/0x440 [ 8743.393746] mnt_want_write+0x5f/0x240 [ 8743.393755] ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393839] ksmbd_vfs_set_dos_attrib_xattr+0xcc/0x110 [ksmbd] [ 8743.393924] compat_ksmbd_vfs_set_dos_attrib_xattr+0x39/0x50 [ksmbd] [ 8743.394010] smb2_open+0x3432/0x3cc0 [ksmbd] [ 8743.394099] handle_ksmbd_work+0x2c9/0x7b0 [ksmbd] [ 8743.394187] process_one_work+0x65a/0xb30 [ 8743.394198] worker_thread+0x2cf/0x700 [ 8743.394209] kthread+0x1ad/0x1f0 [ 8743.394218] ret_from_fork+0x29/0x50 This patch add mnt_want_write() above parent inode lock and remove nested mnt_want_write calls in smb2_open(). Fixes: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Cc: stable@vger.kernel.org Reported-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23ksmbd: prevent memory leak on error returnZongmin Zhou
When allocated memory for 'new' failed,just return will cause memory leak of 'ar'. Fixes: 1819a9042999 ("ksmbd: reorganize ksmbd_iov_pin_rsp()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311031837.H3yo7JVl-lkp@intel.com/ Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23btrfs: make error messages more clear when getting a chunk mapFilipe Manana
When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity checks to verify we found a chunk map and that map found covers the logical address the caller passed in. However the messages aren't very clear in the sense that don't mention the issue is with a chunk map and one of them prints the 'length' argument as if it were the end offset of the requested range (while the in the string format we use %llu-%llu which suggests a range, and the second %llu-%llu is actually a range for the chunk map). So improve these two details in the error messages. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-23btrfs: fix off-by-one when checking chunk map includes logical addressFilipe Manana
At btrfs_get_chunk_map() we get the extent map for the chunk that contains the given logical address stored in the 'logical' argument. Then we do sanity checks to verify the extent map contains the logical address. One of these checks verifies if the extent map covers a range with an end offset behind the target logical address - however this check has an off-by-one error since it will consider an extent map whose start offset plus its length matches the target logical address as inclusive, while the fact is that the last byte it covers is behind the target logical address (by 1). So fix this condition by using '<=' rather than '<' when comparing the extent map's "start + length" against the target logical address. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-23btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()Bragatheswaran Manickavel
In btrfs_ref_tree_mod(), when !parent 're' was allocated through kmalloc(). In the following code, if an error occurs, the execution will be redirected to 'out' or 'out_unlock' and the function will be exited. However, on some of the paths, 're' are not deallocated and may lead to memory leaks. For example: lookup_block_entry() for 'be' returns NULL, the out label will be invoked. During that flow ref and 'ra' are freed but not 're', which can potentially lead to a memory leak. CC: stable@vger.kernel.org # 5.10+ Reported-and-tested-by: syzbot+d66de4cbf532749df35f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d66de4cbf532749df35f Signed-off-by: Bragatheswaran Manickavel <bragathemanick0908@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-23btrfs: add dmesg output for first mount and last unmount of a filesystemQu Wenruo
There is a feature request to add dmesg output when unmounting a btrfs. There are several alternative methods to do the same thing, but with their own problems: - Use eBPF to watch btrfs_put_super()/open_ctree() Not end user friendly, they have to dip their head into the source code. - Watch for directory /sys/fs/<uuid>/ This is way more simple, but still requires some simple device -> uuid lookups. And a script needs to use inotify to watch /sys/fs/. Compared to all these, directly outputting the information into dmesg would be the most simple one, with both device and UUID included. And since we're here, also add the output when mounting a filesystem for the first time for parity. A more fine grained monitoring of subvolume mounts should be done by another layer, like audit. Now mounting a btrfs with all default mkfs options would look like this: [81.906566] BTRFS info (device dm-8): first mount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 [81.907494] BTRFS info (device dm-8): using crc32c (crc32c-intel) checksum algorithm [81.908258] BTRFS info (device dm-8): using free space tree [81.912644] BTRFS info (device dm-8): auto enabling async discard [81.913277] BTRFS info (device dm-8): checking UUID tree [91.668256] BTRFS info (device dm-8): last unmount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/kdave/btrfs-progs/issues/689 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-23smb: client: introduce cifs_sfu_make_node()Paulo Alcantara
Remove duplicate code and add new helper for creating special files in SFU (Services for UNIX) format that can be shared by SMB1+ code. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: set correct file type from NFS reparse pointsPaulo Alcantara
Handle all file types in NFS reparse points as specified in MS-FSCC 2.1.2.6 Network File System (NFS) Reparse Data Buffer. The client is now able to set all file types based on the parsed NFS reparse point, which used to support only symlinks. This works for SMB1+. Before patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt ls: cannot access 'block': Operation not supported ls: cannot access 'char': Operation not supported ls: cannot access 'fifo': Operation not supported ls: cannot access 'sock': Operation not supported total 1 l????????? ? ? ? ? ? block l????????? ? ? ? ? ? char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 l????????? ? ? ? ? ? fifo l--------- 1 root root 0 Nov 18 23:23 link -> f0 l????????? ? ? ? ? ? sock After patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt total 1 brwxr-xr-x 1 root root 123, 123 Nov 18 00:34 block crwxr-xr-x 1 root root 1234, 1234 Nov 18 00:33 char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 prwxr-xr-x 1 root root 0 Nov 18 23:23 fifo lrwxr-xr-x 1 root root 0 Nov 18 23:23 link -> f0 srwxr-xr-x 1 root root 0 Nov 19 2023 sock Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: introduce ->parse_reparse_point()Paulo Alcantara
Parse reparse point into cifs_open_info_data structure and feed it through cifs_open_info_to_fattr(). Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: implement ->query_reparse_point() for SMB1Paulo Alcantara
Reparse points are not limited to symlinks, so implement ->query_reparse_point() in order to handle different file types. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23cifs: fix use after free for iface while disabling secondary channelsRitvik Budhiraja
We were deferencing iface after it has been released. Fix is to release after all dereference instances have been encountered. Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311110815.UJaeU3Tt-lkp@intel.com/ Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-22eventfs: Make sure that parent->d_inode is locked in creating files/dirsSteven Rostedt (Google)
Since the locking of the parent->d_inode has been moved outside the creation of the files and directories (as it use to be locked via a conditional), add a WARN_ON_ONCE() to the case that it's not locked. Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-22eventfs: Do not allow NULL parent to eventfs_start_creating()Steven Rostedt (Google)
The eventfs directory is dynamically created via the meta data supplied by the existing trace events. All files and directories in eventfs has a parent. Do not allow NULL to be passed into eventfs_start_creating() as the parent because that should never happen. Warn if it does. Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-22eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()Steven Rostedt (Google)
The both create_file_dentry() and create_dir_dentry() takes a boolean parameter "lookup", as on lookup the inode_lock should already be taken, but for dcache_dir_open_wrapper() it is not taken. There's no reason that the dcache_dir_open_wrapper() can't take the inode_lock before calling these functions. In fact, it's better if it does, as the lock can be held throughout both directory and file creations. This also simplifies the code, and possibly prevents unexpected race conditions when the lock is released. Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-22eventfs: Use GFP_NOFS for allocation when eventfs_mutex is heldSteven Rostedt (Google)
If memory reclaim happens, it can reclaim file system pages. The file system pages from eventfs may take the eventfs_mutex on reclaim. This means that allocation while holding the eventfs_mutex must not call into filesystem reclaim. A lockdep splat uncovered this. Link: https://lkml.kernel.org/r/20231121231112.373501894@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode") Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-22xfs: dquot recovery does not validate the recovered dquotDarrick J. Wong
When we're recovering ondisk quota records from the log, we need to validate the recovered buffer contents before writing them to disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-11-22xfs: clean up dqblk extractionDarrick J. Wong
Since the introduction of xfs_dqblk in V5, xfs really ought to find the dqblk pointer from the dquot buffer, then compute the xfs_disk_dquot pointer from the dqblk pointer. Fix the open-coded xfs_buf_offset calls and do the type checking in the correct order. Note that this has made no practical difference since the start of the xfs_disk_dquot is coincident with the start of the xfs_dqblk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-11-22ext2: Fix ki_pos update for DIO buffered-io fallback caseRitesh Harjani (IBM)
Commit "filemap: update ki_pos in generic_perform_write", made updating of ki_pos into common code in generic_perform_write() function. This also causes generic/091 to fail. This happened due to an in-flight collision with: fb5de4358e1a ("ext2: Move direct-io to use iomap"). I have chosen fixes tag based on which commit got landed later to upstream kernel. Fixes: 182c25e9c157 ("filemap: update ki_pos in generic_perform_write") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <d595bee9f2475ed0e8a2e7fb94f7afc2c6ffc36a.1700643443.git.ritesh.list@gmail.com>
2023-11-21jfs: fix shift-out-of-bounds in dbJoinManas Ghandat
Currently while joining the leaf in a buddy system there is shift out of bound error in calculation of BUDSIZE. Added the required check to the BUDSIZE and fixed the documentation as well. Reported-by: syzbot+411debe54d318eaed386@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=411debe54d318eaed386 Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix uaf in jfs_evict_inodeEdward Adam Davis
When the execution of diMount(ipimap) fails, the object ipimap that has been released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs when rcu_core() calls jfs_free_node(). Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as ipimap. Reported-and-tested-by: syzbot+01cf2dbcbe2022454388@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix array-index-out-of-bounds in dbAdjTreeManas Ghandat
Currently there is a bound check missing in the dbAdjTree while accessing the dmt_stree. To add the required check added the bool is_ctl which is required to determine the size as suggest in the following commit. https://lore.kernel.org/linux-kernel-mentees/f9475918-2186-49b8-b801-6f0f9e75f4fa@oracle.com/ Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix slab-out-of-bounds Read in dtSearchManas Ghandat
Currently while searching for current page in the sorted entry table of the page there is a out of bound access. Added a bound check to fix the error. Dave: Set return code to -EIO Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202310241724.Ed02yUz9-lkp@intel.com/ Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21UBSAN: array-index-out-of-bounds in dtSplitRootOsama Muhammad
Syzkaller reported the following issue: oop0: detected capacity change from 0 to 32768 UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:1971:9 index -2 is out of range for type 'struct dtslot [128]' CPU: 0 PID: 3613 Comm: syz-executor270 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:283 dtSplitRoot+0x8d8/0x1900 fs/jfs/jfs_dtree.c:1971 dtSplitUp fs/jfs/jfs_dtree.c:985 [inline] dtInsert+0x1189/0x6b80 fs/jfs/jfs_dtree.c:863 jfs_mkdir+0x757/0xb00 fs/jfs/namei.c:270 vfs_mkdir+0x3b3/0x590 fs/namei.c:4013 do_mkdirat+0x279/0x550 fs/namei.c:4038 __do_sys_mkdirat fs/namei.c:4053 [inline] __se_sys_mkdirat fs/namei.c:4051 [inline] __x64_sys_mkdirat+0x85/0x90 fs/namei.c:4051 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fcdc0113fd9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffeb8bc67d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000102 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcdc0113fd9 RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000003 RBP: 00007fcdc00d37a0 R08: 0000000000000000 R09: 00007fcdc00d37a0 R10: 00005555559a72c0 R11: 0000000000000246 R12: 00000000f8008000 R13: 0000000000000000 R14: 00083878000000f8 R15: 0000000000000000 </TASK> The issue is caused when the value of fsi becomes less than -1. The check to break the loop when fsi value becomes -1 is present but syzbot was able to produce value less than -1 which cause the error. This patch simply add the change for the values less than 0. The patch is tested via syzbot. Reported-and-tested-by: syzbot+d4b1df2e9d4ded6488ec@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d4b1df2e9d4ded6488ec Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTreeOsama Muhammad
Syzkaller reported the following issue: UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6 index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]') CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> ================================================================================ Kernel panic - not syncing: UBSAN: panic_on_warn set ... CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 panic+0x30f/0x770 kernel/panic.c:340 check_panic_on_warn+0x82/0xa0 kernel/panic.c:236 ubsan_epilogue lib/ubsan.c:223 [inline] __ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> Kernel Offset: disabled Rebooting in 86400 seconds.. The issue is caused when the value of lp becomes greater than CTLTREESIZE which is the max size of stree. Adding a simple check solves this issue. Dave: As the function returns a void, good error handling would require a more intrusive code reorganization, so I modified Osama's patch at use WARN_ON_ONCE for lack of a cleaner option. The patch is tested via syzbot. Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21fs: debugfs: Add write functionality to debugfs blobsAvadhut Naik
Currently, debugfs_create_blob() creates read-only debugfs binary blob files. In some cases, however, userspace tools need to write variable length data structures into predetermined memory addresses. An example is when injecting Vendor-defined error types through the einj module. In such cases, the functionality to write to these blob files in debugfs would be desired since the mapping aspect can be handled within the modules with userspace tools only needing to write into the blob files. Implement a write callback to enable writing to these blob files, created in debugfs, by owners only. Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-21Merge tag 'erofs-for-6.7-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Tidy up erofs_read_inode() for simplicity - Fix broken fscache mode due to NULL dereference of dif->bdev_handle - Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig * tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: add EROFS webpage erofs: fix NULL dereference of dif->bdev_handle in fscache mode erofs: simplify erofs_read_inode()
2023-11-21fs: Rename mapping private membersMatthew Wilcox (Oracle)
It is hard to find where mapping->private_lock, mapping->private_list and mapping->private_data are used, due to private_XXX being a relatively common name for variables and structure members in the kernel. To fit with other members of struct address_space, rename them all to have an i_ prefix. Tested with an allmodconfig build. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20231117215823.2821906-1-willy@infradead.org Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>