summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-06-02Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "Fix an ext4 regression which landed during the 6.4 merge window" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
2023-06-02Merge tag 'for-6.4-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One regression fix. The rewrite of scrub code in 6.4 broke device replace in zoned mode, some of the writes could happen out of order so this had to be adjusted for all cases" * tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix dev-replace after the scrub rework
2023-06-02Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ↵Ojaswin Mujoo
ext4_mb_check_limits" This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1. The reverted commit was intended to remove a dead check however it was observed that this check was actually being used to exit early instead of looping sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than the goal extent. Due to this, a my performance tests (fsmark, parallel file writes in a highly fragmented FS) were seeing a 2x-3x regression. Example, the default value of the following variables is: sbi->s_mb_max_to_scan = 200 sbi->s_mb_min_to_scan = 10 In ext4_mb_check_limits() if we find an extent smaller than goal, then we return early and try again. This loop will go on until we have processed sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and just use whatever we have even if it is smaller than goal extent. Now, the regression comes when we find an extent bigger than goal. Earlier, in this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use the bigger extent. However with commit 32c08693 that check was removed and hence we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough free extent to satisfy the request. The only time we would exit early would be when the free extent is *exactly* the size of our goal, which is pretty uncommon occurrence and so we would almost always end up looping 200 times. Hence, revert the commit by adding the check back to fix the regression. Also add a comment to outline this policy. Fixes: 32c086937019 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-02Merge tag 'nfsd-6.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Two minor bug fixes * tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix double fget() bug in __write_ports_addfd() nfsd: make a copy of struct iattr before calling notify_change
2023-06-02ksmbd: validate smb request protocol idNamjae Jeon
This patch add the validation for smb request protocol id. If it is not one of the four ids(SMB1_PROTO_NUMBER, SMB2_PROTO_NUMBER, SMB2_TRANSFORM_PROTO_NUM, SMB2_COMPRESSION_TRANSFORM_ID), don't allow processing the request. And this will fix the following KASAN warning also. [ 13.905265] BUG: KASAN: slab-out-of-bounds in init_smb2_rsp_hdr+0x1b9/0x1f0 [ 13.905900] Read of size 16 at addr ffff888005fd2f34 by task kworker/0:2/44 ... [ 13.908553] Call Trace: [ 13.908793] <TASK> [ 13.908995] dump_stack_lvl+0x33/0x50 [ 13.909369] print_report+0xcc/0x620 [ 13.910870] kasan_report+0xae/0xe0 [ 13.911519] kasan_check_range+0x35/0x1b0 [ 13.911796] init_smb2_rsp_hdr+0x1b9/0x1f0 [ 13.912492] handle_ksmbd_work+0xe5/0x820 Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-02ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loopNamjae Jeon
The length field of netbios header must be greater than the SMB header sizes(smb1 or smb2 header), otherwise the packet is an invalid SMB packet. If `pdu_size` is 0, ksmbd allocates a 4 bytes chunk to `conn->request_buf`. In the function `get_smb2_cmd_val` ksmbd will read cmd from `rcv_hdr->Command`, which is `conn->request_buf + 12`, causing the KASAN detector to print the following error message: [ 7.205018] BUG: KASAN: slab-out-of-bounds in get_smb2_cmd_val+0x45/0x60 [ 7.205423] Read of size 2 at addr ffff8880062d8b50 by task ksmbd:42632/248 ... [ 7.207125] <TASK> [ 7.209191] get_smb2_cmd_val+0x45/0x60 [ 7.209426] ksmbd_conn_enqueue_request+0x3a/0x100 [ 7.209712] ksmbd_server_process_request+0x72/0x160 [ 7.210295] ksmbd_conn_handler_loop+0x30c/0x550 [ 7.212280] kthread+0x160/0x190 [ 7.212762] ret_from_fork+0x1f/0x30 [ 7.212981] </TASK> Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-02ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()Namjae Jeon
Dan reported the following error message: fs/smb/server/smbacl.c:1296 smb_check_perm_dacl() error: 'posix_acls' dereferencing possible ERR_PTR() fs/smb/server/vfs.c:1323 ksmbd_vfs_make_xattr_posix_acl() error: 'posix_acls' dereferencing possible ERR_PTR() fs/smb/server/vfs.c:1830 ksmbd_vfs_inherit_posix_acl() error: 'acls' dereferencing possible ERR_PTR() __get_acl() returns a mix of error pointers and NULL. This change it with IS_ERR_OR_NULL(). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-02ksmbd: fix out-of-bound read in parse_lease_state()Namjae Jeon
This bug is in parse_lease_state, and it is caused by the missing check of `struct create_context`. When the ksmbd traverses the create_contexts, it doesn't check if the field of `NameOffset` and `Next` is valid, The KASAN message is following: [ 6.664323] BUG: KASAN: slab-out-of-bounds in parse_lease_state+0x7d/0x280 [ 6.664738] Read of size 2 at addr ffff888005c08988 by task kworker/0:3/103 ... [ 6.666644] Call Trace: [ 6.666796] <TASK> [ 6.666933] dump_stack_lvl+0x33/0x50 [ 6.667167] print_report+0xcc/0x620 [ 6.667903] kasan_report+0xae/0xe0 [ 6.668374] kasan_check_range+0x35/0x1b0 [ 6.668621] parse_lease_state+0x7d/0x280 [ 6.668868] smb2_open+0xbe8/0x4420 [ 6.675137] handle_ksmbd_work+0x282/0x820 Use smb2_find_context_vals() to find smb2 create request lease context. smb2_find_context_vals validate create context fields. Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang <cc85nod@gmail.com> Tested-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-02ksmbd: fix out-of-bound read in deassemble_neg_contexts()Namjae Jeon
The check in the beginning is `clen + sizeof(struct smb2_neg_context) <= len_of_ctxts`, but in the end of loop, `len_of_ctxts` will subtract `((clen + 7) & ~0x7) + sizeof(struct smb2_neg_context)`, which causes integer underflow when clen does the 8 alignment. We should use `(clen + 7) & ~0x7` in the check to avoid underflow from happening. Then there are some variables that need to be declared unsigned instead of signed. [ 11.671070] BUG: KASAN: slab-out-of-bounds in smb2_handle_negotiate+0x799/0x1610 [ 11.671533] Read of size 2 at addr ffff888005e86cf2 by task kworker/0:0/7 ... [ 11.673383] Call Trace: [ 11.673541] <TASK> [ 11.673679] dump_stack_lvl+0x33/0x50 [ 11.673913] print_report+0xcc/0x620 [ 11.674671] kasan_report+0xae/0xe0 [ 11.675171] kasan_check_range+0x35/0x1b0 [ 11.675412] smb2_handle_negotiate+0x799/0x1610 [ 11.676217] ksmbd_smb_negotiate_common+0x526/0x770 [ 11.676795] handle_ksmbd_work+0x274/0x810 ... Cc: stable@vger.kernel.org Signed-off-by: Chih-Yen Chang <cc85nod@gmail.com> Tested-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-02fs: Lock moved directoriesJan Kara
When a directory is moved to a different directory, some filesystems (udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to update their pointer to the parent and this must not race with other operations on the directory. Lock the directories when they are moved. Although not all filesystems need this locking, we perform it in vfs_rename() because getting the lock ordering right is really difficult and we don't want to expose these locking details to filesystems. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-5-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02fs: Establish locking order for unrelated directoriesJan Kara
Currently the locking order of inode locks for directories that are not in ancestor relationship is not defined because all operations that needed to lock two directories like this were serialized by sb->s_vfs_rename_mutex. However some filesystems need to lock two subdirectories for RENAME_EXCHANGE operations and for this we need the locking order established even for two tree-unrelated directories. Provide a helper function lock_two_inodes() that establishes lock ordering for any two inodes and use it in lock_two_directories(). CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-4-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02Revert "f2fs: fix potential corruption when moving a directory"Jan Kara
This reverts commit d94772154e524b329a168678836745d2773a6e02. The locking is going to be provided by VFS. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-3-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02Revert "udf: Protect rename against modification of moved directory"Jan Kara
This reverts commit f950fd0529130a617b3da526da9fb6a896ce87c2. The locking is going to be provided by vfs_rename() in the following patches. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-2-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02ext4: Remove ext4 locking of moved directoryJan Kara
Remove locking of moved directory in ext4_rename2(). We will take care of it in VFS instead. This effectively reverts commit 0813299c586b ("ext4: Fix possible corruption when moving a directory") and followup fixes. CC: Ted Tso <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02jfs: Use unsigned variable for length calculationsKees Cook
To avoid confusing the compiler about possible negative sizes, switch "ssize" which can never be negative from int to u32. Seen with GCC 13: ../fs/jfs/namei.c: In function 'jfs_symlink': ../include/linux/fortify-string.h:57:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [-2147483648, -1] [-Warray-bounds=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ ... ../fs/jfs/namei.c:950:17: note: in expansion of macro 'memcpy' 950 | memcpy(ip->i_link, name, ssize); | ^~~~~~ Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Jeff Xu <jeffxu@chromium.org> Message-Id: <20230204183355.never.877-kees@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-01fork, vhost: Use CLONE_THREAD to fix freezer/ps regressionMike Christie
When switching from kthreads to vhost_tasks two bugs were added: 1. The vhost worker tasks's now show up as processes so scripts doing ps or ps a would not incorrectly detect the vhost task as another process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but vhost tasks's didn't disable or add support for them. To fix both bugs, this switches the vhost task to be thread in the process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that SIGKILL/STOP support is required because CLONE_THREAD requires CLONE_SIGHAND which requires those 2 signals to be supported. This is a modified version of the patch written by Mike Christie <michael.christie@oracle.com> which was a modified version of patch originally written by Linus. Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER. Including ignoring signals, setting up the register state, and having get_signal return instead of calling do_group_exit. Tidied up the vhost_task abstraction so that the definition of vhost_task only needs to be visible inside of vhost_task.c. Making it easier to review the code and tell what needs to be done where. As part of this the main loop has been moved from vhost_worker into vhost_task_fn. vhost_worker now returns true if work was done. The main loop has been updated to call get_signal which handles SIGSTOP, freezing, and collects the message that tells the thread to exit as part of process exit. This collection clears __fatal_signal_pending. This collection is not guaranteed to clear signal_pending() so clear that explicitly so the schedule() sleeps. For now the vhost thread continues to exist and run work until the last file descriptor is closed and the release function is called as part of freeing struct file. To avoid hangs in the coredump rendezvous and when killing threads in a multi-threaded exec. The coredump code and de_thread have been modified to ignore vhost threads. Remvoing the special case for exec appears to require teaching vhost_dev_flush how to directly complete transactions in case the vhost thread is no longer running. Removing the special case for coredump rendezvous requires either the above fix needed for exec or moving the coredump rendezvous into get_signal. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Co-developed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-01fs: iomap: use bio_add_folio_nofail where possibleJohannes Thumshirn
When the iomap buffered-io code can't add a folio to a bio, it allocates a new bio and adds the folio to that one. This is done using bio_add_folio(), but doesn't check for errors. As adding a folio to a newly created bio can't fail, use the newly introduced bio_add_folio_nofail() function. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/58fa893c24c67340a63323f09a179fefdca07f2a.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-01btrfs: zoned: fix dev-replace after the scrub reworkQu Wenruo
[BUG] After commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), scrub no longer works for zoned device at all. Even an empty zoned btrfs cannot be replaced: # mkfs.btrfs -f /dev/nvme0n1 # mount /dev/nvme0n1 /mnt/btrfs # btrfs replace start -Bf 1 /dev/nvme0n2 /mnt/btrfs Resetting device zones /dev/nvme1n1 (160 zones) ... ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/btrfs/": Input/output error And we can hit kernel crash related to that: BTRFS info (device nvme1n1): host-managed zoned block device /dev/nvme3n1, 160 zones of 134217728 bytes BTRFS info (device nvme1n1): dev_replace from /dev/nvme2n1 (devid 2) to /dev/nvme3n1 started nvme3n1: Zone Management Append(0x7d) @ LBA 65536, 4 blocks, Zone Is Full (sct 0x1 / sc 0xb9) DNR I/O error, dev nvme3n1, sector 786432 op 0xd:(ZONE_APPEND) flags 0x4000 phys_seg 3 prio class 2 BTRFS error (device nvme1n1): bdev /dev/nvme3n1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BUG: kernel NULL pointer dereference, address: 00000000000000a8 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40 Call Trace: <IRQ> btrfs_lookup_ordered_extent+0x31/0x190 btrfs_record_physical_zoned+0x18/0x40 btrfs_simple_end_io+0xaf/0xc0 blk_update_request+0x153/0x4c0 blk_mq_end_request+0x15/0xd0 nvme_poll_cq+0x1d3/0x360 nvme_irq+0x39/0x80 __handle_irq_event_percpu+0x3b/0x190 handle_irq_event+0x2f/0x70 handle_edge_irq+0x7c/0x210 __common_interrupt+0x34/0xa0 common_interrupt+0x7d/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 [CAUSE] Dev-replace reuses scrub code to iterate all extents and write the existing content back to the new device. And for zoned devices, we call fill_writer_pointer_gap() to make sure all the writes into the zoned device is sequential, even if there may be some gaps between the writes. However we have several different bugs all related to zoned dev-replace: - We are using ZONE_APPEND operation for metadata style write back For zoned devices, btrfs has two ways to write data: * ZONE_APPEND for data This allows higher queue depth, but will not be able to know where the write would land. Thus needs to grab the real on-disk physical location in it's endio. * WRITE for metadata This requires single queue depth (new writes can only be submitted after previous one finished), and all writes must be sequential. For scrub, we go single queue depth, but still goes with ZONE_APPEND, which requires btrfs_bio::inode being populated. This is the cause of that crash. - No correct tracing of write_pointer After a write finished, we should forward sctx->write_pointer, or fill_writer_pointer_gap() would not work properly and cause more than necessary zero out, and fill the whole zone prematurely. - Incorrect physical bytenr passed to fill_writer_pointer_gap() In scrub_write_sectors(), one call site passes logical address, which is completely wrong. The other call site passes physical address of current sector, but we should pass the physical address of the btrfs_bio we're submitting. This is the cause of the -EIO errors. [FIX] - Do not use ZONE_APPEND for btrfs_submit_repair_write(). - Manually forward sctx->write_pointer after successful writeback - Use the physical address of the to-be-submitted btrfs_bio for fill_writer_pointer_gap() Now zoned device replace would work as expected. Reported-by: Christoph Hellwig <hch@lst.de> Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-01gfs2: Don't get stuck writing page onto itself under direct I/OAndreas Gruenbacher
When a direct I/O write is performed, iomap_dio_rw() invalidates the part of the page cache which the write is going to before carrying out the write. In the odd case, the direct I/O write will be reading from the same page it is writing to. gfs2 carries out writes with page faults disabled, so it should have been obvious that this page invalidation can cause iomap_dio_rw() to never make any progress. Currently, gfs2 will end up in an endless retry loop in gfs2_file_direct_write() instead, though. Break this endless loop by limiting the number of retries and falling back to buffered I/O after that. Also simplify should_fault_in_pages() sightly and add a comment to make the above case easier to understand. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-06-01Merge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Eight server fixes (most also for stable): - Two fixes for uninitialized pointer reads (rename and link) - Fix potential UAF in oplock break - Two fixes for potential out of bound reads in negotiate - Fix crediting bug - Two fixes for xfstests (allocation size fix for test 694 and lookup issue shown by test 464)" * tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: call putname after using the last component ksmbd: fix incorrect AllocationSize set in smb2_get_info ksmbd: fix UAF issue from opinfo->conn ksmbd: fix multiple out-of-bounds read during context decoding ksmbd: fix slab-out-of-bounds read in smb2_handle_negotiate ksmbd: fix credit count leakage ksmbd: fix uninitialized pointer read in smb2_create_link() ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
2023-06-01fs/sysv: Null check to prevent null-ptr-deref bugPrince Kumar Maurya
sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug. Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc Signed-off-by: Prince Kumar Maurya <princekumarmaurya06@gmail.com> Message-Id: <20230531013141.19487-1-princekumarmaurya06@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-31Merge tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Four small smb3 client fixes: - two small fixes suggested by kernel test robot - small cleanup fix - update Paulo's email address in the maintainer file" * tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: address unused variable warning smb: delete an unnecessary statement smb3: missing null check in SMB2_change_notify smb3: update a reviewer email in MAINTAINERS file
2023-05-31Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix two regressions in ext4 and a number of issues reported by syzbot" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: enable the lazy init thread when remounting read/write ext4: fix fsync for non-directories ext4: add lockdep annotations for i_data_sem for ea_inode's ext4: disallow ea_inodes with extended attributes ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find() ext4: add EA_INODE checking to ext4_iget()
2023-05-31kernfs: fix missing kernfs_idr_lock to remove an ID from the IDRMuchun Song
The root->ino_idr is supposed to be protected by kernfs_idr_lock, fix it. Fixes: 488dee96bb62 ("kernfs: allow creating kernfs objects with arbitrary uid/gid") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230523024017.24851-1-songmuchun@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-31debugfs: Correct the 'debugfs_create_str' docsIvan Orlov
The documentation of the 'debugfs_create_str' says that the function returns a pointer to a dentry created, or an ERR_PTR in case of error. Actually, this is not true: this function doesn't return anything at all. Correct the documentation correspondingly. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20230514172353.52878-1-ivan.orlov0322@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-31zonefs: use __bio_add_page for adding single page to bioJohannes Thumshirn
The zonefs superblock reading code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/04c9978ccaa0fc9871cd4248356638d98daccf0c.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31gfs2: use __bio_add_page for adding single page to bioJohannes Thumshirn
The GFS2 superblock reading code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/087c67d4e4973f949d3519c1e4822784ce583c5a.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31jfs: logmgr: use __bio_add_page to add single page to bioJohannes Thumshirn
The JFS IO code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/9fb5ed86d19f6e0b6f64dfc4109a48ff8ff24497.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31fs: buffer: use __bio_add_page to add single page to bioJohannes Thumshirn
The buffer_head submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Gou Hao <gouhao@uniontech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/84ff2dcbe81b258a73ad900adb5266e208b61a4d.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31block: Use iov_iter_extract_pages() and page pinning in direct-io.cDavid Howells
Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@infradead.org> cc: David Hildenbrand <david@redhat.com> cc: Lorenzo Stoakes <lstoakes@gmail.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Jason Gunthorpe <jgg@nvidia.com> cc: Logan Gunthorpe <logang@deltatee.com> cc: Hillf Danton <hdanton@sina.com> cc: Christian Brauner <brauner@kernel.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230526214142.958751-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31Merge tag 'virt-to-pfn-for-arch-v6.5-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator into asm-generic This is an attempt to harden the typing on virt_to_pfn() and pfn_to_virt(). Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings. For symmetry, we do the same with pfn_to_virt(). The problem with this inconsistent typing was pointed out by Russell King: https://lore.kernel.org/linux-arm-kernel/YoJDKJXc0MJ2QZTb@shell.armlinux.org.uk/ And confirmed by Andrew Morton: https://lore.kernel.org/linux-mm/20220701160004.2ffff4e5ab59a55499f4c736@linux-foundation.org/ So the recognition of the problem is widespread. These platforms have been chosen as initial conversion targets: - ARM - ARM64/Aarch64 - asm-generic (including for example x86) - m68k The idea is that if this goes in, it will block further misuse of the function signatures due to the large compile coverage, and then I can go in and fix the remaining architectures on a one-by-one basis. Some of the patches have been circulated before but were not picked up by subsystem maintainers, so now the arch tree is target for this series. It has passed zeroday builds after a lot of iterations in my personal tree, but there could be some randconfig outliers. New added or deeply hidden problems appear all the time so some minor fallout can be expected. * tag 'virt-to-pfn-for-arch-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator: m68k/mm: Make pfn accessors static inlines arm64: memory: Make virt_to_pfn() a static inline ARM: mm: Make virt_to_pfn() a static inline asm-generic/page.h: Make pfn accessors static inlines xen/netback: Pass (void *) to virt_to_page() netfs: Pass a pointer to virt_to_page() cifs: Pass a pointer to virt_to_page() in cifsglob cifs: Pass a pointer to virt_to_page() riscv: mm: init: Pass a pointer to virt_to_page() ARC: init: Pass a pointer to virt_to_pfn() in init m68k: Pass a pointer to virt_to_pfn() virt_to_page() fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
2023-05-31nfsd: fix double fget() bug in __write_ports_addfd()Dan Carpenter
The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-30befs: Replace all non-returning strlcpy with strscpyAzeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated. In an effort to remove strlcpy() completely, replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230509014136.2095900-1-azeemshaikh38@gmail.com
2023-05-30binfmt: Slightly simplify elf_fdpic_map_file()Christophe JAILLET
There is no point in initializing 'load_addr' and 'seg' here, they are both re-written just before being used below. Doing so, 'load_addr' can be moved in the #ifdef CONFIG_MMU section. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/4f5e4096ad7f17716e924b5bd080e5709fc0b84b.1685290790.git.christophe.jaillet@wanadoo.fr
2023-05-30binfmt: Use struct_size()Christophe JAILLET
Use struct_size() instead of hand-writing it. It is less verbose, more robust and more informative. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/53150beae5dc04dac513dba391a2e4ae8696a7f3.1685290790.git.christophe.jaillet@wanadoo.fr
2023-05-30Merge tag 'for-6.4-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One bug fix and two build warning fixes: - call proper end bio callback for metadata RAID0 in a rare case of an unaligned block - fix uninitialized variable (reported by gcc 10.2) - fix warning about potential access beyond array bounds on mips64 with 64k pages (runtime check would not allow that)" * tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds btrfs: fix an uninitialized variable warning in btrfs_log_inode btrfs: call btrfs_orig_bbio_end_io in btrfs_end_bio_work
2023-05-30ext4: enable the lazy init thread when remounting read/writeTheodore Ts'o
In commit a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled") we defer clearing tyhe SB_RDONLY flag in struct super. However, we didn't defer when we checked sb_rdonly() to determine the lazy itable init thread should be enabled, with the next result that the lazy inode table initialization would not be properly started. This can cause generic/231 to fail in ext4's nojournal mode. Fix this by moving when we decide to start or stop the lazy itable init thread to after we clear the SB_RDONLY flag when we are remounting the file system read/write. Fixes a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until...") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230527035729.1001605-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: fix fsync for non-directoriesJan Kara
Commit e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") simplified ext4_sync_file() by dropping special handling of journalled data mode as it was not needed anymore. However that branch was also used for directories and symlinks and since the fastcommit code does not track metadata changes to non-regular files, the change has caused e.g. fsync(2) on directories to not commit transaction as it should. Fix the problem by adding handling for non-regular files. Fixes: e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") Reported-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/all/ZFqO3xVnmhL7zv1x@debian-BULLSEYE-live-builder-AMD64 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20230524104453.8734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: add lockdep annotations for i_data_sem for ea_inode'sTheodore Ts'o
Treat i_data_sem for ea_inodes as being in their own lockdep class to avoid lockdep complaints about ext4_setattr's use of inode_lock() on normal inodes potentially causing lock ordering with i_data_sem on ea_inodes in ext4_xattr_inode_write(). However, ea_inodes will be operated on by ext4_setattr(), so this isn't a problem. Cc: stable@kernel.org Link: https://syzkaller.appspot.com/bug?extid=298c5d8fb4a128bc27b0 Reported-by: syzbot+298c5d8fb4a128bc27b0@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-5-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: disallow ea_inodes with extended attributesTheodore Ts'o
An ea_inode stores the value of an extended attribute; it can not have extended attributes itself, or this will cause recursive nightmares. Add a check in ext4_iget() to make sure this is the case. Cc: stable@kernel.org Reported-by: syzbot+e44749b6ba4d0434cd47@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-4-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()Theodore Ts'o
If the ea_inode has been pushed out of the inode cache while there is still a reference in the mb_cache, the lockdep subclass will not be set on the inode, which can lead to some lockdep false positives. Fixes: 33d201e0277b ("ext4: fix lockdep warning about recursive inode locking") Cc: stable@kernel.org Reported-by: syzbot+d4b971e744b1f5439336@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-3-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30fs: udf: udftime: Replace LGPL boilerplate with SPDX identifierBagas Sanjaya
Replace license boilerplate in udftime.c with SPDX identifier for LGPL-2.0. Cc: Paul Eggert <eggert@twinsun.com> Cc: Richard Fontana <rfontana@redhat.com> Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230522005434.22133-3-bagasdotme@gmail.com>
2023-05-30fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifierBagas Sanjaya
The notice refers to full GPL 2.0 text on now defunct MIT FTP site [1]. Replace it with appropriate SPDX license identifier. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pali Rohár <pali@kernel.org> Link: https://web.archive.org/web/20020809115410/ftp://prep.ai.mit.edu/pub/gnu/GPL [1] Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230522005434.22133-2-bagasdotme@gmail.com>
2023-05-30fs: Drop wait_unfrozen wait queueJan Kara
wait_unfrozen waitqueue is used only in quota code to wait for filesystem to become unfrozen. In that place we can just use sb_start_write() - sb_end_write() pair to achieve the same. So just remove the waitqueue. Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230525141710.7595-1-jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29erofs: adapt managed inode operations into foliosGao Xiang
This patch gets rid of erofs_try_to_free_cached_page() and fold it into .release_folio(). It also moves managed inode operations into zdata.c, which simplifies the code a bit. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-5-hsiangkao@linux.alibaba.com
2023-05-29erofs: kill hooked chains to avoid loops on deduplicated compressed imagesGao Xiang
After heavily stressing EROFS with several images which include a hand-crafted image of repeated patterns for more than 46 days, I found two chains could be linked with each other almost simultaneously and form a loop so that the entire loop won't be submitted. As a consequence, the corresponding file pages will remain locked forever. It can be _only_ observed on data-deduplicated compressed images. For example, consider two chains with five pclusters in total: Chain 1: 2->3->4->5 -- The tail pcluster is 5; Chain 2: 5->1->2 -- The tail pcluster is 2. Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link to Chain 2 at the same time with pcluster 2. Since hooked chains are all linked locklessly now, I have no idea how to simply avoid the race. Instead, let's avoid hooked chains completely until I could work out a proper way to fix this and end users finally tell us that it's needed to add it back. Actually, this optimization can be found with multi-threaded workloads (especially even more often on deduplicated compressed images), yet I'm not sure about the overall system impacts of not having this compared with implementation complexity. Fixes: 267f2492c8f7 ("erofs: introduce multi-reference pclusters (fully-referenced)") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com
2023-05-29erofs: avoid on-stack pagepool directly passed by argumentsGao Xiang
On-stack pagepool is used so that short-lived temporary pages could be shared within a single I/O request (e.g. among multiple pclusters). Moving the remaining frontend-related uses into z_erofs_decompress_frontend to avoid too many arguments. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-3-hsiangkao@linux.alibaba.com
2023-05-29erofs: allocate extra bvec pages directly instead of retryingGao Xiang
If non-bootstrap bvecs cannot be kept in place (very rarely), an extra short-lived page is allocated. Let's just allocate it immediately rather than do unnecessary -EAGAIN return first and retry as a cleanup. Also it's unnecessary to use __GFP_NOFAIL here since we could gracefully fail out this case instead. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-2-hsiangkao@linux.alibaba.com
2023-05-29netfs: Pass a pointer to virt_to_page()Linus Walleij
Like the other calls in this function virt_to_page() expects a pointer, not an integer. However since many architectures implement virt_to_pfn() as a macro, this function becomes polymorphic and accepts both a (unsigned long) and a (void *). Fix this up with an explicit cast. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-05-29cifs: Pass a pointer to virt_to_page() in cifsglobLinus Walleij
Like the other calls in this function virt_to_page() expects a pointer, not an integer. However since many architectures implement virt_to_pfn() as a macro, this function becomes polymorphic and accepts both a (unsigned long) and a (void *). Fix this up with an explicit cast. Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>