summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-03-29reiserfs: remove unused sched_count variableTom Rix
clang with W=1 reports fs/reiserfs/journal.c:3034:6: error: variable 'sched_count' set but not used [-Werror,-Wunused-but-set-variable] int sched_count = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230326204459.1358553-1-trix@redhat.com>
2023-03-29Merge branch 'slab/for-6.4/slob-removal' into slab/for-nextVlastimil Babka
A series by myself to remove CONFIG_SLOB: The SLOB allocator was deprecated in 6.2 and there have been no complaints so far so let's proceed with the removal. Besides the code cleanup, the main immediate benefit will be allowing kfree() family of function to work on kmem_cache_alloc() objects, which was incompatible with SLOB. This includes kfree_rcu() which had no kmem_cache_free_rcu() counterpart yet and now it shouldn't be necessary anymore. Otherwise it's all straightforward removal. After this series, 'git grep slob' or 'git grep SLOB' will have 3 remaining relevant hits in non-mm code: - tomoyo - patch submitted and carried there, doesn't need to wait for this series - skbuff - patch to cleanup now-unnecessary #ifdefs will be posted to netdev after this is merged, as requested to avoid conflicts - ftrace ring_buffer - patch to remove obsolete comment is carried there The rest of 'git grep SLOB' hits are false positives, or intentional (CREDITS, and mm/Kconfig SLUB_TINY description to help those that will happen to migrate later).
2023-03-29mm, pagemap: remove SLOB and SLQB from comments and documentationVlastimil Babka
SLOB has been removed and SLQB never merged, so remove their mentions from comments and documentation of pagemap. In stable_page_flags() also correct an outdated comment mentioning that PageBuddy() means a page->_refcount of -1, and remove compound_head() from the PageSlab() call, as that's already implicitly there thanks to PF_NO_TAIL. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steve French <sfrench@samba.org> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-cifs@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: netdev@vger.kernel.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-28mm: shrinkers: convert shrinker_rwsem to mutexQi Zheng
Now there are no readers of shrinker_rwsem, so we can simply replace it with mutex lock. Link: https://lkml.kernel.org/r/20230313112819.38938-9-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill Tkhai <tkhai@ya.ru> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christian König <christian.koenig@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28mm,jfs: move write_one_page/folio_write_one to jfsChristoph Hellwig
The last remaining user of folio_write_one through the write_one_page wrapper is jfs, so move the functionality there and hard code the call to metapage_writepage. Note that the use of the pagecache by the JFS 'metapage' buffer cache is a bit odd, and we could probably do without VM-level dirty tracking at all, but that's a change for another time. Link: https://lkml.kernel.org/r/20230307143125.27778-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Gang He <ghe@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_pageChristoph Hellwig
Use filemap_write_and_wait_range to write back the range of the dirty page instead of write_one_page in preparation of removing write_one_page and eventually ->writepage. Link: https://lkml.kernel.org/r/20230307143125.27778-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Gang He <ghe@suse.com> Cc: Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28ufs: don't flush page immediately for DIRSYNC directoriesChristoph Hellwig
Patch series "remove most callers of write_one_page", v4. This series removes most users of the write_one_page API. These helpers internally call ->writepage which we are gradually removing from the kernel. This patch (of 3): We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Link: https://lkml.kernel.org/r/20230307143125.27778-1-hch@lst.de Link: https://lkml.kernel.org/r/20230307143125.27778-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Jan Kara <jack@suse.cz> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28lazy tlb: introduce lazy tlb mm refcount helper functionsNicholas Piggin
Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. There is no functional change with this patch. Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-29btrfs: ignore fiemap path cache when there are multiple paths for a nodeFilipe Manana
During fiemap, when walking backreferences to determine if a b+tree node/leaf is shared, we may find a tree block (leaf or node) for which two parents were added to the references ulist. This happens if we get for example one direct ref (shared tree block ref) and one indirect ref (non-shared tree block ref) for the tree block at the current level, which can happen during relocation. In that case the fiemap path cache can not be used since it's meant for a single path, with one tree block at each possible level, so having multiple references for a tree block at any level may result in getting the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the warning: WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL) at lookup_backref_shared_cache() and at store_backref_shared_cache(). This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the warning is there just to catch any unexpected case like the one described above. However if a user finds this it may be scary and get reported. So just ignore the path cache once we find a tree block for which there are more than one reference, which is the less common case, and update the cache with the sharedness check result for all levels below the level for which we found multiple references. Reported-by: Jarno Pelkonen <jarno.pelkonen@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/ Fixes: 12a824dc67a6 ("btrfs: speedup checking for extent sharedness during fiemap") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28fsdax: dedupe should compare the min of two iters' lengthShiyang Ruan
In an dedupe comparison iter loop, the length of iomap_iter decreases because it implies the remaining length after each iteration. The dedupe command will fail with -EIO if the range is larger than one page size and not aligned to the page size. Also report warning in dmesg: [ 4338.498374] ------------[ cut here ]------------ [ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16 ... The compare function should use the min length of the current iters, not the total length. Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTENShiyang Ruan
unshare copies data from source to destination. But if the source is HOLE or UNWRITTEN extents, we should zero the destination, otherwise the HOLE or UNWRITTEN part will be user-visible old data of the new allocated extent. Found by running generic/649 while mounting with -o dax=always on pmem. Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-27fs-verity: simplify sysctls with register_sysctl()Luis Chamberlain
register_sysctl_paths() is only needed if your child (directories) have entries but this does not so just use register_sysctl() so to do away with the path specification. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230302202826.776286-10-mcgrof@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27fs/buffer.c: use b_folio for fsverity workEric Biggers
Use b_folio now that it exists. This removes an unnecessary call to compound_head(). No actual change in behavior. Link: https://lore.kernel.org/r/20230224232530.98440-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27fscrypt: use WARN_ON_ONCE instead of WARN_ONEric Biggers
As per Linus's suggestion (https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com), use WARN_ON_ONCE instead of WARN_ON. This barely adds any extra overhead, and it makes it so that if any of these ever becomes reachable (they shouldn't, but that's the point), the logs can't be flooded. Link: https://lore.kernel.org/r/20230320233943.73600-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27fscrypt: new helper function - fscrypt_prepare_lookup_partial()Luís Henriques
This patch introduces a new helper function which can be used both in lookups and in atomic_open operations by filesystems that want to handle filename encryption and no-key dentries themselves. The reason for this function to be used in atomic open is that this operation can act as a lookup if handed a dentry that is negative. And in this case we may need to set DCACHE_NOKEY_NAME. Signed-off-by: Luís Henriques <lhenriques@suse.de> Tested-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> [ebiggers: improved the function comment, and moved the function to just below __fscrypt_prepare_lookup()] Link: https://lore.kernel.org/r/20230320220149.21863-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27fs/buffer.c: use b_folio for fscrypt workEric Biggers
Use b_folio now that it exists. This removes an unnecessary call to compound_head(). No actual change in behavior. Link: https://lore.kernel.org/r/20230224232503.98372-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-28gfs2: Fix inode height consistency checkAndreas Gruenbacher
The maximum allowed height of an inode's metadata tree depends on the filesystem block size; it is lower for bigger-block filesystems. When reading in an inode, make sure that the height doesn't exceed the maximum allowed height. Arrays like sd_heightsize are sized to be big enough for any filesystem block size; they will often be slightly bigger than what's needed for a specific filesystem. Reported-by: syzbot+45d4691b1ed3c48eba05@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-03-28btrfs: fix deadlock when aborting transaction during relocation with scrubFilipe Manana
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANS_STATE_COMMIT_START), we will deadlock if there is a paused scrub. That results in stack traces like the following: [42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? start_transaction+0xcb/0x610 [btrfs] [42.936] prepare_to_relocate+0x111/0x1a0 [btrfs] [42.936] relocate_block_group+0x57/0x5d0 [btrfs] [42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [42.936] ? __pfx_autoremove_wake_function+0x10/0x10 [42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [42.936] btrfs_balance+0x8ff/0x11d0 [btrfs] [42.936] ? __kmem_cache_alloc_node+0x14a/0x410 [42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs] [42.937] ? mod_objcg_state+0xd2/0x360 [42.937] ? refill_obj_stock+0xb0/0x160 [42.937] ? seq_release+0x25/0x30 [42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [42.937] ? percpu_counter_add_batch+0x2e/0xa0 [42.937] ? __x64_sys_ioctl+0x88/0xc0 [42.937] __x64_sys_ioctl+0x88/0xc0 [42.937] do_syscall_64+0x38/0x90 [42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.196] task:btrfs state:D stack:0 pid:346772 ppid:1 flags:0x00004002 [59.196] Call Trace: [59.196] <TASK> [59.196] __schedule+0x392/0xa70 [59.196] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.196] schedule+0x5d/0xd0 [59.196] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.197] ? __pfx_autoremove_wake_function+0x10/0x10 [59.197] scrub_pause_off+0x21/0x50 [btrfs] [59.197] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.197] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] scrub_stripe+0x20d/0x740 [btrfs] [59.198] scrub_chunk+0xc4/0x130 [btrfs] [59.198] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.199] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.199] ? _copy_from_user+0x7b/0x80 [59.199] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.199] ? refill_stock+0x33/0x50 [59.199] ? should_failslab+0xa/0x20 [59.199] ? kmem_cache_alloc_node+0x151/0x460 [59.199] ? alloc_io_context+0x1b/0x80 [59.199] ? preempt_count_add+0x70/0xa0 [59.199] ? __x64_sys_ioctl+0x88/0xc0 [59.199] __x64_sys_ioctl+0x88/0xc0 [59.199] do_syscall_64+0x38/0x90 [59.199] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.199] RIP: 0033:0x7f82ffaffe9b [59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b [59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003 [59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640 [59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.199] </TASK> [59.199] INFO: task btrfs:346773 blocked for more than 120 seconds. [59.200] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.201] task:btrfs state:D stack:0 pid:346773 ppid:1 flags:0x00004002 [59.201] Call Trace: [59.201] <TASK> [59.201] __schedule+0x392/0xa70 [59.201] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.201] schedule+0x5d/0xd0 [59.201] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.201] ? __pfx_autoremove_wake_function+0x10/0x10 [59.201] scrub_pause_off+0x21/0x50 [btrfs] [59.202] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.202] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.202] ? __pfx_autoremove_wake_function+0x10/0x10 [59.202] scrub_stripe+0x20d/0x740 [btrfs] [59.202] scrub_chunk+0xc4/0x130 [btrfs] [59.203] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.203] ? __pfx_autoremove_wake_function+0x10/0x10 [59.203] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.203] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.203] ? _copy_from_user+0x7b/0x80 [59.203] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.204] ? should_failslab+0xa/0x20 [59.204] ? kmem_cache_alloc_node+0x151/0x460 [59.204] ? alloc_io_context+0x1b/0x80 [59.204] ? preempt_count_add+0x70/0xa0 [59.204] ? __x64_sys_ioctl+0x88/0xc0 [59.204] __x64_sys_ioctl+0x88/0xc0 [59.204] do_syscall_64+0x38/0x90 [59.204] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.204] RIP: 0033:0x7f82ffaffe9b [59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b [59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003 [59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640 [59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.204] </TASK> [59.204] INFO: task btrfs:346774 blocked for more than 120 seconds. [59.205] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.206] task:btrfs state:D stack:0 pid:346774 ppid:1 flags:0x00004002 [59.206] Call Trace: [59.206] <TASK> [59.206] __schedule+0x392/0xa70 [59.206] schedule+0x5d/0xd0 [59.206] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.206] ? __pfx_autoremove_wake_function+0x10/0x10 [59.206] scrub_pause_off+0x21/0x50 [btrfs] [59.207] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.207] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.207] ? __pfx_autoremove_wake_function+0x10/0x10 [59.207] scrub_stripe+0x20d/0x740 [btrfs] [59.208] scrub_chunk+0xc4/0x130 [btrfs] [59.208] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.208] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.208] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.208] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.209] ? _copy_from_user+0x7b/0x80 [59.209] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.209] ? should_failslab+0xa/0x20 [59.209] ? kmem_cache_alloc_node+0x151/0x460 [59.209] ? alloc_io_context+0x1b/0x80 [59.209] ? preempt_count_add+0x70/0xa0 [59.209] ? __x64_sys_ioctl+0x88/0xc0 [59.209] __x64_sys_ioctl+0x88/0xc0 [59.209] do_syscall_64+0x38/0x90 [59.209] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.209] RIP: 0033:0x7f82ffaffe9b [59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b [59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003 [59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640 [59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.209] </TASK> [59.209] INFO: task btrfs:346775 blocked for more than 120 seconds. [59.210] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.211] task:btrfs state:D stack:0 pid:346775 ppid:1 flags:0x00004002 [59.211] Call Trace: [59.211] <TASK> [59.211] __schedule+0x392/0xa70 [59.211] schedule+0x5d/0xd0 [59.211] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.211] ? __pfx_autoremove_wake_function+0x10/0x10 [59.211] scrub_pause_off+0x21/0x50 [btrfs] [59.212] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.212] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.212] ? __pfx_autoremove_wake_function+0x10/0x10 [59.212] scrub_stripe+0x20d/0x740 [btrfs] [59.213] scrub_chunk+0xc4/0x130 [btrfs] [59.213] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.213] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.213] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.213] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.214] ? _copy_from_user+0x7b/0x80 [59.214] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.214] ? should_failslab+0xa/0x20 [59.214] ? kmem_cache_alloc_node+0x151/0x460 [59.214] ? alloc_io_context+0x1b/0x80 [59.214] ? preempt_count_add+0x70/0xa0 [59.214] ? __x64_sys_ioctl+0x88/0xc0 [59.214] __x64_sys_ioctl+0x88/0xc0 [59.214] do_syscall_64+0x38/0x90 [59.214] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.214] RIP: 0033:0x7f82ffaffe9b [59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b [59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003 [59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640 [59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.214] </TASK> [59.214] INFO: task btrfs:346776 blocked for more than 120 seconds. [59.215] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.217] task:btrfs state:D stack:0 pid:346776 ppid:1 flags:0x00004002 [59.217] Call Trace: [59.217] <TASK> [59.217] __schedule+0x392/0xa70 [59.217] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.217] schedule+0x5d/0xd0 [59.217] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.217] ? __pfx_autoremove_wake_function+0x10/0x10 [59.217] scrub_pause_off+0x21/0x50 [btrfs] [59.217] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.217] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.218] ? __pfx_autoremove_wake_function+0x10/0x10 [59.218] scrub_stripe+0x20d/0x740 [btrfs] [59.218] scrub_chunk+0xc4/0x130 [btrfs] [59.218] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.219] ? __pfx_autoremove_wake_function+0x10/0x10 [59.219] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.219] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.219] ? _copy_from_user+0x7b/0x80 [59.219] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.219] ? should_failslab+0xa/0x20 [59.219] ? kmem_cache_alloc_node+0x151/0x460 [59.219] ? alloc_io_context+0x1b/0x80 [59.219] ? preempt_count_add+0x70/0xa0 [59.219] ? __x64_sys_ioctl+0x88/0xc0 [59.219] __x64_sys_ioctl+0x88/0xc0 [59.219] do_syscall_64+0x38/0x90 [59.219] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.219] RIP: 0033:0x7f82ffaffe9b [59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b [59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003 [59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640 [59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.219] </TASK> [59.219] INFO: task btrfs:346822 blocked for more than 120 seconds. [59.220] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.222] task:btrfs state:D stack:0 pid:346822 ppid:1 flags:0x00004002 [59.222] Call Trace: [59.222] <TASK> [59.222] __schedule+0x392/0xa70 [59.222] schedule+0x5d/0xd0 [59.222] btrfs_scrub_cancel+0x91/0x100 [btrfs] [59.222] ? __pfx_autoremove_wake_function+0x10/0x10 [59.222] btrfs_commit_transaction+0x572/0xeb0 [btrfs] [59.223] ? start_transaction+0xcb/0x610 [btrfs] [59.223] prepare_to_relocate+0x111/0x1a0 [btrfs] [59.223] relocate_block_group+0x57/0x5d0 [btrfs] [59.223] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [59.223] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [59.224] ? __pfx_autoremove_wake_function+0x10/0x10 [59.224] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [59.224] btrfs_balance+0x8ff/0x11d0 [btrfs] [59.224] ? __kmem_cache_alloc_node+0x14a/0x410 [59.224] btrfs_ioctl+0x2334/0x32c0 [btrfs] [59.225] ? mod_objcg_state+0xd2/0x360 [59.225] ? refill_obj_stock+0xb0/0x160 [59.225] ? seq_release+0x25/0x30 [59.225] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [59.225] ? percpu_counter_add_batch+0x2e/0xa0 [59.225] ? __x64_sys_ioctl+0x88/0xc0 [59.225] __x64_sys_ioctl+0x88/0xc0 [59.225] do_syscall_64+0x38/0x90 [59.225] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.225] RIP: 0033:0x7f381a6ffe9b [59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [59.225] </TASK> What happens is the following: 1) A scrub is running, so fs_info->scrubs_running is 1; 2) Task A starts block group relocation, and at btrfs_relocate_chunk() it pauses scrub by calling btrfs_scrub_pause(). That increments fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running); 3) The scrub task pauses at scrub_pause_off(), waiting for fs_info->scrub_pause_req to decrease to 0; 4) Task A then enters btrfs_relocate_block_group(), and down that call chain we start a transaction and then attempt to commit it; 5) When task A calls btrfs_commit_transaction(), it either will do the commit itself or wait for some other task that already started the commit of the transaction - it doesn't matter which case; 6) The transaction commit enters state TRANS_STATE_COMMIT_START; 7) An error happens during the transaction commit, like -ENOSPC when running delayed refs or delayed items for example; 8) This results in calling transaction.c:cleanup_transaction(), where we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req from 0 to 1, and blocking this task waiting for fs_info->scrubs_running to decrease to 0; 9) From this point on, both the transaction commit and the scrub task hang forever: 1) The transaction commit is waiting for fs_info->scrubs_running to be decreased to 0; 2) The scrub task is at scrub_pause_off() waiting for fs_info->scrub_pause_req to decrease to 0 - so it can not proceed to stop the scrub and decrement fs_info->scrubs_running from 0 to 1. Therefore resulting in a deadlock. Fix this by having cleanup_transaction(), called if a transaction commit fails, not call btrfs_scrub_cancel() if relocation is in progress, and having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if the relocation failed and a transaction abort happened. This was triggered with btrfs/061 from fstests. Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28btrfs: scan device in non-exclusive modeAnand Jain
This fixes mkfs/mount/check failures due to race with systemd-udevd scan. During the device scan initiated by systemd-udevd, other user space EXCL operations such as mkfs, mount, or check may get blocked and result in a "Device or resource busy" error. This is because the device scan process opens the device with the EXCL flag in the kernel. Two reports were received: - btrfs/179 test case, where the fsck command failed with the -EBUSY error - LTP pwritev03 test case, where mkfs.vfs failed with the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem on the device. In both cases, fsck and mkfs (respectively) were racing with a systemd-udevd device scan, and systemd-udevd won, resulting in the -EBUSY error for fsck and mkfs. Reproducing the problem has been difficult because there is a very small window during which these userspace threads can race to acquire the exclusive device open. Even on the system where the problem was observed, the problem occurrences were anywhere between 10 to 400 iterations and chances of reproducing decreases with debug printk()s. However, an exclusive device open is unnecessary for the scan process, as there are no write operations on the device during scan. Furthermore, during the mount process, the superblock is re-read in the below function call chain: btrfs_mount_root btrfs_open_devices open_fs_devices btrfs_open_one_device btrfs_get_bdev_and_sb So, to fix this issue, removes the FMODE_EXCL flag from the scan operation, and add a comment. The case where mkfs may still write to the device and a scan is running, the btrfs signature is not written at that time so scan will not recognize such device. Reported-by: Sherry Yang <sherry.yang@oracle.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28btrfs: fix race between quota disable and quota assign ioctlsFilipe Manana
The quota assign ioctl can currently run in parallel with a quota disable ioctl call. The assign ioctl uses the quota root, while the disable ioctl frees that root, and therefore we can have a use-after-free triggered in the assign ioctl, leading to a trace like the following when KASAN is enabled: [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0 [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736 [672.724][T736] [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37 [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [672.727][T736] Call Trace: [672.728][T736] <TASK> [672.728][T736] dump_stack_lvl+0xd9/0x150 [672.725][T736] print_report+0xc1/0x5e0 [672.720][T736] ? __virt_addr_valid+0x61/0x2e0 [672.727][T736] ? __phys_addr+0xc9/0x150 [672.725][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.722][T736] kasan_report+0xc0/0xf0 [672.729][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.724][T736] btrfs_search_slot+0x2962/0x2db0 [672.723][T736] ? fs_reclaim_acquire+0xba/0x160 [672.722][T736] ? split_leaf+0x13d0/0x13d0 [672.726][T736] ? rcu_is_watching+0x12/0xb0 [672.723][T736] ? kmem_cache_alloc+0x338/0x3c0 [672.722][T736] update_qgroup_status_item+0xf7/0x320 [672.724][T736] ? add_qgroup_rb+0x3d0/0x3d0 [672.739][T736] ? do_raw_spin_lock+0x12d/0x2b0 [672.730][T736] ? spin_bug+0x1d0/0x1d0 [672.737][T736] btrfs_run_qgroups+0x5de/0x840 [672.730][T736] ? btrfs_qgroup_rescan_worker+0xa70/0xa70 [672.738][T736] ? __del_qgroup_relation+0x4ba/0xe00 [672.738][T736] btrfs_ioctl+0x3d58/0x5d80 [672.735][T736] ? tomoyo_path_number_perm+0x16a/0x550 [672.737][T736] ? tomoyo_execute_permission+0x4a0/0x4a0 [672.731][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.737][T736] ? __sanitizer_cov_trace_switch+0x54/0x90 [672.734][T736] ? do_vfs_ioctl+0x132/0x1660 [672.730][T736] ? vfs_fileattr_set+0xc40/0xc40 [672.730][T736] ? _raw_spin_unlock_irq+0x2e/0x50 [672.732][T736] ? sigprocmask+0xf2/0x340 [672.737][T736] ? __fget_files+0x26a/0x480 [672.732][T736] ? bpf_lsm_file_ioctl+0x9/0x10 [672.738][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.736][T736] __x64_sys_ioctl+0x198/0x210 [672.736][T736] do_syscall_64+0x39/0xb0 [672.731][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.739][T736] RIP: 0033:0x4556ad [672.742][T736] </TASK> [672.743][T736] [672.748][T736] Allocated by task 27677: [672.743][T736] kasan_save_stack+0x22/0x40 [672.741][T736] kasan_set_track+0x25/0x30 [672.741][T736] __kasan_kmalloc+0xa4/0xb0 [672.749][T736] btrfs_alloc_root+0x48/0x90 [672.746][T736] btrfs_create_tree+0x146/0xa20 [672.744][T736] btrfs_quota_enable+0x461/0x1d20 [672.743][T736] btrfs_ioctl+0x4a1c/0x5d80 [672.747][T736] __x64_sys_ioctl+0x198/0x210 [672.749][T736] do_syscall_64+0x39/0xb0 [672.744][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.756][T736] [672.757][T736] Freed by task 27677: [672.759][T736] kasan_save_stack+0x22/0x40 [672.759][T736] kasan_set_track+0x25/0x30 [672.756][T736] kasan_save_free_info+0x2e/0x50 [672.751][T736] ____kasan_slab_free+0x162/0x1c0 [672.758][T736] slab_free_freelist_hook+0x89/0x1c0 [672.752][T736] __kmem_cache_free+0xaf/0x2e0 [672.752][T736] btrfs_put_root+0x1ff/0x2b0 [672.759][T736] btrfs_quota_disable+0x80a/0xbc0 [672.752][T736] btrfs_ioctl+0x3e5f/0x5d80 [672.756][T736] __x64_sys_ioctl+0x198/0x210 [672.753][T736] do_syscall_64+0x39/0xb0 [672.765][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.769][T736] [672.768][T736] The buggy address belongs to the object at ffff888022ec0000 [672.768][T736] which belongs to the cache kmalloc-4k of size 4096 [672.769][T736] The buggy address is located 520 bytes inside of [672.769][T736] freed 4096-byte region [ffff888022ec0000, ffff888022ec1000) [672.760][T736] [672.764][T736] The buggy address belongs to the physical page: [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0 [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002 [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 [672.771][T736] page dumped because: kasan: bad access detected [672.778][T736] page_owner tracks the page as allocated [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88 [672.779][T736] get_page_from_freelist+0x119c/0x2d50 [672.779][T736] __alloc_pages+0x1cb/0x4a0 [672.776][T736] alloc_pages+0x1aa/0x270 [672.773][T736] allocate_slab+0x260/0x390 [672.771][T736] ___slab_alloc+0xa9a/0x13e0 [672.778][T736] __slab_alloc.constprop.0+0x56/0xb0 [672.771][T736] __kmem_cache_alloc_node+0x136/0x320 [672.789][T736] __kmalloc+0x4e/0x1a0 [672.783][T736] tomoyo_realpath_from_path+0xc3/0x600 [672.781][T736] tomoyo_path_perm+0x22f/0x420 [672.782][T736] tomoyo_path_unlink+0x92/0xd0 [672.780][T736] security_path_unlink+0xdb/0x150 [672.788][T736] do_unlinkat+0x377/0x680 [672.788][T736] __x64_sys_unlink+0xca/0x110 [672.789][T736] do_syscall_64+0x39/0xb0 [672.783][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.784][T736] page last free stack trace: [672.787][T736] free_pcp_prepare+0x4e5/0x920 [672.787][T736] free_unref_page+0x1d/0x4e0 [672.784][T736] __unfreeze_partials+0x17c/0x1a0 [672.797][T736] qlist_free_all+0x6a/0x180 [672.796][T736] kasan_quarantine_reduce+0x189/0x1d0 [672.797][T736] __kasan_slab_alloc+0x64/0x90 [672.793][T736] kmem_cache_alloc+0x17c/0x3c0 [672.799][T736] getname_flags.part.0+0x50/0x4e0 [672.799][T736] getname_flags+0x9e/0xe0 [672.792][T736] vfs_fstatat+0x77/0xb0 [672.791][T736] __do_sys_newlstat+0x84/0x100 [672.798][T736] do_syscall_64+0x39/0xb0 [672.796][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.790][T736] [672.791][T736] Memory state around the buggy address: [672.799][T736] ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.805][T736] ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ^ [672.809][T736] ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex before calling btrfs_run_qgroups(), which is what all qgroup ioctls should call. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/ CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-27fs/ntfs3: Fix root inode checkingKonstantin Komarov
Separate checking inode->i_op and inode itself. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302162319.bDJOuyfy-lkp@intel.com/ Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Print details about mount failsKonstantin Komarov
Added error mesages with error codes. Minor refactoring and code formatting. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Add missed "nocase" in ntfs_show_optionsKonstantin Komarov
Sort processing ntfs3's mount options in same order they declared. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Code formatting and refactoringKonstantin Komarov
Added minor refactoring. Added and fixed some comments. In some places, the code has been reformatted to fit into 80 columns. clang-format-12 was used to format code according kernel's .clang-format. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Changed ntfs_get_acl() to use dentryKonstantin Komarov
ntfs_get_acl changed to match new interface in struct inode_operations. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Remove field sbi->used.bitmap.set_tailKonstantin Komarov
This field is not used in driver. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Undo critial modificatins to keep directory consistencyKonstantin Komarov
Affect xfstest 320. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Undo endian changesKonstantin Komarov
sbi->mft.reserved_bitmap is in-memory (not on-disk!) bitmap. Assumed cpu endian is faster than fixed endian. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Optimization in ntfs_set_state()Konstantin Komarov
The current volume flags are updated only if VOLUME_FLAG_DIRTY has been changed. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix ntfs_create_inode()Konstantin Komarov
Previous variant creates an inode that requires update the parent directory (ea_packed_size). Operations in ntfs_create_inode have been rearranged so we insert new directory entry with correct ea_packed_size and new created inode does not require update it's parent directory. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Remove noacsrulesKonstantin Komarov
Currently, this option does not work properly. Its use leads to unstable results. If we figure out how to implement it without errors, we will add it later. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Use bh_read to simplify codeKonstantin Komarov
The duplicating code is replaced by a generic function bh_read() Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix a possible null-pointer dereference in ni_clear()Jia-Ju Bai
In a previous commit c1006bd13146, ni->mi.mrec in ni_write_inode() could be NULL, and thus a NULL check is added for this variable. However, in the same call stack, ni->mi.mrec can be also dereferenced in ni_clear(): ntfs_evict_inode(inode) ni_write_inode(inode, ...) ni = ntfs_i(inode); is_rec_inuse(ni->mi.mrec) -> Add a NULL check by previous commit ni_clear(ntfs_i(inode)) is_rec_inuse(ni->mi.mrec) -> No check Thus, a possible null-pointer dereference may exist in ni_clear(). To fix it, a NULL check is added in this function. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Refactoring of various minor issuesKonstantin Komarov
Removed unused macro. Changed null pointer checking. Fixed inconsistent indenting. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Restore overflow checking for attr size in mi_enum_attrKonstantin Komarov
Fixed comment. Removed explicit initialization for INDEX_ROOT. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Check for extremely large size of $AttrDefKonstantin Komarov
Added additional checking for size of $AttrDef. Added comment. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Improved checking of attribute's name lengthKonstantin Komarov
Added comment, added null pointer checking. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Add null pointer checksKonstantin Komarov
Added null pointer checks in function ntfs_security_init. Also added le32_to_cpu in functions ntfs_security_init and indx_read. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: fix spelling mistake "attibute" -> "attribute"Yu Zhe
There is a spelling mistake in comment. Fix it. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Add length check in indx_get_rootEdward Lo
This adds a length check to guarantee the retrieved index root is legit. [ 162.459513] BUG: KASAN: use-after-free in hdr_find_e.isra.0+0x10c/0x320 [ 162.460176] Read of size 2 at addr ffff8880037bca99 by task mount/243 [ 162.460851] [ 162.461252] CPU: 0 PID: 243 Comm: mount Not tainted 6.0.0-rc7 #42 [ 162.461744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 162.462609] Call Trace: [ 162.462954] <TASK> [ 162.463276] dump_stack_lvl+0x49/0x63 [ 162.463822] print_report.cold+0xf5/0x689 [ 162.464608] ? unwind_get_return_address+0x3a/0x60 [ 162.465766] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.466975] kasan_report+0xa7/0x130 [ 162.467506] ? _raw_spin_lock_irq+0xc0/0xf0 [ 162.467998] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.468536] __asan_load2+0x68/0x90 [ 162.468923] hdr_find_e.isra.0+0x10c/0x320 [ 162.469282] ? cmp_uints+0xe0/0xe0 [ 162.469557] ? cmp_sdh+0x90/0x90 [ 162.469864] ? ni_find_attr+0x214/0x300 [ 162.470217] ? ni_load_mi+0x80/0x80 [ 162.470479] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.470931] ? ntfs_bread_run+0x190/0x190 [ 162.471307] ? indx_get_root+0xe4/0x190 [ 162.471556] ? indx_get_root+0x140/0x190 [ 162.471833] ? indx_init+0x1e0/0x1e0 [ 162.472069] ? fnd_clear+0x115/0x140 [ 162.472363] ? _raw_spin_lock_irqsave+0x100/0x100 [ 162.472731] indx_find+0x184/0x470 [ 162.473461] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 162.474429] ? indx_find_buffer+0x2d0/0x2d0 [ 162.474704] ? do_syscall_64+0x3b/0x90 [ 162.474962] dir_search_u+0x196/0x2f0 [ 162.475381] ? ntfs_nls_to_utf16+0x450/0x450 [ 162.475661] ? ntfs_security_init+0x3d6/0x440 [ 162.475906] ? is_sd_valid+0x180/0x180 [ 162.476191] ntfs_extend_init+0x13f/0x2c0 [ 162.476496] ? ntfs_fix_post_read+0x130/0x130 [ 162.476861] ? iput.part.0+0x286/0x320 [ 162.477325] ntfs_fill_super+0x11e0/0x1b50 [ 162.477709] ? put_ntfs+0x1d0/0x1d0 [ 162.477970] ? vsprintf+0x20/0x20 [ 162.478258] ? set_blocksize+0x95/0x150 [ 162.478538] get_tree_bdev+0x232/0x370 [ 162.478789] ? put_ntfs+0x1d0/0x1d0 [ 162.479038] ntfs_fs_get_tree+0x15/0x20 [ 162.479374] vfs_get_tree+0x4c/0x130 [ 162.479729] path_mount+0x654/0xfe0 [ 162.480124] ? putname+0x80/0xa0 [ 162.480484] ? finish_automount+0x2e0/0x2e0 [ 162.480894] ? putname+0x80/0xa0 [ 162.481467] ? kmem_cache_free+0x1c4/0x440 [ 162.482280] ? putname+0x80/0xa0 [ 162.482714] do_mount+0xd6/0xf0 [ 162.483264] ? path_mount+0xfe0/0xfe0 [ 162.484782] ? __kasan_check_write+0x14/0x20 [ 162.485593] __x64_sys_mount+0xca/0x110 [ 162.486024] do_syscall_64+0x3b/0x90 [ 162.486543] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.487141] RIP: 0033:0x7f9d374e948a [ 162.488324] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 162.489728] RSP: 002b:00007ffe30e73d18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 162.490971] RAX: ffffffffffffffda RBX: 0000561cdb43a060 RCX: 00007f9d374e948a [ 162.491669] RDX: 0000561cdb43a260 RSI: 0000561cdb43a2e0 RDI: 0000561cdb442af0 [ 162.492050] RBP: 0000000000000000 R08: 0000561cdb43a280 R09: 0000000000000020 [ 162.492459] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000561cdb442af0 [ 162.493183] R13: 0000561cdb43a260 R14: 0000000000000000 R15: 00000000ffffffff [ 162.493644] </TASK> [ 162.493908] [ 162.494214] The buggy address belongs to the physical page: [ 162.494761] page:000000003e38a3d5 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x37bc [ 162.496064] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 162.497278] raw: 000fffffc0000000 ffffea00000df1c8 ffffea00000df008 0000000000000000 [ 162.498928] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 162.500542] page dumped because: kasan: bad access detected [ 162.501057] [ 162.501242] Memory state around the buggy address: [ 162.502230] ffff8880037bc980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.502977] ffff8880037bca00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503522] >ffff8880037bca80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503963] ^ [ 162.504370] ffff8880037bcb00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.504766] ffff8880037bcb80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Signed-off-by: Edward Lo <edward.lo@ambergroup.io> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix slab-out-of-bounds read in hdr_delete_de()Zeng Heng
Here is a BUG report from syzbot: BUG: KASAN: slab-out-of-bounds in hdr_delete_de+0xe0/0x150 fs/ntfs3/index.c:806 Read of size 16842960 at addr ffff888079cc0600 by task syz-executor934/3631 Call Trace: memmove+0x25/0x60 mm/kasan/shadow.c:54 hdr_delete_de+0xe0/0x150 fs/ntfs3/index.c:806 indx_delete_entry+0x74f/0x3670 fs/ntfs3/index.c:2193 ni_remove_name+0x27a/0x980 fs/ntfs3/frecord.c:2910 ntfs_unlink_inode+0x3d4/0x720 fs/ntfs3/inode.c:1712 ntfs_rename+0x41a/0xcb0 fs/ntfs3/namei.c:276 Before using the meta-data in struct INDEX_HDR, we need to check index header valid or not. Otherwise, the corruptedi (or malicious) fs image can cause out-of-bounds access which could make kernel panic. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Reported-by: syzbot+9c2811fd56591639ff5f@syzkaller.appspotmail.com Signed-off-by: Zeng Heng <zengheng4@huawei.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Validate MFT flags before replaying logsEdward Lo
Log load and replay is part of the metadata handle flow during mount operation. The $MFT record will be loaded and used while replaying logs. However, a malformed $MFT record, say, has RECORD_FLAG_DIR flag set and contains an ATTR_ROOT attribute will misguide kernel to treat it as a directory, and try to free the allocated resources when the corresponding inode is freed, which will cause an invalid kfree because the memory hasn't actually been allocated. [ 101.368647] BUG: KASAN: invalid-free in kvfree+0x2c/0x40 [ 101.369457] [ 101.369986] CPU: 0 PID: 198 Comm: mount Not tainted 6.0.0-rc7+ #5 [ 101.370529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 101.371362] Call Trace: [ 101.371795] <TASK> [ 101.372157] dump_stack_lvl+0x49/0x63 [ 101.372658] print_report.cold+0xf5/0x689 [ 101.373022] ? ni_write_inode+0x754/0xd90 [ 101.373378] ? kvfree+0x2c/0x40 [ 101.373698] kasan_report_invalid_free+0x77/0xf0 [ 101.374058] ? kvfree+0x2c/0x40 [ 101.374352] ? kvfree+0x2c/0x40 [ 101.374668] __kasan_slab_free+0x189/0x1b0 [ 101.374992] ? kvfree+0x2c/0x40 [ 101.375271] kfree+0x168/0x3b0 [ 101.375717] kvfree+0x2c/0x40 [ 101.376002] indx_clear+0x26/0x60 [ 101.376316] ni_clear+0xc5/0x290 [ 101.376661] ntfs_evict_inode+0x45/0x70 [ 101.377001] evict+0x199/0x280 [ 101.377432] iput.part.0+0x286/0x320 [ 101.377819] iput+0x32/0x50 [ 101.378166] ntfs_loadlog_and_replay+0x143/0x320 [ 101.378656] ? ntfs_bio_fill_1+0x510/0x510 [ 101.378968] ? iput.part.0+0x286/0x320 [ 101.379367] ntfs_fill_super+0xecb/0x1ba0 [ 101.379729] ? put_ntfs+0x1d0/0x1d0 [ 101.380046] ? vsprintf+0x20/0x20 [ 101.380542] ? mutex_unlock+0x81/0xd0 [ 101.380914] ? set_blocksize+0x95/0x150 [ 101.381597] get_tree_bdev+0x232/0x370 [ 101.382254] ? put_ntfs+0x1d0/0x1d0 [ 101.382699] ntfs_fs_get_tree+0x15/0x20 [ 101.383094] vfs_get_tree+0x4c/0x130 [ 101.383675] path_mount+0x654/0xfe0 [ 101.384203] ? putname+0x80/0xa0 [ 101.384540] ? finish_automount+0x2e0/0x2e0 [ 101.384943] ? putname+0x80/0xa0 [ 101.385362] ? kmem_cache_free+0x1c4/0x440 [ 101.385968] ? putname+0x80/0xa0 [ 101.386666] do_mount+0xd6/0xf0 [ 101.387228] ? path_mount+0xfe0/0xfe0 [ 101.387585] ? __kasan_check_write+0x14/0x20 [ 101.387979] __x64_sys_mount+0xca/0x110 [ 101.388436] do_syscall_64+0x3b/0x90 [ 101.388757] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 101.389289] RIP: 0033:0x7fa0f70e948a [ 101.390048] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 101.391297] RSP: 002b:00007ffc24fdecc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 101.391988] RAX: ffffffffffffffda RBX: 000055932c183060 RCX: 00007fa0f70e948a [ 101.392494] RDX: 000055932c183260 RSI: 000055932c1832e0 RDI: 000055932c18bce0 [ 101.393053] RBP: 0000000000000000 R08: 000055932c183280 R09: 0000000000000020 [ 101.393577] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055932c18bce0 [ 101.394044] R13: 000055932c183260 R14: 0000000000000000 R15: 00000000ffffffff [ 101.394747] </TASK> [ 101.395402] [ 101.396047] Allocated by task 198: [ 101.396724] kasan_save_stack+0x26/0x50 [ 101.397400] __kasan_slab_alloc+0x6d/0x90 [ 101.397974] kmem_cache_alloc_lru+0x192/0x5a0 [ 101.398524] ntfs_alloc_inode+0x23/0x70 [ 101.399137] alloc_inode+0x3b/0xf0 [ 101.399534] iget5_locked+0x54/0xa0 [ 101.400026] ntfs_iget5+0xaf/0x1780 [ 101.400414] ntfs_loadlog_and_replay+0xe5/0x320 [ 101.400883] ntfs_fill_super+0xecb/0x1ba0 [ 101.401313] get_tree_bdev+0x232/0x370 [ 101.401774] ntfs_fs_get_tree+0x15/0x20 [ 101.402224] vfs_get_tree+0x4c/0x130 [ 101.402673] path_mount+0x654/0xfe0 [ 101.403160] do_mount+0xd6/0xf0 [ 101.403537] __x64_sys_mount+0xca/0x110 [ 101.404058] do_syscall_64+0x3b/0x90 [ 101.404333] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 101.404816] [ 101.405067] The buggy address belongs to the object at ffff888008cc9ea0 [ 101.405067] which belongs to the cache ntfs_inode_cache of size 992 [ 101.406171] The buggy address is located 232 bytes inside of [ 101.406171] 992-byte region [ffff888008cc9ea0, ffff888008cca280) [ 101.406995] [ 101.408559] The buggy address belongs to the physical page: [ 101.409320] page:00000000dccf19dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cc8 [ 101.410654] head:00000000dccf19dd order:2 compound_mapcount:0 compound_pincount:0 [ 101.411533] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 101.412665] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888003695140 [ 101.413209] raw: 0000000000000000 00000000800e000e 00000001ffffffff 0000000000000000 [ 101.413799] page dumped because: kasan: bad access detected [ 101.414213] [ 101.414427] Memory state around the buggy address: [ 101.414991] ffff888008cc9e80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.415785] ffff888008cc9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.416933] >ffff888008cc9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.417857] ^ [ 101.418566] ffff888008cca000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.419704] ffff888008cca080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Signed-off-by: Edward Lo <edward.lo@ambergroup.io> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix NULL dereference in ni_write_inodeAbdun Nihaal
Syzbot reports a NULL dereference in ni_write_inode. When creating a new inode, if allocation fails in mi_init function (called in mi_format_new function), mi->mrec is set to NULL. In the error path of this inode creation, mi->mrec is later dereferenced in ni_write_inode. Add a NULL check to prevent NULL dereference. Link: https://syzkaller.appspot.com/bug?extid=f45957555ed4a808cc7a Reported-and-tested-by: syzbot+f45957555ed4a808cc7a@syzkaller.appspotmail.com Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Enhance the attribute size checkEdward Lo
This combines the overflow and boundary check so that all attribute size will be properly examined while enumerating them. [ 169.181521] BUG: KASAN: slab-out-of-bounds in run_unpack+0x2e3/0x570 [ 169.183161] Read of size 1 at addr ffff8880094b6240 by task mount/247 [ 169.184046] [ 169.184925] CPU: 0 PID: 247 Comm: mount Not tainted 6.0.0-rc7+ #3 [ 169.185908] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 169.187066] Call Trace: [ 169.187492] <TASK> [ 169.188049] dump_stack_lvl+0x49/0x63 [ 169.188495] print_report.cold+0xf5/0x689 [ 169.188964] ? run_unpack+0x2e3/0x570 [ 169.189331] kasan_report+0xa7/0x130 [ 169.189714] ? run_unpack+0x2e3/0x570 [ 169.190079] __asan_load1+0x51/0x60 [ 169.190634] run_unpack+0x2e3/0x570 [ 169.191290] ? run_pack+0x840/0x840 [ 169.191569] ? run_lookup_entry+0xb3/0x1f0 [ 169.192443] ? mi_enum_attr+0x20a/0x230 [ 169.192886] run_unpack_ex+0xad/0x3e0 [ 169.193276] ? run_unpack+0x570/0x570 [ 169.193557] ? ni_load_mi+0x80/0x80 [ 169.193889] ? debug_smp_processor_id+0x17/0x20 [ 169.194236] ? mi_init+0x4a/0x70 [ 169.194496] attr_load_runs_vcn+0x166/0x1c0 [ 169.194851] ? attr_data_write_resident+0x250/0x250 [ 169.195188] mi_read+0x133/0x2c0 [ 169.195481] ntfs_iget5+0x277/0x1780 [ 169.196017] ? call_rcu+0x1c7/0x330 [ 169.196392] ? ntfs_get_block_bmap+0x70/0x70 [ 169.196708] ? evict+0x223/0x280 [ 169.197014] ? __kmalloc+0x33/0x540 [ 169.197305] ? wnd_init+0x15b/0x1b0 [ 169.197599] ntfs_fill_super+0x1026/0x1ba0 [ 169.197994] ? put_ntfs+0x1d0/0x1d0 [ 169.198299] ? vsprintf+0x20/0x20 [ 169.198583] ? mutex_unlock+0x81/0xd0 [ 169.198930] ? set_blocksize+0x95/0x150 [ 169.199269] get_tree_bdev+0x232/0x370 [ 169.199750] ? put_ntfs+0x1d0/0x1d0 [ 169.200094] ntfs_fs_get_tree+0x15/0x20 [ 169.200431] vfs_get_tree+0x4c/0x130 [ 169.200714] path_mount+0x654/0xfe0 [ 169.201067] ? putname+0x80/0xa0 [ 169.201358] ? finish_automount+0x2e0/0x2e0 [ 169.201965] ? putname+0x80/0xa0 [ 169.202445] ? kmem_cache_free+0x1c4/0x440 [ 169.203075] ? putname+0x80/0xa0 [ 169.203414] do_mount+0xd6/0xf0 [ 169.203719] ? path_mount+0xfe0/0xfe0 [ 169.203977] ? __kasan_check_write+0x14/0x20 [ 169.204382] __x64_sys_mount+0xca/0x110 [ 169.204711] do_syscall_64+0x3b/0x90 [ 169.205059] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 169.205571] RIP: 0033:0x7f67a80e948a [ 169.206327] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 169.208296] RSP: 002b:00007ffddf020f58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 169.209253] RAX: ffffffffffffffda RBX: 000055e2547a6060 RCX: 00007f67a80e948a [ 169.209777] RDX: 000055e2547a6260 RSI: 000055e2547a62e0 RDI: 000055e2547aeaf0 [ 169.210342] RBP: 0000000000000000 R08: 000055e2547a6280 R09: 0000000000000020 [ 169.210843] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055e2547aeaf0 [ 169.211307] R13: 000055e2547a6260 R14: 0000000000000000 R15: 00000000ffffffff [ 169.211913] </TASK> [ 169.212304] [ 169.212680] Allocated by task 0: [ 169.212963] (stack is not available) [ 169.213200] [ 169.213472] The buggy address belongs to the object at ffff8880094b5e00 [ 169.213472] which belongs to the cache UDP of size 1152 [ 169.214095] The buggy address is located 1088 bytes inside of [ 169.214095] 1152-byte region [ffff8880094b5e00, ffff8880094b6280) [ 169.214639] [ 169.215004] The buggy address belongs to the physical page: [ 169.215766] page:000000002e324c8c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x94b4 [ 169.218412] head:000000002e324c8c order:2 compound_mapcount:0 compound_pincount:0 [ 169.219078] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 169.220272] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888002409b40 [ 169.221006] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 169.222320] page dumped because: kasan: bad access detected [ 169.222922] [ 169.223119] Memory state around the buggy address: [ 169.224056] ffff8880094b6100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.224908] ffff8880094b6180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.225677] >ffff8880094b6200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.226445] ^ [ 169.227055] ffff8880094b6280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.227638] ffff8880094b6300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Signed-off-by: Edward Lo <edward.lo@ambergroup.io> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix OOB read in indx_insert_into_bufferZhangPeng
Syzbot reported a OOB read bug: BUG: KASAN: slab-out-of-bounds in indx_insert_into_buffer+0xaa3/0x13b0 fs/ntfs3/index.c:1755 Read of size 17168 at addr ffff8880255e06c0 by task syz-executor308/3630 Call Trace: <TASK> memmove+0x25/0x60 mm/kasan/shadow.c:54 indx_insert_into_buffer+0xaa3/0x13b0 fs/ntfs3/index.c:1755 indx_insert_entry+0x446/0x6b0 fs/ntfs3/index.c:1863 ntfs_create_inode+0x1d3f/0x35c0 fs/ntfs3/inode.c:1548 ntfs_create+0x3e/0x60 fs/ntfs3/namei.c:100 lookup_open fs/namei.c:3413 [inline] If the member struct INDEX_BUFFER *index of struct indx_node is incorrect, that is, the value of __le32 used is greater than the value of __le32 total in struct INDEX_HDR. Therefore, OOB read occurs when memmove is called in indx_insert_into_buffer(). Fix this by adding a check in hdr_find_e(). Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Reported-by: syzbot+d882d57193079e379309@syzkaller.appspotmail.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix NULL pointer dereference in 'ni_write_inode'Ye Bin
Syzbot found the following issue: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000016 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af56000 [0000000000000016] pgd=08000001090da003, p4d=08000001090da003, pud=08000001090ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 3036 Comm: syz-executor206 Not tainted 6.0.0-rc6-syzkaller-17739-g16c9f284e746 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : is_rec_inuse fs/ntfs3/ntfs.h:313 [inline] pc : ni_write_inode+0xac/0x798 fs/ntfs3/frecord.c:3232 lr : ni_write_inode+0xa0/0x798 fs/ntfs3/frecord.c:3226 sp : ffff8000126c3800 x29: ffff8000126c3860 x28: 0000000000000000 x27: ffff0000c8b02000 x26: ffff0000c7502320 x25: ffff0000c7502288 x24: 0000000000000000 x23: ffff80000cbec91c x22: ffff0000c8b03000 x21: ffff0000c8b02000 x20: 0000000000000001 x19: ffff0000c75024d8 x18: 00000000000000c0 x17: ffff80000dd1b198 x16: ffff80000db59158 x15: ffff0000c4b6b500 x14: 00000000000000b8 x13: 0000000000000000 x12: ffff0000c4b6b500 x11: ff80800008be1b60 x10: 0000000000000000 x9 : ffff0000c4b6b500 x8 : 0000000000000000 x7 : ffff800008be1b50 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000008 x1 : 0000000000000001 x0 : 0000000000000000 Call trace: is_rec_inuse fs/ntfs3/ntfs.h:313 [inline] ni_write_inode+0xac/0x798 fs/ntfs3/frecord.c:3232 ntfs_evict_inode+0x54/0x84 fs/ntfs3/inode.c:1744 evict+0xec/0x334 fs/inode.c:665 iput_final fs/inode.c:1748 [inline] iput+0x2c4/0x324 fs/inode.c:1774 ntfs_new_inode+0x7c/0xe0 fs/ntfs3/fsntfs.c:1660 ntfs_create_inode+0x20c/0xe78 fs/ntfs3/inode.c:1278 ntfs_create+0x54/0x74 fs/ntfs3/namei.c:100 lookup_open fs/namei.c:3413 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x804/0x11c4 fs/namei.c:3688 do_filp_open+0xdc/0x1b8 fs/namei.c:3718 do_sys_openat2+0xb8/0x22c fs/open.c:1311 do_sys_open fs/open.c:1327 [inline] __do_sys_openat fs/open.c:1343 [inline] __se_sys_openat fs/open.c:1338 [inline] __arm64_sys_openat+0xb0/0xe0 fs/open.c:1338 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall arch/arm64/kernel/syscall.c:52 [inline] el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206 el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654 el0t_64_sync+0x18c/0x190 Code: 97dafee4 340001b4 f9401328 2a1f03e0 (79402d14) ---[ end trace 0000000000000000 ]--- Above issue may happens as follows: ntfs_new_inode mi_init mi->mrec = kmalloc(sbi->record_size, GFP_NOFS); -->failed to allocate memory if (!mi->mrec) return -ENOMEM; iput iput_final evict ntfs_evict_inode ni_write_inode is_rec_inuse(ni->mi.mrec)-> As 'ni->mi.mrec' is NULL trigger NULL-ptr-deref To solve above issue if new inode failed make inode bad before call 'iput()' in 'ntfs_new_inode()'. Reported-by: syzbot+f45957555ed4a808cc7a@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix null-ptr-deref on inode->i_op in ntfs_lookup()ZhangPeng
Syzbot reported a null-ptr-deref bug: ntfs3: loop0: Different NTFS' sector size (1024) and media sector size (512) ntfs3: loop0: Mark volume as dirty due to NTFS errors general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] RIP: 0010:d_flags_for_inode fs/dcache.c:1980 [inline] RIP: 0010:__d_add+0x5ce/0x800 fs/dcache.c:2796 Call Trace: <TASK> d_splice_alias+0x122/0x3b0 fs/dcache.c:3191 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3688 do_filp_open+0x264/0x4f0 fs/namei.c:3718 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_open fs/open.c:1334 [inline] __se_sys_open fs/open.c:1330 [inline] __x64_sys_open+0x221/0x270 fs/open.c:1330 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd If the MFT record of ntfs inode is not a base record, inode->i_op can be NULL. And a null-ptr-deref may happen: ntfs_lookup() dir_search_u() # inode->i_op is set to NULL d_splice_alias() __d_add() d_flags_for_inode() # inode->i_op->get_link null-ptr-deref Fix this by adding a Check on inode->i_op before calling the d_splice_alias() function. Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Reported-by: syzbot+a8f26a403c169b7593fe@syzkaller.appspotmail.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Add check for kmemdupJiasheng Jiang
Since the kmemdup may return NULL pointer, it should be better to add check for the return value in order to avoid NULL pointer dereference. Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2023-03-27fs/ntfs3: Fix memory leak if ntfs_read_mft failedChen Zhongjin
Label ATTR_ROOT in ntfs_read_mft() sets is_root = true and ni->ni_flags |= NI_FLAG_DIR, then next attr will goto label ATTR_ALLOC and alloc ni->dir.alloc_run. However two states are not always consistent and can make memory leak. 1) attr_name in ATTR_ROOT does not fit the condition it will set is_root = true but NI_FLAG_DIR is not set. 2) next attr_name in ATTR_ALLOC fits the condition and alloc ni->dir.alloc_run 3) in cleanup function ni_clear(), when NI_FLAG_DIR is set, it frees ni->dir.alloc_run, otherwise it frees ni->file.run 4) because NI_FLAG_DIR is not set in this case, ni->dir.alloc_run is leaked as kmemleak reported: unreferenced object 0xffff888003bc5480 (size 64): backtrace: [<000000003d42e6b0>] __kmalloc_node+0x4e/0x1c0 [<00000000d8e19b8a>] kvmalloc_node+0x39/0x1f0 [<00000000fc3eb5b8>] run_add_entry+0x18a/0xa40 [ntfs3] [<0000000011c9f978>] run_unpack+0x75d/0x8e0 [ntfs3] [<00000000e7cf1819>] run_unpack_ex+0xbc/0x500 [ntfs3] [<00000000bbf0a43d>] ntfs_iget5+0xb25/0x2dd0 [ntfs3] [<00000000a6e50693>] ntfs_fill_super+0x218d/0x3580 [ntfs3] [<00000000b9170608>] get_tree_bdev+0x3fb/0x710 [<000000004833798a>] vfs_get_tree+0x8e/0x280 [<000000006e20b8e6>] path_mount+0xf3c/0x1930 [<000000007bf15a5f>] do_mount+0xf3/0x110 ... Fix this by always setting is_root and NI_FLAG_DIR together. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>