summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-02-07fanotify: define struct members to hold response decision contextRichard Guy Briggs
This patch adds a flag, FAN_INFO and an extensible buffer to provide additional information about response decisions. The buffer contains one or more headers defining the information type and the length of the following information. The patch defines one additional information type, FAN_RESPONSE_INFO_AUDIT_RULE, to audit a rule number. This will allow for the creation of other information types in the future if other users of the API identify different needs. The kernel can be tested if it supports a given info type by supplying the complete info extension but setting fd to FAN_NOFD. It will return the expected size but not issue an audit record. Suggested-by: Steve Grubb <sgrubb@redhat.com> Link: https://lore.kernel.org/r/2745105.e9J7NaK4W3@x2 Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201001101219.GE17860@quack2.suse.cz Tested-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <10177cfcae5480926b7176321a28d9da6835b667.1675373475.git.rgb@redhat.com>
2023-02-07fanotify: Ensure consistent variable type for responseRichard Guy Briggs
The user space API for the response variable is __u32. This patch makes sure that the whole path through the kernel uses u32 so that there is no sign extension or truncation of the user space response. Suggested-by: Steve Grubb <sgrubb@redhat.com> Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Tested-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>
2023-02-07udf: remove reporting loc in debug outputTom Rix
clang build fails with fs/udf/partition.c:86:28: error: variable 'loc' is uninitialized when used here [-Werror,-Wuninitialized] sb, block, partition, loc, index); ^~~ loc is now only known when bh is valid. So remove reporting loc in debug output. Fixes: 4215db46d538 ("udf: Use udf_bread() in udf_get_pblock_virt15()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2023-02-07udf: Check consistency of Space Bitmap DescriptorVladislav Efanov
Bits, which are related to Bitmap Descriptor logical blocks, are not reset when buffer headers are allocated for them. As the result, these logical blocks can be treated as free and be used for other blocks.This can cause usage of one buffer header for several types of data. UDF issues WARNING in this situation: WARNING: CPU: 0 PID: 2703 at fs/udf/inode.c:2014 __udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 RIP: 0010:__udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 Call Trace: udf_setup_indirect_aext+0x573/0x880 fs/udf/inode.c:1980 udf_add_aext+0x208/0x2e0 fs/udf/inode.c:2067 udf_insert_aext fs/udf/inode.c:2233 [inline] udf_update_extents fs/udf/inode.c:1181 [inline] inode_getblk+0x1981/0x3b70 fs/udf/inode.c:885 Found by Linux Verification Center (linuxtesting.org) with syzkaller. [JK: Somewhat cleaned up the boundary checks] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru> Signed-off-by: Jan Kara <jack@suse.cz>
2023-02-06cifs: Fix use-after-free in rdata->read_into_pages()ZhaoLong Wang
When the network status is unstable, use-after-free may occur when read data from the server. BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0 Call Trace: <TASK> dump_stack_lvl+0x38/0x4c print_report+0x16f/0x4a6 kasan_report+0xb7/0x130 readpages_fill_pages+0x14c/0x7e0 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 </TASK> Allocated by task 2535: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x82/0x90 cifs_readdata_direct_alloc+0x2c/0x110 cifs_readdata_alloc+0x2d/0x60 cifs_readahead+0x393/0xfe0 read_pages+0x12f/0x470 page_cache_ra_unbounded+0x1b1/0x240 filemap_get_pages+0x1c8/0x9a0 filemap_read+0x1c0/0x540 cifs_strict_readv+0x21b/0x240 vfs_read+0x395/0x4b0 ksys_read+0xb8/0x150 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 79: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 __kasan_slab_free+0x10e/0x1a0 __kmem_cache_free+0x7a/0x1a0 cifs_readdata_release+0x49/0x60 process_one_work+0x46c/0x760 worker_thread+0x2a4/0x6f0 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 Last potentially related work creation: kasan_save_stack+0x22/0x50 __kasan_record_aux_stack+0x95/0xb0 insert_work+0x2b/0x130 __queue_work+0x1fe/0x660 queue_work_on+0x4b/0x60 smb2_readv_callback+0x396/0x800 cifs_abort_connection+0x474/0x6a0 cifs_reconnect+0x5cb/0xa50 cifs_readv_from_socket.cold+0x22/0x6c cifs_read_page_from_socket+0xc1/0x100 readpages_fill_pages.cold+0x2f/0x46 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 The following function calls will cause UAF of the rdata pointer. readpages_fill_pages cifs_read_page_from_socket cifs_readv_from_socket cifs_reconnect __cifs_reconnect cifs_abort_connection mid->callback() --> smb2_readv_callback queue_work(&rdata->work) # if the worker completes first, # the rdata is freed cifs_readv_complete kref_put cifs_readdata_release kfree(rdata) return rdata->... # UAF in readpages_fill_pages() Similarly, this problem also occurs in the uncache_fill_pages(). Fix this by adjusts the order of condition judgment in the return statement. Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-06btrfs: simplify update of last_dir_index_offset when logging a directoryFilipe Manana
When logging a directory, we always set the inode's last_dir_index_offset to the offset of the last dir index item we found. This is using an extra field in the log context structure, and it makes more sense to update it only after we insert dir index items, and we could directly update the inode's last_dir_index_offset field instead. So make this simpler by updating the inode's last_dir_index_offset only when we actually insert dir index keys in the log tree, and getting rid of the last_dir_item_offset field in the log context structure. Reported-by: David Arendt <admin@prnet.org> Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/ Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com> Link: https://lore.kernel.org/linux-btrfs/Y8voyTXdnPDz8xwY@mail.gmail.com/ Reported-by: Hunter Wardlaw <wardlawhunter@gmail.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216851 CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-06Merge tag 'for-6.2-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - explicitly initialize zlib work memory to fix a KCSAN warning - limit number of send clones by maximum memory allocated - limit device size extent in case it device shrink races with chunk allocation - raid56 fixes: - fix copy&paste error in RAID6 stripe recovery - make error bitmap update atomic * tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: raid56: make error_bitmap update atomic btrfs: send: limit number of clones and allocated memory size btrfs: zlib: zero-initialize zlib workspace btrfs: limit device extents to the device size btrfs: raid56: fix stripes if vertical errors are found
2023-02-05f2fs: fix f2fs_show_options to show nogc_merge mount optionYangtao Li
Commit 5911d2d1d1a3 ("f2fs: introduce gc_merge mount option") forgot to show nogc_merge option, let's fix it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-05f2fs: fix cgroup writeback accounting with fs-layer encryptionEric Biggers
When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), f2fs encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support") Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-05f2fs: fix wrong calculation of block ageqixiaoyu1
Currently we wrongly calculate the new block age to old * LAST_AGE_WEIGHT / 100. Fix it to new * (100 - LAST_AGE_WEIGHT) / 100 + old * LAST_AGE_WEIGHT / 100. Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull ELF fix from Al Viro: "One of the many equivalent build warning fixes for !CONFIG_ELF_CORE configs. Geert's is the earliest one I've been able to find" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: Move dump_emit_page() to kill unused warning
2023-02-05xfs: don't use BMBT btree split workers for IO completionDave Chinner
When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_refcount_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_phys_extent describe physical mappings, so rename them "pmap". xfs_refcount_intents describe refcount intents, so rename them "ri". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass refcount intent directly through the log intent codeDarrick J. Wong
Pass the incore refcount intent through the CUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_rmap_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_rmap_intents describe block mapping intents, so rename them "ri". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass rmap space mapping directly through the log intent codeDarrick J. Wong
Pass the incore rmap space mapping through the RUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing xfs_extent_item variable namesDarrick J. Wong
Change the name of all pointers to xfs_extent_item structures to "xefi" to make the name consistent and because the current selections ("new" and "free") mean other things in C. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass xfs_extent_free_item directly through the log intent codeDarrick J. Wong
Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_bmap_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_bmap_intents describe block mapping intents, so rename them "bi". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass the xfs_bmbt_irec directly through the log intent codeDarrick J. Wong
Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: use strscpy() to instead of strncpy()Xu Panda
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05kbuild: remove --include-dir MAKEFLAG from top MakefileMasahiro Yamada
I added $(srctree)/ to some included Makefiles in the following commits: - 3204a7fb98a3 ("kbuild: prefix $(srctree)/ to some included Makefiles") - d82856395505 ("kbuild: do not require sub-make for separate output tree builds") They were a preparation for removing --include-dir flag. I have never thought --include-dir useful. Rather, it _is_ harmful. For example, run the following commands: $ make -s ARCH=x86 mrproper defconfig $ make ARCH=arm O=foo dtbs make[1]: Entering directory '/tmp/linux/foo' HOSTCC scripts/basic/fixdep Error: kernelrelease not valid - run 'make prepare' to update it UPD include/config/kernel.release make[1]: Leaving directory '/tmp/linux/foo' The first command configures the source tree for x86. The next command tries to build ARM device trees in the separate foo/ directory - this must stop because the directory foo/ has not been configured yet. However, due to --include-dir=$(abs_srctree), the top Makefile includes the wrong include/config/auto.conf from the source tree and continues building. Kbuild traverses the directory tree, but of course it does not work correctly. The Error message is also pointless - 'make prepare' does not help at all for fixing the issue. This commit fixes more arch Makefile, and finally removes --include-dir from the top Makefile. There are more breakages under drivers/, but I do not volunteer to fix them all. I just moved --include-dir to drivers/Makefile. With this commit, the second command will stop with a sensible message. $ make -s ARCH=x86 mrproper defconfig $ make ARCH=arm O=foo dtbs make[1]: Entering directory '/tmp/linux/foo' SYNC include/config/auto.conf.cmd *** *** The source tree is not clean, please run 'make ARCH=arm mrproper' *** in /tmp/linux *** make[2]: *** [../Makefile:646: outputmakefile] Error 1 /tmp/linux/Makefile:770: include/config/auto.conf.cmd: No such file or directory make[1]: *** [/tmp/linux/Makefile:793: include/config/auto.conf.cmd] Error 2 make[1]: Leaving directory '/tmp/linux/foo' make: *** [Makefile:226: __sub-make] Error 2 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-02-03revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"Andrew Morton
This fix was nacked by Philip, for reasons identified in the email linked below. Link: https://lkml.kernel.org/r/68f15d67-8945-2728-1f17-5b53a80ec52d@squashfs.org.uk Fixes: 72e544b1b28325 ("squashfs: harden sanity check in squashfs_read_xattr_id_table") Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Fedor Pchelkin <pchelkin@ispras.ru> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03fsdax: dax_unshare_iter() should return a valid lengthShiyang Ruan
The copy_mc_to_kernel() will return 0 if it executed successfully. Then the return value should be set to the length it copied. [akpm@linux-foundation.org: don't mess up `ret', per Matthew] Link: https://lkml.kernel.org/r/1675341227-14-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03aio: fix mremap after fork null-derefSeth Jenkins
Commit e4a0d3e720e7 ("aio: Make it possible to remap aio ring") introduced a null-deref if mremap is called on an old aio mapping after fork as mm->ioctx_table will be set to NULL. [jmoyer@redhat.com: fix 80 column issue] Link: https://lkml.kernel.org/r/x49sffq4nvg.fsf@segfault.boston.devel.redhat.com Fixes: e4a0d3e720e7 ("aio: Make it possible to remap aio ring") Signed-off-by: Seth Jenkins <sethjenkins@google.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Jann Horn <jannh@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03Merge tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A safeguard to prevent the kernel client from further damaging the filesystem after running into a case of an invalid snap trace. The root cause of this metadata corruption is still being investigated but it appears to be stemming from the MDS. As such, this is the best we can do for now" * tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-client: ceph: blocklist the kclient when receiving corrupted snap trace ceph: move mount state enum to super.h
2023-02-03Merge tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes, mainly for MM. 13 are cc:stable" * tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kconfig.debug: fix the help description in SCHED_DEBUG mm/swapfile: add cond_resched() in get_swap_pages() mm: use stack_depot_early_init for kmemleak Squashfs: fix handling and sanity checking of xattr_ids count sh: define RUNTIME_DISCARD_EXIT highmem: round down the address passed to kunmap_flush_on_unmap() migrate: hugetlb: check for hugetlb shared PMD in node migration mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Revert "mm: kmemleak: alloc gray object for reserved region with direct map" freevxfs: Kconfig: fix spelling maple_tree: should get pivots boundary by type .mailmap: update e-mail address for Eugen Hristev mm, mremap: fix mremap() expanding for vma's with vm_ops->close() squashfs: harden sanity check in squashfs_read_xattr_id_table ia64: fix build error due to switch case label appearing next to declaration mm: multi-gen LRU: fix crash during cgroup migration Revert "mm: add nodes= arg to memory.reclaim" zsmalloc: fix a race with deferred_handles storing ...
2023-02-03splice: use bvec_set_page to initialize a bvecChristoph Hellwig
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230203150634.3199647-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03orangefs: use bvec_set_{page,folio} to initialize bvecsChristoph Hellwig
Use the bvec_set_page and bvec_set_folio helpers to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230203150634.3199647-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03nfs: use bvec_set_page to initialize bvecsChristoph Hellwig
Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com> Link: https://lore.kernel.org/r/20230203150634.3199647-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03coredump: use bvec_set_page to initialize a bvecChristoph Hellwig
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230203150634.3199647-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03cifs: use bvec_set_page to initialize bvecsChristoph Hellwig
Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Link: https://lore.kernel.org/r/20230203150634.3199647-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03ceph: use bvec_set_page to initialize a bvecChristoph Hellwig
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230203150634.3199647-13-hch@lst.de Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03afs: use bvec_set_folio to initialize a bvecChristoph Hellwig
Use the bvec_set_folio helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230203150634.3199647-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-02nilfs2: prevent WARNING in nilfs_dat_commit_end()Ryusuke Konishi
If nilfs2 reads a corrupted disk image and its DAT metadata file contains invalid lifetime data for a virtual block number, a kernel warning can be generated by the WARN_ON check in nilfs_dat_commit_end() and can panic if the kernel is booted with panic_on_warn. This patch avoids the issue with a sanity check that treats it as an error. Since error return is not allowed in the execution phase of nilfs_dat_commit_end(), this inserts that sanity check in nilfs_dat_prepare_end(), which prepares for nilfs_dat_commit_end(). As the error code, -EINVAL is returned to notify bmap layer of the metadata corruption. When the bmap layer sees this code, it handles the abnormal situation and replaces the return code with -EIO as it should. Link: https://lkml.kernel.org/r/000000000000154d2c05e9ec7df6@google.com Link: https://lkml.kernel.org/r/20230127132202.6083-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+cbff7a52b6f99059e67f@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02nilfs2: replace WARN_ONs for invalid DAT metadata block requestsRyusuke Konishi
If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated. This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should. Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs: gracefully handle ->get_block not mapping bh in __mpage_writepageJan Kara
When filesystem's ->get_block function does not map the buffer head when called from __mpage_writepage(), __mpage_writepage() will happily go and pass bogus bdev and block number to bio allocation routines which leads to crashes sooner or later. E.g. UDF can do this because it doesn't want to allocate blocks from ->writepages callbacks. It allocates blocks on write or page fault but writeback can still spot dirty buffers without underlying blocks allocated e.g. if blocksize < pagesize, the tail page is dirtied (which means all its buffers are dirtied), and truncate extends the file so that some buffer starts to be within i_size. Link: https://lkml.kernel.org/r/20230126085155.26395-1-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02cramfs: Kconfig: fix spelling & punctuationRandy Dunlap
Fix spelling and hyphenation in cramfs Kconfig. Link: https://lkml.kernel.org/r/20230124181631.15204-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs/ext4: use try_cmpxchg in ext4_update_bh_stateUros Bizjak
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in ext4_update_bh_state. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Link: https://lkml.kernel.org/r/20221102071147.6642-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs: hfsplus: initialize fsdata in hfsplus_file_truncate()Alexander Potapenko
When aops->write_begin() does not initialize fsdata, KMSAN may report an error passing the latter to aops->write_end(). Fix this by unconditionally initializing fsdata. Link: https://lkml.kernel.org/r/20221121112134.407362-5-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs: hfs: initialize fsdata in hfs_file_truncate()Alexander Potapenko
When aops->write_begin() does not initialize fsdata, KMSAN may report an error passing the latter to aops->write_end(). Fix this by unconditionally initializing fsdata. Link: https://lkml.kernel.org/r/20221121112134.407362-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fat: fix return value of vfat_bad_char() and vfat_replace_char() functionsPali Rohár
These functions returns boolean value not wide character. Link: https://lkml.kernel.org/r/20221226142512.13848-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02ntfs: fix multiple kernel-doc warningsRandy Dunlap
Fix many W=1 kernel-doc warnings in fs/ntfs/: fs/ntfs/aops.c:30: warning: Incorrect use of kernel-doc format: * ntfs_end_buffer_async_read - async io completion for reading attributes fs/ntfs/aops.c:46: warning: expecting prototype for aops.c(). Prototype was for ntfs_end_buffer_async_read() instead fs/ntfs/aops.c:1655: warning: cannot understand function prototype: 'const struct address_space_operations ntfs_normal_aops = ' fs/ntfs/aops.c:1670: warning: cannot understand function prototype: 'const struct address_space_operations ntfs_compressed_aops = ' fs/ntfs/aops.c:1685: warning: cannot understand function prototype: 'const struct address_space_operations ntfs_mst_aops = ' fs/ntfs/compress.c:22: warning: Incorrect use of kernel-doc format: * ntfs_compression_constants - enum of constants used in the compression code fs/ntfs/compress.c:24: warning: cannot understand function prototype: 'typedef enum ' fs/ntfs/compress.c:47: warning: cannot understand function prototype: 'u8 *ntfs_compression_buffer; ' fs/ntfs/compress.c:52: warning: expecting prototype for ntfs_cb_lock(). Prototype was for DEFINE_SPINLOCK() instead fs/ntfs/dir.c:21: warning: Incorrect use of kernel-doc format: * The little endian Unicode string $I30 as a global constant. fs/ntfs/dir.c:23: warning: cannot understand function prototype: 'ntfschar I30[5] = ' fs/ntfs/inode.c:31: warning: Incorrect use of kernel-doc format: * ntfs_test_inode - compare two (possibly fake) inodes for equality fs/ntfs/inode.c:47: warning: expecting prototype for inode.c(). Prototype was for ntfs_test_inode() instead fs/ntfs/inode.c:2956: warning: expecting prototype for ntfs_write_inode(). Prototype was for __ntfs_write_inode() instead fs/ntfs/mft.c:24: warning: expecting prototype for mft.c - NTFS kernel mft record operations. Part of the Linux(). Prototype was for MAX_BHS() instead fs/ntfs/namei.c:263: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Inode operations for directories. fs/ntfs/namei.c:368: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Export operations allowing NFS exporting of mounted NTFS partitions. fs/ntfs/runlist.c:16: warning: Incorrect use of kernel-doc format: * ntfs_rl_mm - runlist memmove fs/ntfs/runlist.c:22: warning: expecting prototype for runlist.c - NTFS runlist handling code. Part of the Linux(). Prototype was for ntfs_rl_mm() instead fs/ntfs/super.c:61: warning: missing initial short description on line: * simple_getbool - fs/ntfs/super.c:2661: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * The complete super operations. Link: https://lkml.kernel.org/r/20230109010041.21442-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02freevxfs: fix kernel-doc warningsRandy Dunlap
Fix multiple kernel-doc warnings in freevxfs: fs/freevxfs/vxfs_subr.c:45: warning: Function parameter or member 'mapping' not described in 'vxfs_get_page' fs/freevxfs/vxfs_subr.c:45: warning: Excess function parameter 'ip' description in 'vxfs_get_page' 2 warnings fs/freevxfs/vxfs_subr.c:101: warning: expecting prototype for vxfs_get_block(). Prototype was for vxfs_getblk() instead fs/freevxfs/vxfs_super.c:184: warning: expecting prototype for vxfs_read_super(). Prototype was for vxfs_fill_super() instead Link: https://lkml.kernel.org/r/20230109022915.17504-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02proc: mark /proc/cmdline as permanentAlexey Dobriyan
/proc/cmdline is never removed, mark is as permanent for slightly faster open and close. Link: https://lkml.kernel.org/r/Y66xAveh2yUsP7m9@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02hfsplus: remove unnecessary variable initializationXU pengfei
Variables are assigned first and then used. Initialization is not required. [akpm@linux-foundation.org: give hfsplus_listxattr:key_len narrower scope] Link: https://lkml.kernel.org/r/20221221032119.10037-1-xupengfei@nfschina.com Signed-off-by: XU pengfei <xupengfei@nfschina.com> Reviewed-by: Andrew Morton <akpm@linux-foudation.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02hfs: fix missing hfs_bnode_get() in __hfs_bnode_createLiu Shixin
Syzbot found a kernel BUG in hfs_bnode_put(): kernel BUG at fs/hfs/bnode.c:466! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 3634 Comm: kworker/u4:5 Not tainted 6.1.0-rc7-syzkaller-00190-g97ee9d1c1696 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:hfs_bnode_put+0x46f/0x480 fs/hfs/bnode.c:466 Code: 8a 80 ff e9 73 fe ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a0 fe ff ff 48 89 df e8 db 8a 80 ff e9 93 fe ff ff e8 a1 68 2c ff <0f> 0b e8 9a 68 2c ff 0f 0b 0f 1f 84 00 00 00 00 00 55 41 57 41 56 RSP: 0018:ffffc90003b4f258 EFLAGS: 00010293 RAX: ffffffff825e318f RBX: 0000000000000000 RCX: ffff8880739dd7c0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90003b4f430 R08: ffffffff825e2d9b R09: ffffed10045157d1 R10: ffffed10045157d1 R11: 1ffff110045157d0 R12: ffff8880228abe80 R13: ffff88807016c000 R14: dffffc0000000000 R15: ffff8880228abe00 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa6ebe88718 CR3: 000000001e93d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> hfs_write_inode+0x1bc/0xb40 write_inode fs/fs-writeback.c:1440 [inline] __writeback_single_inode+0x4d6/0x670 fs/fs-writeback.c:1652 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1878 __writeback_inodes_wb+0x125/0x420 fs/fs-writeback.c:1949 wb_writeback+0x440/0x7b0 fs/fs-writeback.c:2054 wb_check_start_all fs/fs-writeback.c:2176 [inline] wb_do_writeback fs/fs-writeback.c:2202 [inline] wb_workfn+0x827/0xef0 fs/fs-writeback.c:2235 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> The BUG_ON() is triggered at here: /* Dispose of resources used by a node */ void hfs_bnode_put(struct hfs_bnode *node) { if (node) { <skipped> BUG_ON(!atomic_read(&node->refcnt)); <- we have issue here!!!! <skipped> } } By tracing the refcnt, I found the node is created by hfs_bmap_alloc() with refcnt 1. Then the node is used by hfs_btree_write(). There is a missing of hfs_bnode_get() after find the node. The issue happened in following path: <alloc> hfs_bmap_alloc hfs_bnode_find __hfs_bnode_create <- allocate a new node with refcnt 1. hfs_bnode_put <- decrease the refcnt <write> hfs_btree_write hfs_bnode_find __hfs_bnode_create hfs_bnode_findhash <- find the node without refcnt increased. hfs_bnode_put <- trigger the BUG_ON() since refcnt is 0. Link: https://lkml.kernel.org/r/20221212021627.3766829-1-liushixin2@huawei.com Reported-by: syzbot+5b04b49a7ec7226c7426@syzkaller.appspotmail.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mpage: convert __mpage_writepage() to use a folio more fullyMatthew Wilcox (Oracle)
This is just a conversion to the folio API. While there are some nods towards supporting multi-page folios in here, the blocks array is still sized for one page's worth of blocks, and there are other assumptions such as the blocks_per_page variable. [willy@infradead.org: fix accidentally-triggering WARN_ON_ONCE] Link: https://lkml.kernel.org/r/Y9kuaBgXf9lKJ8b0@casper.infradead.org Link: https://lkml.kernel.org/r/20230126201255.1681189-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs: convert writepage_t callback to pass a folioMatthew Wilcox (Oracle)
Patch series "Convert writepage_t to use a folio". More folioisation. I split out the mpage work from everything else because it completely dominated the patch, but some implementations I just converted outright. This patch (of 2): We always write back an entire folio, but that's currently passed as the head page. Convert all filesystems that use write_cache_pages() to expect a folio instead of a page. Link: https://lkml.kernel.org/r/20230126201255.1681189-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230126201255.1681189-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mpage: stop using bdev_{read,write}_pageChristoph Hellwig
Patch series "remove ->rw_page". This series removes the ->rw_page block_device_operation, which is an old and clumsy attempt at a simple read/write fast path for the block layer. It isn't actually used by the fastest block layer operations that we support (polled I/O through io_uring), but only used by the mpage buffered I/O helpers which are some of the slowest I/O we have and do not make any difference there at all, and zram which is a block device abused to duplicate the zram functionality. Given that zram is heavily used we need to make sure there is a good replacement for synchronous I/O, so this series adds a new flag for drivers that complete I/O synchronously and uses that flag to use on-stack bios and synchronous submission for them in the swap code. This patch (of 7): These are micro-optimizations for synchronous I/O, which do not matter compared to all the other inefficiencies in the legacy buffer_head based mpage code. Link: https://lkml.kernel.org/r/20230125133436.447864-1-hch@lst.de Link: https://lkml.kernel.org/r/20230125133436.447864-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>