summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-09-25io_uring: ensure async buffered read-retry is setup properlyJens Axboe
A previous commit for fixing up short reads botched the async retry path, so we ended up going to worker threads more often than we should. Fix this up, so retries work the way they originally were intended to. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Reported-by: Hao_Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25efivarfs: Replace invalid slashes with exclamation marks in dentries.Michael Schaller
Without this patch efivarfs_alloc_dentry creates dentries with slashes in their name if the respective EFI variable has slashes in its name. This in turn causes EIO on getdents64, which prevents a complete directory listing of /sys/firmware/efi/efivars/. This patch replaces the invalid shlashes with exclamation marks like kobject_set_name_vargs does for /sys/firmware/efi/vars/ to have consistently named dentries under /sys/firmware/efi/vars/ and /sys/firmware/efi/efivars/. Signed-off-by: Michael Schaller <misch@google.com> Link: https://lore.kernel.org/r/20200925074502.150448-1-misch@google.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-09-25xfs: remove deprecated sysctl optionsPavel Reichl
These optionr were for Irix compatibility, probably for clustered XFS clients in a heterogenous cluster which contained both Irix & Linux machines, so that behavior would be consistent. That doesn't exist anymore and it's no longer needed. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: actually state when the sysctls go away] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25xfs: remove deprecated mount optionsPavel Reichl
ikeep/noikeep was a workaround for old DMAPI code which is no longer relevant. attr2/noattr2 - is for controlling upgrade behaviour from fixed attribute fork sizes in the inode (attr1) and dynamic attribute fork sizes (attr2). mkfs has defaulted to setting attr2 since 2007, hence just about every XFS filesystem out there in production right now uses attr2. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix minor typos] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25xfs: directly call xfs_generic_create() for ->create() and ->mkdir()Kaixu Xia
The current create and mkdir handlers both call the xfs_vn_mknod() which is a wrapper routine around xfs_generic_create() function. Actually the create and mkdir handlers can directly call xfs_generic_create() function and reduce the call chain. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25xfs: avoid shared rmap operations for attr fork extentsDarrick J. Wong
During code review, I noticed that the rmap code uses the (slower) shared mappings rmap functions for any extent of a reflinked file, even if those extents are for the attr fork, which doesn't support sharing. We can speed up rmap a tiny bit by optimizing out this case. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: drop the obsolete comment on filestream lockingGao Xiang
Since commit 1c1c6ebcf52 ("xfs: Replace per-ag array with a radix tree"), there is no m_peraglock anymore, so it's hard to understand the described situation since per-ag is no longer an array and no need to reallocate, call xfs_filestream_flush() in growfs. In addition, the race condition for shrink feature is quite confusing to me currently as well. Get rid of it instead. Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: code cleanup in xfs_attr_leaf_entsize_{remote,local}Kaixu Xia
Cleanup the typedef usage, the unnecessary parentheses, the unnecessary backslash and use the open-coded round_up call in xfs_attr_leaf_entsize_{remote,local}. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: do the assert for all the log done items in xfs_trans_cancelKaixu Xia
We should do the assert for all the log intent-done items if they appear here. This patch detect intent-done items by the fact that their item ops don't have iop_unpin and iop_push methods and also move the helper xlog_item_is_intent to xfs_trans.h. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25xfs: remove the unused parameter id from xfs_qm_dqattach_oneKaixu Xia
Since we never use the second parameter id, so remove it from xfs_qm_dqattach_one() function. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: remove the redundant crc feature check in xfs_attr3_rmt_verifyKaixu Xia
We already check whether the crc feature is enabled before calling xfs_attr3_rmt_verify(), so remove the redundant feature check in that function. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: fix some commentsKaixu Xia
Fix the comments to help people understand the code. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> [darrick: fix the indenting problems too] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: remove the unnecessary xfs_dqid_t type castKaixu Xia
Since the type prid_t and xfs_dqid_t both are uint32_t, seems the type cast is unnecessary, so remove it. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: use the existing type definition for di_projidKaixu Xia
We have already defined the project ID type prid_t, so maybe should use it here. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25xfs: remove the unused SYNCHRONIZE macroKaixu Xia
There are no callers of the SYNCHRONIZE() macro, so remove it. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-25iov_iter: move rw_copy_check_uvector() into lib/iov_iter.cDavid Laight
This lets the compiler inline it into import_iovec() generating much better code. Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-25io_uring: don't unconditionally set plug->nowait = trueJens Axboe
This causes all the bios to be submitted with REQ_NOWAIT, which can be problematic on either btrfs or on file systems that otherwise use a mix of block devices where only some of them support it. For now, just remove the setting of plug->nowait = true. Reported-by: Dan Melnic <dmm@fb.com> Reported-by: Brian Foster <bfoster@redhat.com> Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishingJosef Bacik
We need to move the closing of the src_device out of all the device replace locking, but we definitely want to zero out the superblock before we commit the last time to make sure the device is properly removed. Handle this by pushing btrfs_scratch_superblocks into btrfs_dev_replace_finishing, and then later on we'll move the src_device closing and freeing stuff where we need it to be. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-25block: add a bdev_is_partition helperChristoph Hellwig
Add a littler helper to make the somewhat arcane bd_contains checks a little more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25io_uring: ensure open/openat2 name is cleaned on cancelationJens Axboe
If we cancel these requests, we'll leak the memory associated with the filename. Add them to the table of ops that need cleaning, if REQ_F_NEED_CLEANUP is set. Cc: stable@vger.kernel.org Fixes: e62753e4e292 ("io_uring: call statx directly") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25quota: clear padding in v2r1_mem2diskdqb()Eric Dumazet
Freshly allocated memory contains garbage, better make sure to init all struct v2r1_disk_dqblk fields to avoid KMSAN report: BUG: KMSAN: uninit-value in qtree_entry_unused+0x137/0x1b0 fs/quota/quota_tree.c:218 CPU: 0 PID: 23373 Comm: syz-executor.1 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219 qtree_entry_unused+0x137/0x1b0 fs/quota/quota_tree.c:218 v2r1_mem2diskdqb+0x43d/0x710 fs/quota/quota_v2.c:285 qtree_write_dquot+0x226/0x870 fs/quota/quota_tree.c:394 v2_write_dquot+0x1ad/0x280 fs/quota/quota_v2.c:333 dquot_commit+0x4af/0x600 fs/quota/dquot.c:482 ext4_write_dquot fs/ext4/super.c:5934 [inline] ext4_mark_dquot_dirty+0x4d8/0x6a0 fs/ext4/super.c:5985 mark_dquot_dirty fs/quota/dquot.c:347 [inline] mark_all_dquot_dirty fs/quota/dquot.c:385 [inline] dquot_alloc_inode+0xc05/0x12b0 fs/quota/dquot.c:1755 __ext4_new_inode+0x8204/0x9d70 fs/ext4/ialloc.c:1155 ext4_tmpfile+0x41a/0x850 fs/ext4/namei.c:2686 vfs_tmpfile+0x2a2/0x570 fs/namei.c:3283 do_tmpfile fs/namei.c:3316 [inline] path_openat+0x4035/0x6a90 fs/namei.c:3359 do_filp_open+0x2b8/0x710 fs/namei.c:3395 do_sys_openat2+0xa88/0x1140 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_compat_sys_openat fs/open.c:1242 [inline] __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240 __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline] __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c RIP: 0023:0xf7ff4549 Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000f55cd0cc EFLAGS: 00000296 ORIG_RAX: 0000000000000127 RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 0000000020000000 RDX: 0000000000410481 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2916 [inline] __kmalloc+0x2bb/0x4b0 mm/slub.c:3982 kmalloc include/linux/slab.h:559 [inline] getdqbuf+0x56/0x150 fs/quota/quota_tree.c:52 qtree_write_dquot+0xf2/0x870 fs/quota/quota_tree.c:378 v2_write_dquot+0x1ad/0x280 fs/quota/quota_v2.c:333 dquot_commit+0x4af/0x600 fs/quota/dquot.c:482 ext4_write_dquot fs/ext4/super.c:5934 [inline] ext4_mark_dquot_dirty+0x4d8/0x6a0 fs/ext4/super.c:5985 mark_dquot_dirty fs/quota/dquot.c:347 [inline] mark_all_dquot_dirty fs/quota/dquot.c:385 [inline] dquot_alloc_inode+0xc05/0x12b0 fs/quota/dquot.c:1755 __ext4_new_inode+0x8204/0x9d70 fs/ext4/ialloc.c:1155 ext4_tmpfile+0x41a/0x850 fs/ext4/namei.c:2686 vfs_tmpfile+0x2a2/0x570 fs/namei.c:3283 do_tmpfile fs/namei.c:3316 [inline] path_openat+0x4035/0x6a90 fs/namei.c:3359 do_filp_open+0x2b8/0x710 fs/namei.c:3395 do_sys_openat2+0xa88/0x1140 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_compat_sys_openat fs/open.c:1242 [inline] __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240 __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline] __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Fixes: 498c60153ebb ("quota: Implement quota format with 64-bit space and inode limits") Link: https://lore.kernel.org/r/20200924183619.4176790-1-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jan Kara <jack@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-24ep_create_wakeup_source(): dentry name can change under you...Al Viro
or get freed, for that matter, if it's a long (separately stored) name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-24bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flagChristoph Hellwig
Replace the two negative flags that are always used together with a single positive flag that indicates the writeback capability instead of two related non-capabilities. Also remove the pointless wrappers to just check the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24bdi: invert BDI_CAP_NO_ACCT_WBChristoph Hellwig
Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to make the checks more obvious. Also remove the pointless bdi_cap_account_writeback wrapper that just obsfucates the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flagChristoph Hellwig
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24bdi: remove BDI_CAP_CGROUP_WRITEBACKChristoph Hellwig
Just checking SB_I_CGROUPWB for cgroup writeback support is enough. Either the file system allocates its own bdi (e.g. btrfs), in which case it is known to support cgroup writeback, or the bdi comes from the block layer, which always supports cgroup writeback. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24bdi: initialize ->ra_pages and ->io_pages in bdi_initChristoph Hellwig
Set up a readahead size by default, as very few users have a good reason to change it. This means code, ecryptfs, and orangefs now set up the values while they were previously missing it, while ubifs, mtd and vboxsf manually set it to 0 to avoid readahead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24fs: remove the unused SB_I_MULTIROOT flagChristoph Hellwig
The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c28 ("NFS: Add fs_context support.") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24NFSv4: make cache consistency bitmask dynamicOlga Kornievskaia
Client uses static bitmask for GETATTR on CLOSE/WRITE/DELEGRETURN and ignores the fact that it might have some attributes marked invalid in its cache. Compared to v3 where all attributes are retrieved in postop attributes, v4's cache is frequently out of sync and leads to standalone GETATTRs being sent to the server. Instead, in addition to the minimum cache consistency attributes also check cache_validity and adjust the GETATTR request accordingly. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-24nfs: fix spellint typo in pnfs.cWang Qing
Change the comment typo: "manger" -> "manager". Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-23fscrypt: rename DCACHE_ENCRYPTED_NAME to DCACHE_NOKEY_NAMEEric Biggers
Originally we used the term "encrypted name" or "ciphertext name" to mean the encoded filename that is shown when an encrypted directory is listed without its key. But these terms are ambiguous since they also mean the filename stored on-disk. "Encrypted name" is especially ambiguous since it could also be understood to mean "this filename is encrypted on-disk", similar to "encrypted file". So we've started calling these encoded names "no-key names" instead. Therefore, rename DCACHE_ENCRYPTED_NAME to DCACHE_NOKEY_NAME to avoid confusion about what this flag means. Link: https://lore.kernel.org/r/20200924042624.98439-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-23fscrypt: don't call no-key names "ciphertext names"Eric Biggers
Currently we're using the term "ciphertext name" ambiguously because it can mean either the actual ciphertext filename, or the encoded filename that is shown when an encrypted directory is listed without its key. The latter we're now usually calling the "no-key name"; and while it's derived from the ciphertext name, it's not the same thing. To avoid this ambiguity, rename fscrypt_name::is_ciphertext_name to fscrypt_name::is_nokey_name, and update comments that say "ciphertext name" (or "encrypted name") to say "no-key name" instead when warranted. Link: https://lore.kernel.org/r/20200924042624.98439-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-23Merge tag 'for-5.9-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "syzkaller started to hit us with reports, here's a fix for one type (stack overflow when printing checksums on read error). The other patch is a fix for sysfs object, we have a test for that and it leads to a crash." * tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix put of uninitialized kobject after seed device delete btrfs: fix overflow when copying corrupt csums for a message
2020-09-23block: mark blkdev_get staticChristoph Hellwig
There are no users outside the core block code left now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23PM: rewrite is_hibernate_resume_dev to not require an inodeChristoph Hellwig
Just check the dev_t to help simplifying the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23ocfs2: cleanup o2hb_region_dev_storeChristoph Hellwig
Use blkdev_get_by_dev instead of igrab (aka open coded bdgrab) + blkdev_get. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23block: move the NEED_PART_SCAN flag to struct gendiskChristoph Hellwig
We can only scan for partitions on the whole disk, so move the flag from struct block_device to struct gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23xfs: clean up calculation of LR header blocksGao Xiang
Let's use DIV_ROUND_UP() to calculate log record header blocks as what did in xlog_get_iclog_buffer_size() and wrap up a common helper for log recovery. Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-23xfs: avoid LR buffer overrun due to crafted h_lenGao Xiang
Currently, crafted h_len has been blocked for the log header of the tail block in commit a70f9fe52daa ("xfs: detect and handle invalid iclog size set by mkfs"). However, each log record could still have crafted h_len and cause log record buffer overrun. So let's check h_len vs buffer size for each log record as well. Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-09-23xfs: don't release log intent items when recovery failsDarrick J. Wong
Nowadays, log recovery will call ->release on the recovered intent items if recovery fails. Therefore, it's redundant to release them from inside the ->recover functions when they're about to return an error. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23xfs: attach inode to dquot in xfs_bui_item_recoverDarrick J. Wong
In the bmap intent item recovery code, we must be careful to attach the inode to its dquots (if quotas are enabled) so that a change in the shape of the bmap btree doesn't cause the quota counters to be incorrect. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23xfs: log new intent items created as part of finishing recovered intent itemsDarrick J. Wong
During a code inspection, I found a serious bug in the log intent item recovery code when an intent item cannot complete all the work and decides to requeue itself to get that done. When this happens, the item recovery creates a new incore deferred op representing the remaining work and attaches it to the transaction that it allocated. At the end of _item_recover, it moves the entire chain of deferred ops to the dummy parent_tp that xlog_recover_process_intents passed to it, but fail to log a new intent item for the remaining work before committing the transaction for the single unit of work. xlog_finish_defer_ops logs those new intent items once recovery has finished dealing with the intent items that it recovered, but this isn't sufficient. If the log is forced to disk after a recovered log item decides to requeue itself and the system goes down before we call xlog_finish_defer_ops, the second log recovery will never see the new intent item and therefore has no idea that there was more work to do. It will finish recovery leaving the filesystem in a corrupted state. The same logic applies to /any/ deferred ops added during intent item recovery, not just the one handling the remaining work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23xfs: check dabtree node hash values when loading child blocksDarrick J. Wong
When xchk_da_btree_block is loading a non-root dabtree block, we know that the parent block had to have a (hashval, address) pointer to the block that we just loaded. Check that the hashval in the parent matches the block we just loaded. This was found by fuzzing nbtree[3].hashval = ones in xfs/394. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23xfs: don't free rt blocks when we're doing a REMAP bunmapi callDarrick J. Wong
When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent to be unmapped from the given file fork without the extent being freed. We do this for non-rt files, but we forgot to do this for realtime files. So far this isn't a big deal since nobody makes a bunmapi call to a rt file with the REMAP flag set, but don't leave a logic bomb. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23xfs: Set xfs_buf's b_ops member when zeroing bitmap/summary filesChandan Babu R
In xfs_growfs_rt(), we enlarge bitmap and summary files by allocating new blocks for both files. For each of the new blocks allocated, we allocate an xfs_buf, zero the payload, log the contents and commit the transaction. Hence these buffers will eventually find themselves appended to list at xfs_ail->ail_buf_list. Later, xfs_growfs_rt() loops across all of the new blocks belonging to the bitmap inode to set the bitmap values to 1. In doing so, it allocates a new transaction and invokes the following sequence of functions, - xfs_rtfree_range() - xfs_rtmodify_range() - xfs_rtbuf_get() We pass '&xfs_rtbuf_ops' as the ops pointer to xfs_trans_read_buf(). - xfs_trans_read_buf() We find the xfs_buf of interest in per-ag hash table, invoke xfs_buf_reverify() which ends up assigning '&xfs_rtbuf_ops' to xfs_buf->b_ops. On the other hand, if xfs_growfs_rt_alloc() had allocated a few blocks for the bitmap inode and returned with an error, all the xfs_bufs corresponding to the new bitmap blocks that have been allocated would continue to be on xfs_ail->ail_buf_list list without ever having a non-NULL value assigned to their b_ops members. An AIL flush operation would then trigger the following warning message to be printed on the console, XFS (loop0): _xfs_buf_ioapply: no buf ops on daddr 0x58 len 8 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 3 PID: 449 Comm: xfsaild/loop0 Not tainted 5.8.0-rc4-chandan-00038-g4d8c2b9de9ab-dirty #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0x57/0x70 _xfs_buf_ioapply+0x37c/0x3b0 ? xfs_rw_bdev+0x1e0/0x1e0 ? xfs_buf_delwri_submit_buffers+0xd4/0x210 __xfs_buf_submit+0x6d/0x1f0 xfs_buf_delwri_submit_buffers+0xd4/0x210 xfsaild+0x2c8/0x9e0 ? __switch_to_asm+0x42/0x70 ? xfs_trans_ail_cursor_first+0x80/0x80 kthread+0xfe/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 This message indicates that the xfs_buf had its b_ops member set to NULL. This commit fixes the issue by assigning "&xfs_rtbuf_ops" to b_ops member of each of the xfs_bufs logged by xfs_growfs_rt_alloc(). Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-22fs: remove compat_sys_mountChristoph Hellwig
compat_sys_mount is identical to the regular sys_mount now, so remove it and use the native version everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-22fs,nfs: lift compat nfs4 mount data handling into the nfs codeChristoph Hellwig
There is no reason the generic fs code should bother with NFS specific binary mount data - lift the conversion into nfs4_parse_monolithic instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-22nfs: simplify nfs4_parse_monolithicChristoph Hellwig
Remove a level of indentation for the version 1 mount data parsing, and simplify the NULL data case a little bit as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-22fs: omfs: use kmemdup() rather than kmalloc+memcpyAlex Dewar
Issue identified with Coccinelle. Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>