summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-05-21Merge tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Seven smb3 fixes: one for stable, three others fix problems found in testing handle leases, and a compounded request fix" * tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6: Fix KASAN identified use-after-free issue. Defer close only when lease is enabled. Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. cifs: Fix inconsistent indenting cifs: fix memory leak in smb2_copychunk_range SMB3: incorrect file id in requests compounded with open cifs: remove deadstore in cifs_close_all_deferred_files()
2021-05-21Merge branch 'for-v5.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault
2021-05-20Fix KASAN identified use-after-free issue.Rohith Surabattula
[ 612.157429] ================================================================== [ 612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0 [ 612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382 [ 612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G OE 5.13.0-rc2+ #98 [ 612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 612.159640] Workqueue: 0x0 (deferredclose) [ 612.159669] Call Trace: [ 612.159685] dump_stack+0xbb/0x107 [ 612.159711] print_address_description.constprop.0+0x18/0x140 [ 612.159733] ? process_one_work+0x90/0x9b0 [ 612.159743] ? process_one_work+0x90/0x9b0 [ 612.159754] kasan_report.cold+0x7c/0xd8 [ 612.159778] ? lock_is_held_type+0x80/0x130 [ 612.159789] ? process_one_work+0x90/0x9b0 [ 612.159812] kasan_check_range+0x145/0x1a0 [ 612.159834] process_one_work+0x90/0x9b0 [ 612.159877] ? pwq_dec_nr_in_flight+0x110/0x110 [ 612.159914] ? spin_bug+0x90/0x90 [ 612.159967] worker_thread+0x3b6/0x6c0 [ 612.160023] ? process_one_work+0x9b0/0x9b0 [ 612.160038] kthread+0x1dc/0x200 [ 612.160051] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 612.160092] ret_from_fork+0x1f/0x30 [ 612.160399] Allocated by task 2358: [ 612.160757] kasan_save_stack+0x1b/0x40 [ 612.160768] __kasan_kmalloc+0x9b/0xd0 [ 612.160778] cifs_new_fileinfo+0xb0/0x960 [cifs] [ 612.161170] cifs_open+0xadf/0xf20 [cifs] [ 612.161421] do_dentry_open+0x2aa/0x6b0 [ 612.161432] path_openat+0xbd9/0xfa0 [ 612.161441] do_filp_open+0x11d/0x230 [ 612.161450] do_sys_openat2+0x115/0x240 [ 612.161460] __x64_sys_openat+0xce/0x140 When mod_delayed_work is called to modify the delay of pending work, it might return false and queue a new work when pending work is already scheduled or when try to grab pending work failed. So, Increase the reference count when new work is scheduled to avoid use-after-free. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-20Merge tag 'char-misc-5.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here is a big set of char/misc/other driver fixes for 5.13-rc3. The majority here is the fallout of the umn.edu re-review of all prior submissions. That resulted in a bunch of reverts along with the "correct" changes made, such that there is no regression of any of the potential fixes that were made by those individuals. I would like to thank the over 80 different developers who helped with the review and fixes for this mess. Other than that, there's a few habanna driver fixes for reported issues, and some dyndbg fixes for reported problems. All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (82 commits) misc: eeprom: at24: check suspend status before disable regulator uio_hv_generic: Fix another memory leak in error handling paths uio_hv_generic: Fix a memory leak in error handling paths uio/uio_pci_generic: fix return value changed in refactoring Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference"" dyndbg: drop uninformative vpr_info dyndbg: avoid calling dyndbg_emit_prefix when it has no work binder: Return EFAULT if we fail BINDER_ENABLE_ONEWAY_SPAM_DETECTION cdrom: gdrom: initialize global variable at init time brcmfmac: properly check for bus register errors Revert "brcmfmac: add a check for the status of usb_register" video: imsttfb: check for ioremap() failures Revert "video: imsttfb: fix potential NULL pointer dereferences" net: liquidio: Add missing null pointer checks Revert "net: liquidio: fix a NULL pointer dereference" media: gspca: properly check for errors in po1030_probe() Revert "media: gspca: Check the return value of write_bridge for timeout" media: gspca: mt9m111: Check write_bridge for timeout Revert "media: gspca: mt9m111: Check write_bridge for timeout" media: dvb: Add check on sp8870_readreg return ...
2021-05-20Merge tag 'quota_for_v5.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code
2021-05-19Defer close only when lease is enabled.Rohith Surabattula
When smb2 lease parameter is disabled on server. Server grants batch oplock instead of RHW lease by default on open, inode page cache needs to be zapped immediatley upon close as cache is not valid. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-19Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.Rohith Surabattula
Removed oplock_break_received flag which was added to achieve synchronization between oplock handler and open handler by earlier commit. It is not needed because there is an existing lock open_file_lock to achieve the same. find_readable_file takes open_file_lock and then traverses the openFileList. Similarly, cifs_oplock_break while closing the deferred handle (i.e cifsFileInfo_put) takes open_file_lock and then sends close to the server. Added comments for better readability. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-19cifs: Fix inconsistent indentingJiapeng Chong
Eliminate the follow smatch warning: fs/cifs/fs_context.c:1148 smb3_fs_context_parse_param() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-19cifs: fix memory leak in smb2_copychunk_rangeRonnie Sahlberg
When using smb2_copychunk_range() for large ranges we will run through several iterations of a loop calling SMB2_ioctl() but never actually free the returned buffer except for the final iteration. This leads to memory leaks everytime a large copychunk is requested. Fixes: 9bf0c9cd4314 ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files") Cc: <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-19Merge tag 'fs.idmapped.mount_setattr.v5.13-rc3' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull mount_setattr fix from Christian Brauner: "This makes an underlying idmapping assumption more explicit. We currently don't have any filesystems that support idmapped mounts which are mountable inside a user namespace, i.e. where s_user_ns != init_user_ns. That was a deliberate decision for now as userns root can just mount the filesystem themselves. Express this restriction explicitly and enforce it until there's a real use-case for this. This way we can notice it and will have a chance to adapt and audit our translation helpers and fstests appropriately if we need to support such filesystems" * tag 'fs.idmapped.mount_setattr.v5.13-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: fs/mount_setattr: tighten permission checks
2021-05-19SMB3: incorrect file id in requests compounded with openSteve French
See MS-SMB2 3.2.4.1.4, file ids in compounded requests should be set to 0xFFFFFFFFFFFFFFFF (we were treating it as u32 not u64 and setting it incorrectly). Signed-off-by: Steve French <stfrench@microsoft.com> Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
2021-05-18signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfoEric W. Biederman
With the addition of ssi_perf_data and ssi_perf_type struct signalfd_siginfo is dangerously close to running out of space. All that remains is just enough space for two additional 64bit fields. A practice of adding all possible siginfo_t fields into struct singalfd_siginfo can not be supported as adding the missing fields ssi_lower, ssi_upper, and ssi_pkey would require two 64bit fields and one 32bit fields. In practice the fields ssi_perf_data and ssi_perf_type can never be used by signalfd as the signal that generates them always delivers them synchronously to the thread that triggers them. Therefore until someone actually needs the fields ssi_perf_data and ssi_perf_type in signalfd_siginfo remove them. This leaves a bit more room for future expansion. v1: https://lkml.kernel.org/r/20210503203814.25487-12-ebiederm@xmission.com v2: https://lkml.kernel.org/r/20210505141101.11519-12-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-5-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18signal: Deliver all of the siginfo perf data in _perfEric W. Biederman
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18signal: Implement SIL_FAULT_TRAPNOEric W. Biederman
Now that si_trapno is part of the union in _si_fault and available on all architectures, add SIL_FAULT_TRAPNO and update siginfo_layout to return SIL_FAULT_TRAPNO when the code assumes si_trapno is valid. There is room for future changes to reduce when si_trapno is valid but this is all that is needed to make si_trapno and the other members of the the union in _sigfault mutually exclusive. Update the code that uses siginfo_layout to deal with SIL_FAULT_TRAPNO and have the same code ignore si_trapno in in all other cases. v1: https://lkml.kernel.org/r/m1o8dvs7s7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-6-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-2-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-17Merge tag 'for-5.13-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix fiemap to print extents that could get misreported due to internal extent splitting and logical merging for fiemap output - fix RCU stalls during delayed iputs - fix removed dentries still existing after log is synced" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix removed dentries still existing after log is synced btrfs: return whole extents in fiemap btrfs: avoid RCU stalls while running delayed iputs btrfs: return 0 for dev_extent_hole_check_zoned hole_start in case of error
2021-05-16cifs: remove deadstore in cifs_close_all_deferred_files()wenhuizhang
Deadstore detected by Lukas Bulwahn's CodeChecker Tool (ELISA group). line 741 struct cifsInodeInfo *cinode; line 747 cinode = CIFS_I(d_inode(cfile->dentry)); could be deleted. cinode on filesystem should not be deleted when files are closed, they are representations of some data fields on a physical disk, thus no further action is required. The virtual inode on vfs will be handled by vfs automatically, and the denotation is inode, which is different from the cinode. Signed-off-by: wenhuizhang <wenhui@gwmail.gwu.edu> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: resource, squashfs, hfsplus, modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan, pagemap, and ioremap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/ioremap: fix iomap_max_page_shift docs: admin-guide: update description for kernel.modprobe sysctl hfsplus: prevent corruption in shrinking truncate mm/filemap: fix readahead return types kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled mm: fix struct page layout on 32-bit systems ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" userfaultfd: release page in error path to avoid BUG_ON squashfs: fix divide error in calculate_skip() kernel/resource: fix return code check in __request_free_mem_region mm, slub: move slub_debug static key enabling outside slab_mutex mm/hugetlb: fix cow where page writtable in child mm/hugetlb: fix F_SEAL_FUTURE_WRITE
2021-05-15Merge tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Just a few minor fixes/changes: - Fix issue with double free race for linked timeout completions - Fix reference issue with timeouts - Remove last few places that make SQPOLL special, since it's just an io thread now. - Bump maximum allowed registered buffers, as we don't allocate as much anymore" * tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block: io_uring: increase max number of reg buffers io_uring: further remove sqpoll limits on opcodes io_uring: fix ltout double free on completion race io_uring: fix link timeout refs
2021-05-15Merge tag 'erofs-for-5.13-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "This mainly fixes 1 lcluster-sized pclusters for the big pcluster feature, which can be forcely generated by mkfs as a specific on-disk case for per-(sub)file compression strategies but missed to handle in runtime properly. Also, documentation updates are included to fix the broken illustration due to the ReST conversion by accident and complete the big pcluster introduction. Summary: - update documentation to fix the broken illustration due to ReST conversion by accident at that time and complete the big pcluster introduction - fix 1 lcluster-sized pclusters for the big pcluster feature" * tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix 1 lcluster-sized pcluster for big pcluster erofs: update documentation about data compression erofs: fix broken illustration in documentation
2021-05-15Merge tag 'dax-fixes-5.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A fix for a hang condition due to missed wakeups in the filesystem-dax core when exercised by virtiofs. This bug has been there from the beginning, but the condition has not triggered on other filesystems since they hold a lock over invalidation events" * tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Wake up all waiters after invalidating dax entry dax: Add a wakeup mode parameter to put_unlocked_entry() dax: Add an enum for specifying dax wakup mode
2021-05-14hfsplus: prevent corruption in shrinking truncateJouni Roivas
I believe there are some issues introduced by commit 31651c607151 ("hfsplus: avoid deadlock on file truncation") HFS+ has extent records which always contains 8 extents. In case the first extent record in catalog file gets full, new ones are allocated from extents overflow file. In case shrinking truncate happens to middle of an extent record which locates in extents overflow file, the logic in hfsplus_file_truncate() was changed so that call to hfs_brec_remove() is not guarded any more. Right action would be just freeing the extents that exceed the new size inside extent record by calling hfsplus_free_extents(), and then check if the whole extent record should be removed. However since the guard (blk_cnt > start) is now after the call to hfs_brec_remove(), this has unfortunate effect that the last matching extent record is removed unconditionally. To reproduce this issue, create a file which has at least 10 extents, and then perform shrinking truncate into middle of the last extent record, so that the number of remaining extents is not under or divisible by 8. This causes the last extent record (8 extents) to be removed totally instead of truncating into middle of it. Thus this causes corruption, and lost data. Fix for this is simply checking if the new truncated end is below the start of this extent record, making it safe to remove the full extent record. However call to hfs_brec_remove() can't be moved to it's previous place since we're dropping ->tree_lock and it can cause a race condition and the cached info being invalidated possibly corrupting the node data. Another issue is related to this one. When entering into the block (blk_cnt > start) we are not holding the ->tree_lock. We break out from the loop not holding the lock, but hfs_find_exit() does unlock it. Not sure if it's possible for someone else to take the lock under our feet, but it can cause hard to debug errors and premature unlocking. Even if there's no real risk of it, the locking should still always be kept in balance. Thus taking the lock now just before the check. Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com Fixes: 31651c607151f ("hfsplus: avoid deadlock on file truncation") Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14mm/filemap: fix readahead return typesMatthew Wilcox (Oracle)
A readahead request will not allocate more memory than can be represented by a size_t, even on systems that have HIGHMEM available. Change the length functions from returning an loff_t to a size_t. Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14squashfs: fix divide error in calculate_skip()Phillip Lougher
Sysbot has reported a "divide error" which has been identified as being caused by a corrupted file_size value within the file inode. This value has been corrupted to a much larger value than expected. Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to the file_size value corruption this overflows the int argument/variable in that function, leading to the divide error. This patch changes the function to use u64. This will accommodate any unexpectedly large values due to corruption. The value returned from calculate_skip() is clamped to be never more than SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to an unexpectedly large return result here. Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com> Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14mm/hugetlb: fix F_SEAL_FUTURE_WRITEPeter Xu
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14Merge tag 'f2fs-5.13-rc1-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This fixes some critical bugs such as memory leak in compression flows, kernel panic when handling errors, and swapon failure due to newly added condition check" * tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: return EINVAL for hole cases in swap file f2fs: avoid swapon failure by giving a warning first f2fs: compress: fix to assign cc.cluster_idx correctly f2fs: compress: fix race condition of overwrite vs truncate f2fs: compress: fix to free compress page correctly f2fs: support iflag change given the mask f2fs: avoid null pointer access when handling IPU error
2021-05-14io_uring: increase max number of reg buffersPavel Begunkov
Since recent changes instead of storing a large array of struct io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and we should not to so worry about restricting max number of registererd buffer slots, increase the limit 4 times. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14io_uring: further remove sqpoll limits on opcodesPavel Begunkov
There are three types of requests that left disabled for sqpoll, namely epoll ctx, statx, and resources update. Since SQPOLL task is now closely mimics a userspace thread, remove the restrictions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14io_uring: fix ltout double free on completion racePavel Begunkov
Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free. Cc: stable@vger.kernel.org # 5.10+ Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14btrfs: fix removed dentries still existing after log is syncedFilipe Manana
When we move one inode from one directory to another and both the inode and its previous parent directory were logged before, we are not supposed to have the dentry for the old parent if we have a power failure after the log is synced. Only the new dentry is supposed to exist. Generally this works correctly, however there is a scenario where this is not currently working, because the old parent of the file/directory that was moved is not authoritative for a range that includes the dir index and dir item keys of the old dentry. This case is better explained with the following example and reproducer: # The test requires a very specific layout of keys and items in the # fs/subvolume btree to trigger the bug. So we want to make sure that # on whatever platform we are, we have the same leaf/node size. # # Currently in btrfs the node/leaf size can not be smaller than the page # size (but it can be greater than the page size). So use the largest # supported node/leaf size (64K). $ mkfs.btrfs -f -n 65536 /dev/sdc $ mount /dev/sdc /mnt # "testdir" is inode 257. $ mkdir /mnt/testdir $ chmod 755 /mnt/testdir # Create several empty files to have the directory "testdir" with its # items spread over several leaves (7 in this case). $ for ((i = 1; i <= 1200; i++)); do echo -n > /mnt/testdir/file$i done # Create our test directory "dira", inode number 1458, which gets all # its items in leaf 7. # # The BTRFS_DIR_ITEM_KEY item for inode 257 ("testdir") that points to # the entry named "dira" is in leaf 2, while the BTRFS_DIR_INDEX_KEY # item that points to that entry is in leaf 3. # # For this particular filesystem node size (64K), file count and file # names, we endup with the directory entry items from inode 257 in # leaves 2 and 3, as previously mentioned - what matters for triggering # the bug exercised by this test case is that those items are not placed # in leaf 1, they must be placed in a leaf different from the one # containing the inode item for inode 257. # # The corresponding BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY items for # the parent inode (257) are the following: # # item 460 key (257 DIR_ITEM 3724298081) itemoff 48344 itemsize 34 # location key (1458 INODE_ITEM 0) type DIR # transid 6 data_len 0 name_len 4 # name: dira # # and: # # item 771 key (257 DIR_INDEX 1202) itemoff 36673 itemsize 34 # location key (1458 INODE_ITEM 0) type DIR # transid 6 data_len 0 name_len 4 # name: dira $ mkdir /mnt/testdir/dira # Make sure everything done so far is durably persisted. $ sync # Now do a change to inode 257 ("testdir") that does not result in # COWing leaves 2 and 3 - the leaves that contain the directory items # pointing to inode 1458 (directory "dira"). # # Changing permissions, the owner/group, updating or adding a xattr, # etc, will not change (COW) leaves 2 and 3. So for the sake of # simplicity change the permissions of inode 257, which results in # updating its inode item and therefore change (COW) only leaf 1. $ chmod 700 /mnt/testdir # Now fsync directory inode 257. # # Since only the first leaf was changed/COWed, we log the inode item of # inode 257 and only the dentries found in the first leaf, all have a # key type of BTRFS_DIR_ITEM_KEY, and no keys of type # BTRFS_DIR_INDEX_KEY, because they sort after the former type and none # exist in the first leaf. # # We also log 3 items that represent ranges for dir items and dir # indexes for which the log is authoritative: # # 1) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is # authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset # in the range [0, 2285968570] (the offset here is the crc32c of the # dentry's name). The value 2285968570 corresponds to the offset of # the first key of leaf 2 (which is of type BTRFS_DIR_ITEM_KEY); # # 2) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is # authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset # in the range [4293818216, (u64)-1] (the offset here is the crc32c # of the dentry's name). The value 4293818216 corresponds to the # offset of the highest key of type BTRFS_DIR_ITEM_KEY plus 1 # (4293818215 + 1), which is located in leaf 2; # # 3) a key of type BTRFS_DIR_LOG_INDEX_KEY, with an offset of 1203, # which indicates the log is authoritative for all keys of type # BTRFS_DIR_INDEX_KEY that have an offset in the range # [1203, (u64)-1]. The value 1203 corresponds to the offset of the # last key of type BTRFS_DIR_INDEX_KEY plus 1 (1202 + 1), which is # located in leaf 3; # # Also, because "testdir" is a directory and inode 1458 ("dira") is a # child directory, we log inode 1458 too. $ xfs_io -c "fsync" /mnt/testdir # Now move "dira", inode 1458, to be a child of the root directory # (inode 256). # # Because this inode was previously logged, when "testdir" was fsynced, # the log is updated so that the old inode reference, referring to inode # 257 as the parent, is deleted and the new inode reference, referring # to inode 256 as the parent, is added to the log. $ mv /mnt/testdir/dira /mnt # Now change some file and fsync it. This guarantees the log changes # made by the previous move/rename operation are persisted. We do not # need to do any special modification to the file, just any change to # any file and sync the log. $ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/testdir/file1 # Simulate a power failure and then mount again the filesystem to # replay the log tree. We want to verify that we are able to mount the # filesystem, meaning log replay was successful, and that directory # inode 1458 ("dira") only has inode 256 (the filesystem's root) as # its parent (and no longer a child of inode 257). # # It used to happen that during log replay we would end up having # inode 1458 (directory "dira") with 2 hard links, being a child of # inode 257 ("testdir") and inode 256 (the filesystem's root). This # resulted in the tree checker detecting the issue and causing the # mount operation to fail (with -EIO). # # This happened because in the log we have the new name/parent for # inode 1458, which results in adding the new dentry with inode 256 # as the parent, but the previous dentry, under inode 257 was never # removed - this is because the ranges for dir items and dir indexes # of inode 257 for which the log is authoritative do not include the # old dir item and dir index for the dentry of inode 257 referring to # inode 1458: # # - for dir items, the log is authoritative for the ranges # [0, 2285968570] and [4293818216, (u64)-1]. The dir item at inode 257 # pointing to inode 1458 has a key of (257 DIR_ITEM 3724298081), as # previously mentioned, so the dir item is not deleted when the log # replay procedure processes the authoritative ranges, as 3724298081 # is outside both ranges; # # - for dir indexes, the log is authoritative for the range # [1203, (u64)-1], and the dir index item of inode 257 pointing to # inode 1458 has a key of (257 DIR_INDEX 1202), as previously # mentioned, so the dir index item is not deleted when the log # replay procedure processes the authoritative range. <power failure> $ mount /dev/sdc /mnt mount: /mnt: can't read superblock on /dev/sdc. $ dmesg (...) [87849.840509] BTRFS info (device sdc): start tree-log replay [87849.875719] BTRFS critical (device sdc): corrupt leaf: root=5 block=30539776 slot=554 ino=1458, invalid nlink: has 2 expect no more than 1 for dir [87849.878084] BTRFS info (device sdc): leaf 30539776 gen 7 total ptrs 557 free space 2092 owner 5 [87849.879516] BTRFS info (device sdc): refs 1 lock_owner 0 current 2099108 [87849.880613] item 0 key (1181 1 0) itemoff 65275 itemsize 160 [87849.881544] inode generation 6 size 0 mode 100644 [87849.882692] item 1 key (1181 12 257) itemoff 65258 itemsize 17 (...) [87850.562549] item 556 key (1458 12 257) itemoff 16017 itemsize 14 [87850.563349] BTRFS error (device dm-0): block=30539776 write time tree block corruption detected [87850.564386] ------------[ cut here ]------------ [87850.564920] WARNING: CPU: 3 PID: 2099108 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs] [87850.566129] Modules linked in: btrfs dm_zero dm_snapshot (...) [87850.573789] CPU: 3 PID: 2099108 Comm: mount Not tainted 5.12.0-rc8-btrfs-next-86 #1 (...) [87850.587481] Call Trace: [87850.587768] btree_csum_one_bio+0x244/0x2b0 [btrfs] [87850.588354] ? btrfs_bio_fits_in_stripe+0xd8/0x110 [btrfs] [87850.589003] btrfs_submit_metadata_bio+0xb7/0x100 [btrfs] [87850.589654] submit_one_bio+0x61/0x70 [btrfs] [87850.590248] submit_extent_page+0x91/0x2f0 [btrfs] [87850.590842] write_one_eb+0x175/0x440 [btrfs] [87850.591370] ? find_extent_buffer_nolock+0x1c0/0x1c0 [btrfs] [87850.592036] btree_write_cache_pages+0x1e6/0x610 [btrfs] [87850.592665] ? free_debug_processing+0x1d5/0x240 [87850.593209] do_writepages+0x43/0xf0 [87850.593798] ? __filemap_fdatawrite_range+0xa4/0x100 [87850.594391] __filemap_fdatawrite_range+0xc5/0x100 [87850.595196] btrfs_write_marked_extents+0x68/0x160 [btrfs] [87850.596202] btrfs_write_and_wait_transaction.isra.0+0x4d/0xd0 [btrfs] [87850.597377] btrfs_commit_transaction+0x794/0xca0 [btrfs] [87850.598455] ? _raw_spin_unlock_irqrestore+0x32/0x60 [87850.599305] ? kmem_cache_free+0x15a/0x3d0 [87850.600029] btrfs_recover_log_trees+0x346/0x380 [btrfs] [87850.601021] ? replay_one_extent+0x7d0/0x7d0 [btrfs] [87850.601988] open_ctree+0x13c9/0x1698 [btrfs] [87850.602846] btrfs_mount_root.cold+0x13/0xed [btrfs] [87850.603771] ? kmem_cache_alloc_trace+0x7c9/0x930 [87850.604576] ? vfs_parse_fs_string+0x5d/0xb0 [87850.605293] ? kfree+0x276/0x3f0 [87850.605857] legacy_get_tree+0x30/0x50 [87850.606540] vfs_get_tree+0x28/0xc0 [87850.607163] fc_mount+0xe/0x40 [87850.607695] vfs_kern_mount.part.0+0x71/0x90 [87850.608440] btrfs_mount+0x13b/0x3e0 [btrfs] (...) [87850.629477] ---[ end trace 68802022b99a1ea0 ]--- [87850.630849] BTRFS: error (device sdc) in btrfs_commit_transaction:2381: errno=-5 IO failure (Error while writing out transaction) [87850.632422] BTRFS warning (device sdc): Skipping commit of aborted transaction. [87850.633416] BTRFS: error (device sdc) in cleanup_transaction:1978: errno=-5 IO failure [87850.634553] BTRFS: error (device sdc) in btrfs_replay_log:2431: errno=-5 IO failure (Failed to recover log tree) [87850.637529] BTRFS error (device sdc): open_ctree failed In this example the inode we moved was a directory, so it was easy to detect the problem because directories can only have one hard link and the tree checker immediately detects that. If the moved inode was a file, then the log replay would succeed and we would end up having both the new hard link (/mnt/foo) and the old hard link (/mnt/testdir/foo) present, but only the new one should be present. Fix this by forcing re-logging of the old parent directory when logging the new name during a rename operation. This ensures we end up with a log that is authoritative for a range covering the keys for the old dentry, therefore causing the old dentry do be deleted when replaying the log. A test case for fstests will follow up soon. Fixes: 64d6b281ba4db0 ("btrfs: remove unnecessary check_parent_dirs_for_sync()") CC: stable@vger.kernel.org # 5.12+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-14btrfs: return whole extents in fiemapBoris Burkov
`xfs_io -c 'fiemap <off> <len>' <file>` can give surprising results on btrfs that differ from xfs. btrfs prints out extents trimmed to fit the user input. If the user's fiemap request has an offset, then rather than returning each whole extent which intersects that range, we also trim the start extent to not have start < off. Documentation in filesystems/fiemap.txt and the xfs_io man page suggests that returning the whole extent is expected. Some cases which all yield the same fiemap in xfs, but not btrfs: dd if=/dev/zero of=$f bs=4k count=1 sudo xfs_io -c 'fiemap 0 1024' $f 0: [0..7]: 26624..26631 sudo xfs_io -c 'fiemap 2048 1024' $f 0: [4..7]: 26628..26631 sudo xfs_io -c 'fiemap 2048 4096' $f 0: [4..7]: 26628..26631 sudo xfs_io -c 'fiemap 3584 512' $f 0: [7..7]: 26631..26631 sudo xfs_io -c 'fiemap 4091 5' $f 0: [7..6]: 26631..26630 I believe this is a consequence of the logic for merging contiguous extents represented by separate extent items. That logic needs to track the last offset as it loops through the extent items, which happens to pick up the start offset on the first iteration, and trim off the beginning of the full extent. To fix it, start `off` at 0 rather than `start` so that we keep the iteration/merging intact without cutting off the start of the extent. after the fix, all the above commands give: 0: [0..7]: 26624..26631 The merging logic is exercised by fstest generic/483, and I have written a new fstest for checking we don't have backwards or zero-length fiemaps for cases like those above. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-14btrfs: avoid RCU stalls while running delayed iputsJosef Bacik
Generally a delayed iput is added when we might do the final iput, so usually we'll end up sleeping while processing the delayed iputs naturally. However there's no guarantee of this, especially for small files. In production we noticed 5 instances of RCU stalls while testing a kernel release overnight across 1000 machines, so this is relatively common: host count: 5 rcu: INFO: rcu_sched self-detected stall on CPU rcu: ....: (20998 ticks this GP) idle=59e/1/0x4000000000000002 softirq=12333372/12333372 fqs=3208 (t=21031 jiffies g=27810193 q=41075) NMI backtrace for cpu 1 CPU: 1 PID: 1713 Comm: btrfs-cleaner Kdump: loaded Not tainted 5.6.13-0_fbk12_rc1_5520_gec92bffc1ec9 #1 Call Trace: <IRQ> dump_stack+0x50/0x70 nmi_cpu_backtrace.cold.6+0x30/0x65 ? lapic_can_unplug_cpu.cold.30+0x40/0x40 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x99/0xc7 rcu_sched_clock_irq.cold.90+0x1b2/0x3a3 ? trigger_load_balance+0x5c/0x200 ? tick_sched_do_timer+0x60/0x60 ? tick_sched_do_timer+0x60/0x60 update_process_times+0x24/0x50 tick_sched_timer+0x37/0x70 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:queued_spin_lock_slowpath+0x17d/0x1b0 RSP: 0018:ffffc9000da5fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000000 RBX: ffff889fa81d0cd8 RCX: 0000000000000029 RDX: ffff889fff86c0c0 RSI: 0000000000080000 RDI: ffff88bfc2da7200 RBP: ffff888f2dcdd768 R08: 0000000001040000 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff82a55560 R12: ffff88bfc2da7200 R13: 0000000000000000 R14: ffff88bff6c2a360 R15: ffffffff814bd870 ? kzalloc.constprop.57+0x30/0x30 list_lru_add+0x5a/0x100 inode_lru_list_add+0x20/0x40 iput+0x1c1/0x1f0 run_delayed_iput_locked+0x46/0x90 btrfs_run_delayed_iputs+0x3f/0x60 cleaner_kthread+0xf2/0x120 kthread+0x10b/0x130 Fix this by adding a cond_resched_lock() to the loop processing delayed iputs so we can avoid these sort of stalls. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-14btrfs: return 0 for dev_extent_hole_check_zoned hole_start in case of errorJohannes Thumshirn
Commit 7000babddac6 ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned") assigned false to the hole_start parameter of dev_extent_hole_check_zoned(). The hole_start parameter is not boolean and returns the start location of the found hole. Fixes: 7000babddac6 ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-13fs: ecryptfs: remove BUG_ON from crypt_scatterlistPhillip Potter
crypt_stat memory itself is allocated when inode is created, in ecryptfs_alloc_inode, which returns NULL on failure and is handled by callers, which would prevent us getting to this point. It then calls ecryptfs_init_crypt_stat which allocates crypt_stat->tfm checking for and likewise handling allocation failure. Finally, crypt_stat->flags has ECRYPTFS_STRUCT_INITIALIZED merged into it in ecryptfs_init_crypt_stat as well. Simply put, the conditions that the BUG_ON checks for will never be triggered, as to even get to this function, the relevant conditions will have already been fulfilled (or the inode allocation would fail in the first place and thus no call to this function or those above it). Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20210503115736.2104747-50-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-13Revert "ecryptfs: replace BUG_ON with error handling code"Greg Kroah-Hartman
This reverts commit 2c2a7552dd6465e8fde6bc9cccf8d66ed1c1eb72. Because of recent interactions with developers from @umn.edu, all commits from them have been recently re-reviewed to ensure if they were correct or not. Upon review, this commit was found to be incorrect for the reasons below, so it must be reverted. It will be fixed up "correctly" in a later kernel change. The original commit log for this change was incorrect, no "error handling code" was added, things will blow up just as badly as before if any of these cases ever were true. As this BUG_ON() never fired, and most of these checks are "obviously" never going to be true, let's just revert to the original code for now until this gets unwound to be done correctly in the future. Cc: Aditya Pakki <pakki001@umn.edu> Fixes: 2c2a7552dd64 ("ecryptfs: replace BUG_ON with error handling code") Cc: stable <stable@vger.kernel.org> Acked-by: Tyler Hicks <code@tyhicks.com> Link: https://lore.kernel.org/r/20210503115736.2104747-49-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-13erofs: fix 1 lcluster-sized pcluster for big pclusterGao Xiang
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type rather than a HEAD or PLAIN type instead, which means its pclustersize _must_ be 1 lcluster (since its uncompressed size < 2 lclusters), as illustrated below: HEAD HEAD / PLAIN lcluster type ____________ ____________ |_:__________|_________:__| file data (uncompressed) . . .____________. |____________| pcluster data (compressed) Such on-disk case was explained before [1] but missed to be handled properly in the runtime implementation. It can be observed if manually generating 1 lcluster-sized pcluster with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now. [1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-05-12f2fs: return EINVAL for hole cases in swap fileJaegeuk Kim
This tries to fix xfstests/generic/495. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-12fs/mount_setattr: tighten permission checksChristian Brauner
We currently don't have any filesystems that support idmapped mounts which are mountable inside a user namespace. That was a deliberate decision for now as a userns root can just mount the filesystem themselves. So enforce this restriction explicitly until there's a real use-case for this. This way we can notice it and will have a chance to adapt and audit our translation helpers and fstests appropriately if we need to support such filesystems. Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org CC: linux-fsdevel@vger.kernel.org Suggested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-11f2fs: avoid swapon failure by giving a warning firstJaegeuk Kim
The final solution can be migrating blocks to form a section-aligned file internally. Meanwhile, let's ask users to do that when preparing the swap file initially like: 1) create() 2) ioctl(F2FS_IOC_SET_PIN_FILE) 3) fallocate() Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 36e4d95891ed ("f2fs: check if swapfile is section-alligned") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: compress: fix to assign cc.cluster_idx correctlyChao Yu
In f2fs_destroy_compress_ctx(), after f2fs_destroy_compress_ctx(), cc.cluster_idx will be cleared w/ NULL_CLUSTER, f2fs_cluster_blocks() may check wrong cluster metadata, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: compress: fix race condition of overwrite vs truncateChao Yu
pos_fsstress testcase complains a panic as belew: ------------[ cut here ]------------ kernel BUG at fs/f2fs/compress.c:1082! invalid opcode: 0000 [#1] SMP PTI CPU: 4 PID: 2753477 Comm: kworker/u16:2 Tainted: G OE 5.12.0-rc1-custom #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: writeback wb_workfn (flush-252:16) RIP: 0010:prepare_compress_overwrite+0x4c0/0x760 [f2fs] Call Trace: f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs] f2fs_write_cache_pages+0x468/0x8a0 [f2fs] f2fs_write_data_pages+0x2a4/0x2f0 [f2fs] do_writepages+0x38/0xc0 __writeback_single_inode+0x44/0x2a0 writeback_sb_inodes+0x223/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x290 wb_workfn+0x309/0x500 process_one_work+0x220/0x3c0 worker_thread+0x53/0x420 kthread+0x12f/0x150 ret_from_fork+0x22/0x30 The root cause is truncate() may race with overwrite as below, so that one reference count left in page can not guarantee the page attaching in mapping tree all the time, after truncation, later find_lock_page() may return NULL pointer. - prepare_compress_overwrite - f2fs_pagecache_get_page - unlock_page - f2fs_setattr - truncate_setsize - truncate_inode_page - delete_from_page_cache - find_lock_page Fix this by avoiding referencing updated page. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: compress: fix to free compress page correctlyChao Yu
In error path of f2fs_write_compressed_pages(), it needs to call f2fs_compress_free_page() to release temporary page. Fixes: 5e6bbde95982 ("f2fs: introduce mempool for {,de}compress intermediate page allocation") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: support iflag change given the maskJaegeuk Kim
In f2fs_fileattr_set(), if (!fa->flags_valid) mask &= FS_COMMON_FL; In this case, we can set supported flags by mask only instead of BUG_ON. /* Flags shared betwen flags/xflags */ (FS_SYNC_FL | FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NODUMP_FL | FS_NOATIME_FL | FS_DAX_FL | \ FS_PROJINHERIT_FL) Fixes: 9b1bb01c8ae7 ("f2fs: convert to fileattr") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11f2fs: avoid null pointer access when handling IPU errorJaegeuk Kim
Unable to handle kernel NULL pointer dereference at virtual address 000000000000001a pc : f2fs_inplace_write_data+0x144/0x208 lr : f2fs_inplace_write_data+0x134/0x208 Call trace: f2fs_inplace_write_data+0x144/0x208 f2fs_do_write_data_page+0x270/0x770 f2fs_write_single_data_page+0x47c/0x830 __f2fs_write_data_pages+0x444/0x98c f2fs_write_data_pages.llvm.16514453770497736882+0x2c/0x38 do_writepages+0x58/0x118 __writeback_single_inode+0x44/0x300 writeback_sb_inodes+0x4b8/0x9c8 wb_writeback+0x148/0x42c wb_do_writeback+0xc8/0x390 wb_workfn+0xb0/0x2f4 process_one_work+0x1fc/0x444 worker_thread+0x268/0x4b4 kthread+0x13c/0x158 ret_from_fork+0x10/0x18 Fixes: 955772787667 ("f2fs: drop inplace IO if fs status is abnormal") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11Merge tag 'for-5.13-rc1-part2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Handle transaction start error in btrfs_fileattr_set() This is fix for code introduced by the new fileattr merge" * tag 'for-5.13-rc1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: handle transaction start error in btrfs_fileattr_set
2021-05-11btrfs: handle transaction start error in btrfs_fileattr_setRitesh Harjani
Add error handling in btrfs_fileattr_set in case of an error while starting a transaction. This fixes btrfs/232 which otherwise used to fail with below signature on Power. btrfs/232 [ 1119.474650] run fstests btrfs/232 at 2021-04-21 02:21:22 <...> [ 1366.638585] BUG: Unable to handle kernel data access on read at 0xffffffffffffff86 [ 1366.638768] Faulting instruction address: 0xc0000000009a5c88 cpu 0x0: Vector: 380 (Data SLB Access) at [c000000014f177b0] pc: c0000000009a5c88: btrfs_update_root_times+0x58/0xc0 lr: c0000000009a5c84: btrfs_update_root_times+0x54/0xc0 <...> pid = 24881, comm = fsstress btrfs_update_inode+0xa0/0x140 btrfs_fileattr_set+0x5d0/0x6f0 vfs_fileattr_set+0x2a8/0x390 do_vfs_ioctl+0x1290/0x1ac0 sys_ioctl+0x6c/0x120 system_call_exception+0x3d4/0x410 system_call_common+0xec/0x278 Fixes: 97fc29775487 ("btrfs: convert to fileattr") Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-10Merge tag 'for-5.13-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "First batch of various fixes, here's a list of notable ones: - fix unmountable seed device after fstrim - fix silent data loss in zoned mode due to ordered extent splitting - fix race leading to unpersisted data and metadata on fsync - fix deadlock when cloning inline extents and using qgroups" * tag 'for-5.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize return variable in cleanup_free_space_cache_v1 btrfs: zoned: sanity check zone type btrfs: fix unmountable seed device after fstrim btrfs: fix deadlock when cloning inline extents and using qgroups btrfs: fix race leading to unpersisted data and metadata on fsync btrfs: do not consider send context as valid when trying to flush qgroups btrfs: zoned: fix silent data loss after failure splitting ordered extent
2021-05-10quota: Use 'hlist_for_each_entry' to simplify codeChristophe JAILLET
Use 'hlist_for_each_entry' instead of hand writing it. This saves a few lines of code. Link: https://lore.kernel.org/r/f82d3e33964dcbd2aac19866735e0a8381c8a735.1619599407.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-09Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small SMB3 chmultichannel related changesets (also for stable) from the SMB3 test event this week. The other fixes are still in review/testing" * tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6: smb3: if max_channels set to more than one channel request multichannel smb3: do not attempt multichannel to server which does not support it smb3: when mounting with multichannel include it in requested capabilities
2021-05-08io_uring: fix link timeout refsPavel Begunkov
WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] io_put_req fs/io_uring.c:2140 [inline] io_queue_linked_timeout fs/io_uring.c:6300 [inline] __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354 io_submit_sqe fs/io_uring.c:6534 [inline] io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline] __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182 io_link_timeout_fn() should put only one reference of the linked timeout request, however in case of racing with the master request's completion first io_req_complete() puts one and then io_put_req_deferred() is called. Cc: stable@vger.kernel.org # 5.12+ Fixes: 9ae1f8dd372e0 ("io_uring: fix inconsistent lock state") Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-08Merge tag 'kbuild-v5.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - Convert sh and sparc to use generic shell scripts to generate the syscall headers - refactor .gitignore files - Update kernel/config_data.gz only when the content of the .config is really changed, which avoids the unneeded re-link of vmlinux - move "remove stale files" workarounds to scripts/remove-stale-files - suppress unused-but-set-variable warnings by default for Clang as well - fix locale setting LANG=C to LC_ALL=C - improve 'make distclean' - always keep intermediate objects from scripts/link-vmlinux.sh - move IF_ENABLED out of <linux/kconfig.h> to make it self-contained - misc cleanups * tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h> kbuild: Don't remove link-vmlinux temporary files on exit/signal kbuild: remove the unneeded comments for external module builds kbuild: make distclean remove tag files in sub-directories kbuild: make distclean work against $(objtree) instead of $(srctree) kbuild: refactor modname-multi by using suffix-search kbuild: refactor fdtoverlay rule kbuild: parameterize the .o part of suffix-search arch: use cross_compiling to check whether it is a cross build or not kbuild: remove ARCH=sh64 support from top Makefile .gitignore: prefix local generated files with a slash kbuild: replace LANG=C with LC_ALL=C Makefile: Move -Wno-unused-but-set-variable out of GCC only block kbuild: add a script to remove stale generated files kbuild: update config_data.gz only when the content of .config is changed .gitignore: ignore only top-level modules.builtin .gitignore: move tags and TAGS close to other tag files kernel/.gitgnore: remove stale timeconst.h and hz.bc usr/include: refactor .gitignore genksyms: fix stale comment ...