summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-31pstore/ram: Run without kernel crash dump regionKees Cook
The ram pstore backend has always had the crash dumper frontend enabled unconditionally. However, it was possible to effectively disable it by setting a record_size=0. All the machinery would run (storing dumps to the temporary crash buffer), but 0 bytes would ultimately get stored due to there being no przs allocated for dumps. Commit 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes"), however, assumed that there would always be at least one allocated dprz for calculating the size of the temporary crash buffer. This was, of course, not the case when record_size=0, and would lead to a NULL deref trying to find the dprz buffer size: BUG: unable to handle kernel NULL pointer dereference at (null) ... IP: ramoops_probe+0x285/0x37e (fs/pstore/ram.c:808) cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size; Instead, we need to only enable the frontends based on the success of the prz initialization and only take the needed actions when those zones are available. (This also fixes a possible error in detecting if the ftrace frontend should be enabled.) Reported-and-tested-by: Yaro Slav <yaro330@gmail.com> Fixes: 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2019-05-31pstore: Set tfm to NULL on free_buf_for_compressionPi-Hsun Shih
Set tfm to NULL on free_buf_for_compression() after crypto_free_comp(). This avoid a use-after-free when allocate_buf_for_compression() and free_buf_for_compression() are called twice. Although free_buf_for_compression() freed the tfm, allocate_buf_for_compression() won't reinitialize the tfm since the tfm pointer is not NULL. Fixes: 95047b0519c1 ("pstore: Refactor compression initialization") Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2019-05-30Merge tag 'for-5.2-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes for bugs reported by users, fuzzing tools and regressions: - fix crashes in relocation: + resuming interrupted balance operation does not properly clean up orphan trees + with enabled qgroups, resuming needs to be more careful about block groups due to limited context when updating qgroups - fsync and logging fixes found by fuzzing - incremental send fixes for no-holes and clone - fix spin lock type used in timer function for zstd" * tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix race updating log root item during fsync Btrfs: fix wrong ctime and mtime of a directory after log replay Btrfs: fix fsync not persisting changed attributes of a directory btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() Btrfs: incremental send, fix emission of invalid clone operations Btrfs: incremental send, fix file corruption when no-holes feature is enabled btrfs: correct zstd workspace manager lock to use spin_lock_bh() btrfs: Ensure replaced device doesn't have pending chunk allocation
2019-05-30NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handledYihao Wu
When a waiter is waked by CB_NOTIFY_LOCK, it will retry nfs4_proc_setlk(). The waiter may fail to nfs4_proc_setlk() and sleep again. However, the waiter is already removed from clp->cl_lock_waitq when handling CB_NOTIFY_LOCK in nfs4_wake_lock_waiter(). So any subsequent CB_NOTIFY_LOCK won't wake this waiter anymore. We should put the waiter back to clp->cl_lock_waitq before retrying. Cc: stable@vger.kernel.org #4.9+ Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-30NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiterYihao Wu
Commit b7dbcc0e433f "NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter" found this bug. However it didn't fix it. This commit replaces schedule_timeout() with wait_woken() and default_wake_function() with woken_wake_function() in function nfs4_retry_setlk() and nfs4_wake_lock_waiter(). wait_woken() uses memory barriers in its implementation to avoid potential race condition when putting a process into sleeping state and then waking it up. Fixes: a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks") Cc: stable@vger.kernel.org #4.9+ Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-30jbd2: fix typo in comment of journal_submit_inode_data_buffersLiu Song
delayed/dealyed Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-05-30jbd2: fix some print format mistakesGaowei Pu
There are some print format mistakes in debug messages. Fix them. Signed-off-by: Gaowei Pu <pugaowei@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209Thomas Gleixner
Based on 1 normalized pattern(s): released under gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 15 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.895196075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license v 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 45 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 191Thomas Gleixner
Based on 1 normalized pattern(s): licensed under gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 99 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Steve Winslow <swinslow@gmail.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170027.163048684@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 188Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to free software foundation 51 franklin street fifth floor boston ma 02111 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 27 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170026.981318839@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 655 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 173Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license v2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.485313544@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157Thomas Gleixner
Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1105 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 142Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation inc 675 mass ave cambridge ma 02139 usa either version 2 of the license or at your option any later version incorporated herein by reference extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 4 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.465381181@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30f2fs: fix sparse warningChao Yu
make C=2 CHECKFLAGS="-D__CHECK_ENDIAN__" CHECK dir.c dir.c:842:50: warning: cast from restricted __le32 CHECK node.c node.c:2759:40: warning: restricted __le32 degrades to integer Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30f2fs: fix f2fs_show_options to show nodiscard mount optionSahitya Tummala
Fix f2fs_show_options to show nodiscard mount option. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30f2fs: add error prints for debugging mount failureSahitya Tummala
Add error prints to get more details on the mount failure. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30f2fs: fix to do sanity check on segment bitmap of LFS cursegChao Yu
As Jungyeon Reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203233 - Reproduces gcc poc_13.c ./run.sh f2fs - Kernel messages F2FS-fs (sdb): Bitmap was wrongly set, blk:4608 kernel BUG at fs/f2fs/segment.c:2133! RIP: 0010:update_sit_entry+0x35d/0x3e0 Call Trace: f2fs_allocate_data_block+0x16c/0x5a0 do_write_page+0x57/0x100 f2fs_do_write_node_page+0x33/0xa0 __write_node_page+0x270/0x4e0 f2fs_sync_node_pages+0x5df/0x670 f2fs_write_checkpoint+0x364/0x13a0 f2fs_sync_fs+0xa3/0x130 f2fs_do_sync_file+0x1a6/0x810 do_fsync+0x33/0x60 __x64_sys_fsync+0xb/0x10 do_syscall_64+0x43/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The testcase fails because that, in fuzzed image, current segment was allocated with LFS type, its .next_blkoff should point to an unused block address, but actually, its bitmap shows it's not. So during allocation, f2fs crash when setting bitmap. Introducing sanity_check_curseg() to check such inconsistence of current in-used segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30ext4: gracefully handle ext4_break_layouts() failure during truncateJan Kara
ext4_break_layouts() may fail e.g. due to a signal being delivered. Thus we need to handle its failure gracefully and not by taking the filesystem down. Currently ext4_break_layouts() failure is rare but it may become more common once RDMA uses layout leases for handling long-term page pins for DAX mappings. To handle the failure we need to move ext4_break_layouts() earlier during setattr handling before we do hard to undo changes such as modifying inode size. To be able to do that we also have to move some other checks which are better done without holding i_mmap_sem earlier. Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-05-30ext2: add missing brelse() in ext2_new_inode()Chengguang Xu
There is a missing brelse of bitmap_bh in an error path of ext2_new_inode(). Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn> Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-29CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEMRoberto Bergantinos Corpas
In cifs_read_allocate_pages, in case of ENOMEM, we go through whole rdata->pages array but we have failed the allocation before nr_pages, therefore we may end up calling put_page with NULL pointer, causing oops Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-05-29ovl: detect overlapping layersAmir Goldstein
Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-28dfs_cache: fix a wrong use of kfree in flush_cache_ent()Gen Zhang
In flush_cache_ent(), 'ce->ce_path' is allocated by kstrdup_const(). It should be freed by kfree_const(), rather than kfree(). Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-28fs/cifs/smb2pdu.c: fix buffer free in SMB2_ioctl_freeMurphy Zhou
The 2nd buffer could be NULL even if iov_len is not zero. This can trigger a panic when handling symlinks. It's easy to reproduce with LTP fs_racer scripts[1] which are randomly craete/delete/link files and dirs. Fix this panic by checking if the 2nd buffer is padding before kfree, like what we do in SMB2_open_free. [1] https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/fs/racer Fixes: 2c87d6a94d16 ("cifs: Allocate memory for all iovs in smb2_ioctl") Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
2019-05-28cifs: fix memory leak of pneg_inbuf on -EOPNOTSUPP ioctl caseColin Ian King
Currently in the case where SMB2_ioctl returns the -EOPNOTSUPP error there is a memory leak of pneg_inbuf. Fix this by returning via the out_free_inbuf exit path that will perform the relevant kfree. Addresses-Coverity: ("Resource leak") Fixes: 969ae8e8d4ee ("cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED") CC: Stable <stable@vger.kernel.org> # v5.1+ Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-05-28fscrypt: don't set policy for a dead directoryHongjie Fang
The directory may have been removed when entering fscrypt_ioctl_set_policy(). If so, the empty_dir() check will return error for ext4 file system. ext4_rmdir() sets i_size = 0, then ext4_empty_dir() reports an error because 'inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2)'. If the fs is mounted with errors=panic, it will trigger a panic issue. Add the check IS_DEADDIR() to fix this problem. Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support") Cc: <stable@vger.kernel.org> # v4.1+ Signed-off-by: Hongjie Fang <hongjiefang@asrmicro.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28ext4: encrypt only up to last block in ext4_bio_write_page()Eric Biggers
As an optimization, don't encrypt blocks fully beyond i_size, since those definitely won't need to be written out. Also add a comment. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28ext4: decrypt only the needed block in __ext4_block_zero_page_range()Chandan Rajendra
In __ext4_block_zero_page_range(), only decrypt the block that actually needs to be decrypted, rather than assuming blocksize == PAGE_SIZE and decrypting the whole page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> (EB: rebase onto previous changes, improve the commit message, and use bh_offset()) Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28ext4: decrypt only the needed blocks in ext4_block_write_begin()Chandan Rajendra
In ext4_block_write_begin(), only decrypt the blocks that actually need to be decrypted (up to two blocks which intersect the boundaries of the region that will be written to), rather than assuming blocksize == PAGE_SIZE and decrypting the whole page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> (EB: rebase onto previous changes, improve the commit message, and move the check for encrypted inode) Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28ext4: clear BH_Uptodate flag on decryption errorChandan Rajendra
If decryption fails, ext4_block_write_begin() can return with the page's buffer_head marked with the BH_Uptodate flag. This commit clears the BH_Uptodate flag in such cases. Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: decrypt only the needed blocks in __fscrypt_decrypt_bio()Eric Biggers
In __fscrypt_decrypt_bio(), only decrypt the blocks that actually comprise the bio, rather than assuming blocksize == PAGE_SIZE and decrypting the entirety of every page used in the bio. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: support decrypting multiple filesystem blocks per pageEric Biggers
Rename fscrypt_decrypt_page() to fscrypt_decrypt_pagecache_blocks() and redefine its behavior to decrypt all filesystem blocks in the given region of the given page, rather than assuming that the region consists of just one filesystem block. Also remove the 'inode' and 'lblk_num' parameters, since they can be retrieved from the page as it's already assumed to be a pagecache page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: introduce fscrypt_decrypt_block_inplace()Eric Biggers
Currently fscrypt_decrypt_page() does one of two logically distinct things depending on whether FS_CFLG_OWN_PAGES is set in the filesystem's fscrypt_operations: decrypt a pagecache page in-place, or decrypt a filesystem block in-place in any page. Currently these happen to share the same implementation, but this conflates the notion of blocks and pages. It also makes it so that all callers have to provide inode and lblk_num, when fscrypt could determine these itself for pagecache pages. Therefore, move the FS_CFLG_OWN_PAGES behavior into a new function fscrypt_decrypt_block_inplace(). This mirrors fscrypt_encrypt_block_inplace(). This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: handle blocksize < PAGE_SIZE in fscrypt_zeroout_range()Eric Biggers
Adjust fscrypt_zeroout_range() to encrypt a block at a time rather than a page at a time, so that it works when blocksize < PAGE_SIZE. This isn't optimized for performance, but then again this function already wasn't optimized for performance. As a future optimization, we could submit much larger bios here. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: support encrypting multiple filesystem blocks per pageEric Biggers
Rename fscrypt_encrypt_page() to fscrypt_encrypt_pagecache_blocks() and redefine its behavior to encrypt all filesystem blocks from the given region of the given page, rather than assuming that the region consists of just one filesystem block. Also remove the 'inode' and 'lblk_num' parameters, since they can be retrieved from the page as it's already assumed to be a pagecache page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: introduce fscrypt_encrypt_block_inplace()Eric Biggers
fscrypt_encrypt_page() behaves very differently depending on whether the filesystem set FS_CFLG_OWN_PAGES in its fscrypt_operations. This makes the function difficult to understand and document. It also makes it so that all callers have to provide inode and lblk_num, when fscrypt could determine these itself for pagecache pages. Therefore, move the FS_CFLG_OWN_PAGES behavior into a new function fscrypt_encrypt_block_inplace(). This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: clean up some BUG_ON()s in block encryption/decryptionEric Biggers
Replace some BUG_ON()s with WARN_ON_ONCE() and returning an error code, and move the check for len divisible by FS_CRYPTO_BLOCK_SIZE into fscrypt_crypt_block() so that it's done for both encryption and decryption, not just encryption. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: rename fscrypt_do_page_crypto() to fscrypt_crypt_block()Eric Biggers
fscrypt_do_page_crypto() only does a single encryption or decryption operation, with a single logical block number (single IV). So it actually operates on a filesystem block, not a "page" per se. To reflect this, rename it to fscrypt_crypt_block(). Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: remove the "write" part of struct fscrypt_ctxEric Biggers
Now that fscrypt_ctx is not used for writes, remove the 'w' fields. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28fscrypt: simplify bounce page handlingEric Biggers
Currently, bounce page handling for writes to encrypted files is unnecessarily complicated. A fscrypt_ctx is allocated along with each bounce page, page_private(bounce_page) points to this fscrypt_ctx, and fscrypt_ctx::w::control_page points to the original pagecache page. However, because writes don't use the fscrypt_ctx for anything else, there's no reason why page_private(bounce_page) can't just point to the original pagecache page directly. Therefore, this patch makes this change. In the process, it also cleans up the API exposed to filesystems that allows testing whether a page is a bounce page, getting the pagecache page from a bounce page, and freeing a bounce page. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-05-28Btrfs: fix race updating log root item during fsyncFilipe Manana
When syncing the log, the final phase of a fsync operation, we need to either create a log root's item or update the existing item in the log tree of log roots, and that depends on the current value of the log root's log_transid - if it's 1 we need to create the log root item, otherwise it must exist already and we update it. Since there is no synchronization between updating the log_transid and checking it for deciding whether the log root's item needs to be created or updated, we end up with a tiny race window that results in attempts to update the item to fail because the item was not yet created: CPU 1 CPU 2 btrfs_sync_log() lock root->log_mutex set log root's log_transid to 1 unlock root->log_mutex btrfs_sync_log() lock root->log_mutex sets log root's log_transid to 2 unlock root->log_mutex update_log_root() sees log root's log_transid with a value of 2 calls btrfs_update_root(), which fails with -EUCLEAN and causes transaction abort Until recently the race lead to a BUG_ON at btrfs_update_root(), but after the recent commit 7ac1e464c4d47 ("btrfs: Don't panic when we can't find a root key") we just abort the current transaction. A sample trace of the BUG_ON() on a SLE12 kernel: ------------[ cut here ]------------ kernel BUG at ../fs/btrfs/root-tree.c:157! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2048 NUMA pSeries (...) Supported: Yes, External CPU: 78 PID: 76303 Comm: rtas_errd Tainted: G X 4.4.156-94.57-default #1 task: c00000ffa906d010 ti: c00000ff42b08000 task.ti: c00000ff42b08000 NIP: d000000036ae5cdc LR: d000000036ae5cd8 CTR: 0000000000000000 REGS: c00000ff42b0b860 TRAP: 0700 Tainted: G X (4.4.156-94.57-default) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 22444484 XER: 20000000 CFAR: d000000036aba66c SOFTE: 1 GPR00: d000000036ae5cd8 c00000ff42b0bae0 d000000036bda220 0000000000000054 GPR04: 0000000000000001 0000000000000000 c00007ffff8d37c8 0000000000000000 GPR08: c000000000e19c00 0000000000000000 0000000000000000 3736343438312079 GPR12: 3930373337303434 c000000007a3a800 00000000007fffff 0000000000000023 GPR16: c00000ffa9d26028 c00000ffa9d261f8 0000000000000010 c00000ffa9d2ab28 GPR20: c00000ff42b0bc48 0000000000000001 c00000ff9f0d9888 0000000000000001 GPR24: c00000ffa9d26000 c00000ffa9d261e8 c00000ffa9d2a800 c00000ff9f0d9888 GPR28: c00000ffa9d26028 c00000ffa9d2aa98 0000000000000001 c00000ffa98f5b20 NIP [d000000036ae5cdc] btrfs_update_root+0x25c/0x4e0 [btrfs] LR [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] Call Trace: [c00000ff42b0bae0] [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] (unreliable) [c00000ff42b0bba0] [d000000036b53610] btrfs_sync_log+0x2d0/0xc60 [btrfs] [c00000ff42b0bce0] [d000000036b1785c] btrfs_sync_file+0x44c/0x4e0 [btrfs] [c00000ff42b0bd80] [c00000000032e300] vfs_fsync_range+0x70/0x120 [c00000ff42b0bdd0] [c00000000032e44c] do_fsync+0x5c/0xb0 [c00000ff42b0be10] [c00000000032e8dc] SyS_fdatasync+0x2c/0x40 [c00000ff42b0be30] [c000000000009488] system_call+0x3c/0x100 Instruction dump: 7f43d378 4bffebb9 60000000 88d90008 3d220000 e8b90000 3b390009 e87a01f0 e8898e08 e8f90000 4bfd48e5 60000000 <0fe00000> e95b0060 39200004 394a0ea0 ---[ end trace 8f2dc8f919cabab8 ]--- So fix this by doing the check of log_transid and updating or creating the log root's item while holding the root's log_mutex. Fixes: 7237f1833601d ("Btrfs: fix tree logs parallel sync") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28Btrfs: fix wrong ctime and mtime of a directory after log replayFilipe Manana
When replaying a log that contains a new file or directory name that needs to be added to its parent directory, we end up updating the mtime and the ctime of the parent directory to the current time after we have set their values to the correct ones (set at fsync time), efectivelly losing them. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/file # fsync of the directory is optional, not needed $ xfs_io -c fsync /mnt/dir $ xfs_io -c fsync /mnt/dir/file $ stat -c %Y /mnt/dir 1557856079 <power failure> $ sleep 3 $ mount /dev/sdb /mnt $ stat -c %Y /mnt/dir 1557856082 --> should have been 1557856079, the mtime is updated to the current time when replaying the log Fix this by not updating the mtime and ctime to the current time at btrfs_add_link() when we are replaying a log tree. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: e02119d5a7b43 ("Btrfs: Add a write ahead tree log to optimize synchronous operations") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28Btrfs: fix fsync not persisting changed attributes of a directoryFilipe Manana
While logging an inode we follow its ancestors and for each one we mark it as logged in the current transaction, even if we have not logged it. As a consequence if we change an attribute of an ancestor, such as the UID or GID for example, and then explicitly fsync it, we end up not logging the inode at all despite returning success to user space, which results in the attribute being lost if a power failure happens after the fsync. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ chown 6007:6007 /mnt/dir $ sync $ chown 9003:9003 /mnt/dir $ touch /mnt/dir/file $ xfs_io -c fsync /mnt/dir/file # fsync our directory after fsync'ing the new file, should persist the # new values for the uid and gid. $ xfs_io -c fsync /mnt/dir <power failure> $ mount /dev/sdb /mnt $ stat -c %u:%g /mnt/dir 6007:6007 --> should be 9003:9003, the uid and gid were not persisted, despite the explicit fsync on the directory prior to the power failure Fix this by not updating the logged_trans field of ancestor inodes when logging an inode, since we have not logged them. Let only future calls to btrfs_log_inode() to mark inodes as logged. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer ↵Qu Wenruo
dereference [BUG] When mounting a fs with reloc tree and has qgroup enabled, it can cause NULL pointer dereference at mount time: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:btrfs_qgroup_add_swapped_blocks+0x186/0x300 [btrfs] Call Trace: replace_path.isra.23+0x685/0x900 [btrfs] merge_reloc_root+0x26e/0x5f0 [btrfs] merge_reloc_roots+0x10a/0x1a0 [btrfs] btrfs_recover_relocation+0x3cd/0x420 [btrfs] open_ctree+0x1bc8/0x1ed0 [btrfs] btrfs_mount_root+0x544/0x680 [btrfs] legacy_get_tree+0x34/0x60 vfs_get_tree+0x2d/0xf0 fc_mount+0x12/0x40 vfs_kern_mount.part.12+0x61/0xa0 vfs_kern_mount+0x13/0x20 btrfs_mount+0x16f/0x860 [btrfs] legacy_get_tree+0x34/0x60 vfs_get_tree+0x2d/0xf0 do_mount+0x81f/0xac0 ksys_mount+0xbf/0xe0 __x64_sys_mount+0x25/0x30 do_syscall_64+0x65/0x240 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] In btrfs_recover_relocation(), we don't have enough info to determine which block group we're relocating, but only to merge existing reloc trees. Thus in btrfs_recover_relocation(), rc->block_group is NULL. btrfs_qgroup_add_swapped_blocks() hasn't taken this into consideration, and causes a NULL pointer dereference. The bug is introduced by commit 3d0174f78e72 ("btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group"), and later qgroup refactoring still keeps this optimization. [FIX] Thankfully in the context of btrfs_recover_relocation(), there is no other progress can modify tree blocks, thus those swapped tree blocks pair will never affect qgroup numbers, no matter whatever we set for block->trace_leaf. So we only need to check if @bg is NULL before accessing @bg->flags. Reported-by: Juan Erbes <jerbes@gmail.com> Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1134806 Fixes: 3d0174f78e72 ("btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group") CC: stable@vger.kernel.org # 4.20+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON()Qu Wenruo
[BUG] When a fs has orphan reloc tree along with unfinished balance: ... item 16 key (TREE_RELOC ROOT_ITEM FS_TREE) itemoff 12090 itemsize 439 generation 12 root_dirid 256 bytenr 300400640 level 1 refs 0 <<< lastsnap 8 byte_limit 0 bytes_used 1359872 flags 0x0(none) uuid 7c48d938-33a3-4aae-ab19-6e5c9d406e46 item 17 key (BALANCE TEMPORARY_ITEM 0) itemoff 11642 itemsize 448 temporary item objectid BALANCE offset 0 balance status flags 14 Then at mount time, we can hit the following kernel BUG_ON(): BTRFS info (device dm-3): relocating block group 298844160 flags metadata|dup ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1413! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 897 Comm: btrfs-balance Tainted: G O 5.2.0-rc1-custom #15 RIP: 0010:create_reloc_root+0x1eb/0x200 [btrfs] Call Trace: btrfs_init_reloc_root+0x96/0xb0 [btrfs] record_root_in_trans+0xb2/0xe0 [btrfs] btrfs_record_root_in_trans+0x55/0x70 [btrfs] select_reloc_root+0x7e/0x230 [btrfs] do_relocation+0xc4/0x620 [btrfs] relocate_tree_blocks+0x592/0x6a0 [btrfs] relocate_block_group+0x47b/0x5d0 [btrfs] btrfs_relocate_block_group+0x183/0x2f0 [btrfs] btrfs_relocate_chunk+0x4e/0xe0 [btrfs] btrfs_balance+0x864/0xfa0 [btrfs] balance_kthread+0x3b/0x50 [btrfs] kthread+0x123/0x140 ret_from_fork+0x27/0x50 [CAUSE] In btrfs, reloc trees are used to record swapped tree blocks during balance. Reloc tree either get merged (replace old tree blocks of its parent subvolume) in next transaction if its ref is 1 (fresh). Or is already merged and will be cleaned up if its ref is 0 (orphan). After commit d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots"), reloc tree cleanup is delayed until one block group is balanced. Since fresh reloc roots are recorded during merge, as long as there is no power loss, those orphan reloc roots converted from fresh ones are handled without problem. However when power loss happens, orphan reloc roots can be recorded on-disk, thus at next mount time, we will have orphan reloc roots from on-disk data directly, and ignored by clean_dirty_subvols() routine. Then when background balance starts to balance another block group, and needs to create new reloc root for the same root, btrfs_insert_item() returns -EEXIST, and trigger that BUG_ON(). [FIX] For orphan reloc roots, also queue them to rc->dirty_subvol_roots, so all reloc roots no matter orphan or not, can be cleaned up properly and avoid above BUG_ON(). And to cooperate with above change, clean_dirty_subvols() will check if the queued root is a reloc root or a subvol root. For a subvol root, do the old work, and for a orphan reloc root, clean it up. Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.1 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28Btrfs: incremental send, fix emission of invalid clone operationsFilipe Manana
When doing an incremental send we can now issue clone operations with a source range that ends at the source's file eof and with a destination range that ends at an offset smaller then the destination's file eof. If the eof of the source file is not aligned to the sector size of the filesystem, the receiver will get a -EINVAL error when trying to do the operation or, on older kernels, silently corrupt the destination file. The corruption happens on kernels without commit ac765f83f1397646 ("Btrfs: fix data corruption due to cloning of eof block"), while the failure to clone happens on kernels with that commit. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt/sdb $ xfs_io -f -c "pwrite -S 0xb1 0 2M" /mnt/sdb/foo $ xfs_io -f -c "pwrite -S 0xc7 0 2M" /mnt/sdb/bar $ xfs_io -f -c "pwrite -S 0x4d 0 2M" /mnt/sdb/baz $ xfs_io -f -c "pwrite -S 0xe2 0 2M" /mnt/sdb/zoo $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base $ btrfs send -f /tmp/base.send /mnt/sdb/base $ xfs_io -c "reflink /mnt/sdb/bar 1560K 500K 100K" /mnt/sdb/bar $ xfs_io -c "reflink /mnt/sdb/bar 1560K 0 100K" /mnt/sdb/zoo $ xfs_io -c "truncate 550K" /mnt/sdb/bar $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr $ btrfs send -f /tmp/incr.send -p /mnt/sdb/base /mnt/sdb/incr $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ btrfs receive -f /tmp/base.send /mnt/sdc $ btrfs receive -vv -f /tmp/incr.send /mnt/sdc (...) truncate bar size=563200 utimes bar clone zoo - source=bar source offset=512000 offset=0 length=51200 ERROR: failed to clone extents to zoo Invalid argument The failure happens because the clone source range ends at the eof of file bar, 563200, which is not aligned to the filesystems sector size (4Kb in this case), and the destination range ends at offset 0 + 51200, which is less then the size of the file zoo (2Mb). So fix this by detecting such case and instead of issuing a clone operation for the whole range, do a clone operation for smaller range that is sector size aligned followed by a write operation for the block containing the eof. Here we will always be pessimistic and assume the destination filesystem of the send stream has the largest possible sector size (64Kb), since we have no way of determining it. This fixes a recent regression introduced in kernel 5.2-rc1. Fixes: 040ee6120cb6706 ("Btrfs: send, improve clone range") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>