Age | Commit message (Collapse) | Author |
|
Pull vfs fixes from Al Viro:
"No common topic, just assorted fixes"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse: fix the ->direct_IO() treatment of iov_iter
fs: fix cast in fsparam_u32hex() macro
vboxsf: Fix the check for the old binary mount-arguments struct
|
|
Pull io_uring fixes from Jens Axboe:
"A few fixes - most of them regression fixes from this cycle, but also
a few stable heading fixes, and a build fix for the included demo tool
since some systems now actually have gettid() available"
* tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
io_uring: fix openat/openat2 unified prep handling
io_uring: mark statx/files_update/epoll_ctl as non-SQPOLL
tools/io_uring: fix compile breakage
io_uring: don't use retry based buffered reads for non-async bdev
io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there
io_uring: don't run task work on an exiting task
io_uring: drop 'ctx' ref on task work cancelation
io_uring: grab any needed state during defer prep
|
|
The following test case leads to NULL kobject free error:
mount seed /mnt
add sprout to /mnt
umount /mnt
mount sprout to /mnt
delete seed
kobject: '(null)' (00000000dd2b87e4): is not initialized, yet kobject_put() is being called.
WARNING: CPU: 1 PID: 15784 at lib/kobject.c:736 kobject_put+0x80/0x350
RIP: 0010:kobject_put+0x80/0x350
::
Call Trace:
btrfs_sysfs_remove_devices_dir+0x6e/0x160 [btrfs]
btrfs_rm_device.cold+0xa8/0x298 [btrfs]
btrfs_ioctl+0x206c/0x22a0 [btrfs]
ksys_ioctl+0xe2/0x140
__x64_sys_ioctl+0x1e/0x29
do_syscall_64+0x96/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4047c6288b
::
This is because, at the end of the seed device-delete, we try to remove
the seed's devid sysfs entry. But for the seed devices under the sprout
fs, we don't initialize the devid kobject yet. So add a kobject state
check, which takes care of the bug.
Fixes: 668e48af7a94 ("btrfs: sysfs, add devid/dev_state kobject and device attributes")
CC: stable@vger.kernel.org # 5.6+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Now that there's a library function that calculates the SHA-256 digest
of a buffer in one step, use it instead of sha256_init() +
sha256_update() + sha256_final().
Link: https://lore.kernel.org/r/20200917045341.324996-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_set_test_dummy_encryption() requires that the optional argument
to the test_dummy_encryption mount option be specified as a substring_t.
That doesn't work well with filesystems that use the new mount API,
since the new way of parsing mount options doesn't use substring_t.
Make it take the argument as a 'const char *' instead.
Instead of moving the match_strdup() into the callers in ext4 and f2fs,
make them just use arg->from directly. Since the pattern is
"test_dummy_encryption=%s", the argument will be null-terminated.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
In preparation for moving the logic for "get the encryption policy
inherited by new files in this directory" to a single place, make
fscrypt_prepare_symlink() a regular function rather than an inline
function that wraps __fscrypt_prepare_symlink().
This way, the new function fscrypt_policy_to_inherit() won't need to be
exported to filesystems.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
The fscrypt UAPI header defines fscrypt_policy to fscrypt_policy_v1,
for source compatibility with old userspace programs.
Internally, the kernel doesn't want that compatibility definition.
Instead, fscrypt_private.h #undefs it and re-defines it to a union.
That works for now. However, in order to add
fscrypt_operations::get_dummy_policy(), we'll need to forward declare
'union fscrypt_policy' in include/linux/fscrypt.h. That would cause
build errors because "fscrypt_policy" is used in ioctl numbers.
To avoid this, modify the UAPI header to make the fscrypt_policy
compatibility definition conditional on !__KERNEL__, and make the ioctls
use fscrypt_policy_v1 instead of fscrypt_policy.
Note that this doesn't change the actual ioctl numbers.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_get_encryption_info() has never actually been safe to call in a
context that needs GFP_NOFS, since it calls crypto_alloc_skcipher().
crypto_alloc_skcipher() isn't GFP_NOFS-safe, even if called under
memalloc_nofs_save(). This is because it may load kernel modules, and
also because it internally takes crypto_alg_sem. Other tasks can do
GFP_KERNEL allocations while holding crypto_alg_sem for write.
The use of fscrypt_init_mutex isn't GFP_NOFS-safe either.
So, stop pretending that fscrypt_get_encryption_info() is nofs-safe.
I.e., when it allocates memory, just use GFP_KERNEL instead of GFP_NOFS.
Note, another reason to do this is that GFP_NOFS is deprecated in favor
of using memalloc_nofs_save() in the proper places.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Now that all filesystems have been converted to use
fscrypt_prepare_new_inode(), the encryption key for new symlink inodes
is now already set up whenever we try to encrypt the symlink target.
Enforce this rather than try to set up the key again when it may be too
late to do so safely.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Now that all filesystems have been converted to use
fscrypt_prepare_new_inode() and fscrypt_set_context(),
fscrypt_inherit_context() is no longer used. Remove it.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Now that a fscrypt_info may be set up for inodes that are currently
being created and haven't yet had an inode number assigned, avoid
logging confusing messages about "inode 0".
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Convert ubifs to use the new functions fscrypt_prepare_new_inode() and
fscrypt_set_context().
Unlike ext4 and f2fs, this doesn't appear to fix any deadlock bug. But
it does shorten the code slightly and get all filesystems using the same
helper functions, so that fscrypt_inherit_context() can be removed.
It also fixes an incorrect error code where ubifs returned EPERM instead
of the expected ENOKEY.
Link: https://lore.kernel.org/r/20200917041136.178600-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Convert f2fs to use the new functions fscrypt_prepare_new_inode() and
fscrypt_set_context(). This avoids calling
fscrypt_get_encryption_info() from under f2fs_lock_op(), which can
deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe.
For more details about this problem, see the earlier patch
"fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()".
This also fixes a f2fs-specific deadlock when the filesystem is mounted
with '-o test_dummy_encryption' and a file is created in an unencrypted
directory other than the root directory:
INFO: task touch:207 blocked for more than 30 seconds.
Not tainted 5.9.0-rc4-00099-g729e3d0919844 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:touch state:D stack: 0 pid: 207 ppid: 167 flags:0x00000000
Call Trace:
[...]
lock_page include/linux/pagemap.h:548 [inline]
pagecache_get_page+0x25e/0x310 mm/filemap.c:1682
find_or_create_page include/linux/pagemap.h:348 [inline]
grab_cache_page include/linux/pagemap.h:424 [inline]
f2fs_grab_cache_page fs/f2fs/f2fs.h:2395 [inline]
f2fs_grab_cache_page fs/f2fs/f2fs.h:2373 [inline]
__get_node_page.part.0+0x39/0x2d0 fs/f2fs/node.c:1350
__get_node_page fs/f2fs/node.c:35 [inline]
f2fs_get_node_page+0x2e/0x60 fs/f2fs/node.c:1399
read_inline_xattr+0x88/0x140 fs/f2fs/xattr.c:288
lookup_all_xattrs+0x1f9/0x2c0 fs/f2fs/xattr.c:344
f2fs_getxattr+0x9b/0x160 fs/f2fs/xattr.c:532
f2fs_get_context+0x1e/0x20 fs/f2fs/super.c:2460
fscrypt_get_encryption_info+0x9b/0x450 fs/crypto/keysetup.c:472
fscrypt_inherit_context+0x2f/0xb0 fs/crypto/policy.c:640
f2fs_init_inode_metadata+0xab/0x340 fs/f2fs/dir.c:540
f2fs_add_inline_entry+0x145/0x390 fs/f2fs/inline.c:621
f2fs_add_dentry+0x31/0x80 fs/f2fs/dir.c:757
f2fs_do_add_link+0xcd/0x130 fs/f2fs/dir.c:798
f2fs_add_link fs/f2fs/f2fs.h:3234 [inline]
f2fs_create+0x104/0x290 fs/f2fs/namei.c:344
lookup_open.isra.0+0x2de/0x500 fs/namei.c:3103
open_last_lookups+0xa9/0x340 fs/namei.c:3177
path_openat+0x8f/0x1b0 fs/namei.c:3365
do_filp_open+0x87/0x130 fs/namei.c:3395
do_sys_openat2+0x96/0x150 fs/open.c:1168
[...]
That happened because f2fs_add_inline_entry() locks the directory
inode's page in order to add the dentry, then f2fs_get_context() tries
to lock it recursively in order to read the encryption xattr. This
problem is specific to "test_dummy_encryption" because normally the
directory's fscrypt_info would be set up prior to
f2fs_add_inline_entry() in order to encrypt the new filename.
Regardless, the new design fixes this test_dummy_encryption deadlock as
well as potential deadlocks with fs reclaim, by setting up any needed
fscrypt_info structs prior to taking so many locks.
The test_dummy_encryption deadlock was reported by Daniel Rosenberg.
Reported-by: Daniel Rosenberg <drosen@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Convert ext4 to use the new functions fscrypt_prepare_new_inode() and
fscrypt_set_context(). This avoids calling
fscrypt_get_encryption_info() from within a transaction, which can
deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe.
For more details about this problem, see the earlier patch
"fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()".
Link: https://lore.kernel.org/r/20200917041136.178600-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
To compute a new inode's xattr credits, we need to know whether the
inode will be encrypted or not. When we switch to use the new helper
function fscrypt_prepare_new_inode(), we won't find out whether the
inode will be encrypted until slightly later than is currently the case.
That will require moving the code block that computes the xattr credits.
To make this easier and reduce the length of __ext4_new_inode(), move
this code block into a new function ext4_xattr_credits_for_new_inode().
Link: https://lore.kernel.org/r/20200917041136.178600-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
reiserfs_read_locked_inode() didn't initialize key length properly. Use
_make_cpu_key() macro for key initialization so that all key member are
properly initialized.
CC: stable@vger.kernel.org
Reported-by: syzbot+d94d02749498bb7bab4b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
udf_process_sequence() allocates temporary array for processing
partition descriptors on volume which it fails to free. Free the array
when it is not needed anymore.
Fixes: 7b78fd02fb19 ("udf: Fix handling of Partition Descriptors")
CC: stable@vger.kernel.org
Reported-by: syzbot+128f4dd6e796c98b3760@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
After commit 9293fcfbc181 ("udf: Remove struct ustr as non-needed
intermediate storage"), the variable ret is being initialized with
'-ENOMEM' that is meaningless. So remove it.
Link: https://lore.kernel.org/r/20200922081322.70535-1-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The following sequence of commands,
mkfs.xfs -f -m reflink=0 -r rtdev=/dev/loop1,size=10M /dev/loop0
mount -o rtdev=/dev/loop1 /dev/loop0 /mnt
xfs_growfs /mnt
... causes the following call trace to be printed on the console,
XFS: Assertion failed: (bip->bli_flags & XFS_BLI_STALE) || (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF), file: fs/xfs/xfs_buf_item.c, line: 331
Call Trace:
xfs_buf_item_format+0x632/0x680
? kmem_alloc_large+0x29/0x90
? kmem_alloc+0x70/0x120
? xfs_log_commit_cil+0x132/0x940
xfs_log_commit_cil+0x26f/0x940
? xfs_buf_item_init+0x1ad/0x240
? xfs_growfs_rt_alloc+0x1fc/0x280
__xfs_trans_commit+0xac/0x370
xfs_growfs_rt_alloc+0x1fc/0x280
xfs_growfs_rt+0x1a0/0x5e0
xfs_file_ioctl+0x3fd/0xc70
? selinux_file_ioctl+0x174/0x220
ksys_ioctl+0x87/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x3e/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurs because the buffer being formatted has the value of
XFS_BLFT_UNKNOWN_BUF assigned to the 'type' subfield of
bip->bli_formats->blf_flags.
This commit fixes the issue by assigning one of XFS_BLFT_RTSUMMARY_BUF
and XFS_BLFT_RTBITMAP_BUF to the 'type' subfield of
bip->bli_formats->blf_flags before committing the corresponding
transaction.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The inode extent truncate path unmaps extents from the inode block
mapping, finishes deferred ops to free the associated extents and
then explicitly rolls the transaction before processing the next
extent. The latter extent roll is spurious as xfs_defer_finish()
always returns a clean transaction and automatically relogs inodes
attached to the transaction (with lock_flags == 0). This can
unnecessarily increase the number of log ticket regrants that occur
during a long running truncate operation. Remove the explicit
transaction roll.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
A mirror index is always of type u32.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled. This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Size the uptodate array dynamically to support larger pages in the
page cache. With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page. Add a few debugging assertions.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked. The comment is no longer useful as
attach_page_private() handles the refcount now.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
This helper is useful for both THPs and for supporting block size larger
than page size. Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed. block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place. That means we can remove it from iomap_write_actor().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
While it is true that reading from an unmirrored source always uses
index 0, that is no longer true for mirrored sources when we fail over.
Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The hash_lock field of the cache structure was a leftover
of a previous iteration of the code. It is now unused,
so remove it.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Convert the uses of fallthrough comments to fallthrough macro. Please see
commit 294f69e662d1 ("compiler_attributes.h: Add 'fallthrough' pseudo
keyword for switch/case use") for detail.
Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The following oops is seen during xfstest/565 when the 'test'
(source of the copy) is NFS4.0 and 'scratch' (destination) is NFS4.2
[ 59.692458] run fstests generic/565 at 2020-08-01 05:50:35
[ 60.613588] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 60.624970] #PF: supervisor read access in kernel mode
[ 60.627671] #PF: error_code(0x0000) - not-present page
[ 60.630347] PGD 0 P4D 0
[ 60.631853] Oops: 0000 [#1] SMP PTI
[ 60.634086] CPU: 6 PID: 2828 Comm: xfs_io Kdump: loaded Not tainted 5.8.0-rc3 #1
[ 60.637676] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 60.639901] RIP: 0010:nfs4_check_serverowner_major_id+0x5/0x30 [nfsv4]
[ 60.642719] Code: 89 ff e8 3e b3 b8 e1 e9 71 fe ff ff 41 bc da d8 ff ff e9 c3 fe ff ff e8 e9 9d 08 e2 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <8b> 57 08 31 c0 3b 56 08 75 12 48 83 c6 0c 48 83 c7 0c e8 c4 97 bb
[ 60.652629] RSP: 0018:ffffc265417f7e10 EFLAGS: 00010287
[ 60.655379] RAX: ffffa0664b066400 RBX: 0000000000000000 RCX: 0000000000000001
[ 60.658754] RDX: ffffa066725fb000 RSI: ffffa066725fd000 RDI: 0000000000000000
[ 60.662292] RBP: 0000000000020000 R08: 0000000000020000 R09: 0000000000000000
[ 60.666189] R10: 0000000000000003 R11: 0000000000000000 R12: ffffa06648258d00
[ 60.669914] R13: 0000000000000000 R14: 0000000000000000 R15: ffffa06648258100
[ 60.673645] FS: 00007faa9fb35800(0000) GS:ffffa06677d80000(0000) knlGS:0000000000000000
[ 60.677698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.680773] CR2: 0000000000000008 CR3: 0000000203f14000 CR4: 00000000000406e0
[ 60.684476] Call Trace:
[ 60.685809] nfs4_copy_file_range+0xfc/0x230 [nfsv4]
[ 60.688704] vfs_copy_file_range+0x2ee/0x310
[ 60.691104] __x64_sys_copy_file_range+0xd6/0x210
[ 60.693527] do_syscall_64+0x4d/0x90
[ 60.695512] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 60.698006] RIP: 0033:0x7faa9febc1bd
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The variable error is ssize_t, which is signed and will
cast to unsigned when comapre with variable size, so add
a check to avoid unexpected result in case of negative
value of error.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The pointer clnt is being initialized with a value that is never
read and so this is assignment redundant and can be removed. The
pointer can removed because it is being used as a temporary
variable and it is clearer to make the direct assignment and remove
it completely.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
A previous commit unified how we handle prep for these two functions,
but this means that we check the allowed context (SQPOLL, specifically)
later than we should. Move the ring type checking into the two parent
functions, instead of doing it after we've done some setup work.
Fixes: ec65fea5a8d7 ("io_uring: deduplicate io_openat{,2}_prep()")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
These will naturally fail when attempted through SQPOLL, but either
with -EFAULT or -EBADF. Make it explicit that these are not workable
through SQPOLL and return -EINVAL, just like other ops that need to
use ->files.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some block devices, like dm, bubble back -EAGAIN through the completion
handler. We check for this in io_read(), but don't honor it for when
we have copied the iov. Return -EAGAIN for this case before retrying,
to force punt to io-wq.
Fixes: bcf5a06304d6 ("io_uring: support true async buffered reads, if file provides it")
Reported-by: Zorro Lang <zlang@redhat.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we already have mapped the necessary data for retry, then don't set
it up again. It's a pointless operation, and we leak the iovec if it's
a large (non-stack) vec.
Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Syzkaller reported a buffer overflow in btree_readpage_end_io_hook()
when loop mounting a crafted image:
detected buffer overflow in memcpy
------------[ cut here ]------------
kernel BUG at lib/string.c:1129!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 26 Comm: kworker/u4:2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: btrfs-endio-meta btrfs_work_helper
RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129
RSP: 0018:ffffc90000e27980 EFLAGS: 00010286
RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000
RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22
RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7
R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e
R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c
FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557335f440d0 CR3: 000000009647d000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
memcpy include/linux/string.h:405 [inline]
btree_readpage_end_io_hook.cold+0x206/0x221 fs/btrfs/disk-io.c:642
end_bio_extent_readpage+0x4de/0x10c0 fs/btrfs/extent_io.c:2854
bio_endio+0x3cf/0x7f0 block/bio.c:1449
end_workqueue_fn+0x114/0x170 fs/btrfs/disk-io.c:1695
btrfs_work_helper+0x221/0xe20 fs/btrfs/async-thread.c:318
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Modules linked in:
---[ end trace b68924293169feef ]---
RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129
RSP: 0018:ffffc90000e27980 EFLAGS: 00010286
RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000
RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22
RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7
R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e
R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c
FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f95b7c4d008 CR3: 000000009647d000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
The overflow happens, because in btree_readpage_end_io_hook() we assume
that we have found a 4 byte checksum instead of the real possible 32
bytes we have for the checksums.
With the fix applied:
[ 35.726623] BTRFS: device fsid 815caf9a-dc43-4d2a-ac54-764b8333d765 devid 1 transid 5 /dev/loop0 scanned by syz-repro (215)
[ 35.738994] BTRFS info (device loop0): disk space caching is enabled
[ 35.738998] BTRFS info (device loop0): has skinny extents
[ 35.743337] BTRFS warning (device loop0): loop0 checksum verify failed on 1052672 wanted 0xf9c035fc8d239a54 found 0x67a25c14b7eabcf9 level 0
[ 35.743420] BTRFS error (device loop0): failed to read chunk root
[ 35.745899] BTRFS error (device loop0): open_ctree failed
Reported-by: syzbot+e864a35d361e1d4e29a5@syzkaller.appspotmail.com
Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of dirtytime_interval_handler to match its prototype in
linux/writeback.h which fixes the following sparse error/warning:
fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces)
fs/fs-writeback.c:2189:50: expected void *
fs/fs-writeback.c:2189:50: got void [noderef] __user *buffer
fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)):
fs/fs-writeback.c:2184:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
fs/fs-writeback.c: note: in included file:
./include/linux/writeback.h:374:5: note: previously declared as:
./include/linux/writeback.h:374:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Let's add REQ_RAHEAD flag so it'd be easier to identify
readahead I/O requests in blktrace.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200919072730.24989-3-hsiangkao@redhat.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
|
|
should_decompress_synchronously() has one single condition
for now, so fold it instead.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200919072730.24989-2-hsiangkao@redhat.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
|
|
variable `err' in z_erofs_submit_queue() isn't useful
here, remove it instead.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200919072730.24989-1-hsiangkao@redhat.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
|
|
Get rid of boilerplate in most of ->statfs()
instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|