Age | Commit message (Collapse) | Author |
|
If the filename is change by underlying rename the server, fp->filename
and real filename can be different. This patch remove the uses of
fp->filename in ksmbd and replace it with d_path().
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks. With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Pull cifs fixes from Steve French:
- two fixes related to unmount
- symlink overflow fix
- minor netfs fix
- improved tracing for crediting (flow control)
* tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: verify that tcon is valid before dereference in cifs_kill_sb
cifs: potential buffer overflow in handling symlinks
cifs: Split the smb3_add_credits tracepoint
cifs: release cached dentries only if mount is complete
cifs: Check the IOCB_DIRECT flag, not O_DIRECT
|
|
When asked to create a path ending '/', but which is not to be a
directory (LOOKUP_DIRECTORY not set), filename_create() will never try
to create the file. If it doesn't exist, -ENOENT is reported.
However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
->lookup() function, even though there is no intent to create. This is
misleading and can cause incorrect behaviour.
If you try
ln -s foo /path/dir/
where 'dir' is a directory on an NFS filesystem which is not currently
known in the dcache, this will fail with ENOENT.
But as the name is not in the dcache, nfs_lookup gets called with
LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
lookup, with the expectation that a subsequent call to create the target
will be made, and the lookup can be combined with the creation. In the
case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
made. Instead filename_create() sees that the dentry is not (yet)
positive and returns -ENOENT - even though the directory actually
exists.
So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
create, and use the absence of these flags to decide if -ENOENT should
be returned.
Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
split that out and store it in 'reval_flag'. __lookup_hash() then gets
reval_flag combined with whatever create flags were determined to be
needed.
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more code and warning fixes.
There's one feature ioctl removal patch slated for 5.18 that did not
make it to the main pull request. It's just a one-liner and the ioctl
has a v2 that's in use for a long time, no point to postpone it to
5.19.
Late update:
- remove balance v1 ioctl, superseded by v2 in 2012
Fixes:
- add back cgroup attribution for compressed writes
- add super block write start/end annotations to asynchronous balance
- fix root reference count on an error handling path
- in zoned mode, activate zone at the chunk allocation time to avoid
ENOSPC due to timing issues
- fix delayed allocation accounting for direct IO
Warning fixes:
- simplify assertion condition in zoned check
- remove an unused variable"
* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix btrfs_submit_compressed_write cgroup attribution
btrfs: fix root ref counts in error handling in btrfs_get_root_ref
btrfs: zoned: activate block group only for extent allocation
btrfs: return allocated block group from do_chunk_alloc()
btrfs: mark resumed async balance as writing
btrfs: remove support of balance v1 ioctl
btrfs: release correct delalloc amount in direct IO write path
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache fixes from David Howells:
"Here's a collection of fscache and cachefiles fixes and misc small
cleanups. The two main fixes are:
- Add a missing unmark of the inode in-use mark in an error path.
- Fix a KASAN slab-out-of-bounds error when setting the xattr on a
cachefiles volume due to the wrong length being given to memcpy().
In addition, there's the removal of an unused parameter, removal of an
unused Kconfig option, conditionalising a bit of procfs-related stuff
and some doc fixes"
* tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: remove FSCACHE_OLD_API Kconfig option
fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
fscache: Remove the cookie parameter from fscache_clear_page_bits()
docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
cachefiles: unmark inode in use in error path
|
|
When inline encryption is used, the usual message "fscrypt: AES-256-XTS
using implementation <impl>" doesn't appear in the kernel log. Add a
similar message for the blk-crypto case that indicates that inline
encryption was used, and whether blk-crypto-fallback was used or not.
This can be useful for debugging performance problems.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220414053415.158986-1-ebiggers@kernel.org
|
|
On umount, cifs_sb->tlink_tree might contain entries that do not represent
a valid tcon.
Check the tcon for error before we dereference it.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
FS_CRYPTO_BLOCK_SIZE is neither the filesystem block size nor the
granularity of encryption. Rather, it defines two logically separate
constraints that both arise from the block size of the AES cipher:
- The alignment required for the lengths of file contents blocks
- The minimum input/output length for the filenames encryption modes
Since there are way too many things called the "block size", and the
connection with the AES block size is not easily understood, split
FS_CRYPTO_BLOCK_SIZE into two constants FSCRYPT_CONTENTS_ALIGNMENT and
FSCRYPT_FNAME_MIN_MSG_LEN that more clearly describe what they are.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220405010914.18519-1-ebiggers@kernel.org
|
|
Smatch printed a warning:
arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error:
__memcpy() 'dctx->buf' too small (16 vs u32max)
It's caused because Smatch marks 'link_len' as untrusted since it comes
from sscanf(). Add a check to ensure that 'link_len' is not larger than
the size of the 'link_str' buffer.
Fixes: c69c1b6eaea1 ("cifs: implement CIFSParseMFSymlink()")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We should not return an error code in req->result in
io_poll_check_events(), because it may get mangled and returned as
success. Just return the error code directly, the callers will fail the
request or proceed accordingly.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5f03514ee33324dc811fb93df84aee0f695fb044.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We pass "unlocked" into io_assign_file() in io_poll_check_events(),
which can lead to double locking.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2476d4ae46554324b599ee4055447b105f20a75a.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass right issue_flags into into io_file_get_fixed() instead of
IO_URING_F_UNLOCKED. It's probably not a problem at the moment but let's
do it safer.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7d242daa9df5d776907686977cd29fbceb4a2d8d.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This commit enables XFS module to work with fs instances having 64-bit
per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
list of supported incompat feature flags.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
The following changes are made to enable userspace to obtain 64-bit extent
counters,
1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
it is capable of receiving 64-bit extent counters.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
A future commit will add a new XFS_IBULK flag which will not have a
corresponding XFS_IWALK flag. In preparation for the change, this commit
separates XFS_IBULK_* flags from XFS_IWALK_* flags.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
This commit enables upgrading existing inodes to use large extent counters
provided that underlying filesystem's superblock has large extent counter
feature enabled.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in length and each block
is 1KB in size.
With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
1KB sized blocks, a file can reach upto,
(2^31) * 1KB = 2TB
This is much larger than the theoretical maximum size of a directory
i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
Since a directory's inode can never overflow its data fork extent counter,
this commit removes all the overflow checks associated with
it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
data fork is larger than 96GB.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.
LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000
Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331
CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:83 [inline]
dump_stack+0x144/0x187 lib/dump_stack.c:124
print_address_description+0x7d/0x630 mm/kasan/report.c:387
__kasan_report+0x132/0x190 mm/kasan/report.c:547
kasan_report+0x47/0x60 mm/kasan/report.c:564
ext4_search_dir fs/ext4/namei.c:1394 [inline]
search_dirblock fs/ext4/namei.c:1199 [inline]
__ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
__lookup_hash+0xc5/0x190 fs/namei.c:1451
do_rmdir+0x19e/0x310 fs/namei.c:3760
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x445e59
Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
The buggy address belongs to the page:
page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
flags: 0x200000000000000()
raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
ext4_search_dir:
...
de = (struct ext4_dir_entry_2 *)search_buf;
dlimit = search_buf + buf_size;
while ((char *) de < dlimit) {
...
if ((char *) de + de->name_len <= dlimit &&
ext4_match(dir, fname, de)) {
...
}
...
de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
if (de_len <= 0)
return -1;
offset += de_len;
de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
}
Assume:
de=0xffff8881317c2fff
dlimit=0x0xffff8881317c3000
If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
out of range, then will trigger use-after-free.
To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
'de->name_len' to judge if '(char *) de + de->name_len' out of range.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
We got issue as follows:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:389!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197
Workqueue: events flush_stashed_error_work
RIP: 0010:start_this_handle+0x41c/0x1160
RSP: 0018:ffff888106b47c20 EFLAGS: 00010202
RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac
RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050
RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a
R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000
FS: 0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
jbd2__journal_start+0x38a/0x790
jbd2_journal_start+0x19/0x20
flush_stashed_error_work+0x110/0x2b3
process_one_work+0x688/0x1080
worker_thread+0x8b/0xc50
kthread+0x26f/0x310
ret_from_fork+0x22/0x30
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
Above issue may happen as follows:
umount read procfs error_work
ext4_put_super
flush_work(&sbi->s_error_work);
ext4_mb_seq_groups_show
ext4_mb_load_buddy_gfp
ext4_mb_init_group
ext4_mb_init_cache
ext4_read_block_bitmap_nowait
ext4_validate_block_bitmap
ext4_error
ext4_handle_error
schedule_work(&EXT4_SB(sb)->s_error_work);
ext4_unregister_sysfs(sb);
jbd2_journal_destroy(sbi->s_journal);
journal_kill_thread
journal->j_flags |= JBD2_UNMOUNT;
flush_stashed_error_work
jbd2_journal_start
start_this_handle
BUG_ON(journal->j_flags & JBD2_UNMOUNT);
To solve this issue, we call 'ext4_unregister_sysfs() before flushing
s_error_work in ext4_put_super().
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got issue as follows:
[home]# fsck.ext4 -fn ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no
Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no
As the symlink file size does not match the file content. If the writeback
of the symlink data block failed, ext4_finish_bio() handles the end of IO.
However this function fails to mark the buffer with BH_write_io_error and
so when unmount does journal checkpoint it cannot detect the writeback
error and will cleanup the journal. Thus we've lost the correct data in the
journal area. To solve this issue, mark the buffer as BH_write_io_error in
ext4_finish_bio().
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files. This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range). Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.
The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a write performance regression
- Fix crashes during request deferral on RDMA transports
* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix the svc_deferred_event trace class
SUNRPC: Fix NFSD's request deferral on RDMA transports
nfsd: Clean up nfsd_file_put()
nfsd: Fix a write performance regression
SUNRPC: Return true/false (not 1/0) from bool functions
|
|
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.
This patch fixes struct compat_stat to match struct stat.
[ Historical note: the old x86 'struct stat' did have that 16-bit field
that the compat layer had kept around, but it was changes back in 2003
by "struct stat - support larger dev_t":
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d
and back in those days, the x86_64 port was still new, and separate
from the i386 code, and had already picked up the old version with a
16-bit st_dev field ]
Note that we can't change compat_dev_t because it is used by
compat_loop_info.
Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them. This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Ensure that only 0 is passed for pad here.
Fixes: c73ebb685fb6 ("io_uring: add timeout support for io_uring_enter()")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Only allow resv field to be 0 in struct io_uring_rsrc_update user
arguments.
Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Verify that the user does not pass in anything but 0 for this field.
Fixes: 992da01aa932 ("io_uring: change registration/upd/rsrc tagging ABI")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move validation to be more consistently straight after
copy_from_user. This is already done in io_register_rsrc_update and so
this removes that redundant check.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io-wq work cancellation path can't take uring_lock as how it's done on
file assignment, we have to handle IO_WQ_WORK_CANCEL first, this fixes
encountered hangs.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0d9b9f37841645518503f6a207e509d14a286aba.1649773463.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are two reasons why this isn't the best idea:
- It's an odd area to grab a bit of storage space, hence it's an odd area
to grab storage from.
- It puts the 3rd io_kiocb cacheline into the hot path, where normal hot
path just needs the first two.
Use 'cflags' for joint fd/cflags storage. We only need fd until we
successfully issue, and we only need cflags once a request is done and is
completed.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for fixing a regression with pulling in an extra cacheline
for IO that doesn't usually touch the last cacheline of the io_kiocb,
move the cached location of apoll->events to space shared with some other
completion data. Like cflags, this isn't used until after the request
has been completed, so we can piggy back on top of comp_list.
Fixes: 81459350d581 ("io_uring: cache req->apoll->events in req->cflags")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
-1 tells use to use the current position, but we check if the file is
a stream regardless of that. Fix up io_kiocb_update_pos() to only
dip into file if we need to. This is both more efficient and also drops
12 bytes of text on aarch64 and 64 bytes on x86-64.
Fixes: b4aec4001595 ("io_uring: do not recalculate ppos unnecessarily")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As mentioned in the previous commit, the kernel misuses sb_frextents in
the incore mount to reflect both incore reservations made by running
transactions as well as the actual count of free rt extents on disk.
This results in the superblock being written to the log with an
underestimate of the number of rt extents that are marked free in the
rtbitmap.
Teaching XFS to recompute frextents after log recovery avoids
operational problems in the current mount, but it doesn't solve the
problem of us writing undercounted frextents which are then recovered by
an older kernel that doesn't have that fix.
Create an incore percpu counter to mirror the ondisk frextents. This
new counter will track transaction reservations and the only time we
will touch the incore super counter (i.e the one that gets logged) is
when those transactions commit updates to the rt bitmap. This is in
contrast to the lazysbcount counters (e.g. fdblocks), where we know that
log recovery will always fix any incorrect counter that we log.
As a bonus, we only take m_sb_lock at transaction commit time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
I've been observing periodic corruption reports from xfs_scrub involving
the free rt extent counter (frextents) while running xfs/141. That test
uses an error injection knob to induce a torn write to the log, and an
arbitrary number of recovery mounts, frextents will count fewer free rt
extents than can be found the rtbitmap.
The root cause of the problem is a combination of the misuse of
sb_frextents in the incore mount to reflect both incore reservations
made by running transactions as well as the actual count of free rt
extents on disk. The following sequence can reproduce the undercount:
Thread 1 Thread 2
xfs_trans_alloc(rtextents=3)
xfs_mod_frextents(-3)
<blocks>
xfs_attr_set()
xfs_bmap_attr_addfork()
xfs_add_attr2()
xfs_log_sb()
xfs_sb_to_disk()
xfs_trans_commit()
<log flushed to disk>
<log goes down>
Note that thread 1 subtracts 3 from sb_frextents even though it never
commits to using that space. Thread 2 writes the undercounted value to
the ondisk superblock and logs it to the xattr transaction, which is
then flushed to disk. At next mount, log recovery will find the logged
superblock and write that back into the filesystem. At the end of log
recovery, we reread the superblock and install the recovered
undercounted frextents value into the incore superblock. From that
point on, we've effectively leaked thread 1's transaction reservation.
The correct fix for this is to separate the incore reservation from the
ondisk usage, but that's a matter for the next patch. Because the
kernel has been logging superblocks with undercounted frextents for a
very long time and we don't demand that sysadmins run xfs_repair after a
crash, fix the undercount by recomputing frextents after log recovery.
Gating this on log recovery is a reasonable balance (I think) between
correcting the problem and slowing down every mount attempt. Note that
xfs_repair will fix undercounted frextents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Pass an explicit xfs_mount pointer to the rtalloc query functions so
that they can support transactionless queries.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Remove the open-coded check of O_LARGEFILE. This changes the errno
to be the same as other filesystems; it was changed generically in
2.6.24 but that fix skipped XFS.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
This commit introduces new fields in the on-disk inode format to support
64-bit data fork extent counters and 32-bit attribute fork extent
counters. The new fields will be used only when an inode has
XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit
data fork extent counters and 16-bit attribute fork extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
|
|
This commit also prints inode fields with invalid values instead of printing
addresses of inode and buffer instances.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
|
|
This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
This commit adds the new per-inode flag XFS_DIFLAG2_NREXT64 to indicate that
an inode supports 64-bit extent counters. This flag is also enabled by default
on newly created inodes when the corresponding filesystem has large extent
counter feature bit (i.e. XFS_FEAT_NREXT64) set.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
XFS_FSOP_GEOM_FLAGS_NREXT64 indicates that the current filesystem instance
supports 64-bit per-inode extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
which support large per-inode extent counters. This commit defines the new
incompat feature bit and the corresponding per-fs feature bit (along with
inline functions to work on it).
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
A future commit will introduce a 64-bit on-disk data extent counter and a
32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
of these quantities.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
A future commit will increase the width of xfs_extnum_t in order to facilitate
larger per-inode extent counters. Hence this patch now uses basic types to
define xfs_log_dinode->[di_nextents|dianextents].
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
counter fields will add more logic to this helper.
This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
with calls to xfs_dfork_nextents().
No functional changes have been made.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
xfs_extnum_t is the type to use to declare variables which have values
obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
types (e.g. uint32_t) with xfs_extnum_t for such variables.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
xfs_iext_max_nextents() returns the maximum number of extents possible for one
of data, cow or attribute fork. This helper will be extended further in a
future commit when maximum extent counts associated with data/attribute forks
are increased.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
|
The maximum extent length depends on maximum block count that can be stored in
a BMBT record. Hence this commit defines MAXEXTLEN based on
BMBT_BLOCKCOUNT_BITLEN.
While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|