Age | Commit message (Collapse) | Author |
|
Now that we support non-blocking path resolution internally, expose it
via openat2() in the struct open_how ->resolve flags. This allows
applications using openat2() to limit path resolution to the extent that
it is already cached.
If the lookup cannot be satisfied in a non-blocking manner, openat2(2)
will return -1/-EAGAIN.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU based lookups, which we know will not block. If
we can do a cached path resolution of the filename, then we don't have
to always punt lookups for a worker.
During path resolution, we always do LOOKUP_RCU first. If that fails and
we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same as for the previous commit - instead of 0/-ECHILD make
it return true/false, rename to try_to_unlazy_child().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Since a few years, kernel addresses are no longer included in oops
dumps, at least on x86. All we get is a symbol name with offset and
size.
This is a problem for ceph_connection_operations handlers, especially
con->ops->dispatch(). All three handlers have the same name and there
is little context to disambiguate between e.g. monitor and OSD clients
because almost everything is inlined. gdb sneakily stops at the first
matching symbol, so one has to resort to nm and addr2line.
Some of these are already prefixed with mon_, osd_ or mds_. Let's do
the same for all others.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
Most callers check for non-zero return, and assume it's -ECHILD (which
it always will be). One caller uses the actual error return. Clean this
up and make it fully consistent, by having unlazy_walk() return a bool
instead. Rename it to try_to_unlazy() and return true on success, and
failure on error. That's easier to read.
No functional changes in this patch.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.
The calculation we need to use is:
1 + (((strlen(name) + 1) + 15) >> 5)
where we are adding one to the strlen() result to account for the NUL
termination.
Fix this by the following means:
(1) Create an inline function to do the calculation for a given name
length.
(2) Use the function to calculate the number of records used for a dirent
in afs_dir_iterate_block().
Use this to move the over-end check out of the loop since it only
needs to be done once.
Further use this to only go through the loop for the 2nd+ records
composing an entry. The only test there now is for if the record is
allocated - and we already checked the first block at the top of the
outer loop.
(3) Add a max name length check in afs_dir_iterate_block().
(4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
from (1) to calculate the number of blocks rather than doing it
incorrectly themselves.
Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
|
|
AFS has a structured layout in its directory contents (AFS dirs are
downloaded as files and parsed locally by the client for lookup/readdir).
The slots in the directory are defined by union afs_xdr_dirent. This,
however, only directly allows a name of a length that will fit into that
union. To support a longer name, the next 1-8 contiguous entries are
annexed to the first one and the name flows across these.
afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
the page, to find out how long the name is. This worked fine until
6a39e62abbaf. With that commit, the compiler determines the size of the
array and asserts that the string fits inside that array. This is a
problem for AFS because we *expect* it to overflow one or more arrays.
A similar problem also occurs in afs_dir_scan_block() when a directory file
is being locally edited to avoid the need to redownload it. There strlen()
was being used safely because each page has the last byte set to 0 when the
file is downloaded and validated (in afs_dir_check_page()).
Fix this by changing the afs_xdr_dirent union name field to an
indeterminate-length array and dropping the overflow field.
(Note that whilst looking at this, I realised that the calculation of the
number of slots a dirent used is non-standard and not quite right, but I'll
address that in a separate patch.)
The issue can be triggered by something like:
touch /afs/example.com/thisisaveryveryverylongname
and it generates a report that looks like:
detected buffer overflow in strnlen
------------[ cut here ]------------
kernel BUG at lib/string.c:1149!
...
RIP: 0010:fortify_panic+0xf/0x11
...
Call Trace:
afs_dir_iterate_block+0x12b/0x35b
afs_dir_iterate+0x14e/0x1ce
afs_do_lookup+0x131/0x417
afs_lookup+0x24f/0x344
lookup_open.isra.0+0x1bb/0x27d
open_last_lookups+0x166/0x237
path_openat+0xe0/0x159
do_filp_open+0x48/0xa4
? kmem_cache_alloc+0xf5/0x16e
? __clear_close_on_exec+0x13/0x22
? _raw_spin_unlock+0xa/0xb
do_sys_openat2+0x72/0xde
do_sys_open+0x3b/0x58
do_syscall_64+0x2d/0x3a
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 6a39e62abbaf ("lib: string.h: detect intra-object overflow in fortified string functions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: Daniel Axtens <dja@axtens.net>
|
|
Running my yearly branch profiling code, it detected a 100% wrong branch
condition in name.c for lookup_fast(). The code in question has:
status = d_revalidate(dentry, nd->flags);
if (likely(status > 0))
return dentry;
if (unlazy_child(nd, dentry, seq))
return ERR_PTR(-ECHILD);
if (unlikely(status == -ECHILD))
/* we'd been told to redo it in non-rcu mode */
status = d_revalidate(dentry, nd->flags);
If the status of the d_revalidate() is greater than zero, then the function
finishes. Otherwise, if it is an "unlazy_child" it returns with -ECHILD.
After the above two checks, the status is compared to -ECHILD, as that is
what is returned if the original d_revalidate() needed to be done in a
non-rcu mode.
Especially this path is called in a condition of:
if (nd->flags & LOOKUP_RCU) {
And most of the d_revalidate() functions have:
if (flags & LOOKUP_RCU)
return -ECHILD;
It appears that that is the only case that this if statement is triggered
on two of my machines, running in production.
As it is dependent on what filesystem mix is configured in the running
kernel, simply remove the unlikely() from the if statement.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
use vfs_open() instead
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When CRC32 is disabled, zonefs cannot be linked:
ld: fs/zonefs/super.o: in function `zonefs_fill_super':
Add a Kconfig 'select' statement for it.
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
|
|
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
|
|
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
|
|
Function gfs2_log_write_page is only used in lops.c, so make it static.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.
Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.
Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix new kernel-doc warnings in fs/block_dev.c:
../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'
Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we COW the identity, we assume that ->mm never changes. But this
isn't true of multiple processes end up sharing the ring. Hence treat
id->mm like like any other process compontent when it comes to the
identity mapping. This is pretty trivial, just moving the existing grab
into io_grab_identity(), and including a check for the match.
Cc: stable@vger.kernel.org # 5.10
Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>:
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>:
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
On reconnect, cap and dentry releases are dropped and the fields
that follow must be reencoded into the freed space. Currently these
are timestamp and gid_list, but gid_list isn't reencoded. This
results in
failed to decode message of type 24 v4: End of buffer
errors on the MDS.
While at it, make a change to encode gid_list unconditionally,
without regard to what head/which version was used as a result
of checking whether CEPH_FEATURE_FS_BTIME is supported or not.
URL: https://tracker.ceph.com/issues/48618
Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
Commit
121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
converted native x86-32 which take 64-bit arguments to use the
compat handlers to allow conversion to passing args via pt_regs.
sys_fanotify_mark() was however missed, as it has a general compat
handler. Add a config option that will use the syscall wrapper that
takes the split args for native 32-bit.
[ bp: Fix typo in Kconfig help text. ]
Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
Reported-by: Paweł Jasiak <pawel@jasiak.xyz>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
|
|
Since commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops") we've required that file operation structures explicitly
enable splice support, rather than falling back to the default handlers.
Most /proc files use the indirect 'struct proc_ops' to describe their
file operations, and were fixed up to support splice earlier in commits
40be821d627c..b24c30c67863, but the mountinfo files interact with the
VFS directly using their own 'struct file_operations' and got missed as
a result.
This adds the necessary support for splice to work for /proc/*/mountinfo
and friends.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Reported-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209971
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted patches from previous cycle(s)..."
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix hostfs_open() use of ->f_path.dentry
Make sure that make_create_in_sticky() never sees uninitialized value of dir_mode
fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set
fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
fs/namespace.c: WARN if mnt_count has become negative
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Various bug fixes and cleanups for ext4; no new features this cycle"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
ext4: remove unnecessary wbc parameter from ext4_bio_write_page
ext4: avoid s_mb_prefetch to be zero in individual scenarios
ext4: defer saving error info from atomic context
ext4: simplify ext4 error translation
ext4: move functions in super.c
ext4: make ext4_abort() use __ext4_error()
ext4: standardize error message in ext4_protect_reserved_inode()
ext4: remove redundant sb checksum recomputation
ext4: don't remount read-only with errors=continue on reboot
ext4: fix deadlock with fs freezing and EA inodes
jbd2: add a helper to find out number of fast commit blocks
ext4: make fast_commit.h byte identical with e2fsprogs/fast_commit.h
ext4: fix fall-through warnings for Clang
ext4: add docs about fast commit idempotence
ext4: remove the unused EXT4_CURRENT_REV macro
ext4: fix an IS_ERR() vs NULL check
ext4: check for invalid block size early when mounting a file system
ext4: fix a memory leak of ext4_free_data
ext4: delete nonsensical (commented-out) code inside ext4_xattr_block_set()
ext4: update ext4_data_block_valid related comments
...
|
|
Pull io_uring fixes from Jens Axboe:
"All straight fixes, or a prep patch for a fix, either bound for stable
or fixing issues from this merge window. In particular:
- Fix new shutdown op not breaking links on failure
- Hold mm->mmap_sem for mm->locked_vm manipulation
- Various cancelation fixes (me, Pavel)
- Fix error path potential double ctx free (Pavel)
- IOPOLL fixes (Xiaoguang)"
* tag 'io_uring-5.11-2020-12-23' of git://git.kernel.dk/linux-block:
io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()
io_uring: fix double io_uring free
io_uring: fix ignoring xa_store errors
io_uring: end waiting before task cancel attempts
io_uring: always progress task_work on task cancel
io-wq: kill now unused io_wq_cancel_all()
io_uring: make ctx cancel on exit targeted to actual ctx
io_uring: fix 0-iov read buffer select
io_uring: close a small race gap for files cancel
io_uring: fix io_wqe->work_list corruption
io_uring: limit {io|sq}poll submit locking scope
io_uring: inline io_cqring_mark_overflow()
io_uring: consolidate CQ nr events calculation
io_uring: remove racy overflow list fast checks
io_uring: cancel reqs shouldn't kill overflow list
io_uring: hold mmap_sem for mm->locked_vm manipulation
io_uring: break links on shutdown failure
|
|
Pull block fixes from Jens Axboe:
"A few stragglers in here, but mostly just straight fixes. In
particular:
- Set of rnbd fixes for issues around changes for the merge window
(Gioh, Jack, Md Haris Iqbal)
- iocost tracepoint addition (Baolin)
- Copyright/maintainers update (Christoph)
- Remove old blk-mq fast path CPU warning (Daniel)
- loop max_part fix (Josh)
- Remote IPI threaded IRQ fix (Sebastian)
- dasd stable fixes (Stefan)
- bcache merge window fixup and style fixup (Yi, Zheng)"
* tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block:
md/bcache: convert comma to semicolon
bcache:remove a superfluous check in register_bcache
block: update some copyrights
block: remove a pointless self-reference in block_dev.c
MAINTAINERS: add fs/block_dev.c to the block section
blk-mq: Don't complete on a remote CPU in force threaded mode
s390/dasd: fix list corruption of lcu list
s390/dasd: fix list corruption of pavgroup group list
s390/dasd: prevent inconsistent LCU device data
s390/dasd: fix hanging device offline processing
blk-iocost: Add iocg idle state tracepoint
nbd: Respect max_part for all partition scans
block/rnbd-clt: Does not request pdu to rtrs-clt
block/rnbd-clt: Dynamically allocate sglist for rnbd_iu
block/rnbd: Set write-back cache and fua same to the target device
block/rnbd: Fix typos
block/rnbd-srv: Protect dev session sysfs removal
block/rnbd-clt: Fix possible memleak
block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
blk-mq: Remove 'running from the wrong CPU' warning
|
|
io_wq_submit_work()
io_iopoll_complete() does not hold completion_lock to complete polled io,
so in io_wq_submit_work(), we can not call io_req_complete() directly, to
complete polled io, otherwise there maybe concurrent access to cqring,
defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always
let io_iopoll_complete() complete polled io") has fixed this issue, but
Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests(
IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is not
good.
Given that io_iopoll_complete() is always called under uring_lock, so here
for polled io, we can also get uring_lock to fix this issue.
Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io")
Cc: <stable@vger.kernel.org> # 5.5+
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: don't deref 'req' after completing it']
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Once we created a file for current context during setup, we should not
call io_ring_ctx_wait_and_kill() directly as it'll be done by fput(file)
Cc: stable@vger.kernel.org # 5.10
Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: fix unused 'ret' for !CONFIG_UNIX]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked
(held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock.
The problem is, not all the callers of gfs2_make_fs_ro should be doing this.
The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure),
signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the
file system we can get into the following circular lock dependency:
deactivate_super
down_write(&s->s_umount); <-------------------------------------- s_umount
deactivate_locked_super
gfs2_kill_sb
kill_block_super
generic_shutdown_super
gfs2_put_super
gfs2_make_fs_ro
gfs2_glock_nq_init sd_freeze_gl
freeze_go_sync
if (freeze glock in SH)
freeze_super (vfs)
down_write(&sb->s_umount); <------- s_umount
This patch moves the hold of the freeze glock outside the two sister rw/ro
functions to their callers, but it doesn't request the glock from
gfs2_put_super, thus eliminating the circular dependency.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Many places in the gfs2 code queued and dequeued the freeze glock.
Almost all of them acquire it in SHARED mode, and need to specify the
same LM_FLAG_NOEXP and GL_EXACT flags.
This patch adds common helper functions gfs2_freeze_lock and gfs2_freeze_unlock
to make the code more readable, and to prepare for the next patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Pull configfs update from Christoph Hellwig:
"Fix a kerneldoc comment (Alex Shi)"
* tag 'configfs-5.11' of git://git.infradead.org/users/hch/configfs:
configfs: fix kernel-doc markup issue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat update from Namjae Jeon:
"Avoid page allocation failure from upcase table allocation"
* tag 'exfat-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: Avoid allocating upcase table using kcalloc()
|
|
When the first file is opened, ext4 samples the mountpoint of the
filesystem in 64 bytes of the super block. It does so using
strlcpy(), this means that the remaining bytes in the super block
string buffer are untouched. If the mount point before had a longer
path than the current one, it can be reconstructed.
Consider the case where the fs was mounted to "/media/johnjdeveloper"
and later to "/". The super block buffer then contains
"/\x00edia/johnjdeveloper".
This case was seen in the wild and caused confusion how the name
of a developer ands up on the super block of a filesystem used
in production...
Fix this by using strncpy() instead of strlcpy(). The superblock
field is defined to be a fixed-size char array, and it is already
marked using __nonstring in fs/ext4/ext4.h. The consumer of the field
in e2fsprogs already assumes that in the case of a 64+ byte mount
path, that s_last_mounted will not be NUL terminated.
Link: https://lore.kernel.org/r/X9ujIOJG/HqMr88R@mit.edu
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
The wrapper is now useless since it does what
ext4_handle_dirty_metadata() does. Just remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When setting password salt in the superblock, we forget to recompute the
superblock checksum so it will not match until the next superblock
modification which recomputes the checksum. Fix it.
CC: Michael Halcrow <mhalcrow@google.com>
Reported-by: Andreas Dilger <adilger@dilger.ca>
Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
No behavioral change.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-6-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If journalling is still working at the moment we get to writing error
information to the superblock we cannot write directly to the superblock
as such write could race with journalled update of the superblock and
cause journal checksum failures, writing inconsistent information to the
journal or other problems. We cannot journal the superblock directly
from the error handling functions as we are running in uncertain context
and could deadlock so just punt journalled superblock update to a
workqueue.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Protect all superblock modifications (including checksum computation)
with a superblock buffer lock. That way we are sure computed checksum
matches current superblock contents (a mismatch could cause checksum
failures in nojournal mode or if an unjournalled superblock update races
with a journalled one). Also we avoid modifying superblock contents
while it is being written out (which can cause DIF/DIX failures if we
are running in nojournal mode).
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Everybody passes 1 as sync argument of ext4_commit_super(). Just drop
it.
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4_bio_write_page does not need wbc parameter, since its parameter
io contains the io_wbc field. The io::io_wbc is initialized by
ext4_io_submit_init which is called in ext4_writepages and
ext4_writepage functions prior to ext4_bio_write_page.
Therefor, when ext4_bio_write_page is called, wbc info
has already been included in io parameter.
Signed-off-by: Lei Chen <lennychen@tencent.com>
Link: https://lore.kernel.org/r/1607669664-25656-1-git-send-email-lennychen@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
save_error_info() is always called together with ext4_handle_error().
Combine them into a single call and move unconditional bits out of
save_error_info() into ext4_handle_error().
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Commit cfd732377221 ("ext4: add prefetching for block allocation
bitmaps") introduced block bitmap prefetch, and expects to read block
bitmaps of flex_bg through an IO. However, it seems to ignore the
value range of s_log_groups_per_flex. In the scenario where the value
of s_log_groups_per_flex is greater than 27, s_mb_prefetch or
s_mb_prefetch_limit will overflow, cause a divide zero exception.
In addition, the logic of calculating nr is also flawed, because the
size of flexbg is fixed during a single mount, but s_mb_prefetch can
be modified, which causes nr to fail to meet the value condition of
[1, flexbg_size].
To solve this problem, we need to set the upper limit of
s_mb_prefetch. Since we expect to load block bitmaps of a flex_bg
through an IO, we can consider determining a reasonable upper limit
among the IO limit parameters. After consideration, we chose
BLK_MAX_SEGMENT_SIZE. This is a good choice to solve divide zero
problem and avoiding performance degradation.
[ Some minor code simplifications to make the changes easy to follow -- TYT ]
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Samuel Liao <samuelliao@tencent.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/1607051143-24508-1-git-send-email-brookxu@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When filesystem inconsistency is detected with group locked, we
currently try to modify superblock to store error there without
blocking. However this can cause superblock checksum failures (or
DIF/DIX failure) when the superblock is just being written out.
Make error handling code just store error information in ext4_sb_info
structure and copy it to on-disk superblock only in ext4_commit_super().
In case of error happening with group locked, we just postpone the
superblock flushing to a workqueue.
[ Added fixup so that s_first_error_* does not get updated after
the file system is remounted.
Also added fix for syzbot failure. - Ted ]
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com
|
|
Update copyrights for files that have gotten some major rewrites lately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no point in duplicating the file name in the top of the file
comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The table for Unicode upcase conversion requires an order-5 allocation,
which may fail on a highly-fragmented system:
pool-udisksd: page allocation failure: order:5,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),
cpuset=/,mems_allowed=0
CPU: 4 PID: 3756880 Comm: pool-udisksd Tainted: G U
5.8.10-200.fc32.x86_64 #1
Hardware name: Dell Inc. XPS 13 9360/0PVG6D, BIOS 2.13.0 11/14/2019
Call Trace:
dump_stack+0x6b/0x88
warn_alloc.cold+0x75/0xd9
? _cond_resched+0x16/0x40
? __alloc_pages_direct_compact+0x144/0x150
__alloc_pages_slowpath.constprop.0+0xcfa/0xd30
? __schedule+0x28a/0x840
? __wait_on_bit_lock+0x92/0xa0
__alloc_pages_nodemask+0x2df/0x320
kmalloc_order+0x1b/0x80
kmalloc_order_trace+0x1d/0xa0
exfat_create_upcase_table+0x115/0x390 [exfat]
exfat_fill_super+0x3ef/0x7f0 [exfat]
? sget_fc+0x1d0/0x240
? exfat_init_fs_context+0x120/0x120 [exfat]
get_tree_bdev+0x15c/0x250
vfs_get_tree+0x25/0xb0
do_mount+0x7c3/0xaf0
? copy_mount_options+0xab/0x180
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x4d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Make the driver use kvcalloc() to eliminate the issue.
Fixes: 370e812b3ec1 ("exfat: add nls operations")
Cc: stable@vger.kernel.org #v5.7+
Signed-off-by: Artem Labazov <123321artyom@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
this is one of the cases where we need to use d_real() - we are
using more than the name of dentry here. ->d_sb is used as well,
so in case of hostfs being used as a layer we get the wrong
superblock.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
xa_store() may fail, check the result.
Cc: stable@vger.kernel.org # 5.10
Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull 9p update from Dominique Martinet:
- fix long-standing limitation on open-unlink-fop pattern
- add refcount to p9_fid (fixes the above and will allow for more
cleanups and simplifications in the future)
* tag '9p-for-5.11-rc1' of git://github.com/martinetd/linux:
9p: Remove unnecessary IS_ERR() check
9p: Uninitialized variable in v9fs_writeback_fid()
9p: Fix writeback fid incorrectly being attached to dentry
9p: apply review requests for fid refcounting
9p: add refcount to p9_fid struct
fs/9p: search open fids first
fs/9p: track open fids
fs/9p: fix create-unlink-getattr idiom
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
"Add splice file operations"
* tag 'for-linus-5.11-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: add splice file operations
|
|
Pull cifs fixes from Steve French:
"Four small CIFS/SMB3 fixes (witness protocol and reconnect related),
and two that add ability to get and set auditing information in the
security descriptor (SACL), which can be helpful not just for backup
scenarios ("smbinfo secdesc" etc.) but also for improving security"
* tag '5.11-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6:
Add SMB 2 support for getting and setting SACLs
SMB3: Add support for getting and setting SACLs
cifs: Avoid error pointer dereference
cifs: Re-indent cifs_swn_reconnect()
cifs: Unlock on errors in cifs_swn_reconnect()
cifs: Delete a stray unlock in cifs_swn_reconnect()
|