Age | Commit message (Collapse) | Author |
|
New method: ->iterate_shared(). Same arguments as in ->iterate(),
called with the directory locked only shared. Once all filesystems
switch, the old one will be gone.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same as read() on regular files has, and for the same reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ta-da!
The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.
lockdep side also might need more work
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup. Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.
So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup(). Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away. If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held. Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.
d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).
That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We will need to be able to check if there is an in-lookup
dentry with matching parent/name. Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.
Add a secondary hash for locating those. Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain. On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway. That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.
New helper: d_alloc_parallel(). Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.
Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry. In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish. Note that in-lookup matches won't be
possible until we actually go for shared locking.
lookup_slow() switched to use of d_alloc_parallel().
Again, these commits are separated only for making it easier to
review. All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We'll need to verify that there's neither a hashed nor in-lookup
dentry with desired parent/name before adding to in-lookup set.
One possible solution would be to hold the parent's ->d_lock through
both checks, but while the in-lookup set is relatively small at any
time, dcache is not. And holding the parent's ->d_lock through
something like __d_lookup_rcu() would suck too badly.
So we leave the parent's ->d_lock alone, which means that we watch
out for the following scenario:
* we verify that there's no hashed match
* existing in-lookup match gets hashed by another process
* we verify that there's no in-lookup matches and decide
that everything's fine.
Solution: per-directory kinda-sorta seqlock, bumped around the times
we hash something that used to be in-lookup or move (and hash)
something in place of in-lookup. Then the above would turn into
* read the counter
* do dcache lookup
* if no matches found, check for in-lookup matches
* if there had been none of those either, check if the
counter has changed; repeat if it has.
The "kinda-sorta" part is due to the fact that we don't have much spare
space in inode. There is a spare word (shared with i_bdev/i_cdev/i_pipe),
so the counter part is not a problem, but spinlock is a different story.
We could use the parent's ->d_lock, and it would be less painful in
terms of contention, for __d_add() it would be rather inconvenient to
grab; we could do that (using lock_parent()), but...
Fortunately, we can get serialization on the counter itself, and it
might be a good idea in general; we can use cmpxchg() in a loop to
get from even to odd and smp_store_release() from odd to even.
This commit adds the counter and updating logics; the readers will be
added in the next commit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
* __d_add() is about to make it hashed, positive or not.
* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
* in caller of ->lookup() if it has escaped all of the
above. Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.
As the result, we are guaranteed that for as long as the flag is
set, dentry will
* remain negative unhashed with positive refcount
* never have its ->d_alias looked at
* never have its ->d_lru looked at
* never have its ->d_parent and ->d_name changed
Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
* only exist when parent is locked shared
* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
* only exist when there's no hashed dentry with the same
(parent,name)
Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series. The
reason for not making it a single patch is to simplify review.
New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
will be needed as soon as lookups are not serialized by ->i_mutex
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that. Switch to a slightly different scheme,
_not_ setting Checked in case of failure. That way the logics becomes
if Checked => OK
else if Error => fail
else if !validate => fail
else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened. Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and explain the non-obvious logics in case when lookup yields
a different dentry.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and have it use inode_lock()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
grab a reference to dentry we'd got the sucker from, and return
that dentry via *wait, rather than just returning the address of
->i_mutex.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The rest of work.xattr stuff isn't needed for this branch
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"A fix of a regression in UDF that got introduced in 4.6-rc1 by one of
the charset encoding fixes"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix conversion of 'dstring' fields to UTF8
|
|
We've calculated @len to be the bytes we need for '/..' entries from
@kn_from to the common ancestor, and calculated @nlen to be the extra
bytes we need to get from the common ancestor to @kn_to. We use them
as such at the end. But in the loop copying the actual entries, we
overwrite @nlen. Use a temporary variable for that instead.
Without this, the return length, when the buffer is large enough, is
wrong. (When the buffer is NULL or too small, the returned value is
correct. The buffer contents are also correct.)
Interestingly, no callers of this function are affected by this as of
yet. However the upcoming cgroup_show_path() will be.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
|
|
Also drop the newline from the message.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Struct gfs2_alloc_parms ap is never referenced in function
gfs2_rbm_find, so this patch removes it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
i_mutex has been replaced by i_rwsem and directly accessing the
non-existent i_mutex breaks the kernel build.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is the per-I/O equivalent of O_DSYNC and O_SYNC, and very useful for
all kinds of file servers and storage targets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The kiocb already has the new position, so use that. The only interesting
case is AIO, where we currently don't bother updating ki_pos. We're about
to free the kiocb after we're done, so we might as well update it to make
everyone's life simpler.
While we're at it also return the bytes written argument passed in if
we were successful so that the boilerplate error switch code in the
callers can go away.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.
XXX: Will need a few additional audits for O_DSYNC
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It has to be identical to ki_pos of the iocb, so use that instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 3034a14 "ima: pass 'opened' flag to identify newly created files"
stopped identifying empty files as new files. However new empty files
can be created using the mknodat syscall. On systems with IMA-appraisal
enabled, these empty files are not labeled with security.ima extended
attributes properly, preventing them from subsequently being opened in
order to write the file data contents. This patch defines a new hook
named ima_post_path_mknod() to mark these empty files, created using
mknodat, as new in order to allow the file data contents to be written.
In addition, files with security.ima xattrs containing a file signature
are considered "immutable" and can not be modified. The file contents
need to be written, before signing the file. This patch relaxes this
requirement for new files, allowing the file signature to be written
before the file contents.
Changelog:
- defer identifying files with signatures stored as security.ima
(based on Dmitry Rozhkov's comments)
- removing tests (eg. dentry, dentry->d_inode, inode->i_size == 0)
(based on Al's review)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Al Viro <<viro@zeniv.linux.org.uk>
Tested-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
|
|
This patch is based on top of the "vfs: support for a common kernel file
loader" patch set. In general when the kernel is reading a file into
memory it does not want anything else writing to it.
The kernel currently only forbids write access to a file being executed.
This patch extends this locking to files being read by the kernel.
Changelog:
- moved function to kernel_read_file() - Mimi
- updated patch description - Mimi
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
|
|
* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT. Which is what
the current code ends up returning.
* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here. At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen. No need
to do that inside atomic_open().
The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error. No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.
Cc: stable@vger.kernel.org # v3.6+
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
A fault in a user provided buffer may lead anywhere, and lockdep warns
that we have a potential deadlock between the mm->mmap_sem and the
kernfs file mutex:
[ 82.811702] ======================================================
[ 82.811705] [ INFO: possible circular locking dependency detected ]
[ 82.811709] 4.5.0-rc4-gfxbench+ #1 Not tainted
[ 82.811711] -------------------------------------------------------
[ 82.811714] kms_setmode/5859 is trying to acquire lock:
[ 82.811717] (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8150d9c1>] drm_gem_mmap+0x1a1/0x270
[ 82.811731]
but task is already holding lock:
[ 82.811734] (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[ 82.811745]
which lock already depends on the new lock.
[ 82.811749]
the existing dependency chain (in reverse order) is:
[ 82.811752]
-> #3 (&mm->mmap_sem){++++++}:
[ 82.811761] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811766] [<ffffffff8118bc65>] __might_fault+0x75/0xa0
[ 82.811771] [<ffffffff8124da4a>] kernfs_fop_write+0x8a/0x180
[ 82.811787] [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[ 82.811792] [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[ 82.811797] [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[ 82.811801] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811807]
-> #2 (s_active#6){++++.+}:
[ 82.811814] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811819] [<ffffffff8124c070>] __kernfs_remove+0x210/0x2f0
[ 82.811823] [<ffffffff8124d040>] kernfs_remove_by_name_ns+0x40/0xa0
[ 82.811828] [<ffffffff8124e9e0>] sysfs_remove_file_ns+0x10/0x20
[ 82.811832] [<ffffffff815318d4>] device_del+0x124/0x250
[ 82.811837] [<ffffffff81531a19>] device_unregister+0x19/0x60
[ 82.811841] [<ffffffff8153c051>] cpu_cache_sysfs_exit+0x51/0xb0
[ 82.811846] [<ffffffff8153c628>] cacheinfo_cpu_callback+0x38/0x70
[ 82.811851] [<ffffffff8109ae89>] notifier_call_chain+0x39/0xa0
[ 82.811856] [<ffffffff8109aef9>] __raw_notifier_call_chain+0x9/0x10
[ 82.811860] [<ffffffff810786de>] cpu_notify+0x1e/0x40
[ 82.811865] [<ffffffff81078779>] cpu_notify_nofail+0x9/0x20
[ 82.811869] [<ffffffff81078ac3>] _cpu_down+0x233/0x340
[ 82.811874] [<ffffffff81079019>] disable_nonboot_cpus+0xc9/0x350
[ 82.811878] [<ffffffff810d2e11>] suspend_devices_and_enter+0x5a1/0xb50
[ 82.811883] [<ffffffff810d3903>] pm_suspend+0x543/0x8d0
[ 82.811888] [<ffffffff810d1b77>] state_store+0x77/0xe0
[ 82.811892] [<ffffffff813fa68f>] kobj_attr_store+0xf/0x20
[ 82.811897] [<ffffffff8124e740>] sysfs_kf_write+0x40/0x50
[ 82.811902] [<ffffffff8124dafc>] kernfs_fop_write+0x13c/0x180
[ 82.811906] [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[ 82.811910] [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[ 82.811914] [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[ 82.811918] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811923]
-> #1 (cpu_hotplug.lock){+.+.+.}:
[ 82.811929] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811933] [<ffffffff817b6f72>] mutex_lock_nested+0x62/0x3b0
[ 82.811940] [<ffffffff810784c1>] get_online_cpus+0x61/0x80
[ 82.811944] [<ffffffff811170eb>] stop_machine+0x1b/0xe0
[ 82.811949] [<ffffffffa0178edd>] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
[ 82.812009] [<ffffffffa017d3a6>] ggtt_bind_vma+0x46/0x70 [i915]
[ 82.812045] [<ffffffffa017eb70>] i915_vma_bind+0x140/0x290 [i915]
[ 82.812081] [<ffffffffa01862b9>] i915_gem_object_do_pin+0x899/0xb00 [i915]
[ 82.812117] [<ffffffffa0186555>] i915_gem_object_pin+0x35/0x40 [i915]
[ 82.812154] [<ffffffffa019a23e>] intel_init_pipe_control+0xbe/0x210 [i915]
[ 82.812192] [<ffffffffa0197312>] intel_logical_rings_init+0xe2/0xde0 [i915]
[ 82.812232] [<ffffffffa0186fe3>] i915_gem_init+0xf3/0x130 [i915]
[ 82.812278] [<ffffffffa02097ed>] i915_driver_load+0xf2d/0x1770 [i915]
[ 82.812318] [<ffffffff81512474>] drm_dev_register+0xa4/0xb0
[ 82.812323] [<ffffffff8151467e>] drm_get_pci_dev+0xce/0x1e0
[ 82.812328] [<ffffffffa01472cf>] i915_pci_probe+0x2f/0x50 [i915]
[ 82.812360] [<ffffffff8143f907>] pci_device_probe+0x87/0xf0
[ 82.812366] [<ffffffff81535f89>] driver_probe_device+0x229/0x450
[ 82.812371] [<ffffffff81536233>] __driver_attach+0x83/0x90
[ 82.812375] [<ffffffff81533c61>] bus_for_each_dev+0x61/0xa0
[ 82.812380] [<ffffffff81535879>] driver_attach+0x19/0x20
[ 82.812384] [<ffffffff8153535f>] bus_add_driver+0x1ef/0x290
[ 82.812388] [<ffffffff81536e9b>] driver_register+0x5b/0xe0
[ 82.812393] [<ffffffff8143e83b>] __pci_register_driver+0x5b/0x60
[ 82.812398] [<ffffffff81514866>] drm_pci_init+0xd6/0x100
[ 82.812402] [<ffffffffa027c094>] 0xffffffffa027c094
[ 82.812406] [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
[ 82.812412] [<ffffffff811595a0>] do_init_module+0x5b/0x1cb
[ 82.812417] [<ffffffff81106160>] load_module+0x1c20/0x2480
[ 82.812422] [<ffffffff81106bae>] SyS_finit_module+0x7e/0xa0
[ 82.812428] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812433]
-> #0 (&dev->struct_mutex){+.+.+.}:
[ 82.812439] [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[ 82.812443] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.812456] [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[ 82.812460] [<ffffffff81196a14>] mmap_region+0x334/0x580
[ 82.812466] [<ffffffff81196fc4>] do_mmap+0x364/0x410
[ 82.812470] [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[ 82.812474] [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[ 82.812479] [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[ 82.812484] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812489]
other info that might help us debug this:
[ 82.812493] Chain exists of:
&dev->struct_mutex --> s_active#6 --> &mm->mmap_sem
[ 82.812502] Possible unsafe locking scenario:
[ 82.812506] CPU0 CPU1
[ 82.812508] ---- ----
[ 82.812510] lock(&mm->mmap_sem);
[ 82.812514] lock(s_active#6);
[ 82.812519] lock(&mm->mmap_sem);
[ 82.812522] lock(&dev->struct_mutex);
[ 82.812526]
*** DEADLOCK ***
[ 82.812531] 1 lock held by kms_setmode/5859:
[ 82.812533] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[ 82.812541]
stack backtrace:
[ 82.812547] CPU: 0 PID: 5859 Comm: kms_setmode Not tainted 4.5.0-rc4-gfxbench+ #1
[ 82.812550] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
[ 82.812553] 0000000000000000 ffff880079407bf0 ffffffff813f8505 ffffffff825fb270
[ 82.812560] ffffffff825c4190 ffff880079407c30 ffffffff810c84ac ffff880079407c90
[ 82.812566] ffff8800797ed328 ffff8800797ecb00 0000000000000001 ffff8800797ed350
[ 82.812573] Call Trace:
[ 82.812578] [<ffffffff813f8505>] dump_stack+0x67/0x92
[ 82.812582] [<ffffffff810c84ac>] print_circular_bug+0x1fc/0x310
[ 82.812586] [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[ 82.812590] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.812594] [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[ 82.812599] [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[ 82.812603] [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[ 82.812608] [<ffffffff81196a14>] mmap_region+0x334/0x580
[ 82.812612] [<ffffffff81196fc4>] do_mmap+0x364/0x410
[ 82.812616] [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[ 82.812629] [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[ 82.812633] [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[ 82.812637] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
Highly unlikely though this scenario is, we can avoid the issue entirely
by moving the copy operation from out under the kernfs_get_active()
tracking by assigning the preallocated buffer its own mutex. The
temporary buffer allocation doesn't require mutex locking as it is
entirely local.
The locked section was extended by the addition of the preallocated buf
to speed up md user operations in
commit 2b75869bba676c248d8d25ae6d2bd9221dfffdb6
Author: NeilBrown <neilb@suse.de>
Date: Mon Oct 13 16:41:28 2014 +1100
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
This can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
Cc: stable@vger.kernel.org
Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor
|
|
Single caller passes GFP_NOFS. We can get rid of the
gfpflags_allow_blocking checks as NOFS can block but does not recurse to
filesystem through reclaim.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Similar to __clear_extent_bit, do not fail if the state preallocation
fails as we might not need it. One less BUG_ON.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Callers pass GFP_NOFS and tests pass GFP_KERNEL, but using NOFS there
does not hurt. No need to pass the flags around.
Signed-off-by: David Sterba <dsterba@suse.com>
|