Age | Commit message (Collapse) | Author |
|
This series of patches is to fix the dis-order issue of setting/clearing
refmap bit described below.
Node 1 Node 2(master)
dlmlock
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
dlmlock succeed
dlmunlock succeed
dlm_purge_lockres
dlm_deref_handler
-> find lock resource is in
DLM_LOCK_RES_SETREF_INPROG state,
so dispatch a deref work
dlm_purge_lockres succeed.
call dlmlock again
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
deref work trigger, call
dlm_lockres_clear_refmap_bit
to clear Node 1 from refmap
dlm_purge_lockres succeed
dlm_send_remote_lock_request
return DLM_IVLOCKID because
the lockres is not exist
BUG if the lockres is $RECOVERY
This series of patches add a new message to keep the order of set and
clear. Other nodes can purge the lock resource only after the refmap bit
on master is cleared.
This patch is to add DEREF_DONE message and corresponding handler. Node
can purge the lock resource after receiving this message. As a new
message is added, so increase the minor number of dlm protocol version.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for
O2NET_MAX_PAYLOAD_BYTES.
Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't
cause any problem. But we'd better correct it for further use.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a75e9ccabd92 ("ocfs2: use spinlock irqsave for downconvert lock")
missed an unmodified place in ocfs2_osb_dump(), so it still exists a
deadlock scenario.
ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_osb_dump
cat /sys/kernel/debug/ocfs2/${uuid}/fs_state
This patch still uses spin_lock_irqsave() - replace spin_lock() to solve
this situation.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There actually no hardware or software interrupts in the context which
using o2hb_live_lock, so we don't need to worry about race conditions
caused by irq/softirq with spinlock held. Turning off irq is not good
for system performance after all. Just replace them with a non
interrupt safe function.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"This is another big update. Main changes are:
- lots of x86 system call (and other traps/exceptions) entry code
enhancements. In particular the complex parts of the 64-bit entry
code have been migrated to C code as well, and a number of dusty
corners have been refreshed. (Andy Lutomirski)
- vDSO special mapping robustification and general cleanups (Andy
Lutomirski)
- cpufeature refactoring, cleanups and speedups (Borislav Petkov)
- lots of other changes ..."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
x86/cpufeature: Enable new AVX-512 features
x86/entry/traps: Show unhandled signal for i386 in do_trap()
x86/entry: Call enter_from_user_mode() with IRQs off
x86/entry/32: Change INT80 to be an interrupt gate
x86/entry: Improve system call entry comments
x86/entry: Remove TIF_SINGLESTEP entry work
x86/entry/32: Add and check a stack canary for the SYSENTER stack
x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
x86/entry: Vastly simplify SYSENTER TF (single-step) handling
x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
x86/entry/32: Restore FLAGS on SYSEXIT
x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
selftests/x86: In syscall_nt, test NT|TF as well
x86/asm-offsets: Remove PARAVIRT_enabled
x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
uprobes: __create_xol_area() must nullify xol_mapping.fault
x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
...
|
|
Now that we're not filtering out I_FREEING inodes from our lookups
anymore, we can eliminate the non_block parameter from the lookup
function.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch basically reverts a very old patch from 2008,
7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since function try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch tries to prevent delete work (queued via iopen callback)
from executing if the glock is currently being used to create
a new inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The fsx test in xfstests was failing because it was using direct IO
writes which were using a bad calculation. It was using
loff_t lstart = offset & (PAGE_CACHE_SIZE - 1); when it should be
loff_t lstart = offset & ~(PAGE_CACHE_SIZE - 1);
Thus, the write at offset 0x67e00 was calculating lstart to be
0xe00, the address of our corruption. Instead, it should have been
0x67000. This patch fixes the calculation.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
We get a bogus warning about a potential uninitialized variable
use in gfs2, because the compiler does not figure out that we
never use the leaf number if get_leaf_nr() returns an error:
fs/gfs2/dir.c: In function 'get_first_leaf':
fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/gfs2/dir.c: In function 'dir_split_leaf':
fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
sufficient to let gcc understand that this is exactly the same
condition as in IS_ERR() so it can optimize the code path enough
to understand it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
|
|
xfs_dir2_node_trim_free can return with setting the rvalp argument
pointer. Initialize it to 0 at the beginning of the function and
only update it to 1 if we succeeded trimming a freespace block.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
__xfs_trans_roll() can return without setting the
*committed argument; this was a problem for xfs_bmap_finish():
int committed;/* xact committed or not */
...
error = __xfs_trans_roll(tp, ip, &committed);
if (error) {
...
if (committed) {
and we tested an uninitialized "committed" variable on the
error path. No caller is preserving "committed" state across
calls to __xfs_trans_roll(), so just initialize committed inside
the function to avoid future errors like this.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_bmap_del_extent() handles extent removal from the in-core and
on-disk extent lists. When removing a delalloc range, it updates the
indirect block reservation appropriately based on the removal. It
currently enforces that the new indirect block reservation is less than
or equal to the original. This is normally the case in all situations
except for in certain cases when the removed range creates a hole in a
single delalloc extent, thus splitting a single delalloc extent in two.
It is possible with small enough extents to split an indlen==1 extent
into two such slightly smaller extents. This leaves one extent with 0
indirect blocks and leads to assert failures in other areas (e.g.,
xfs_bunmapi() if the extent happens to be removed).
Update the indlen distribution code to steal blocks from the deleted
extent, if necessary, to satisfy the worst case total indirect
reservation for the new extents. This is safe as the caller does not
update the fdblocks counters until the extent is removed. Blocks stolen
in this manner simply remain accounted as allocated, having ownership
transferred from the data extent to an indirect reservation.
As a precaution, fall back to the original reservation algorithm if the
new indlen requirement is not met and warn if we end up with extents
without any reservation at all to detect this more easily in the future.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The delayed allocation indirect reservation splitting code is not
sufficient in some cases where a delalloc extent is split in two. In
preparation for enhancements to this code, refactor the current indlen
distribution algorithm into a new helper function.
[dchinner: rename temp, temp2 variables]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_bunmapi() currently updates the fdblocks counter, unreserves quota,
etc. before the extent is deleted by xfs_bmap_del_extent(). The function
has problems dividing up the indirect reserved blocks for scenarios
where a single delalloc extent is split in two. Particularly, there
aren't always enough blocks reserved for multiple extents in a single
extent reservation.
The solution to this problem is to allow the extent removal code to
steal from the deleted extent to meet indirect reservation requirements.
Move the block of code in xfs_bmapi() that updates the fdblocks counter
to after the call to xfs_bmap_del_extent() to allow the codepath to
update the extent record before the free blocks are accounted. Also,
reshuffle the code slightly so the delalloc accounting occurs near the
xfs_bmap_del_extent() call to provide context for the comments.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Add a DEBUG mode-only sysfs knob to enable forced buffered write
failure. An additional side effect of this mode is brute force killing
of delayed allocation blocks in the range of the write. The latter is
the prime motiviation behind this patch, as userspace test
infrastructure requires a reliable mechanism to create and split
delalloc extents without causing extent conversion.
Certain fallocate operations (i.e., zero range) were used for this in
the past, but the implementations have changed such that delalloc
extents are flushed and converted to real blocks, rendering the test
useless.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
Merge tag 'v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into current
Linux 4.5
|
|
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The 'reqs' member of fuse_io_priv serves two purposes. First is to track
the number of oustanding async requests to the server and to signal that
the io request is completed. The second is to be a reference count on the
structure to know when it can be freed.
For sync io requests these purposes can be at odds. fuse_direct_IO() wants
to block until the request is done, and since the signal is sent when
'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
use the object after the userspace server has completed processing
requests. This leads to some handshaking and special casing that it
needlessly complicated and responsible for at least one race condition.
It's much cleaner and safer to maintain a separate reference count for the
object lifecycle and to let 'reqs' just be a count of outstanding requests
to the userspace server. Then we can know for sure when it is safe to free
the object without any handshaking or special cases.
The catch here is that most of the time these objects are stack allocated
and should not be freed. Initializing these objects with a single reference
that is never released prevents accidental attempts to free the objects.
Fixes: 9d5722b7777e ("fuse: handle synchronous iocbs internally")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
iocb that could have been freed if async io has already completed. The fix
in this case is simple and obvious: cache the result before starting io.
It was discovered by KASan:
kernel: ==================================================================
kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390
Signed-off-by: Robert Doebbelin <robert@quobyte.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: bcba24ccdc82 ("fuse: enable asynchronous processing direct IO")
Cc: <stable@vger.kernel.org> # 3.10+
|
|
Dont print warning for ENOSPC error unless ENOSPC_DEBUG is enabled. Use
btrfs_debug if it is enabled.
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
[ preserve the WARN_ON ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
d_add() with inode->i_lock already held; common to d_add() and
d_splice_alias(). All ->lookup() instances that end up hashing
the dentry they are given will hash it here.
This almost completes the preparations to parallel lookups
proper - the only remaining bit is taking security_d_instantiate()
past d_rehash() and doing rehashing without dropping ->d_lock.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
it's a no-op - bumping ->d_seq is pointless there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
First of all, don't bother calling it if inode is NULL -
that makes inode argument unused. Moreover, do it *before*
dropping ->d_lock, not right after that (and don't bother
grabbing ->d_lock in it, of course).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
new primitive: d_exact_alias(dentry, inode). If there is an unhashed
dentry with the same name/parent and given inode, rehash, grab and
return it. Otherwise, return NULL. The only caller of d_add_unique()
switched to d_exact_alias() + d_splice_alias().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
the last user is gone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and use d_add(dn, NULL) in case we need to hash a negative
unhashed rather than using d_rehash() directly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and turn it into d_add in there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
d_splice_alias() guarantees that it'll be always hashed
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and make mountpoint_last() use it. That makes all
candidates for lookup with parent locked shared go
through lookup_slow().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Return dentry and don't pass nameidata or path; lift crossing mountpoints
into the caller.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
No need to lock parent just because of ->d_revalidate() on child;
contrary to the stale comment, lookup_dcache() *can* be used without
locking the parent. Result can be moved as soon as we return, of
course, but the same is true for lookup_one_len_unlocked() itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Have lookup_fast() return 1 on success and 0 on "need to fall back";
lookup_slow() and follow_managed() return positive (1) on success.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
There is memory leak as both caller function kmmpd() and callee
read_mmp_block() not releasing bh_check (i.e buffer_head).
Given patch fixes this problem.
[ Additional changes suggested by Andreas Dilger -- TYT ]
Signed-off-by: Jadhav Vikram <vikramjadhavpucsd2007@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[ 72.440013] do_get_write_access: OOM for frozen_buffer
[ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[ 72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[ 72.495559] do_get_write_access: OOM for frozen_buffer
[ 72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[ 72.496839] do_get_write_access: OOM for frozen_buffer
[ 72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[ 72.505766] Aborting journal on device sda1-8.
[ 72.505851] EXT4-fs (sda1): Remounting filesystem read-only
This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
the error is:
fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has
no member named 'bb_bitmap'.
so, the definition of macro DOUBLE_CHECK should before
'struct ext4_group_info', I fixed it, and I moved the macro
AGGRESSIVE_CHECK together, because I think they shoule be together.
Signed-off-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If data_err=abort option is specified for an ext3/ext4 mount,
/proc/mounts does show it as "(null)". This is caused by token2str()
returning NULL for Opt_data_err_abort (due to its pattern containing
'=').
Signed-off-by: Ales Novak <alnovak@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
ignored in the following "if" condition and ext4_expand_extra_isize()
might be called with NULL iloc.bh set, which triggers NULL pointer
dereference.
This is uncovered by commit 8b4953e13f4c ("ext4: reserve code points for
the project quota feature"), which enlarges the ext4_inode size, and
run the following script on new kernel but with old mke2fs:
#/bin/bash
mnt=/mnt/ext4
devname=ext4-error
dev=/dev/mapper/$devname
fsimg=/home/fs.img
trap cleanup 0 1 2 3 9 15
cleanup()
{
umount $mnt >/dev/null 2>&1
dmsetup remove $devname
losetup -d $backend_dev
rm -f $fsimg
exit 0
}
rm -f $fsimg
fallocate -l 1g $fsimg
backend_dev=`losetup -f --show $fsimg`
devsize=`blockdev --getsz $backend_dev`
good_tab="0 $devsize linear $backend_dev 0"
error_tab="0 $devsize error $backend_dev 0"
dmsetup create $devname --table "$good_tab"
mkfs -t ext4 $dev
mount -t ext4 -o errors=continue,strictatime $dev $mnt
dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
echo 3 > /proc/sys/vm/drop_caches
ls -l $mnt
exit 0
[ Patch changed to simplify the function a tiny bit. -- Ted ]
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|