Age | Commit message (Collapse) | Author |
|
Having the 'name' arg optional and defaulting to the current
personality name is no necessary and leads to errors, as when
changing the level of an array we can end up using the
name of the old level instead of the new one.
So make it non-optional and always explicitly pass the name
of the level that the array will be.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
commit 58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf
md/raid10: handle further errors during fix_read_error better.
in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
But we were passing the IO size in bytes!!!
This resulting in bio_add_page failing, and empty request being sent
down, and a consequent BUG_ON in scsi_lib.
[fix missing space in error message at same time]
This fix is suitable for 3.1.y and later.
Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull a couple more powerpc fixes from Benjamin Herrenschmidt:
"Here are two more fixes that I "missed" when scrubbing patchwork last
week which are worth still having in 3.5."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/kvm: sldi should be sld
powerpc/xmon: Use cpumask iterator to avoid warning
|
|
Document no_new_privs.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
commit 43220aa0f22cd3ce5b30246d50ccd696d119edea
md/raid5: fix a hang on device failure.
fixed a hang, but introduced a refcounting in-balance so
that if the presence of bad-blocks ever caused an rdev to
be 'blocked' we would increment the refcount on the rdev and
never decrement it.
So added the needed rdev_dec_pending when md_wait_for_blocked_rdev
is not called.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Add blk_plug in sync_thread will increase the performance of sync.
Because sync_thread did not blk_plug,so when raid sync, the bio merge
not well.
Testing environment:
SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI
Controller.
OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012
x86_64 x86_64 x86_64 GNU/Linux.
RAID5: four ST31000524NS disk.
Without blk_plug:recovery speed about 63M/Sec;
Add blk_plug:recovery speed about 120M/Sec.
Using blktrace:
blktrace -d /dev/sdb -w 60 -o -|blkparse -i -
without blk_plug:
Total (8,16):
Reads Queued: 309811, 1239MiB Writes Queued: 0, 0KiB
Read Dispatches: 283583, 1189MiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 273351, 1149MiB Writes Completed: 0, 0KiB
Read Merges: 23533, 94132KiB Write Merges: 0, 0KiB
IO unplugs: 0 Timer unplugs: 0
add blk_plug:
Total (8,16):
Reads Queued: 428697, 1714MiB Writes Queued: 0, 0KiB
Read Dispatches: 3954, 1714MiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 3956, 1715MiB Writes Completed: 0, 0KiB
Read Merges: 424743, 1698MiB Write Merges: 0, 0KiB
IO unplugs: 0 Timer unplugs: 3384
The ratio of merge will be markedly increased.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement
nr_pending so we lose the reference we hold on the rdev.
So atomic_inc it first to maintain the reference.
This bug was introduced by commit 73e92e51b7969ef5477d
md/raid5. Don't write to known bad block on doubtful devices.
which appeared in 3.0, so patch is suitable for stable kernels since
then.
Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
In chunk_aligned_read() we are adding data_offset before calling
is_badblock. But is_badblock also adds data_offset, so that is bad.
So move the addition of data_offset to after the call to
is_badblock.
This bug was introduced by commit 31c176ecdf3563140e639
md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0. So that patch is suitable for any
-stable kernel from 3.0.y onwards. However it will need minor
revision for most of those (as the comment didn't appear until
recently).
Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
If a RAID5 has both a failed device and a device marked as
'WantReplacement', then we should preferentially replace the failed
device.
However the current code replaces whichever is found first.
So split into 2 loops, check fail failed/missing first, and only check
for WantReplacement if nothing is failed or missing.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored. We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location. This results in an error, and the recovery
aborts.
When we get to that last chunk we should just stop - there is nothing
more to do anyway.
This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.
Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
We only know of one so far.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
* 'fixes' of git://github.com/hzhuang1/linux:
ARM: pxa: hx4700: Fix basic suspend/resume
|
|
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.
This commit forces the delayed updates to run so the
replay code can find any modifications done. It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
|
|
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent. This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages. If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent. The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds. This isn't necessarily the case,
so rework the remove function so it can handle this case properly. This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case. Thanks,
Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM. Thus, the BUG_ON was triggered because of a wrong
check for the flags.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Miao pointed out there's a problem with mixing dio writes and buffered
reads. If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache. Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data. So we need to lock the extent and
check for uptodate bits in the range. If there are uptodate bits we need to
unlock and invalidate again. This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents. There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size. So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
|
|
If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
also disable itself. This commit therefore checks for tick_nohz_enabled
being zero, disabling rcu_prepare_for_idle() if so. This commit assumes
that tick_nohz_enabled can change at runtime: If this is not the case,
then a simpler approach suffices.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
created a macro out of an inline function in order to avoid build
breakage for certain combinations of gcc flags. Unfortunately, it also
converted a kfree_call_rcu() to a call_rcu(), which made the rcu_data
structure's ->qlen_lazy field lose counts. This commit therefore changes
the call_rcu() back to kfree_call_rcu().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Currently, if several CPUs in the same package have all lazy RCU
callbacks, their wakeups will be uncorrelated. If all the CPUs are in the
same power domain (as is often the case), this will result in unnecessary
power-ups of the package. This commit therefore uses round_jiffies()
to round the timeouts to a second boundary, increasing the odds that
they can be coalesced with each other or with other timeouts.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The TINY_PREEMPT_RCU() function rcu_preempt_needs_cpu(), which is called
from rcu_needs_cpu(), assumes that it is in a quiescent state with respect
to the CPU. This is no longer the case. This commit therefore updates
rcu_preempt_needs_cpu() to make it aware that it is not running in a
quiescent state.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
|
|
Problems in RCU idle entry and exit are almost always confined to the
offending CPU. This commit therefore switches ftrace_dump() from
DUMP_ALL to DUMP_ORIG.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
|
|
If a CPU goes offline with callbacks queued, those callbacks might be
indefinitely postponed, which can result in a system hang. This commit
therefore inserts warnings for this condition.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu()
for uniprocessor optimization, which means that rcu_blocking_is_gp()
can no longer rely on RCU read-side critical sections having disabled
preemption. This commit therefore disables preemption across
rcu_blocking_is_gp()'s scan of the cpu_online_mask.
(Updated from previous version to fix embarrassing bug spotted by
Wu Fengguang.)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
An uninitialized string may be displayed at the end of the rcu_preempt
detected stall info such as
0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D
^^^^^^^^^^
if CONFIG_RCU_FAST_NO_HZ is not defined.
This trivial patch clears the string in this case.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_is_cpu_idle() function is used if CONFIG_DEBUG_LOCK_ALLOC,
but TINY_RCU defines it only when CONFIG_PROVE_RCU. This causes
build failures when CONFIG_DEBUG_LOCK_ALLOC=y but CONFIG_PROVE_RCU=n.
This commit therefore adjusts the #ifdefs for rcu_is_cpu_idle() so
that it is defined when CONFIG_DEBUG_LOCK_ALLOC=y.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The __call_rcu() function is a bit overweight, so this commit splits
it into actual enqueuing of and accounting for the callback (__call_rcu())
and associated RCU-core processing (__call_rcu_core()).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The __call_rcu() function will invoke the RCU core, for example, if
it detects that the current CPU has too many callbacks. However, this
can happen on an offline CPU that is on its way to the idle loop, in
which case it is an error to invoke the RCU core, and the excess callbacks
will be adopted in any case. This commit therefore adds checks to
__call_rcu() for running on an offline CPU, refraining from invoking
the RCU core in this case.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Although __call_rcu() is handled correctly when called from a momentary
non-idle period, if it is called on a CPU that RCU believes to be idle
on RCU_FAST_NO_HZ kernels, the callback might be indefinitely postponed.
This commit therefore ensures that RCU is aware of the new callback and
has a chance to force the CPU out of dyntick-idle mode when a new callback
is posted.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
added cpp macro versions of __kfree_rcu() and __is_kfree_rcu_offset(),
but failed to remove the old inline-function versions. This commit does
this cleanup.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
__rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
consolidates them into kernel/rcupdate.h.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The return value from rcu_assign_pointer() is not used, and using it
would be quite ugly, for example:
q = rcu_assign_pointer(global_p, p);
To prevent this sort of ugliness from spreading, this commit wraps
rcu_assign_pointer() in a do-while loop.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit removes the extraneous parentheses from rcu_assign_keypointer()
so that rcu_assign_pointer() can be wrapped in do-while. It also wraps
rcu_assign_keypointer() in a do-while and parenthesizes its final argument,
as suggested by David Howells.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The return value from RCU_INIT_POINTER() is not used, and using it
would be quite ugly, for example:
q = RCU_INIT_POINTER(global_p, p);
To prevent this sort of ugliness from appearing, this commit wraps
RCU_INIT_POINTER() in a do-while loop.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Howells <dhowells@redhat.com>
|
|
This commit applies the INIT_RCU_POINTER() macro to all uses of
RCU_POINTER_INITIALIZER() that were all too cleverly creating gcc-style
initializations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
|
|
RCU_INIT_POINTER() returns a value that is never used, and which should
be abolished due to terminal ugliness:
q = RCU_INIT_POINTER(global_p, p);
However, there are two uses that cannot be handled by a do-while
formulation because they do gcc-style initialization:
RCU_INIT_POINTER(.real_cred, &init_cred),
RCU_INIT_POINTER(.cred, &init_cred),
This usage is clever, but not necessarily the nicest approach.
This commit therefore creates an RCU_POINTER_INITIALIZER() macro that
is specifically designed for gcc-style initialization.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
|
|
The _rcu_barrier() function accesses other CPUs' rcu_data structure's
->qlen field without benefit of locking. This commit therefore adds
the required ACCESS_ONCE() wrappers around accesses and updates that
need it.
ACCESS_ONCE() is not needed when a CPU accesses its own ->qlen, or
in code that cannot run while _rcu_barrier() is sampling ->qlen fields.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
There are a couple of open-coded initializations of the rcu_data
structure's RCU callback list. This commit therefore consolidates
them into a new init_callback_list() function.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The code that attempts to identify stalls that end just as we detect
them is broken by both flavors of initialization failure. This commit
therefore properly initializes and computes the count of the number
of reasons why the RCU grace period is stalled.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The current rcutorture rcu_barrier() testing never intentionally runs
more than one instance of rcu_barrier() at a given time. This fails
to test the the shiny new concurrency features of rcu_barrier(). This
commit therefore modifies the rcutorture fakewriter kthread to randomly
invoke rcu_barrier() rather than the usual synchronize_rcu().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
The rcu_torture_barrier() function has a copy-and-paste typo in the
string passed to rcutorture_shutdown_absorb(), which this commit fixes.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The child threads in the rcu_torture_barrier_cbs() are improperly
synchronized, which can cause the rcu_barrier() tests to hang. The
failure mode is as follows:
1. CPU 0 running in rcu_torture_barrier() sets barrier_cbs_count
to n_barrier_cbs.
2. CPU 1 running in rcu_torture_barrier_cbs() wakes up, posts
its RCU callback, and atomically decrements barrier_cbs_count.
Because barrier_cbs_count is not zero, it does not do the wake_up().
3. CPU 2 running in rcu_torture_barrier_cbs() wakes up, but
finds that barrier_cbs_count is not equal to n_barrier_cbs,
and so returns to sleep.
4. The value of barrier_cbs_count therefore never reaches zero,
which causes the test to hang.
This commit therefore uses a phase variable to coordinate the test,
preventing this scenario from occurring.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
SRCU now has a call_srcu() and an srcu_barrier(), but rcutorture does not
test them. This commit adds the machinery to allow rcutorture's existing
tests for call_rcu() and rcu_barrier() to apply to the SRCU equivalents.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Move the raw SRCU interfaces out of the middle of the normal SRCU
interfaces.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|