Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix spurious wakeup for requeue_pi really
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Remove -Wcast-align
perf tools: Fix compatibility with libelf 0.8 and autodetect
perf events: Don't generate events for the idle task when exclude_idle is set
perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove cpu arg from the rb_time_stamp() function
tracing: Fix comment typo and documentation example
tracing: Fix trace_seq_printf() return value
tracing: Update *ppos instead of filp->f_pos
|
|
For as long as kretprobes have existed, we've allocated NR_CPUS
instances of kretprobe_instance structures. With the default
value of CONFIG_NR_CPUS increasing on certain architectures, we
are potentially wasting kernel memory.
See http://sourceware.org/bugzilla/show_bug.cgi?id=10839#c3 for
more details.
Use a saner num_possible_cpus() instead of NR_CPUS for
allocation.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: fweisbec@gmail.com
LKML-Reference: <20091030135310.GA22230@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
while rcu_irq_enter() is invoked unconditionally. This patch
moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
calls are balanced.
This patch has no effect on the behavior of the kernel because
both rcu_irq_enter() and rcu_irq_exit() are empty for
!CONFIG_NO_HZ, but the code is easier to understand if the calls
are obviously balanced in all cases.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12567428891605-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Very long RCU read-side critical sections (50 milliseconds or
so) can cause a race between force_quiescent_state() and
rcu_start_gp() as follows on kernel builds with multi-level
rcu_node hierarchies:
1. CPU 0 calls force_quiescent_state(), sees that there is a
grace period in progress, and acquires ->fsqlock.
2. CPU 1 detects the end of the grace period, and so
cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
This operation is carried out under the root rnp->lock,
but CPU 0 has not yet acquired that lock. Note that
rsp->signaled is still RCU_SAVE_DYNTICK from the last
grace period.
3. CPU 1 calls rcu_start_gp(), but no one wants a new grace
period, so it drops the root rnp->lock and returns.
4. CPU 0 acquires the root rnp->lock and picks up rsp->completed
and rsp->signaled, then drops rnp->lock. It then enters the
RCU_SAVE_DYNTICK leg of the switch statement.
5. CPU 2 invokes call_rcu(), and now needs a new grace period.
It calls rcu_start_gp(), which acquires the root rnp->lock, sets
rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
the RCU_SAVE_DYNTICK leg of the switch statement!) and starts
initializing the rcu_node hierarchy. If there are multiple
levels to the hierarchy, it will drop the root rnp->lock and
initialize the lower levels of the hierarchy.
6. CPU 0 notes that rsp->completed has not changed, which permits
both CPU 2 and CPU 0 to try updating it concurrently. If CPU 0's
update prevails, later calls to force_quiescent_state() can
count old quiescent states against the new grace period, which
can in turn result in premature ending of grace periods.
Not good.
This patch adds an RCU_GP_IDLE state for rsp->signaled that is
set initially at boot time and any time a grace period ends.
This prevents CPU 0 from getting into the workings of
force_quiescent_state() in step 4. Additional locking and
checks prevent the concurrent update of rsp->signaled in step 6.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1256742889199-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Ingo triggered the following warning:
WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
Hardware name: System Product Name
ODEBUG: init active object type: timer_list
Modules linked in:
Pid: 2619, comm: dmesg Tainted: G W 2.6.32-rc5-tip+ #5298
Call Trace:
[<81035443>] warn_slowpath_common+0x6a/0x81
[<8120e483>] ? debug_print_object+0x42/0x50
[<81035498>] warn_slowpath_fmt+0x29/0x2c
[<8120e483>] debug_print_object+0x42/0x50
[<8120ec2a>] __debug_object_init+0x279/0x2d7
[<8120ecb3>] debug_object_init+0x13/0x18
[<810409d2>] init_timer_key+0x17/0x6f
[<81041526>] free_uid+0x50/0x6c
[<8104ed2d>] put_cred_rcu+0x61/0x72
[<81067fac>] rcu_do_batch+0x70/0x121
debugobjects warns about an enqueued timer being initialized. If
CONFIG_USER_SCHED=y the user management code uses delayed work to
remove the user from the hash table and tear down the sysfs objects.
free_uid is called from RCU and initializes/schedules delayed work if
the usage count of the user_struct is 0. The init/schedule happens
outside of the uidhash_lock protected region which allows a concurrent
caller of find_user() to reference the about to be destroyed
user_struct w/o preventing the work from being scheduled. If the next
free_uid call happens before the work timer expired then the active
timer is initialized and the work scheduled again.
The race was introduced in commit 5cb350ba (sched: group scheduling,
sysfs tunables) and made more prominent by commit 3959214f (sched:
delayed cleanup of user_struct)
Move the init/schedule_delayed_work inside of the uidhash_lock
protected region to prevent the race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable@kernel.org
|
|
I got a boot crash when forcing cpumasks offstack on 32 bit,
because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
wasn't zeroed).
AFAICT the others need to be zeroed too: only
nohz.ilb_grp_nohz_mask is initialized before use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <200911022037.21282.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
____ftrace_check_##name() is used for compile-time check on
F_printk() only, so it should be marked as __unused instead
of __used.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AEE2D01.4010305@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Today's linux-next build (x86_64 allmodconfig) failed like this:
kernel/user-return-notifier.c: In function
'fire_user_return_notifiers': kernel/user-return-notifier.c:45:
error: expected expression before ')' token
Introduced by commit 7c68af6e32c73992bad24107311f3433c89016e2
("core, x86: Add user return notifiers") from the tip and kvm trees
but revealed by commit e0fdb0e050eae331046385643618f12452aa7e73
("percpu: add __percpu for sparse") from the percpu tree.
Before that percpu tree commit, "put_cpu_var()" would compile
without error (even though it really needs a parameter).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
LKML-Reference: <20091102161722.eea4358d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
sched: move rq_weight data array out of .percpu
percpu: allow pcpu_alloc() to be called with IRQs off
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-param-fixes:
param: fix setting arrays of bool
param: fix NULL comparison on oom
param: fix lots of bugs with writing charp params from sysfs, by leaking mem.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: fix invalid page count in printk output
HWPOISON: Allow schedule_on_each_cpu() from keventd
HWPOISON: fix/proc/meminfo alignment
HWPOISON: fix oops on ksm pages
HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
HWPOISON: return early on non-LRU pages
HWPOISON: Add brief hwpoison description to Documentation
HWPOISON: Clean up PR_MCE_KILL interface
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Move drop_futex_key_refs out of spinlock'ed region
rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang
rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
rcu: Prevent RCU IPI storms in presence of high call_rcu() load
futex: Check for NULL keys in match_futex
futex: Handle spurious wake up
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf timechart: Improve the visual appearance of scheduler delays
perf timechart: Fix the wakeup-arrows that point to non-visible processes
perf top: Fix --delay_secs 0 division by zero
perf tools: Bump version to 0.0.2
perf_event: Adjust frequency and unthrottle for non-group-leader events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Do less agressive buddy clearing
sched: Disable SD_PREFER_LOCAL for MC/CPU domains
|
|
Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Peter Teoh <htmldeveloper@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip(). it makes small inconsistent behavior.
example:
=========================
# cd /mnt/cgroup/hoge
# cat memory.swappiness
60
# echo "59 " > memory.swappiness
# cat memory.swappiness
59
# echo " 58" > memory.swappiness
bash: echo: write error: Invalid argument
This patch fixes it.
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:
Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
[<000000000048a946>] sk_filter+0x9a/0xd0
[<000000000049d938>] netlink_broadcast+0x2c0/0x53c
[<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
[<00000000003baef0>] proc_sid_connector+0xc4/0xd4
[<0000000000142604>] __set_special_pids+0x58/0x90
[<0000000000159938>] sys_setsid+0xb4/0xd8
[<00000000001187fe>] sysc_noemu+0x10/0x16
[<00000041616cb266>] 0x41616cb266
The warning is
---> WARN_ON_ONCE(in_irq() || irqs_disabled());
The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.
After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.
We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.
One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Scott James Remnant <scott@ubuntu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.
Now we have sparse, we can use that (next patch).
tj: * Updated to convert stuff which were missed by or added after the
original patch.
* Kill per_cpu_var() macro.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
|
|
This patch updates percpu related symbols in kernel tracer such that
percpu symbols are unique and don't clash with local symbols. This
serves two purposes of decreasing the possibility of global percpu
symbol collision and allowing dropping per_cpu__ prefix from percpu
symbols.
* kernel/trace/trace.c: s/max_data/max_tr_data/
* kernel/trace/trace_hw_branches: s/tracer/hwb_tracer/, s/buffer/hwb_buffer/
Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
|
|
This patch updates percpu related symbols under kernel/ and mm/ such
that percpu symbols are unique and don't clash with local symbols.
This serves two purposes of decreasing the possibility of global
percpu symbol collision and allowing dropping per_cpu__ prefix from
percpu symbols.
* kernel/lockdep.c: s/lock_stats/cpu_lock_stats/
* kernel/sched.c: s/init_rq_rt/init_rt_rq_var/ (any better idea?)
s/sched_group_cpus/sched_groups/
* kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a
* kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/
s/watchdog_task/softlockup_watchdog/
s/timestamp/ts/ for local variables
* kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/
* mm/slab.c: s/reap_work/slab_reap_work/
s/reap_node/slab_reap_node/
* mm/vmstat.c: local variable changed to avoid collision with vmstat_work
Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: (slab/vmstat) Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
|
|
Merge reason: Pick up fixes and move base from -rc1 to -rc5.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
commit fce29d15b59245597f7f320db4a9f2be0f5fb512
("tracing/filters: Refactor subsystem filter code")
broke system filter accidentally.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4AE810BD.3070009@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix find_probe_event() to compare both of event-name and
event-group. Without this fix, kprobe-tracer overwrites existing
same event-name probe even if its group-name is different.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204244.30545.27516.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We create a dummy struct kernel_param on the stack for parsing each
array element, but we didn't initialize the flags word. This matters
for arrays of type "bool", where the flag indicates if it really is
an array of bools or unsigned int (old-style).
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
|
|
kp->arg is always true: it's the contents of that pointer we care about.
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
|
|
e180a6b7759a "param: fix charp parameters set via sysfs" fixed the case
where charp parameters written via sysfs were freed, leaving drivers
accessing random memory.
Unfortunately, storing a flag in the kparam struct was a bad idea: it's
rodata so setting it causes an oops on some archs. But that's not all:
1) module_param_array() on charp doesn't work reliably, since we use an
uninitialized temporary struct kernel_param.
2) there's a fundamental race if a module uses this parameter and then
it's changed: they will still access the old, freed, memory.
The simplest fix (ie. for 2.6.32) is to never free the memory. This
prevents all these problems, at cost of a memory leak. In practice, there
are only 18 places where a charp is writable via sysfs, and all are
root-only writable.
Reported-by: Takashi Iwai <tiwai@suse.de>
Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
|
|
Freezing isn't exactly the most latency sensitive operation and
there's no reason to burn cpu cycles and power waiting for it to
complete. msleep(10) instead of yield(). This should improve
reliability of emergency hibernation.
[rjw: Modified the comment next to the msleep(10).]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)
See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.
Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This section contains pointers to allocated objects and not scanning it
leads to false positives.
Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
Fix this by allocating this array dynamically and keep only pointer to it
percpu.
The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up().
...
ffffffff8104337c: 65 48 8b 14 25 20 cd mov %gs:0xcd20,%rdx
ffffffff81043383: 00 00
ffffffff81043385: 48 c7 c0 00 e1 00 00 mov $0xe100,%rax
ffffffff8104338c: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff81043393: 00
ffffffff81043394: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff8104339b: 00
ffffffff8104339c: 48 01 d0 add %rdx,%rax
ffffffff8104339f: 49 8d 94 24 08 01 00 lea 0x108(%r12),%rdx
ffffffff810433a6: 00
ffffffff810433a7: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433ac: 48 89 45 b0 mov %rax,-0x50(%rbp)
ffffffff810433b0: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433b5: 48 89 55 c0 mov %rdx,-0x40(%rbp)
...
After:
...
ffffffff8104337c: 65 8b 04 25 28 cd 00 mov %gs:0xcd28,%eax
ffffffff81043383: 00
ffffffff81043384: 48 98 cltq
ffffffff81043386: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi
ffffffff8104338d: 00
ffffffff8104338e: 48 8b 15 d3 7f 76 00 mov 0x767fd3(%rip),%rdx # ffffffff817ab368 <update_shares_data>
ffffffff81043395: 48 8b 34 c5 00 ee 6d mov -0x7e921200(,%rax,8),%rsi
ffffffff8104339c: 81
ffffffff8104339d: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff810433a4: 00
ffffffff810433a5: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433aa: 48 89 7d c0 mov %rdi,-0x40(%rbp)
ffffffff810433ae: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff810433b5: 00
ffffffff810433b6: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433bb: 48 01 f2 add %rsi,%rdx
ffffffff810433be: 48 89 55 b0 mov %rdx,-0x50(%rbp)
...
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This function was taking non-necessary arguments which can be determined
by kmemleak. The patch also modifies the calling sites.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
|
|
Add two more software events that are common to many cpus.
Alignment faults: When a load or store is not aligned properly.
Emulation faults: When an instruction is emulated in software.
Both cause a very significant slowdown (100x or worse), so identifying and
fixing them is very important.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Use lockdep_set_class() to simplify the code and to avoid any
additional overhead in the !LOCKDEP case. Also move the
definition of rcu_root_class into kernel/rcutree.c, as suggested
by Lai Jiangshan.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1256577871443-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
No change in functionality - just straighten out a few small
stylistic details.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Make rcutorture list the available torture_type values when it
doesn't like the one specified.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351868-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Adds the "srcu_expedited" torture type, and also renames
sched_ops_sync to sched_sync_ops for consistency while we are in
this file.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226353636-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This patch creates a synchronize_srcu_expedited() that uses
synchronize_sched_expedited() where synchronize_srcu()
uses synchronize_sched(). The synchronize_srcu() and
synchronize_srcu_expedited() functions become one-liners that
pass synchronize_sched() or synchronize_sched_expedited(),
repectively, to a new __synchronize_srcu() function.
While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
follow the corresponding functions.
Requested-by: Avi Kivity <avi@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
LKML-Reference: <12565226354038-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation. In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86. This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):
CONFIG_TREE_RCU:
text data bss dec filename
183 4 0 187 kernel/rcupdate.o
2783 520 36 3339 kernel/rcutree.o
3526 Total (vs 4565 for v7)
CONFIG_TREE_PREEMPT_RCU:
text data bss dec filename
263 4 0 267 kernel/rcupdate.o
4594 776 52 5422 kernel/rcutree.o
5689 Total (6155 for v7)
CONFIG_TINY_RCU:
text data bss dec filename
96 4 0 100 kernel/rcupdate.o
734 24 0 758 kernel/rcutiny.o
858 Total (vs 848 for v7)
The above is for x86. Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.
Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
o Apply Lai Jiangshan's review comments (aside from
might_sleep() in synchronize_sched(), which is covered by SMP builds).
o Fix up expedited primitives.
Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
o Forward ported to put it into the 2.6.33 stream.
o Added lockdep support.
o Make lightweight rcu_barrier.
Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
o Ported to latest pre-2.6.32 merge window kernel.
- Renamed rcu_qsctr_inc() to rcu_sched_qs().
- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
- Provided trivial rcu_cpu_notify().
- Provided trivial exit_rcu().
- Provided trivial rcu_needs_cpu().
- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
o Removed the dependence on EMBEDDED, with a view to making
TINY_RCU default for !SMP at some time in the future.
o Added (trivial) support for expedited grace periods.
Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
o Squeeze the size down a bit further by removing the
->completed field from struct rcu_ctrlblk.
o This permits synchronize_rcu() to become the empty function.
Previous concerns about rcutorture were unfounded, as
rcutorture correctly handles a constant value from
rcu_batches_completed() and rcu_batches_completed_bh().
Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
o Changed rcu_batches_completed(), rcu_batches_completed_bh()
rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
rcu_nmi_exit(), to be static inlines, as suggested by David
Howells. Doing this saves about 100 bytes from rcutiny.o.
(The numbers between v3 and this v4 of the patch are not directly
comparable, since they are against different versions of Linux.)
Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
o Fix whitespace issues.
o Change short-circuit "||" operator to instead be "+" in order
to fix performance bug noted by "kraai" on LWN.
(http://lwn.net/Articles/324348/)
Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
o This version depends on EMBEDDED as well as !SMP, as suggested
by Ingo.
o Updated rcu_needs_cpu() to unconditionally return zero,
permitting the CPU to enter dynticks-idle mode at any time.
This works because callbacks can be invoked upon entry to
dynticks-idle mode.
o Paul is now OK with this being included, based on a poll at
the Kernel Miniconf at linux.conf.au, where about ten people said
that they cared about saving 900 bytes on single-CPU systems.
o Applies to both mainline and tip/core/rcu.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
When CONFIG_USER_RETURN_NOTIFIER is set, we need to link
kernel/user-return-notifier.o.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1256473485-23109-1-git-send-email-avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
CPU time of a guest is always accounted in 'user' time
without concern for the nice value of its counterpart
process although the guest is scheduled under the nice
value.
This patch fixes the defect and accounts cpu time of
a niced guest in 'nice' time as same as a niced process.
And also the patch adds 'guest_nice' to cpuacct. The
value provides niced guest cpu time which is like 'nice'
to 'user'.
The original discussions can be found here:
http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html
Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1256314810-7897-1-git-send-email-ozaki.ryota@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Conflicts:
fs/proc/array.c
Merge reason: resolve conflict and queue up dependent patch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233647.118547500@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Trivial patch to fix a documentation example and to fix a
comment.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.871719877@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:
trace_seq_printf(s, "");
is perfectly legal, and returning 0 would indicate that it
failed.
We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.
Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.631787612@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This patch restores the effectiveness of LAST_BUDDY in preventing
pgsql+oltp from collapsing due to wakeup preemption. It also
switches LAST_BUDDY to exclusively do what it does best, namely
mitigate the effects of aggressive wakeup preemption, which
improves vmark throughput markedly, and restores mysql+oltp
scalability.
Since buddies are about scalability, enable them beginning at the
point where we begin expanding sched_latency, namely
sched_nr_latency. Previously, buddies were cleared aggressively,
which seriously reduced their effectiveness. Not clearing
aggressively however, produces a small drop in mysql+oltp
throughput immediately after peak, indicating that LAST_BUDDY is
actually doing some harm. This is right at the point where X on the
desktop in competition with another load wants low latency service.
Ergo, do not enable until we need to scale.
To mitigate latency induced by buddies, or by a task just missing
wakeup preemption, check latency at tick time.
Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
CACHE_HOT_BUDDY.
Supporting performance tests:
tip = v2.6.32-rc5-1497-ga525b32
tipx = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
(Three run averages except where noted.)
vmark:
------
tip 108466 messages per second
tip+ 125307 messages per second
tip+x 125335 messages per second
tipx 117781 messages per second
2.6.31.3 122729 messages per second
mysql+oltp:
-----------
clients 1 2 4 8 16 32 64 128 256
..........................................................................................
tip 9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
tip+ 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
tipx 9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
2.6.31.3 8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
pgsql+oltp:
-----------
clients 1 2 4 8 16 32 64 128 256
..........................................................................................
tip 13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
tip+x 13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
tipx 13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
2.6.31.3 13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Today I got:
[39648.224782] Registered led device: iwl-phy0::TX
[40676.545099] __ratelimit: 246 callbacks suppressed
[40676.545103] abcdef[23675]: segfault at 0 ...
as you can see the ratelimit message contains a function prefix.
Since this is always __ratelimit, this wont help much.
This patch changes __ratelimit and printk_ratelimit to print the
function name that calls ratelimit.
This will pinpoint the responsible function, as long as not several
different places call ratelimit with the same ratelimit state at
the same time. In that case we catch only one random function that
calls ratelimit after the wait period.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200910231458.11832.borntraeger@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
After commit:8969a5ede0f9e17da4b943712429aef2c9bcd82b
"generic-ipi: remove kmalloc()", wait = 0 can be guaranteed.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1256210374-25354-1-git-send-email-sheng@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|