Age | Commit message (Collapse) | Author |
|
The device needs MSI disablement. Added to the quirk list.
Reported-by: Harald Dunkel <harri@afaics.de>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
These just represent the secondary and further heads attached to the
card, and they have different sets of PCI bar registers to map.
So don't try to drive them in the main driver.
Reported-by: Frans van Berckel <fberckel@xs4all.nl>
Tested-by: Frans van Berckel <fberckel@xs4all.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We provide regs->tstate, regs->tpc, regs->tnpc and
regs->u_regs[UREG_FP].
regs->tstate is necessary for:
user_mode() (via perf_exclude_event())
perf_misc_flags() (via perf_prepare_sample())
regs->tpc is necessary for:
perf_instruction_pointer() (via perf_prepare_sample())
and regs->u_regs[UREG_FP] is necessary for:
perf_callchain() (via perf_prepare_sample())
The regs->tnpc value is provided just to be tidy.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ACPI spec (sec 6.4.3.5 in v4.0) requires that for Address Space Resource
Descriptors, _LEN <= _MAX - _MIN + 1 in all cases, but there are BIOSes that
violate this. We experimentally determined that Windows truncates the
resource so it doesn't extend past _MAX, so let's do the same thing in
Linux.
http://bugzilla.kernel.org/show_bug.cgi?id=15480
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
I have a machine here that's sending 0xD1 notifications on the video
device once every second or so. I have no idea why (it's a prototype,
it may be broken), but sending KEY_UNKNOWN is unhelpful and results in
the console becoming unusable. Let's not report keys unless we have
something useful to say about them.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
pxms are mapped to low node ids to maintain generic kernel use of
functions such as pxm_to_node() that are used to determine device
affinity. Otherwise, there is no pxm-to-node and node-to-pxm matching
rule for x86_64 users of NUMA emulation where a single pxm may be bound
to multiple NUMA nodes.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Previously, we assumed the only Device object immediately below the root
was the \_SB Scope (which the ACPI CA treats as a Device), so we forced
the HID of all such objects to ACPI_BUS_HID ("LNXSYBUS").
However, there are DSDTs that supply root-level Device objects with _HIDs.
This patch makes us pay attention to those _HIDs and only add the synthetic
ACPI_BUS_HID for root-level objects that do not supply their own _HID.
For example, this DSDT: https://bugzilla.kernel.org/show_bug.cgi?id=15605
contains:
Scope (_SB) {
...
}
Device (AMW0) {
Name (_HID, EisaId ("PNP0C14"))
...
}
and we should use "PNP0C14" for the AMW0 device, not "LNXSYBUS".
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Yong Wang <yong.y.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't terminate the watchdog daemon when fsync() fails because no
watchdog driver actually implements the fsync() syscall.
Signed-off-by: Giel van Schijndel <me@mortis.eu>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
|
|
Use of ioremap() causes build failure on S390.
Restrict the driver to ARM until another architecture comes along
and enables the driver for its own use.
Signed-off-by: Marc Zyngier <maz@misterjones.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The watchdog_info struct cannot be a const since we dynamically fill
in the firmware version.
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
smc91c92_cs:
*cvt_ascii_address returns 0, if success.
*call free_netdev, if we can't find hardware address.
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vmemmap_populate() attempts to report the used index and total size of
vmemmap_table, but it wrongly shifts the total size so that it is
always shown as 0.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While chasing a bug report involving a OS/2 server, I noticed the server sets
pSMBr->CountHigh to a incorrect value even in case of normal writes. This
results in 'nbytes' being computed wrongly and triggers a kernel BUG at
mm/filemap.c.
void iov_iter_advance(struct iov_iter *i, size_t bytes)
{
BUG_ON(i->count < bytes); <--- BUG here
Why the server is setting 'CountHigh' is not clear but only does so after
writing 64k bytes. Though this looks like the server bug, the client side
crash may not be acceptable.
The workaround is to mask off high 16 bits if the number of bytes written as
returned by the server is greater than the bytes requested by the client as
suggested by Jeff Layton.
CC: Stable <stable@kernel.org>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
By doing this we always overwrite nbytes value that is being passed on to
CIFSSMBWrite() and need not rely on the callers to initialize. CIFSSMBWrite2 is
doing this already.
CC: Stable <stable@kernel.org>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
Now that software events use perf_arch_fetch_caller_regs() too, we
need the powerpc version to be always built.
Fixes the following build error:
(.text+0x3210): undefined reference to `perf_arch_fetch_caller_regs'
(.text+0x3324): undefined reference to `perf_arch_fetch_caller_regs'
(.text+0x33bc): undefined reference to `perf_arch_fetch_caller_regs'
(.text+0x33ec): undefined reference to `perf_arch_fetch_caller_regs'
(.text+0xd4a0): undefined reference to `perf_arch_fetch_caller_regs'
arch/powerpc/kernel/built-in.o:(.text+0xd528): more undefined references to `perf_arch_fetch_caller_regs' follow
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [sub-make] Error 2
Reported-by: Michael Ellerman <michael@ellerman.id.au>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
|
|
Now that software events use perf_arch_fetch_caller_regs() too, we
need the stub version to be always built in for archs that don't
implement it.
Fixes the following build error in PARISC:
kernel/built-in.o: In function `perf_event_task_sched_out':
(.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs'
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
|
|
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 5965/1: Fix soft lockup in at91 udc driver
ARM: 6006/1: ARM: Use the correct NOP size in memmove for Thumb-2 kernel builds
ARM: 6005/1: arm: kprobes: fix register corruption with jprobes
ARM: 6003/1: removing compilation warning from pl061.h
ARM: 6001/1: removing compilation warning comming from clkdev.h
ARM: 6000/1: removing compilation warning comming from <asm/irq.h>
ARM: 5999/1: Including device.h and resource.h header files in linux/amba/bus.h
ARM: 5997/1: ARM: Correct the VFPv3 detection
ARM: 5996/1: ARM: Change the mandatory barriers implementation (4/4)
ARM: 5995/1: ARM: Add L2x0 outer_sync() support (3/4)
ARM: 5994/1: ARM: Add outer_cache_fns.sync function pointer (2/4)
ARM: 5993/1: ARM: Move the outer_cache definitions into a separate file (1/4)
|
|
* 'merge' of git://git.secretlab.ca/git/linux-2.6:
powerpc/5200: in lpbfifo, flag DMA irqs as enabled after requesting them
powerpc/fsl: add device tree binding for QE firmware
of/flattree: Fix unhandled OF_DT_NOP tag when unflattening the device tree
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
reiserfs: Fix locking BUG during mount failure
|
|
The missing initialization of the nb_cntl.strap_msi_enable does not
seem to be the only problem that prevents MSI, so that quirk is not
sufficient to enable MSI on all machines. To be safe, disable MSI
unconditionally for the internal graphics and HDMI audio on these
chipsets.
[rjw: Added the PCI_VENDOR_ID_AI quirk.]
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'kgdb-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Turn off tracing while in the debugger
kgdb: use atomic_inc and atomic_dec instead of atomic_set
kgdb: eliminate kgdb_wait(), all cpus enter the same way
kgdbts,sh: Add in breakpoint pc offset for superh
kgdb: have ebin2mem call probe_kernel_write once
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
Freezer: Fix buggy resume test for tasks frozen with cgroup freezer
Freezer: Only show the state of tasks refusing to freeze
|
|
release_one_tty(tty) can be called when tty still has a reference
to pgrp/session. In this case we leak the pid.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Building chokes with:
In file included from /usr/include/gelf.h:53,
from /usr/include/elfutils/libdw.h:53,
from util/probe-finder.h:61,
from util/probe-finder.c:39:
/usr/include/libelf.h:98: error: expected specifier-qualifier-list before 'off64_t'
[...]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <20100329164755.GA16034@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The kernel debugger should turn off kernel tracing any time the
debugger is active and restore it on resume.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Memory barriers should be used for the kgdb cpu synchronization. The
atomic_set() does not imply a memory barrier.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
This is a kgdb architectural change to have all the cpus (master or
slave) enter the same function.
A cpu that hits an exception (wants to be the master cpu) will call
kgdb_handle_exception() from the trap handler and then invoke a
kgdb_roundup_cpu() to synchronize the other cpus and bring them into
the kgdb_handle_exception() as well.
A slave cpu will enter kgdb_handle_exception() from the
kgdb_nmicallback() and set the exception state to note that the
processor is a slave.
Previously the salve cpu would have called kgdb_wait(). This change
allows the debug core to change cpus without resuming the system in
order to inspect arch specific cpu information.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
The kgdb test suite mimics the behavior of gdb. For the sh
architecture the pc must be decremented by 2 for software breakpoint.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
|
|
Rather than call probe_kernel_write() one byte at a time, process the
whole buffer locally and pass the entire result in one go. This way,
architectures that need to do special handling based on the length can
do so, or we only end up calling memcpy() once.
[sonic.zhang@analog.com: Reported original problem and preliminary patch]
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
|
|
This is a fix to the signed/unsigned field handling in the
Python scripting engine, based on a patch from Roel Kluin.
Basically, Python wants to use a PyInt (which is internally a
long) if it can i.e. if the value will fit into that type. If
not, it stores it into a PyLong, which isn't actually a long,
but an arbitrary-precision integer variable.
The code below is similar to to what Python does internally, and
it seems to work as expected on the x86 and x86_64 sytems I
tested it on.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Roel Kluin <roel.kluin@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
LKML-Reference: <1270184305.6422.10.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We used to free all the Tx queues memory when interface is brought
down and reallocate them again in interface up. This requires
order-4 allocation for txq->cmd[]. In situations like s2ram, this
usually leads to allocation failure in the memory subsystem. The
patch fixed this problem by allocating the Tx queues memory only at
the first time. Later iwl_down/iwl_up only initialize but don't
free and reallocate them. The memory is freed at the device removal
time. BTW, we have already done this for the Rx queue.
This fixed bug https://bugzilla.kernel.org/show_bug.cgi?id=15551
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
|
|
Jan Grossmann reported kernel boot panic while booting SMP
kernel on his system with a single core cpu. SMP kernels call
enable_IR_x2apic() from native_smp_prepare_cpus() and on
platforms where the kernel doesn't find SMP configuration we
ended up again calling enable_IR_x2apic() from the
APIC_init_uniprocessor() call in the smp_sanity_check(). Thus
leading to kernel panic.
Don't call enable_IR_x2apic() and default_setup_apic_routing()
from APIC_init_uniprocessor() in CONFIG_SMP case.
NOTE: this kind of non-idempotent and assymetric initialization
sequence is rather fragile and unclean, we'll clean that up
in v2.6.35. This is the minimal fix for v2.6.34.
Reported-by: Jan.Grossmann@kielnet.net
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <jbarnes@virtuousgeek.org>
Cc: <david.woodhouse@intel.com>
Cc: <weidong.han@intel.com>
Cc: <youquan.song@intel.com>
Cc: <Jan.Grossmann@kielnet.net>
Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x]
LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
When collecting tx data for non-aggregation packets in rate scaling, if
the tx data matches "other table", it still uses current table to update
the stats and calculate average throughput in function rs_collect_tx_data().
This can mess up the rate scaling data structure and cause a kernel panic
in a BUG_ON statement in rs_rate_scale_perform().
To fix this bug, we pass table pointer instead of window pointer (pointed
to by table pointer) to function rs_collect_tx_data() so that the table
being used is consistent.
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Henry Zhang <hongx.c.zhang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
|
|
Below warning is triggered sometimes at module removal time when
CONFIG_DMA_API_DEBUG is enabled. This should be caused by we didn't
unmap pending commands (enqueued, but no complete notification
received) for the Tx command queue.
[ 1583.107469] ------------[ cut here ]------------
[ 1583.107539] WARNING: at lib/dma-debug.c:688
dma_debug_device_change+0x13c/0x180()
[ 1583.107617] Hardware name: ...
[ 1583.107664] pci 0000:04:00.0: DMA-API: device driver has pending DMA
allocations while released from device [count=1]
[ 1583.107713] Modules linked in: ...
[ 1583.111661] Pid: 16970, comm: modprobe Tainted: G W
2.6.34-rc1-wl #33
[ 1583.111727] Call Trace:
[ 1583.111779] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180
[ 1583.111833] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180
[ 1583.111908] [<c0138e11>] warn_slowpath_common+0x71/0xd0
[ 1583.111963] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180
[ 1583.112016] [<c0138ebb>] warn_slowpath_fmt+0x2b/0x30
[ 1583.112086] [<c02a281c>] dma_debug_device_change+0x13c/0x180
[ 1583.112142] [<c03e6c33>] notifier_call_chain+0x53/0x90
[ 1583.112198] [<c03e1ebe>] ? down_read+0x6e/0x90
[ 1583.112271] [<c015b229>] __blocking_notifier_call_chain+0x49/0x70
[ 1583.112326] [<c015b26f>] blocking_notifier_call_chain+0x1f/0x30
[ 1583.112380] [<c031931c>] __device_release_driver+0x8c/0xa0
[ 1583.112451] [<c03193bf>] driver_detach+0x8f/0xa0
[ 1583.112538] [<c0318382>] bus_remove_driver+0x82/0x100
[ 1583.112595] [<c0319ad9>] driver_unregister+0x49/0x80
[ 1583.112671] [<c024feb2>] ? sysfs_remove_file+0x12/0x20
[ 1583.112727] [<c02aa292>] pci_unregister_driver+0x32/0x80
[ 1583.112791] [<fc13a3c1>] iwl_exit+0x12/0x19 [iwlagn]
[ 1583.112848] [<c017940a>] sys_delete_module+0x15a/0x210
[ 1583.112870] [<c015a5db>] ? up_read+0x1b/0x30
[ 1583.112893] [<c029600c>] ? trace_hardirqs_off_thunk+0xc/0x10
[ 1583.112924] [<c0295ffc>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 1583.112947] [<c03e6a1f>] ? do_page_fault+0x1ff/0x3c0
[ 1583.112978] [<c03e36f6>] ? restore_all_notrace+0x0/0x18
[ 1583.113002] [<c016aa70>] ? trace_hardirqs_on_caller+0x20/0x190
[ 1583.113025] [<c0102d58>] sysenter_do_call+0x12/0x38
[ 1583.113054] ---[ end trace fc23e059cc4c2ced ]---
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
|
|
In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.
Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:
ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
ENQUEUE_WAKING - the enqueue has relative vruntime due to
having sched_class::task_waking() called,
ENQUEUE_HEAD - the waking task should be places on the head
of the priority queue (where appropriate).
For symmetry also convert sched_class::dequeue() to a flags scheme.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The cpuload calculation in calc_load_account_active() assumes
rq->nr_uninterruptible will not change on an offline cpu after
migrate_nr_uninterruptible(). However the recent migrate on wakeup
changes broke that and would result in decrementing the offline cpu's
rq->nr_uninterruptible.
Fix this by accounting the nr_uninterruptible on the waking cpu.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Now that we hold the rq->lock over set_task_cpu() again, we can do
away with most of the TASK_WAKING checks and reduce them again to
set_cpus_allowed_ptr().
Removes some conditionals from scheduling hot-paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Oleg noticed a few races with the TASK_WAKING usage on fork.
- since TASK_WAKING is basically a spinlock, it should be IRQ safe
- since we set TASK_WAKING (*) without holding rq->lock it could
be there still is a rq->lock holder, thereby not actually
providing full serialization.
(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.
Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().
Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.
Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
suitable cpu.
I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091027.GA9155@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.
_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless.
This can race with other CPUs updating our ->cpus_allowed, and this
looks meaningless to me.
The task is current and running, it must have online cpus in ->cpus_allowed,
the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu,
this likely means we raced with set_cpus_allowed() which was called
for reason, why should sched_exec() retry and call ->select_task_rq()
again?
Change the code to call sched_class->select_task_rq() directly and do
nothing if the returned cpu is wrong after re-checking under rq->lock.
From now task_struct->cpus_allowed is always stable under TASK_WAKING,
select_fallback_rq() is always called under rq-lock or the caller or
the caller owns TASK_WAKING (select_task_rq).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091019.GA9141@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The previous patch preserved the retry logic, but it looks unneeded.
__migrate_task() can only fail if we raced with migration after we dropped
the lock, but in this case the caller of set_cpus_allowed/etc must initiate
migration itself if ->on_rq == T.
We already fixed p->cpus_allowed, the changes in active/online masks must
be visible to racer, it should migrate the task to online cpu correctly.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091014.GA9138@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
cpuset_lock/cpuset_cpus_allowed_locked code
This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.
- cpuset_lock() doesn't really work. It is needed for
cpuset_cpus_allowed_locked() but we can't take this lock in
try_to_wake_up()->select_fallback_rq() path.
- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.
- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
which is not irq-safe, but try_to_wake_up() can be called from irq.
Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.
Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.
The subsequent patches try to to fix these problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091003.GA9123@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
USER_SCHED has been removed, so update the documentation
accordingly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Howells <dhowells@redhat.com>
LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Trivial typo fix. rq->migration_thread can be NULL after
task_rq_unlock(), this is why we have "mt" which should be
used instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330165829.GA18284@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks
task_times(). Other places in kernel use nvcsw and nivcsw, which are
being cleared as well, Clear task statistics only.
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269940193.19286.14.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|