summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-04-04ALSA: hda - Add MSI blacklist for Aopen MZ915-MTakashi Iwai
The device needs MSI disablement. Added to the quirk list. Reported-by: Harald Dunkel <harri@afaics.de> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-04-04sunxvr500: Ignore secondary output PCI devices.David S. Miller
These just represent the secondary and further heads attached to the card, and they have different sets of PCI bar registers to map. So don't try to drive them in the main driver. Reported-by: Frans van Berckel <fberckel@xs4all.nl> Tested-by: Frans van Berckel <fberckel@xs4all.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03sparc64: Implement perf_arch_fetch_caller_regsDavid S. Miller
We provide regs->tstate, regs->tpc, regs->tnpc and regs->u_regs[UREG_FP]. regs->tstate is necessary for: user_mode() (via perf_exclude_event()) perf_misc_flags() (via perf_prepare_sample()) regs->tpc is necessary for: perf_instruction_pointer() (via perf_prepare_sample()) and regs->u_regs[UREG_FP] is necessary for: perf_callchain() (via perf_prepare_sample()) The regs->tnpc value is provided just to be tidy. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-04PNPACPI: truncate _CRS windows with _LEN > _MAX - _MIN + 1Bjorn Helgaas
The ACPI spec (sec 6.4.3.5 in v4.0) requires that for Address Space Resource Descriptors, _LEN <= _MAX - _MIN + 1 in all cases, but there are BIOSes that violate this. We experimentally determined that Windows truncates the resource so it doesn't extend past _MAX, so let's do the same thing in Linux. http://bugzilla.kernel.org/show_bug.cgi?id=15480 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>
2010-04-04ACPI: Don't send KEY_UNKNOWN for random video notificationsMatthew Garrett
I have a machine here that's sending 0xD1 notifications on the video device once every second or so. I have no idea why (it's a prototype, it may be broken), but sending KEY_UNKNOWN is unhelpful and results in the console becoming unusable. Let's not report keys unless we have something useful to say about them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2010-04-04ACPI: NUMA: map pxms to low node idsDavid Rientjes
pxms are mapped to low node ids to maintain generic kernel use of functions such as pxm_to_node() that are used to determine device affinity. Otherwise, there is no pxm-to-node and node-to-pxm matching rule for x86_64 users of NUMA emulation where a single pxm may be bound to multiple NUMA nodes. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Len Brown <len.brown@intel.com>
2010-04-03ACPI: use _HID when supplied by root-level devicesBjorn Helgaas
Previously, we assumed the only Device object immediately below the root was the \_SB Scope (which the ACPI CA treats as a Device), so we forced the HID of all such objects to ACPI_BUS_HID ("LNXSYBUS"). However, there are DSDTs that supply root-level Device objects with _HIDs. This patch makes us pay attention to those _HIDs and only add the synthetic ACPI_BUS_HID for root-level objects that do not supply their own _HID. For example, this DSDT: https://bugzilla.kernel.org/show_bug.cgi?id=15605 contains: Scope (_SB) { ... } Device (AMW0) { Name (_HID, EisaId ("PNP0C14")) ... } and we should use "PNP0C14" for the AMW0 device, not "LNXSYBUS". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Yong Wang <yong.y.wang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2010-04-03sparc64: Update defconfig.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03[WATCHDOG] doc: watchdog simple example: don't fail on fsync()Giel van Schijndel
Don't terminate the watchdog daemon when fsync() fails because no watchdog driver actually implements the fsync() syscall. Signed-off-by: Giel van Schijndel <me@mortis.eu> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2010-04-03Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
2010-04-03[WATCHDOG] set max63xx driver as ARM onlyMarc Zyngier
Use of ioremap() causes build failure on S390. Restrict the driver to ARM until another architecture comes along and enables the driver for its own use. Signed-off-by: Marc Zyngier <maz@misterjones.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2010-04-03[WATCHDOG] powerpc: pika_wdt ident cannot be constSean MacLennan
The watchdog_info struct cannot be a const since we dynamically fill in the firmware version. Signed-off-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2010-04-03smc91c92_cs: fix the problem of "Unable to find hardware address"Ken Kawasaki
smc91c92_cs: *cvt_ascii_address returns 0, if success. *call free_netdev, if we can't find hardware address. Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03sparc64: Fix array size reported by vmemmap_populate()Ben Hutchings
vmemmap_populate() attempts to report the used index and total size of vmemmap_table, but it wrongly shifts the total size so that it is always shown as 0. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03cifs: Fix a kernel BUG with remote OS/2 server (try #3)Suresh Jayaraman
While chasing a bug report involving a OS/2 server, I noticed the server sets pSMBr->CountHigh to a incorrect value even in case of normal writes. This results in 'nbytes' being computed wrongly and triggers a kernel BUG at mm/filemap.c. void iov_iter_advance(struct iov_iter *i, size_t bytes) { BUG_ON(i->count < bytes); <--- BUG here Why the server is setting 'CountHigh' is not clear but only does so after writing 64k bytes. Though this looks like the server bug, the client side crash may not be acceptable. The workaround is to mask off high 16 bits if the number of bytes written as returned by the server is greater than the bytes requested by the client as suggested by Jeff Layton. CC: Stable <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-03[CIFS] initialize nbytes at the beginning of CIFSSMBWrite()Steve French
By doing this we always overwrite nbytes value that is being passed on to CIFSSMBWrite() and need not rely on the callers to initialize. CIFSSMBWrite2 is doing this already. CC: Stable <stable@kernel.org> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-04-03perf: Always build the powerpc perf_arch_fetch_caller_regs versionFrederic Weisbecker
Now that software events use perf_arch_fetch_caller_regs() too, we need the powerpc version to be always built. Fixes the following build error: (.text+0x3210): undefined reference to `perf_arch_fetch_caller_regs' (.text+0x3324): undefined reference to `perf_arch_fetch_caller_regs' (.text+0x33bc): undefined reference to `perf_arch_fetch_caller_regs' (.text+0x33ec): undefined reference to `perf_arch_fetch_caller_regs' (.text+0xd4a0): undefined reference to `perf_arch_fetch_caller_regs' arch/powerpc/kernel/built-in.o:(.text+0xd528): more undefined references to `perf_arch_fetch_caller_regs' follow make[1]: *** [.tmp_vmlinux1] Error 1 make: *** [sub-make] Error 2 Reported-by: Michael Ellerman <michael@ellerman.id.au> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
2010-04-03perf: Always build the stub perf_arch_fetch_caller_regs versionFrederic Weisbecker
Now that software events use perf_arch_fetch_caller_regs() too, we need the stub version to be always built in for archs that don't implement it. Fixes the following build error in PARISC: kernel/built-in.o: In function `perf_event_task_sched_out': (.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs' Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
2010-04-02Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: ARM: 5965/1: Fix soft lockup in at91 udc driver ARM: 6006/1: ARM: Use the correct NOP size in memmove for Thumb-2 kernel builds ARM: 6005/1: arm: kprobes: fix register corruption with jprobes ARM: 6003/1: removing compilation warning from pl061.h ARM: 6001/1: removing compilation warning comming from clkdev.h ARM: 6000/1: removing compilation warning comming from <asm/irq.h> ARM: 5999/1: Including device.h and resource.h header files in linux/amba/bus.h ARM: 5997/1: ARM: Correct the VFPv3 detection ARM: 5996/1: ARM: Change the mandatory barriers implementation (4/4) ARM: 5995/1: ARM: Add L2x0 outer_sync() support (3/4) ARM: 5994/1: ARM: Add outer_cache_fns.sync function pointer (2/4) ARM: 5993/1: ARM: Move the outer_cache definitions into a separate file (1/4)
2010-04-02Merge branch 'merge' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
* 'merge' of git://git.secretlab.ca/git/linux-2.6: powerpc/5200: in lpbfifo, flag DMA irqs as enabled after requesting them powerpc/fsl: add device tree binding for QE firmware of/flattree: Fix unhandled OF_DT_NOP tag when unflattening the device tree
2010-04-02Merge branch 'reiserfs/kill-bkl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Fix locking BUG during mount failure
2010-04-02PCI quirk: RS780/RS880: disable MSI behind the PCI bridgeClemens Ladisch
The missing initialization of the nb_cntl.strap_msi_enable does not seem to be the only problem that prevents MSI, so that quirk is not sufficient to enable MSI on all machines. To be safe, disable MSI unconditionally for the internal graphics and HDMI audio on these chipsets. [rjw: Added the PCI_VENDOR_ID_AI quirk.] Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-02Merge branch 'kgdb-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'kgdb-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: Turn off tracing while in the debugger kgdb: use atomic_inc and atomic_dec instead of atomic_set kgdb: eliminate kgdb_wait(), all cpus enter the same way kgdbts,sh: Add in breakpoint pc offset for superh kgdb: have ebin2mem call probe_kernel_write once
2010-04-02Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Freezer: Fix buggy resume test for tasks frozen with cgroup freezer Freezer: Only show the state of tasks refusing to freeze
2010-04-02tty: release_one_tty() forgets to put pidsOleg Nesterov
release_one_tty(tty) can be called when tty still has a reference to pgrp/session. In this case we leak the pid. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-02perf, probe-finder: Build fix on DebianBorislav Petkov
Building chokes with: In file included from /usr/include/gelf.h:53, from /usr/include/elfutils/libdw.h:53, from util/probe-finder.h:61, from util/probe-finder.c:39: /usr/include/libelf.h:98: error: expected specifier-qualifier-list before 'off64_t' [...] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <20100329164755.GA16034@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02kgdb: Turn off tracing while in the debuggerJason Wessel
The kernel debugger should turn off kernel tracing any time the debugger is active and restore it on resume. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-02kgdb: use atomic_inc and atomic_dec instead of atomic_setJason Wessel
Memory barriers should be used for the kgdb cpu synchronization. The atomic_set() does not imply a memory barrier. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-04-02kgdb: eliminate kgdb_wait(), all cpus enter the same wayJason Wessel
This is a kgdb architectural change to have all the cpus (master or slave) enter the same function. A cpu that hits an exception (wants to be the master cpu) will call kgdb_handle_exception() from the trap handler and then invoke a kgdb_roundup_cpu() to synchronize the other cpus and bring them into the kgdb_handle_exception() as well. A slave cpu will enter kgdb_handle_exception() from the kgdb_nmicallback() and set the exception state to note that the processor is a slave. Previously the salve cpu would have called kgdb_wait(). This change allows the debug core to change cpus without resuming the system in order to inspect arch specific cpu information. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-04-02kgdbts,sh: Add in breakpoint pc offset for superhJason Wessel
The kgdb test suite mimics the behavior of gdb. For the sh architecture the pc must be decremented by 2 for software breakpoint. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Paul Mundt <lethal@linux-sh.org>
2010-04-02kgdb: have ebin2mem call probe_kernel_write onceJason Wessel
Rather than call probe_kernel_write() one byte at a time, process the whole buffer locally and pass the entire result in one go. This way, architectures that need to do special handling based on the length can do so, or we only end up calling memcpy() once. [sonic.zhang@analog.com: Reported original problem and preliminary patch] Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-04-02perf/scripts: Tuple was set from long in both branches in python_process_event()Tom Zanussi
This is a fix to the signed/unsigned field handling in the Python scripting engine, based on a patch from Roel Kluin. Basically, Python wants to use a PyInt (which is internally a long) if it can i.e. if the value will fit into that type. If not, it stores it into a PyLong, which isn't actually a long, but an arbitrary-precision integer variable. The code below is similar to to what Python does internally, and it seems to work as expected on the x86 and x86_64 sytems I tested it on. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Roel Kluin <roel.kluin@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: rostedt@goodmis.org LKML-Reference: <1270184305.6422.10.camel@tropicana> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02iwlwifi: avoid Tx queue memory allocation in interface downZhu Yi
We used to free all the Tx queues memory when interface is brought down and reallocate them again in interface up. This requires order-4 allocation for txq->cmd[]. In situations like s2ram, this usually leads to allocation failure in the memory subsystem. The patch fixed this problem by allocating the Tx queues memory only at the first time. Later iwl_down/iwl_up only initialize but don't free and reallocate them. The memory is freed at the device removal time. BTW, we have already done this for the Rx queue. This fixed bug https://bugzilla.kernel.org/show_bug.cgi?id=15551 Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-02x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boardsSuresh Siddha
Jan Grossmann reported kernel boot panic while booting SMP kernel on his system with a single core cpu. SMP kernels call enable_IR_x2apic() from native_smp_prepare_cpus() and on platforms where the kernel doesn't find SMP configuration we ended up again calling enable_IR_x2apic() from the APIC_init_uniprocessor() call in the smp_sanity_check(). Thus leading to kernel panic. Don't call enable_IR_x2apic() and default_setup_apic_routing() from APIC_init_uniprocessor() in CONFIG_SMP case. NOTE: this kind of non-idempotent and assymetric initialization sequence is rather fragile and unclean, we'll clean that up in v2.6.35. This is the minimal fix for v2.6.34. Reported-by: Jan.Grossmann@kielnet.net Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <jbarnes@virtuousgeek.org> Cc: <david.woodhouse@intel.com> Cc: <weidong.han@intel.com> Cc: <youquan.song@intel.com> Cc: <Jan.Grossmann@kielnet.net> Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x] LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02iwlwifi: use consistent table for tx data collectShanyu Zhao
When collecting tx data for non-aggregation packets in rate scaling, if the tx data matches "other table", it still uses current table to update the stats and calculate average throughput in function rs_collect_tx_data(). This can mess up the rate scaling data structure and cause a kernel panic in a BUG_ON statement in rs_rate_scale_perform(). To fix this bug, we pass table pointer instead of window pointer (pointed to by table pointer) to function rs_collect_tx_data() so that the table being used is consistent. Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com> Signed-off-by: Henry Zhang <hongx.c.zhang@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-02iwlwifi: fix DMA allocation warningsZhu Yi
Below warning is triggered sometimes at module removal time when CONFIG_DMA_API_DEBUG is enabled. This should be caused by we didn't unmap pending commands (enqueued, but no complete notification received) for the Tx command queue. [ 1583.107469] ------------[ cut here ]------------ [ 1583.107539] WARNING: at lib/dma-debug.c:688 dma_debug_device_change+0x13c/0x180() [ 1583.107617] Hardware name: ... [ 1583.107664] pci 0000:04:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=1] [ 1583.107713] Modules linked in: ... [ 1583.111661] Pid: 16970, comm: modprobe Tainted: G W 2.6.34-rc1-wl #33 [ 1583.111727] Call Trace: [ 1583.111779] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180 [ 1583.111833] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180 [ 1583.111908] [<c0138e11>] warn_slowpath_common+0x71/0xd0 [ 1583.111963] [<c02a281c>] ? dma_debug_device_change+0x13c/0x180 [ 1583.112016] [<c0138ebb>] warn_slowpath_fmt+0x2b/0x30 [ 1583.112086] [<c02a281c>] dma_debug_device_change+0x13c/0x180 [ 1583.112142] [<c03e6c33>] notifier_call_chain+0x53/0x90 [ 1583.112198] [<c03e1ebe>] ? down_read+0x6e/0x90 [ 1583.112271] [<c015b229>] __blocking_notifier_call_chain+0x49/0x70 [ 1583.112326] [<c015b26f>] blocking_notifier_call_chain+0x1f/0x30 [ 1583.112380] [<c031931c>] __device_release_driver+0x8c/0xa0 [ 1583.112451] [<c03193bf>] driver_detach+0x8f/0xa0 [ 1583.112538] [<c0318382>] bus_remove_driver+0x82/0x100 [ 1583.112595] [<c0319ad9>] driver_unregister+0x49/0x80 [ 1583.112671] [<c024feb2>] ? sysfs_remove_file+0x12/0x20 [ 1583.112727] [<c02aa292>] pci_unregister_driver+0x32/0x80 [ 1583.112791] [<fc13a3c1>] iwl_exit+0x12/0x19 [iwlagn] [ 1583.112848] [<c017940a>] sys_delete_module+0x15a/0x210 [ 1583.112870] [<c015a5db>] ? up_read+0x1b/0x30 [ 1583.112893] [<c029600c>] ? trace_hardirqs_off_thunk+0xc/0x10 [ 1583.112924] [<c0295ffc>] ? trace_hardirqs_on_thunk+0xc/0x10 [ 1583.112947] [<c03e6a1f>] ? do_page_fault+0x1ff/0x3c0 [ 1583.112978] [<c03e36f6>] ? restore_all_notrace+0x0/0x18 [ 1583.113002] [<c016aa70>] ? trace_hardirqs_on_caller+0x20/0x190 [ 1583.113025] [<c0102d58>] sysenter_do_call+0x12/0x38 [ 1583.113054] ---[ end trace fc23e059cc4c2ced ]--- Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-02sched: Add enqueue/dequeue flagsPeter Zijlstra
In order to reduce the dependency on TASK_WAKING rework the enqueue interface to support a proper flags field. Replace the int wakeup, bool head arguments with an int flags argument and create the following flags: ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task, ENQUEUE_WAKING - the enqueue has relative vruntime due to having sched_class::task_waking() called, ENQUEUE_HEAD - the waking task should be places on the head of the priority queue (where appropriate). For symmetry also convert sched_class::dequeue() to a flags scheme. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Fix nr_uninterruptible countPeter Zijlstra
The cpuload calculation in calc_load_account_active() assumes rq->nr_uninterruptible will not change on an offline cpu after migrate_nr_uninterruptible(). However the recent migrate on wakeup changes broke that and would result in decrementing the offline cpu's rq->nr_uninterruptible. Fix this by accounting the nr_uninterruptible on the waking cpu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Optimize task_rq_lock()Peter Zijlstra
Now that we hold the rq->lock over set_task_cpu() again, we can do away with most of the TASK_WAKING checks and reduce them again to set_cpus_allowed_ptr(). Removes some conditionals from scheduling hot-paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Fix TASK_WAKING vs fork deadlockPeter Zijlstra
Oleg noticed a few races with the TASK_WAKING usage on fork. - since TASK_WAKING is basically a spinlock, it should be IRQ safe - since we set TASK_WAKING (*) without holding rq->lock it could be there still is a rq->lock holder, thereby not actually providing full serialization. (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING. Cure the second issue by not setting TASK_WAKING in sched_fork(), but only temporarily in wake_up_new_task() while calling select_task_rq(). Cure the first by holding rq->lock around the select_task_rq() call, this will disable IRQs, this however requires that we push down the rq->lock release into select_task_rq_fair()'s cgroup stuff. Because select_task_rq_fair() still needs to drop the rq->lock we cannot fully get rid of TASK_WAKING. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Make select_fallback_rq() cpuset friendlyOleg Nesterov
Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems with select_fallback_rq(). It can be called from any context and can't use any cpuset locks including task_lock(). It is called when the task doesn't have online cpus in ->cpus_allowed but ttwu/etc must be able to find a suitable cpu. I am not proud of this patch. Everything which needs such a fat comment can't be good even if correct. But I'd prefer to not change the locking rules in the code I hardly understand, and in any case I believe this simple change make the code much more correct compared to deadlocks we currently have. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091027.GA9155@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: _cpu_down(): Don't play with current->cpus_allowedOleg Nesterov
_cpu_down() changes the current task's affinity and then recovers it at the end. The problems are well known: we can't restore old_allowed if it was bound to the now-dead-cpu, and we can race with the userspace which can change cpu-affinity during unplug. _cpu_down() should not play with current->cpus_allowed at all. Instead, take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable() removes the dying cpu from cpu_online_mask. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091023.GA9148@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: sched_exec(): Remove the select_fallback_rq() logicOleg Nesterov
sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless. This can race with other CPUs updating our ->cpus_allowed, and this looks meaningless to me. The task is current and running, it must have online cpus in ->cpus_allowed, the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu, this likely means we raced with set_cpus_allowed() which was called for reason, why should sched_exec() retry and call ->select_task_rq() again? Change the code to call sched_class->select_task_rq() directly and do nothing if the returned cpu is wrong after re-checking under rq->lock. From now task_struct->cpus_allowed is always stable under TASK_WAKING, select_fallback_rq() is always called under rq-lock or the caller or the caller owns TASK_WAKING (select_task_rq). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091019.GA9141@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: move_task_off_dead_cpu(): Remove retry logicOleg Nesterov
The previous patch preserved the retry logic, but it looks unneeded. __migrate_task() can only fail if we raced with migration after we dropped the lock, but in this case the caller of set_cpus_allowed/etc must initiate migration itself if ->on_rq == T. We already fixed p->cpus_allowed, the changes in active/online masks must be visible to racer, it should migrate the task to online cpu correctly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091014.GA9138@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()Oleg Nesterov
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed lockless. We can race with set_cpus_allowed() running in parallel. Change it to take rq->lock around select_fallback_rq(). Note that it is not trivial to move this spin_lock() into select_fallback_rq(), we must recheck the task was not migrated after we take the lock and other callers do not need this lock. To avoid the races with other callers of select_fallback_rq() which rely on TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise. The owner of TASK_WAKING must update ->cpus_allowed and choose the correct CPU anyway, and the subsequent __migrate_task() is just meaningless because p->se.on_rq must be false. Alternatively, we could change select_task_rq() to take rq->lock right after it calls sched_class->select_task_rq(), but this looks a bit ugly. Also, change it to not assume irqs are disabled and absorb __migrate_task_irq(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091010.GA9131@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Kill the broken and deadlockable ↵Oleg Nesterov
cpuset_lock/cpuset_cpus_allowed_locked code This patch just states the fact the cpusets/cpuhotplug interaction is broken and removes the deadlockable code which only pretends to work. - cpuset_lock() doesn't really work. It is needed for cpuset_cpus_allowed_locked() but we can't take this lock in try_to_wake_up()->select_fallback_rq() path. - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take cpuset_lock() and hangs forever because CPU is already dead and thus T can't be scheduled. - cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock() which is not irq-safe, but try_to_wake_up() can be called from irq. Kill them, and change select_fallback_rq() to use cpu_possible_mask, like we currently do without CONFIG_CPUSETS. Also, with or without this patch, with or without CONFIG_CPUSETS, the callers of select_fallback_rq() can race with each other or with set_cpus_allowed() pathes. The subsequent patches try to to fix these problems. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091003.GA9123@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Remove USER_SCHED from documentationLi Zefan
USER_SCHED has been removed, so update the documentation accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Serge E. Hallyn <serue@us.ibm.com> LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Remove remaining USER_SCHED codeLi Zefan
This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"") Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Dhaval Giani <dhaval.giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Howells <dhowells@redhat.com> LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlockOleg Nesterov
Trivial typo fix. rq->migration_thread can be NULL after task_rq_unlock(), this is why we have "mt" which should be used instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100330165829.GA18284@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Fix proc_sched_set_task()Mike Galbraith
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks task_times(). Other places in kernel use nvcsw and nivcsw, which are being cleared as well, Clear task statistics only. Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269940193.19286.14.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>