summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-10-26include/linux/pageblock-flags.h: fix set_pageblock_flags() macro definitonzeal
The presently-unused macro was missing one parameter. Signed-off-by: zeal <zealcook@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26writeback: remove nonblocking/encountered_congestion referencesWu Fengguang
This removes more dead code that was somehow missed by commit 0d99519efef (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: fix locking for oom_adj and oom_score_adjDavid Rientjes
The locking order in oom_adjust_write() and oom_score_adj_write() for task->alloc_lock and task->sighand->siglock is reversed, and lockdep notices that irqs could encounter an ABBA scenario. This fixes the locking order so that we always take task_lock(task) prior to lock_task_sighand(task). Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: rewrite error handling for oom_adj and oom_score_adj tunablesDavid Rientjes
It's better to use proper error handling in oom_adjust_write() and oom_score_adj_write() instead of duplicating the locking order on various exit paths. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: kill all threads sharing oom killed task's mmDavid Rientjes
It's necessary to kill all threads that share an oom killed task's mm if the goal is to lead to future memory freeing. This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill doesn't kill vfork parent (or child)) since it is obsoleted. It's now guaranteed that any task passed to oom_kill_task() does not share an mm with any thread that is unkillable. Thus, we're safe to issue a SIGKILL to any thread sharing the same mm. This is especially necessary to solve an mm->mmap_sem livelock issue whereas an oom killed thread must acquire the lock in the exit path while another thread is holding it in the page allocator while trying to allocate memory itself (and will preempt the oom killer since a task was already killed). Since tasks with pending fatal signals are now granted access to memory reserves, the thread holding the lock may quickly allocate and release the lock so that the oom killed task may exit. This mainly is for threads that are cloned with CLONE_VM but not CLONE_THREAD, so they are in a different thread group. Non-NPTL threads exist in the wild and this change is necessary to prevent the livelock in such cases. We care more about preventing the livelock than incurring the additional tasklist in the oom killer when a task has been killed. Systems that are sufficiently large to not want the tasklist scan in the oom killer in the first place already have the option of enabling /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for that purpose. This code had existed in the oom killer for over eight years dating back to the 2.4 kernel. [akpm@linux-foundation.org: add nice comment] Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: avoid killing a task if a thread sharing its mm cannot be killedDavid Rientjes
The oom killer's goal is to kill a memory-hogging task so that it may exit, free its memory, and allow the current context to allocate the memory that triggered it in the first place. Thus, killing a task is pointless if other threads sharing its mm cannot be killed because of its /proc/pid/oom_adj or /proc/pid/oom_score_adj value. This patch checks whether any other thread sharing p->mm has an oom_score_adj of OOM_SCORE_ADJ_MIN. If so, the thread cannot be killed and oom_badness(p) returns 0, meaning it's unkillable. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: add per-mm oom disable countYing Han
It's pointless to kill a task if another thread sharing its mm cannot be killed to allow future memory freeing. A subsequent patch will prevent kills in such cases, but first it's necessary to have a way to flag a task that shares memory with an OOM_DISABLE task that doesn't incur an additional tasklist scan, which would make select_bad_process() an O(n^2) function. This patch adds an atomic counter to struct mm_struct that follows how many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN. They cannot be killed by the kernel, so their memory cannot be freed in oom conditions. This only requires task_lock() on the task that we're operating on, it does not require mm->mmap_sem since task_lock() pins the mm and the operation is atomic. [rientjes@google.com: changelog and sys_unshare() code] [rientjes@google.com: protect oom_disable_count with task_lock in fork] [rientjes@google.com: use old_mm for oom_disable_count in exec] Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Documentation/filesystems/proc.txt: improve smaps field documentationMatt Mackall
Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26vmcore: it is not experimental any moreWANG Cong
We use vmcore in our production kernel for a long time, it is pretty stable now. So I don't think we need to mark it as experimental any more. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26um: fix IRQ flag handling namingRichard Weinberger
Commit df9ee292 ("Fix IRQ flag handling naming") changed the IRQ flag handling naming scheme and broke UML: In file included from arch/um/include/asm/fixmap.h:5, from arch/um/include/shared/um_uaccess.h:10, from arch/um/include/asm/uaccess.h:41, from arch/um/include/asm/thread_info.h:13, from include/linux/thread_info.h:56, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:29, from include/linux/time.h:8, from include/linux/stat.h:60, from include/linux/module.h:10, from init/main.c:13: arch/um/include/asm/system.h:11:1: warning: "local_save_flags" redefined This patch brings the new scheme to UML and makes it work again. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26percpu: fix list_head init bug in __percpu_counter_init()Masanori ITOH
WARNING: at lib/list_debug.c:26 __list_add+0x3f/0x81() Hardware name: Express5800/B120a [N8400-085] list_add corruption. next->prev should be prev (ffffffff81a7ea00), but was dead000000200200. (next=ffff88080b872d58). Modules linked in: aoe ipt_MASQUERADE iptable_nat nf_nat autofs4 sunrpc bridge 8021q garp stp llc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath kvm_intel kvm uinput lpfc scsi_transport_fc igb ioatdma scsi_tgt i2c_i801 i2c_core dca iTCO_wdt iTCO_vendor_support pcspkr shpchp megaraid_sas [last unloaded: aoe] Pid: 54, comm: events/3 Tainted: G W 2.6.34-vanilla1 #1 Call Trace: [<ffffffff8104bd77>] warn_slowpath_common+0x7c/0x94 [<ffffffff8104bde6>] warn_slowpath_fmt+0x41/0x43 [<ffffffff8120fd2e>] __list_add+0x3f/0x81 [<ffffffff81212a12>] __percpu_counter_init+0x59/0x6b [<ffffffff810d8499>] bdi_init+0x118/0x17e [<ffffffff811f2c50>] blk_alloc_queue_node+0x79/0x143 [<ffffffff811f2d2b>] blk_alloc_queue+0x11/0x13 [<ffffffffa02a931d>] aoeblk_gdalloc+0x8e/0x1c9 [aoe] [<ffffffffa02aa655>] aoecmd_sleepwork+0x25/0xa8 [aoe] [<ffffffff8106186c>] worker_thread+0x1a9/0x237 [<ffffffffa02aa630>] ? aoecmd_sleepwork+0x0/0xa8 [aoe] [<ffffffff81065827>] ? autoremove_wake_function+0x0/0x39 [<ffffffff810616c3>] ? worker_thread+0x0/0x237 [<ffffffff810653ad>] kthread+0x7f/0x87 [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10 [<ffffffff8106532e>] ? kthread+0x0/0x87 [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10 It's because there is no initialization code for a list_head contained in the struct backing_dev_info under CONFIG_HOTPLUG_CPU, and the bug comes up when block device drivers calling blk_alloc_queue() are used. In case of me, I got them by using aoe. Signed-off-by: Masanori Itoh <itoumsn@nttdata.co.jp> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26kfifo: disable __kfifo_must_check_helper()Andrew Morton
This helper is wrong: it coerces signed values into unsigned ones, so code such as if (kfifo_alloc(...) < 0) { error } will fail to detect the error. So let's disable __kfifo_must_check_helper() for 2.6.36. Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26hostfs: fix UML crash: remove f_spare from hostfsRichard Weinberger
365b1818 ("add f_flags to struct statfs(64)") resized f_spare within struct statfs which caused a UML crash. There is no need to copy f_spare. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26ipmi: proper spinlock initializationEric Dumazet
Unloading ipmi module can trigger following error. (if CONFIG_DEBUG_SPINLOCK=y) [ 9633.779590] BUG: spinlock bad magic on CPU#1, rmmod/7170 [ 9633.779606] lock: f41f5414, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 9633.779626] Pid: 7170, comm: rmmod Not tainted 2.6.36-rc7-11474-gb71eb1e-dirty #328 [ 9633.779644] Call Trace: [ 9633.779657] [<c13921cc>] ? printk+0x18/0x1c [ 9633.779672] [<c11a1f33>] spin_bug+0xa3/0xf0 [ 9633.779685] [<c11a1ffd>] do_raw_spin_lock+0x7d/0x160 [ 9633.779702] [<c1131537>] ? release_sysfs_dirent+0x47/0xb0 [ 9633.779718] [<c1131b78>] ? sysfs_addrm_finish+0xa8/0xd0 [ 9633.779734] [<c1394bac>] _raw_spin_lock_irqsave+0xc/0x20 [ 9633.779752] [<f99d93da>] cleanup_one_si+0x6a/0x200 [ipmi_si] [ 9633.779768] [<c11305b2>] ? sysfs_hash_and_remove+0x72/0x80 [ 9633.779786] [<f99dcf26>] ipmi_pnp_remove+0xd/0xf [ipmi_si] [ 9633.779802] [<c11f622b>] pnp_device_remove+0x1b/0x40 Fix this by initializing spinlocks in a smi_info_alloc() helper function, right after memory allocation and clearing. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26drivers/misc/ad525x_dpot.c: fix typo in spi write16 and write24 transfer countsMichael Hennerich
This is a bug fix. Some SPI connected devices using 16/24 bit accesses, previously failed, now work. This typo slipped in after testing, during some restructuring. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Chris Verges <chrisv@cyberswitching.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26um: remove PAGE_SIZE alignment in linker script causing kernel segfault.Richard Weinberger
The linker script cleanup that I did in commit 5d150a97f93 ("um: Clean up linker script using standard macros.") (2.6.32) accidentally introduced an ALIGN(PAGE_SIZE) when converting to use INIT_TEXT_SECTION; Richard Weinberger reported that this causes the kernel to segfault with CONFIG_STATIC_LINK=y. I'm not certain why this extra alignment is a problem, but it seems likely it is because previously __init_begin = _stext = _text = _sinittext and with the extra ALIGN(PAGE_SIZE), _sinittext becomes different from the rest. So there is likely a bug here where something is assuming that _sinittext is the same as one of those other symbols. But reverting the accidental change fixes the regression, so it seems worth committing that now. Signed-off-by: Tim Abbott <tabbott@ksplice.com> Reported-by: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Tested by: Antoine Martin <antoine@nagafix.co.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26sgi-xp: incoming XPC channel messages can come in after the channel's ↵Robin Holt
partition structures have been torn down Under some workloads, some channel messages have been observed being delayed on the sending side past the point where the receiving side has been able to tear down its partition structures. This condition is already detected in xpc_handle_activate_IRQ_uv(), but that information is not given to xpc_handle_activate_mq_msg_uv(). As a result, xpc_handle_activate_mq_msg_uv() assumes the structures still exist and references them, causing a NULL-pointer deref. Signed-off-by: Robin Holt <holt@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26um: fix global timer issue when using CONFIG_NO_HZRichard Weinberger
This fixes a issue which was introduced by fe2cc53e ("uml: track and make up lost ticks"). timeval_to_ns() returns long long and not int. Due to that UML's timer did not work properlt and caused timer freezes. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm, page-allocator: do not check the state of a non-existant buddy during freeMel Gorman
There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists") that means a buddy at order MAX_ORDER is checked for merging. A page of this order never exists so at times, an effectively random piece of memory is being checked. Alan Curry has reported that this is causing memory corruption in userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32). It is not clear why this is happening. It could be a cache coherency problem where pages mapped in both user and kernel space are getting different cache lines due to the bad read from kernel space (http://lkml.org/lkml/2010/10/13/179). It could also be that there are some special registers being io-remapped at the end of the memmap array and that a read has special meaning on them. Compiler bugs have been ruled out because the assembly before and after the patch looks relatively harmless. This patch fixes the problem by ensuring we are not reading a possibly invalid location of memory. It's not clear why the read causes corruption but one way or the other it is a buggy read. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Corrado Zoccolo <czoccolo@gmail.com> Reported-by: Alan Curry <pacman@kosh.dhis.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26types.h: move misplaced commentAndrew Morton
This comment landed in the wrong place. Cc: Andi Kleen <andi@firstfloor.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Miller <davem@davemloft.net> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm: fix return value of scan_lru_pages in memory unplugKAMEZAWA Hiroyuki
scan_lru_pages returns pfn. So, it's type should be "unsigned long" not "int". Note: I guess this has been work until now because memory hotplug tester's machine has not very big memory.... physical address < 32bit << PAGE_SHIFT. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', ↵Roland Dreier
'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
2010-10-26IB/qib: clean up properly if pci_set_consistent_dma_mask() failsJason Gunthorpe
Clean up properly if pci_set_consistent_dma_mask() fails. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26IB/qib: Allow driver to load if PCIe AER failsRalph Campbell
Some PCIe root complex chip sets don't support advanced error reporting. Allow the driver to load OK if pci_enable_pcie_error_reporting() fails. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not setRalph Campbell
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer "dd" is uninitialized. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26IB/qib: Fix extra log level in qib_early_err()Jason Gunthorpe
Noticed this odd looking thing in dmesg: ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5 which is due to a bad use of dev_info. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26x86: allocate space within a region top-downBjorn Helgaas
Request that allocate_resource() use available space from high addresses first, rather than the default of using low addresses first. The most common place this makes a difference is when we move or assign new PCI device resources. Low addresses are generally scarce, so it's better to use high addresses when possible. This follows Windows practice for PCI allocation. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26x86: update iomem_resource end based on CPU physical address capabilitiesBjorn Helgaas
The iomem_resource map reflects the available physical address space. We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but of course we can only use as much as the CPU can address. This patch updates the end based on the CPU capabilities, so we don't mistakenly allocate space that isn't usable, as we're likely to do when allocating from the top-down. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26x86/PCI: allocate space from the end of a region, not the beginningBjorn Helgaas
Allocate from the end of a region, not the beginning. For example, if we need to allocate 0x800 bytes for a device on bus 0000:00 given these resources: [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00 [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 the available space at [mem 0xbff00000-0xbfffffff] is passed to the alignment callback (pcibios_align_resource()). Prior to this patch, we would put the new 0x800 byte resource at the beginning of that available space, i.e., at [mem 0xbff00000-0xbff007ff]. With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff]. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26PCI: allocate bus resources from the top downBjorn Helgaas
Allocate space from the highest-address PCI bus resource first, then work downward. Previously, we looked for space in PCI host bridge windows in the order we discovered the windows. For example, given the following windows (discovered via an ACPI _CRS method): pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff] pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff] pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff] pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff] pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff] pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff] we attempted to allocate from [mem 0x000a0000-0x000bffff] first, then [mem 0x000c0000-0x000effff], and so on. With this patch, we allocate from [mem 0xff980000-0xff980fff] first, then [mem 0xff97c000-0xff97ffff], [mem 0xfed20000-0xfed9ffff], etc. Allocating top-down follows Windows practice, so we're less likely to trip over BIOS defects in the _CRS description. On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region doesn't actually work and is likely a BIOS defect. The symptom is that we move the AHCI controller to 0xbff00000, which leads to "Boot has failed, sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c43 Reference: https://bugzilla.redhat.com/show_bug.cgi?id=620313 Reference: https://bugzilla.redhat.com/show_bug.cgi?id=629933 Reported-by: Brian Bloniarz <phunge0@hotmail.com> Reported-and-tested-by: Stefan Becker <chemobejk@gmail.com> Reported-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: support allocating space within a region from the top downBjorn Helgaas
Allocate space from the top of a region first, then work downward, if an architecture desires this. When we allocate space from a resource, we look for gaps between children of the resource. Previously, we always looked at gaps from the bottom up. For example, given this: [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00 [mem 0xbff00000-0xbfffffff] gap -- available [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 [mem 0xe0000000-0xf7ffffff] gap -- available we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first, then the [mem 0xe0000000-0xf7ffffff] gap. With this patch an architecture can choose to allocate from the top gap [mem 0xe0000000-0xf7ffffff] first. We can't do this across the board because iomem_resource.end is initialized to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't address the entire 64-bit physical address space. Therefore, we only allocate top-down if the arch requests it by clearing "resource_alloc_from_bottom". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: handle overflow when aligning start of available areaBjorn Helgaas
If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would make us think there's more available space than there really is. We would likely return something that conflicts with a previous resource, which would cause a failure when allocate_resource() requests the newly- allocated region. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027 Reported-by: Fabrice Bellet <fabrice@bellet.info> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: ensure callback doesn't allocate outside available spaceBjorn Helgaas
The alignment callback returns a proposed location, which may have been adjusted to avoid ISA aliases or for other architecture-specific reasons. We already had a check ("tmp.start < tmp.end") to make sure the callback doesn't return an area that extends past the available area. This patch reworks the check to make sure it doesn't return an area that extends either below or above the available area. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: factor out resource_clip() to simplify find_resource()Bjorn Helgaas
This factors out the min/max clipping to simplify find_resource(). No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: add a default alignf to simplify find_resource()Bjorn Helgaas
This removes a test from find_resource(), which is getting cluttered. No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26RDMA/cxgb4: Remove unnecessary KERN_<level> useJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26RDMA/cxgb3: Remove unnecessary KERN_<level> useJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26intel_idle: do not use the LAPIC timer for ATOM C2Len Brown
If we use the LAPIC timer during ATOM C2 on some nvidia chisets, the system stalls. https://bugzilla.kernel.org/show_bug.cgi?id=21032 Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26IPv6: Temp addresses are immediately deleted.Glenn Wurster
There is a bug in the interaction between ipv6_create_tempaddr and addrconf_verify. Because ipv6_create_tempaddr uses the cstamp and tstamp from the public address in creating a private address, if we have not received a router advertisement in a while, tstamp + temp_valid_lft might be < now. If this happens, the new address is created inside ipv6_create_tempaddr, then the loop within addrconf_verify starts again and the address is immediately deleted. We are left with no temporary addresses on the interface, and no more will be created until the public IP address is updated. To avoid this, set the expiry time to be the minimum of the time left on the public address or the config option PLUS the current age of the public interface. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26IPv6: Create temporary address if none exists.Glenn Wurster
If privacy extentions are enabled, but no current temporary address exists, then create one when we get a router advertisement. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26xen: initialize cpu masks for pv guests in xen_smp_initStefano Stabellini
Pv guests don't have ACPI and need the cpu masks to be set correctly as early as possible so we call xen_fill_possible_map from xen_smp_init. On the other hand the initial domain supports ACPI so in this case we skip xen_fill_possible_map and rely on it. However Xen might limit the number of cpus usable by the domain, so we filter those masks during smp initialization using the VCPUOP_is_up hypercall. It is important that the filtering is done before xen_setup_vcpu_info_placement. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-26ASoC: fsl - fix build error in pcm030-audio-fabric.cLiam Girdwood
Fix build error:- sound/soc/fsl/pcm030-audio-fabric.c:27:33: fatal error: sound/soc-of-simple.h: No such file or directory Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-10-26sound/oss/sb_ess.c: delete double assignmentJulia Lawall
Delete successive assignments to the same location. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression i; @@ *i = ...; i = ...; // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-10-26perf python scripting: Add futex-contention scriptArnaldo Carvalho de Melo
The equivalent to this SystemTAP script: http://sourceware.org/systemtap/wiki/WSFutexContention [root@doppio ~]# perf trace futex-contention Press control+C to stop and show the summary ^Cnpviewer.bin[15242] lock 7f0a8be19104 contended 29 times, 72806 avg ns npviewer.bin[15242] lock 7f0a8be19130 contended 2 times, 1355 avg ns synergyc[17245] lock f127f4 contended 1 times, 1830569 avg ns firefox[15116] lock 7f2b7238af0c contended 168 times, 1230390 avg ns synergyc[17245] lock f2fc20 contended 1 times, 33149 avg ns npviewer.bin[15255] lock 7f0a8be19074 contended 155 times, 73047 avg ns npviewer.bin[15255] lock 7f0a8be190a0 contended 127 times, 7088 avg ns synergyc[17247] lock f12854 contended 1 times, 46741 avg ns synergyc[17245] lock f12610 contended 1 times, 7358 avg ns [root@doppio ~]# Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-10-26Merge branch 'misc' into releaseLen Brown
2010-10-26Merge branch 'acpi-mmio' into releaseLen Brown
Conflicts: drivers/acpi/osl.c Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26fib_hash: fix rcu sparse and logical errorsEric Dumazet
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to fz->fz_hash for real. - &fz->fz_hash[fn_hash(f->fn_key, fz)] + rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26fib: fix fib_nl_newrule()Eric Dumazet
Some panic reports in fib_rules_lookup() show a rule could have a NULL pointer as a next pointer in the rules_list. This can actually happen because of a bug in fib_nl_newrule() : It checks if current rule is the destination of unresolved gotos. (Other rules have gotos to this about to be inserted rule) Problem is it does the resolution of the gotos before the rule is inserted in the rules_list (and has a valid next pointer) Fix this by moving the rules_list insertion before the changes on gotos. A lockless reader can not any more follow a ctarget pointer, unless destination is ready (has a valid next pointer) Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru> Reported-by: Joe Buehler <aspam@cox.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26NTLM auth and sign - Use kernel crypto apis to calculate hashes and smb ↵Shirish Pargaonkar
signatures Use kernel crypto sync hash apis insetead of cifs crypto functions. The calls typically corrospond one to one except that insead of key init, setkey is used. Use crypto apis to generate smb signagtures also. Use hmac-md5 to genereate ntlmv2 hash, ntlmv2 response, and HMAC (CR1 of ntlmv2 auth blob. User crypto apis to genereate signature and to verify signature. md5 hash is used to calculate signature. Use secondary key to calculate signature in case of ntlmssp. For ntlmv2 within ntlmssp, during signature calculation, only 16 bytes key (a nonce) stored within session key is used. during smb signature calculation. For ntlm and ntlmv2 without extended security, 16 bytes key as well as entire response (24 bytes in case of ntlm and variable length in case of ntlmv2) is used for smb signature calculation. For kerberos, there is no distinction between key and response. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-10-26Merge branch 'ima-memory-use-fixes'Linus Torvalds
* ima-memory-use-fixes: IMA: fix the ToMToU logic IMA: explicit IMA i_flag to remove global lock on inode_delete IMA: drop refcnt from ima_iint_cache since it isn't needed IMA: only allocate iint when needed IMA: move read counter into struct inode IMA: use i_writecount rather than a private counter IMA: use inode->i_lock to protect read and write counters IMA: convert internal flags from long to char IMA: use unsigned int instead of long for counters IMA: drop the inode opencount since it isn't needed for operation IMA: use rbtree instead of radix tree for inode information cache