summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2008-12-21powerpc: Add reboot notifier to Collaborative Memory ManagerBrian King
When running Active Memory Sharing, pages can get marked as "loaned" with the hypervisor by the CMM driver. This state gets cleared by the system firmware when rebooting the partition. When using kexec to boot a new kernel, this state never gets cleared and the hypervisor and CMM driver can get out of sync with respect to the number of pages currently marked "loaned". Fix this by adding a reboot notifier to the CMM driver to deflate the balloon and mark all pages as active. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Disable Collaborative Memory Manager for kdumpBrian King
When running Active Memory Sharing, the Collaborative Memory Manager (CMM) may mark some pages as "loaned" with the hypervisor. Periodically, the CMM will query the hypervisor for a loan request, which is a single signed value. When kexec'ing into a kdump kernel, the CMM driver in the kdump kernel is not aware of the pages the previous kernel had marked as "loaned", so the hypervisor and the CMM driver are out of sync. This results in the CMM driver getting a negative loan request, which can then get treated as a large unsigned value and can cause kdump to hang due to the CMM driver inflating too large. Since there really is no clean way for the CMM driver in the kdump kernel to clean this up, simply disable CMM in the kdump kernel. This fixes hangs we were seeing doing kdump with AMS. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc/iseries: viodasd needs to depend on CONFIG_BLOCKStephen Rothwell
Otherwise you get lot of errors like these: drivers/block/viodasd.c:72: error: dereferencing pointer to incomplete type drivers/block/viodasd.c: In function 'viodasd_open': drivers/block/viodasd.c:135: error: dereferencing pointer to incomplete type drivers/block/viodasd.c: In function 'viodasd_release': drivers/block/viodasd.c:184: error: dereferencing pointer to incomplete type drivers/block/viodasd.c: In function 'viodasd_getgeo': drivers/block/viodasd.c:209: error: dereferencing pointer to incomplete type drivers/block/viodasd.c:214: error: implicit declaration of function 'get_capacity' drivers/block/viodasd.c: At top level: drivers/block/viodasd.c:222: error: variable 'viodasd_fops' has initializer but incomplete type drivers/block/viodasd.c:223: error: unknown field 'owner' specified in initializer Discovered by a randconfig build. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Pass a valid token to rtas_call() in phyp-dump codeTony Breeds
ibm_configure_kernel_dump is passed as the token to rtas_call() is never initialised. This sets it to something sane. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Nathan Lynch <ntl@pobox.com> Acked-by: Manish Ahuja <mahujam@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Protect against NULL pointer deref in phyp-dump codeTony Breeds
print_dump_header() will be called at least once with a NULL pointer in a normal boot sequence. If DEBUG is defined then we will dereference the pointer and crash. Add a quick fix to exit early in the NULL pointer case. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Manish Ahuja <mahujam@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Rename struct vm_region to avoid conflict with NOMMUDavid Howells
Rename PowerPC's struct vm_region so that I can introduce my own global version for NOMMU. It's feasible that the PowerPC version may wish to use my global one instead. The NOMMU vm_region struct defines areas of the physical memory map that are under mmap. This may include chunks of RAM or regions of memory mapped devices, such as flash. It is also used to retain copies of file content so that shareable private memory mappings of files can be made. As such, it may be compatible with what is described in the banner comment for PowerPC's vm_region struct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Convert sysfs cache code to of_find_next_cache_node()Nathan Lynch
Using the common code means that more complete cache information will provided in sysfs on platforms that don't use the l2-cache property convention. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Convert cpu_to_l2cache() to of_find_next_cache_node()Nathan Lynch
The smp code uses cache information to populate cpu_core_map; change it to use common code for cache lookup. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc: Add of_find_next_cache_node()Nathan Lynch
We have more than one piece of code that looks up cache nodes manually using the "l2-cache" property. Add a common helper routine which does this and handles ePAPR's "next-level-cache" property as well as powermac. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-20Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix resume (S2R) broken by Intel microcode module, on A110L x86 gart: don't complain if no AMD GART found AMD IOMMU: panic if completion wait loop fails AMD IOMMU: set cmd buffer pointers to zero manually x86: re-enable MCE on secondary CPUS after suspend/resume AMD IOMMU: allocate rlookup_table with __GFP_ZERO
2008-12-20Merge git://git.marvell.com/orion into develRussell King
2008-12-20[ARM] mv78xx0: implement GPIO and GPIO interrupt supportLennert Buytenhek
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
2008-12-20[ARM] Kirkwood: implement GPIO and GPIO interrupt supportLennert Buytenhek
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
2008-12-20[ARM] Orion: share GPIO IRQ handling codeLennert Buytenhek
Split off Orion GPIO IRQ handling code into plat-orion/. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
2008-12-20[ARM] Orion: share GPIO handling codeLennert Buytenhek
Split off Orion GPIO handling code into plat-orion/, and add support for multiple sets of (32) GPIO pins. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
2008-12-20x86: fix resume (S2R) broken by Intel microcode module, on A110LDmitry Adamushko
Impact: fix deadlock This is in response to the following bug report: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 Subject : resume (S2R) broken by Intel microcode module, on A110L Submitter : Andreas Mohr <andi@lisas.de> Date : 2008-11-25 08:48 (19 days old) Handled-By : Dmitry Adamushko <dmitry.adamushko@gmail.com> [ The deadlock scenario has been discovered by Andreas Mohr ] I think I might have a logical explanation why the system: (http://bugzilla.kernel.org/show_bug.cgi?id=12100) might hang upon resuming, OTOH it should have likely hanged each and every time. (1) possible deadlock in microcode_resume_cpu() if either 'if' section is taken; (2) now, I don't see it in spec. and can't experimentally verify it (newer ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should be back to its original one (we need to reload ucode anyway so it doesn't seem logical if a cpu doesn't drop the version)... if so, the comparison with memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of the aforementioned 'if' sections might have been triggered - leading to a deadlock. Obviously, in my tests I simulated loading/resuming with the ucode of the same version (just to see that the file is loaded/re-loaded upon resuming) so this issue has never popped up. I'd appreciate if someone with an appropriate system might give a try to the 2nd patch (titled "fix a comparison && deadlock..."). In any case, the deadlock situation is a must-have fix. Reported-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Tested-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-20x86, bts: memory accountingMarkus Metzger
Impact: move the BTS buffer accounting to the mlock bucket Add alloc_locked_buffer() and free_locked_buffer() functions to mm/mlock.c to kalloc a buffer and account the locked memory to current. Account the memory for the BTS buffer to the tracer. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-20x86, bts: add fork and exit handlingMarkus Metzger
Impact: introduce new ptrace facility Add arch_ptrace_untrace() function that is called when the tracer detaches (either voluntarily or when the tracing task dies); ptrace_disable() is only called on a voluntary detach. Add ptrace_fork() and arch_ptrace_fork(). They are called when a traced task is forked. Clear DS and BTS related fields on fork. Release DS resources and reclaim memory in ptrace_untrace(). This releases resources already when the tracing task dies. We used to do that when the traced task dies. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19x86: PAT: move track untrack pfnmap stubs to asm-genericvenkatesh.pallipadi@intel.com
Impact: Cleanup and branch hints only. Move the track and untrack pfn stub routines from memory.c to asm-generic. Also add unlikely to pfnmap related calls in fork and exit path. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-19x86: PAT: remove follow_pfnmap_pte in favor of follow_physvenkatesh.pallipadi@intel.com
Impact: Cleanup - removes a new function in favor of a recently modified older one. Replace follow_pfnmap_pte in pat code with follow_phys. follow_phys lso returns protection eliminating the need of pte_pgprot call. Using follow_phys also eliminates the need for pte_pa. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-19x86: ia32_signal: remove unnecessary declarationHiroshi Shimamoto
Impact: cleanup No need to declare do_signal(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19x86: common.c boot_cpu_stack and boot_exception_stacks should be staticJaswinder Singh
Impact: cleanup, avoid sparse warnings, reduce kernel size a bit Fixes these sparse warnings: arch/x86/kernel/cpu/common.c:869:6: warning: symbol 'boot_cpu_stack' was not declared. Should it be static? arch/x86/kernel/cpu/common.c:910:6: warning: symbol 'boot_exception_stacks' was not declared. Should it be static? Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19sparseirq: fix numa_migrate_irq_desc dependency and commentsYinghai Lu
Impact: reduce kconfig variable scope and clean up Bartlomiej pointed out that the config dependencies and comments are not right. update it depend to NUMA, and fix some comments Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19allow stripping of generated symbols under CONFIG_KALLSYMS_ALLJan Beulich
Building upon parts of the module stripping patch, this patch introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y. Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64) kernels I tested with. The patch also does away with the need to special case the kallsyms- internal symbols by making them available even in the first linking stage. While it is a generated file, the patch includes the changes to scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure here is. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-12-19Merge branch 'for-rmk' of git://git.pengutronix.de/git/ukl/linux-2.6 into develRussell King
2008-12-19ACPI hibernate: Introduce new kernel parameter acpi_sleep=s4_nonvsRafael J. Wysocki
On some machines it may be necessary to disable the saving/restoring of the ACPI NVS memory region during hibernation/resume. For this purpose, introduce new ACPI kernel command line option acpi_sleep=s4_nonvs. Based on a patch by Zhang Rui. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-19x86 hibernate: Mark ACPI NVS memory region at startupRafael J. Wysocki
Introduce new initcall for marking the ACPI NVS memory at startup, so that it can be saved/restored during hibernation/resume. Based on a patch by Zhang Rui. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-19Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' ↵Ingo Molnar
into tracing/core Conflicts: include/linux/ftrace.h
2008-12-19x86: fix warning in arch/x86/kernel/io_apic.cIngo Molnar
this warning: arch/x86/kernel/io_apic.c: In function ‘ir_set_msi_irq_affinity’: arch/x86/kernel/io_apic.c:3373: warning: ‘cfg’ may be used uninitialized in this function triggers because the variable was truly uninitialized. We'd crash on entering this code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19x86: fix intel x86_64 llc_shared_map/cpu_llc_id anomoliesSuresh Siddha
Impact: fix wrong cache sharing detection on platforms supporting > 8 bit apicid's In the presence of extended topology eumeration leaf 0xb provided by cpuid, 32bit extended initial_apicid in cpuinfo_x86 struct will be updated by detect_extended_topology(). At this instance, we should also reinit the apicid (which could also potentially be extended to 32bit). With out this there will potentially be duplicate apicid's populated in the per cpu's cpuinfo_x86 struct, resulting in wrong cache sharing topology etc detected by init_intel_cacheinfo(). Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
2008-12-19x86: fix warning in arch/x86/kernel/microcode_amd.cIngo Molnar
this warning: arch/x86/kernel/microcode_amd.c: In function ‘apply_microcode_amd’: arch/x86/kernel/microcode_amd.c:163: warning: cast from pointer to integer of different size arch/x86/kernel/microcode_amd.c:163: warning: cast from pointer to integer of different size triggers because we want to pass the address to the microcode MSR, which is 64-bit even on 32-bit. Cast it explicitly to express this. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18[ARM] s3c: define __io using the typesafe versionRussell King
as per 0560cf5aa51216b06874333a2fa26ca034d97bdb Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-12-18x86: ia32.h: remove unused struct sigfram32 and rt_sigframe32Hiroshi Shimamoto
Impact: cleanup Remove struct sigfram32 and rt_sigframe32 because there is no user. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18x86: asm-offset_64: use rt_sigframe_ia32Hiroshi Shimamoto
Impact: cleanup Use rt_sigframe_ia32 instead of rt_sigframe32. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18x86: sigframe.h: include headers for dependencyHiroshi Shimamoto
Impact: cleanup Include following headers for dependency. asm/sigcontext.h asm/siginfo.h asm/ucontext.h Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18Merge branch 'for-rmk' of git://git.pengutronix.de/git/imx/linux-2.6 into develRussell King
2008-12-18Merge branch 'next-merged' of git://aeryn.fluff.org.uk/bjdooks/linux into develRussell King
2008-12-18[ARM] S3C64XX: Ensure CPU_V6 is selectedBen Dooks
Select CPU_V6 with the S3C64XX series. Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2008-12-18x86: traps.c declare functions before they get usedJaswinder Singh
Impact: cleanup In asm/traps.h :- do_double_fault : added under X86_64 sync_regs : added under X86_64 math_error : moved out from X86_32 as it is common for both 32 and 64 bit math_emulate : moved from X86_32 as it is common for both 32 and 64 bit smp_thermal_interrupt : added under X86_64 mce_threshold_interrupt : added under X86_64 Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18x86: PAT: add pgprot_writecombine() interface for drivers - v3venkatesh.pallipadi@intel.com
Impact: New mm functionality. Add pgprot_writecombine. pgprot_writecombine will be aliased to pgprot_noncached when not supported by the architecture. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18x86: PAT: change pgprot_noncached to uc_minus instead of strong uc - v3venkatesh.pallipadi@intel.com
Impact: mm behavior change. Make pgprot_noncached uc_minus instead of strong UC. This will make pgprot_noncached to be in line with ioremap_nocache() and all the other APIs that map page uc_minus on uc request. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18x86: PAT: implement track/untrack of pfnmap regions for x86 - v3venkatesh.pallipadi@intel.com
Impact: New mm functionality. Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free routines with reserve and free tracking. reserve and free here only takes care of non RAM region mapping. For RAM region, driver should use set_memory_[uc|wc|wb] to set the cache type and then setup the mapping for user pte. We can bypass below reserve/free in that case. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18[ARM] 5349/1: VFP: Add PM code to save and restore current VFP stateBen Dooks
When CONFIG_PM is selected, the VFP code does not have any handler installed to deal with either saving the VFP state of the current task, nor does it do anything to try and restore the VFP after a resume. On resume, the VFP will have been reset and the co-processor access control registers are in an indeterminate state (very probably the CP10 and CP11 the VFP uses will have been disabled by the ARM core reset). When this happens, resume will break as soon as it tries to unfreeze the tasks and restart scheduling. Add a sys device to allow us to hook the suspend call to save the current thread state if the thread is using VFP and a resume hook which restores the CP10/CP11 access and ensures the VFP is disabled so that the lazy swapping will take place on next access. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-12-18x86: process.c declare c1e_remove_cpu before they get usedJaswinder Singh
Impact: cleanup, avoid sparse warning Included asm/idle.h for c1e_remove_cpu() declaration. Fixes this sparse warning: CHECK arch/x86/kernel/process.c arch/x86/kernel/process.c:284:6: warning: symbol 'c1e_remove_cpu' was not declared. Should it be static? Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18x86: sigframe.h: add guard macroHiroshi Shimamoto
Impact: cleanup Add missing guard macro _ASM_X86_SIGFRAME_H. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18"Tree RCU": scalable classic RCU implementationPaul E. McKenney
This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: avr32: favr-32 build fix ATSTK1006: Fix boot from NAND flash avr32: remove .note.gnu.build-id section when making vmlinux.bin avr32: Enable pullup on USART TX lines
2008-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: We need to implement arch_ptrace_stop().
2008-12-18x86: revert CONFIG_RELOCATABLE=y defconfig changeVegard Nossum
This commit: commit 5cb04df8d3f03e37a19f2502591a84156be71772 Author: Ingo Molnar <mingo@elte.hu> Date: Sun May 4 19:49:04 2008 +0200 x86: defconfig updates changed CONFIG_RELOCATABLE from n to y, which may lead to a mismatch between the vmlinux debug information and the runtime location of the kernel, even when the bootloader does not relocate the kernel. Revert the specific change. Works for me with GRUB and qemu. Reference: http://lkml.org/lkml/2008/11/25/243 Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18[ARM] S3C: Remove unnecessary <linux/delay.h> includesBen Dooks
As per Russell King's last review comment, find and remove all unnecessary includes of <linux/delay.h> in the files that do not need them. Signed-off-by: Ben Dooks <ben-linux@fluff.org>