summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2006-09-26[PATCH] update to the kernel kmap/kunmap APIJames Bottomley
Give non-highmem architectures access to the kmap API for the purposes of overriding (this is what the attached patch does). The proposal is that we should now require all architectures with coherence issues to manage data coherence via the kmap/kunmap API. Thus driver writers never have to write code like kmap(page) modify data in page flush_kernel_dcache_page(page) kunmap(page) instead, kmap/kunmap will manage the coherence and driver (and filesystem) writers don't need to worry about how to flush between kmap and kunmap. For most architectures, the page only needs to be flushed if it was actually written to *and* there are user mappings of it, so the best implementation looks to be: clear the page dirty pte bit in the kernel page tables on kmap and on kunmap, check page->mappings for user maps, and then the dirty bit, and only flush if it both has user mappings and is dirty. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] jbd: fix commit of ordered data buffersJan Kara
Original commit code assumes, that when a buffer on BJ_SyncData list is locked, it is being written to disk. But this is not true and hence it can lead to a potential data loss on crash. Also the code didn't count with the fact that journal_dirty_data() can steal buffers from committing transaction and hence could write buffers that no longer belong to the committing transaction. Finally it could possibly happen that we tried writing out one buffer several times. The patch below tries to solve these problems by a complete rewrite of the data commit code. We go through buffers on t_sync_datalist, lock buffers needing write out and store them in an array. Buffers are also immediately refiled to BJ_Locked list or unfiled (if the write out is completed). When the array is full or we have to block on buffer lock, we submit all accumulated buffers for IO. [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] trigger a syntax error if percpu macros are incorrectly usedJan Blunck
get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple identifiers. Otherwise the arch dependent implementations might break. This patch enforces the correct usage of the macros by producing a syntax error if the variable is not a simple identifier. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] Fix longstanding load balancing bug in the schedulerChristoph Lameter
The scheduler will stop load balancing if the most busy processor contains processes pinned via processor affinity. The scheduler currently only does one search for busiest cpu. If it cannot pull any tasks away from the busiest cpu because they were pinned then the scheduler goes into a corner and sulks leaving the idle processors idle. F.e. If you have processor 0 busy running four tasks pinned via taskset, there are none on processor 1 and one just started two processes on processor 2 then the scheduler will not move one of the two processes away from processor 2. This patch fixes that issue by forcing the scheduler to come out of its corner and retrying the load balancing by considering other processors for load balancing. This patch was originally developed by John Hawkes and discussed at http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2. I have removed extraneous material and gone back to equipping struct rq with the cpu the queue is associated with since this makes the patch much easier and it is likely that others in the future will have the same difficulty of figuring out which processor owns which runqueue. The overhead added through these patches is a single word on the stack if the kernel is configured to support 32 cpus or less (32 bit). For 32 bit environments the maximum number of cpus that can be configued is 255 which would result in the use of 32 bytes additional on the stack. On IA64 up to 1k cpus can be configured which will result in the use of 128 additional bytes on the stack. The maximum additional cache footprint is one cacheline. Typically memory use will be much less than a cacheline and the additional cpumask will be placed on the stack in a cacheline that already contains other local variable. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: John Hawkes <hawkes@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] Don't set calgary iommu as default yAndi Kleen
Most systems don't need it. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: New Intel feature flagsDave Jones
Add supplemental SSE3 instructions flag, and Direct Cache Access flag. As described in "Intel Processor idenfication and the CPUID instruction AP485 Sept 2006" AK: also added for x86-64 Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Add a cumulative thermal throttle event counter.Dmitriy Zavin
The counter is exported to /sys that keeps track of the number of thermal events, such that the user knows how bad the thermal problem might be (since the logging to syslog and mcelog is rate limited). AK: Fixed cpu hotplug locking Signed-off-by: Dmitriy Zavin <dmitriyz@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Make the jiffies compares use the 64bit safe macros.Dmitriy Zavin
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Refactor thermal throttle processingDmitriy Zavin
Refactor the event processing (syslog messaging and rate limiting) into separate file therm_throt.c. This allows consistent reporting of CPU thermal throttle events. After ACK'ing the interrupt, if the event is current, the user (p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit) the event. If that function returns 1, the user has the option to log things further (such as to mce_log in x86_64). AK: minor cleanup Signed-off-by: Dmitriy Zavin <dmitriyz@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)Dmitriy Zavin
The current time_before/time_after macros will fail typechecks when passed u64 values (as returned by get_jiffies_64()). On 64bit systems, this will just result in a warning about mismatching types without explicit casts, but since unsigned long and u64 (unsigned long long) are of same size, it will still work. On 32bit systems, a long is 32bits, so the value from get_jiffies_64() will be truncated by the cast and thus lose all the precision gained by 64bit jiffies. Signed-off-by: Dmitriy Zavin <dmitriyz@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix unwinder warning in traps.cAndi Kleen
Fix linux/arch/x86_64/kernel/traps.c: In function 'dump_trace': linux/arch/x86_64/kernel/traps.c:275: warning: cast to pointer from integer of different size with allnoconfig Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing ↵Andi Kleen
conf1 Some buggy systems can machine check when config space accesses happen for some non existent devices. i386/x86-64 do some early device scans that might trigger this. Allow pci=noearly to disable this. Also when type 1 is disabling also don't do any early accesses which are always type1. This moves the pci= configuration parsing to be a early parameter. I don't think this can break anything because it only changes a single global that is only used by PCI. Cc: gregkh@suse.de Cc: Trammell Hudson <hudson@osresearch.net> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: Move direct PCI scanning functions out of lineAndi Kleen
Saves about 200 bytes of code space. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCIAndi Kleen
This is useful on systems with broken PCI bus. Affects various scans in x86-64 and i386's early ACPI quirk scan. Cc: gregkh@suse.de Cc: len.brown@intel.com Cc: Trammell Hudson <hudson@osresearch.net> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Don't leak NT bit into next taskAndi Kleen
SYSENTER can cause a NT to be set which might cause crashes on the IRET in the next task. Following similar i386 patch from Linus. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinderJan Beulich
Current gcc generates calls not jumps to noreturn functions. When that happens the return address can point to the next function, which confuses the unwinder. This patch works around it by marking asynchronous exception frames in contrast normal call frames in the unwind information. Then teach the unwinder to decode this. For normal call frames the unwinder now subtracts one from the address which avoids this problem. The standard libgcc unwinder uses the same trick. It doesn't include adjustment of the printed address (i.e. for the original example, it'd still be kernel_math_error+0 that gets displayed, but the unwinder wouldn't get confused anymore. This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16 unfortunately because earlier binutils don't support .cfi_signal_frame [AK: added automatic detection of the new binutils and wrote description] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix some broken white space in ia32_signal.cAndi Kleen
No functional changes Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Initialize argument registers for 32bit signal handlers.Andi Kleen
In case the user space was compiled with -mregparm=3 Following i386. Pointed out by Albert Cahalan Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove all traces of signal number conversionAndi Kleen
This was old code that was needed for iBCS and x86-64 never supported that. Pointed out by Albert Cahalan Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Don't synchronize time reading on single core AMD systemsAndi Kleen
We do some additional CPU synchronization in gettimeofday et.al. to make sure the time stamps are always monotonic over multiple CPUs. But on single core systems that is not needed. So don't do it. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove outdated comment in x86-64 mmconfig codeAndi Kleen
Cc: gregkh@suse.de Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Use string instructions for Core2 copy/clearAndi Kleen
It is faster than using a unrolled loop for the use cases the kernel cares about (cached, sizes typically < 4K) Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] x86: - restore i8259A eoi status on resumeMatthew Garrett
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if auto_eoi has been set. Keep track of the current status and restore that on resume. This fixes it for AMD64 and i386. Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Split multi-line printk in oops output.Dave Jones
Sometimes, bug reports come in where we've had an oops, and the only record we have is what the reporter saw on screen shortly before the system locked up completely. Unfortunatly, syslog only prints lines beginning with KERN_EMERG to the console, so some lines get lost. An example of this can be seen at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203723 Some of this information isn't vital to diagnosis, but some parts are useful, such as the tainted flag. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Fix pack_descriptor()Jeremy Fitzhardinge
Fix pack_descriptor: 1. flags are bits 20-23 in the high word 2. limit's 4 msb are bits 16-19 in the high word These haven't mattered so far, because all users have had small limits and a flags setting of 0. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> ===================================================================
2006-09-26[PATCH] i386: Add MMCFG resources to i386 tooAndi Kleen
Following earlier x86-64 patch Cc: gregkh@suse.de Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] MMCONFIG and new Intel motherboardsAaron Durbin
On Sat, Sep 09, 2006 at 04:14:29PM +0200, Andi Kleen wrote: > [patch] Looks reasonable, but probably not for 2.6.18 because this stuff > is already too fragile and it is probably too risky to do any big changes now > since not enough testing time is left. Can you please resubmit > it with proper description and signed-off-by line? I can queue it for .19 then > > -Andi Patch inserts PCI memory mapped config region(s) into the resource map. This will allow for the MMCCONFIG regions to be marked as busy in the iomem address space as well as the regions(s) showing up in /proc/iomem. Signed-off-by: Aaron Durbin <adurbin@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Insert GART region into resource mapAaron Durbin
Patch inserts the GART region into the iomem resource map. The GART will then be visible within /proc/iomem. It will also allow for other users utilizing the GART to subreserve the region (agp or IOMMU). Signed-off-by: Aaron Durbin <adurbin@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Only do MCFG e820 check when type 1 worksAndi Kleen
Needs earlier patch to split type 1 probing from use. This patch should fix the x86 macs where type 1 PCI config space access doesn't work, but MCFG does. They also don't have a usable e820 table so the e820 sanity check failed. Instead assume now that if type 1 doesn't work then MCFG must work and don't do the e820 check. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: PCI: split probing and initialization of type 1 config ↵Andi Kleen
space access First probe if type1/2 accesses work, but then only initialize them at the end. This is useful for a later patch that needs this information inbetween. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix idle notifiersAndi Kleen
Previously exit_idle would be called more often than enter_idle Now instead of using complicated tests just keep track of it using the per CPU variable as a flip flop. I moved the idle state into the PDA to make the access more efficient. Original bug report and an initial patch from Stephane Eranian, but redone by AK. Cc: Stephane Eranian <eranian@hpl.hp.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove experimental mark of kexecEric W. Biederman
kexec has been marked experimental for a year now and all of the serious problems have been worked through. So it is time (if not past time) to remove the experimental mark. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Remove experimental mark of kexecEric W. Biederman
kexec has been marked experimental for a year now and all of the serious problems have been worked through. So it is time (if not past time) to remove the experimental mark. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Mark per cpu data initialization __initdata againAndi Kleen
Before 2.6.16 this was changed to work around code that accessed CPUs not in the possible map. But that code should be all fixed now, so mark it __initdata again. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Remove unused asm-x86_64/mmx.hAndi Kleen
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Define __bad_pda_field as noreturnAndi Kleen
This quietens so warnings about uninitialized use of the return value of the pda read operations. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Reindent macros in pda.hAndi Kleen
Reindent the macros in x86-64 pda.h, making them much more readable. Follows Jeremy's i386 version of this. No functional changes Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix some stylistic issues in uaccess.hAndi Kleen
- Replace some broken white space. - Replace __ keywords with standard names No functional changes. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Check return values of __copy_to_user in uname emulationAndi Kleen
Quietens some new warnings Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Check return value of copy_to_user in compat_sys_pselect7Andi Kleen
Fix linux/fs/compat.c: In function compat_sys_pselect7 linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_result To make it easier to handle I changed to semantics to not try to write out a timespec if an error occurred. I hope that's ok. Cc: dwmw2@infradead.org Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Add __must_check to copy_*_userAndi Kleen
Following i386. And also fix the two occurrences that caused warnings in arch/x86_64/* Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix zeroing on exception in copy_*_userAndi Kleen
- Don't zero for __copy_from_user_inatomic following i386. This will prevent spurious zeros for parallel file system writers when one does a exception - The string instruction version didn't zero the output on exception. Oops. Also I cleaned up the code a bit while I was at it and added a minor optimization to the string instruction path. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: add HPET(s) into resource mapadurbin@google.com
Add HPET(s) into resource map. This will allow for the HPET(s) to be visibile within /proc/iomem. Signed-off-by: Aaron Durbin <adurbin@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] insert IOAPIC(s) and Local APIC into resource mapadurbin@google.com
This patch places the IOAPIC(s) and the Local APIC specified by ACPI tables into the resource map. The APICs will then be visible within /proc/iomem Signed-off-by: Aaron Durbin <adurbin@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Do better early exception handlersChuck Ebbert
Add early i386 fault handlers with debug information for common faults. Handles: divide error invalid opcode protection fault page fault Also adds code to detect early recursive/multiple faults and halt the system when they happen (taken from x86_64.) Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26[PATCH] i386: Allow a kernel not to be in ring 0Rusty Russell
We allow for the fact that the guest kernel may not run in ring 0. This requires some abstraction in a few places when setting %cs or checking privilege level (user vs kernel). This is Chris' [RFC PATCH 15/33] move segment checks to subarch, except rather than using #define USER_MODE_MASK which depends on a config option, we use Zach's more flexible approach of assuming ring 3 == userspace. I also used "get_kernel_rpl()" over "get_kernel_cs()" because I think it reads better in the code... 1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3 2) Add a get_kernel_rpl() macro, and don't assume it's zero. And: Clean up of patch for letting kernel run other than ring 0: a. Add some comments about the SEGMENT_IS_*_CODE() macros. b. Add a USER_RPL macro. (Code was comparing a value to a mask in some places and to the magic number 3 in other places.) c. Add macros for table indicator field and use them. d. Change the entry.S tests for LDT stack segment to use the macros Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386: Abstract sensitive instructionsRusty Russell
Abstract sensitive instructions in assembler code, replacing them with macros (which currently are #defined to the native versions). We use long names: assembler is case-insensitive, so if something goes wrong and macros do not expand, it would assemble anyway. Resulting object files are exactly the same as before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Fix a PDA warning uncovered by the new type checkingAndi Kleen
Fix linux/arch/x86_64/kernel/process.c: In function __switch_to: linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Type checking for write_pda()Jeremy Fitzhardinge
I just added type checking for assignments the PDA in the i386 PDA code. Here's the x86-64 equivalent. (Obviously this doesn't contain the latest x86-64 PDA change.) Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] Use %c instead of %P modifier in pda accessAndi Kleen
Apparently that is the more official way to get numbers without $ in inline assembly Signed-off-by: Andi Kleen <ak@suse.de>