summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-08-26x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changesHaicheng Li
When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000 [ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0 [ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001 [ 1139.243308] FS: 00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1139.243311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0 [ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1139.243321] Call Trace: [ 1139.243324] [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69 [ 1139.243328] [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a [ 1139.243333] [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b [ 1139.243337] [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb [ 1139.243340] [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152 [ 1139.243344] [<ffffffff813388ce>] ? store_online+0x4d/0x75 [ 1139.243348] [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d [ 1139.243351] [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121 [ 1139.243355] [<ffffffff810b539d>] ? vfs_write+0xae/0x14a [ 1139.243358] [<ffffffff810b587f>] ? sys_write+0x47/0x6f [ 1139.243362] [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b This patch makes sure to always replicate new direct mapping PGD entries to the PGDs of all processes, as well as ensures corresponding vmemmap mapping gets synced. V1: initial code by Andi Kleen. V2: fix several issues found in testing. V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all(). [ hpa: changed pgd_change from int to bool ] Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4FD8.6080100@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, mm: Separate x86_64 vmalloc_sync_all() into separate functionsHaicheng Li
No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25x86, bios: Make the x86 early memory reservation a kernel optionH. Peter Anvin
Add a kernel command-line option so the x86 early memory reservation size can be adjusted at runtime instead of only at compile time. Suggested-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25x86, tsc: Remove CPU frequency calibration on AMDBorislav Petkov
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency calibration code for AMD CPUs whose TSCs didn't increment with the core's P0 frequency. From F10h, revB onward, however, the TSC increment rate is denoted by MSRC001_0015[24] and when this bit is set (which should be done by the BIOS) the TSC increments with the P0 frequency so the calibration is not needed and booting can be a couple of mcecs faster on those machines. Besides, there should be virtually no machines out there which don't have this bit set, therefore this calibration can be safely removed. It is a shaky hack anyway since it assumes implicitly that the core is in P0 when BIOS hands off to the OS, which might not always be the case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100825162823.GE26438@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag tracing/trace_stack: Fix stack trace on ppc64
2010-08-25Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks
2010-08-25perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flagLin Ming
If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens on every event, which can generate a flood of NMIs. Clear it. Reported-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25Merge branch 'linus' into perf/coreIngo Molnar
Merge reason: pick up perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25perf: Remove unused variableLin Ming
This fixes the following build warning introduced by the callchain rework: arch/x86/kernel/cpu/perf_event.c:1574: warning: ‘perf_callchain_entry_nmi’ defined but not used Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1282718949.16443.75.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-24x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampolineHugh Dickins
rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT laptop: whereas cloning from kernel to low mappings pgd range does need to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY. Signed-off-by: Hugh Dickins <hughd@google.com> LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-24x86, bios: By default, reserve the low 64K for all BIOSesH. Peter Anvin
The laundry list of BIOSes that need the low 64K reserved is getting very long, so make it the default across all BIOSes. This also allows the code to be simplified and unified with the reservation code for the first 4K. This resolves kernel bugzilla 16661 and who knows what else... Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org>
2010-08-23Merge branch 'for-upstream/pvhvm' of ↵Linus Torvalds
git://xenbits.xensource.com/people/ianc/linux-2.6 * 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6: xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary xen: pvhvm: allow user to request no emulated device unplug
2010-08-23x86, paravirt: Remove alloc_pmd_clone hook, only used by VMIAlok Kataria
VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23x86, vmware: Remove deprecated VMI kernel supportAlok Kataria
With the recent innovations in CPU hardware acceleration technologies from Intel and AMD, VMware ran a few experiments to compare these techniques to guest paravirtualization technique on VMware's platform. These hardware assisted virtualization techniques have outperformed the performance benefits provided by VMI in most of the workloads. VMware expects that these hardware features will be ubiquitous in a couple of years, as a result, VMware has started a phased retirement of this feature from the hypervisor. Please note that VMI has always been an optimization and non-VMI kernels still work fine on VMware's platform. Latest versions of VMware's product which support VMI are, Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence releases for these products will continue supporting VMI. For more details about VMI retirement take a look at this, http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html This feature removal was scheduled for 2.6.37 back in September 2009. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23x86, mem: Optimize memcpy by avoiding memory false dependeceMa Ling
All read operations after allocation stage can run speculatively, all write operation will run in program order, and if addresses are different read may run before older write operation, otherwise wait until write commit. However CPU don't check each address bit, so read could fail to recognize different address even they are in different page.For example if rsi is 0xf004, rdi is 0xe008, in following operation there will generate big performance latency. 1. movq (%rsi), %rax 2. movq %rax, (%rdi) 3. movq 8(%rsi), %rax 4. movq %rax, 8(%rdi) If %rsi and rdi were in really the same meory page, there are TRUE read-after-write dependence because instruction 2 write 0x008 and instruction 3 read 0x00c, the two address are overlap partially. Actually there are in different page and no any issues, but without checking each address bit CPU could think they are in the same page, and instruction 3 have to wait for instruction 2 to write data into cache from write buffer, then load data from cache, the cost time read spent is equal to mfence instruction. We may avoid it by tuning operation sequence as follow. 1. movq 8(%rsi), %rax 2. movq %rax, 8(%rdi) 3. movq (%rsi), %rax 4. movq %rax, (%rdi) Instruction 3 read 0x004, instruction 2 write address 0x010, no any dependence. At last on Core2 we gain 1.83x speedup compared with original instruction sequence. In this patch we first handle small size(less 20bytes), then jump to different copy mode. Based on our micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We use our micro-benchmark, and will do further test according to your requirment) Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23x86, mem: Don't implement forward memmove() as memcpy()Ma, Ling
memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23x86, mm: Avoid unnecessary TLB flushShaohua Li
In x86, access and dirty bits are set automatically by CPU when CPU accesses memory. When we go into the code path of below flush_tlb_fix_spurious_fault(), we already set dirty bit for pte and don't need flush tlb. This might mean tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When the CPUs do page write, they will automatically check the bit and no software involved. On the other hand, flush tlb in below position is harmful. Test creates CPU number of threads, each thread writes to a same but random address in same vma range and we measure the total time. Under a 4 socket system, original time is 1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is 20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for tlb flush. Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrea Archangeli <aarcange@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessaryIan Campbell
It is not immediately clear what this option causes to become ignored. The actual meaning is that it is not necessary to unplug the emulated devices to safely use the PV ones, even if the platform does not support the unplug protocol. (pressumably the user will only add this option if they have ensured that their domain configuration is safe). I think xen_emul_unplug=unnecessary better captures this. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23xen: pvhvm: allow user to request no emulated device unplugIan Campbell
this allows the user to disable pvhvm and revert to emulated devices in case of a system misconfiguration (e.g. initramfs with only emulated drivers in it). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-22Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: free irq source id in handling error path KVM: destroy workqueue on kvm_create_pit() failures KVM: fix poison overwritten caused by using wrong xstate size
2010-08-21Replace Configure with Enable in description of MAXSMPSamuel Thibault
The "Configure" word tends to make user believe they have to say 'yes' to be able to choose the number of procs/nodes. "Enable" should be unambiguous enough. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_devSergey Senozhatsky
Fix BUG: using smp_processor_id() in preemptible thermal_throttle_add_dev. We know the cpu number when calling thermal_throttle_add_dev, so we can remove smp_processor_id call in thermal_throttle_add_dev by supplying the cpu number as argument. This should resolve kernel bugzilla 16615/16629. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> LKML-Reference: <20100820073634.GB5209@swordfish.minsk.epam.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-20Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Fix apic=debug boot crash x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues x86-32: Fix dummy trampoline-related inline stubs x86-32: Separate 1:1 pagetables from swapper_pg_dir x86, cpu: Fix regression in AMD errata checking code
2010-08-20perf: Remove superfluous return values from perf_callchain_*()Peter Zijlstra
Fixes these build warnings introduced by the callchain rework: arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_kernel’: arch/x86/kernel/cpu/perf_event.c:1646: warning: ‘return’ with a value, in function returning void arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_user’: arch/x86/kernel/cpu/perf_event.c:1699: warning: ‘return’ with a value, in function returning void arch/x86/kernel/cpu/perf_event.c: At top level: arch/x86/kernel/cpu/perf_event.c:1607: warning: ‘perf_callchain_entry_nmi’ defined but not used Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep statesSuresh Siddha
TSC's get reset after suspend/resume (even on cpu's with invariant TSC which runs at a constant rate across ACPI P-, C- and T-states). And in some systems BIOS seem to reinit TSC to arbitrary large value (still sync'd across cpu's) during resume. This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less than rq->age_stamp (introduced in 2.6.32). This leads to a big value returned by scale_rt_power() and the resulting big group power set by the update_group_power() is causing improper load balancing between busy and idle cpu's after suspend/resume. This resulted in multi-threaded workloads (like kernel-compilation) go slower after suspend/resume cycle on core i5 laptops. Fix this by recomputing cyc2ns_offset's during resume, so that sched_clock() continues from the point where it was left off during suspend. Reported-by: Florian Pritz <flo@xssn.at> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> # [v2.6.32+] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20x86, apic: Fix apic=debug boot crashDaniel Kiper
Fix a boot crash when apic=debug is used and the APIC is not properly initialized. This issue appears during Xen Dom0 kernel boot but the fix is generic and the crash could occur on real hardware as well. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: jeremy@goop.org Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-19x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issuesBorislav Petkov
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. [ v2.1: fix !HOTPLUG_CPU build ] Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100819181029.GC17171@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-19Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kprobes/x86: Fix the return address of multiple kretprobes perf tools: Fix build error on read only source. perf, x86: Fix Intel-nhm PMU programming errata workaround
2010-08-19kprobes/x86: Fix the return address of multiple kretprobesKUMANO Syuhei
Fix the return address of subsequent kretprobes when multiple kretprobes are set on the same function. For example: # cd /sys/kernel/debug/tracing # echo "r:event1 sys_symlink" > kprobe_events # echo "r:event2 sys_symlink" >> kprobe_events # echo 1 > events/kprobes/enable # ln -s /tmp/foo /tmp/bar (without this patch) # cat trace ln-897 [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink) ln-897 [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink) (with this patch) # cat trace ln-740 [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink) ln-740 [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink) Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> LKML-Reference: <1281853084.3254.11.camel@camp10-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-19Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-08-19perf: Fix race in callchainsFrederic Weisbecker
Now that software events don't have interrupt disabled anymore in the event path, callchains can nest on any context. So seperating nmi and others contexts in two buffers has become racy. Fix this by providing one buffer per nesting level. Given the size of the callchain entries (2040 bytes * 4), we now need to allocate them dynamically. v2: Fixed put_callchain_entry call after recursion. Fix the type of the recursion, it must be an array. v3: Use a manual pr cpu allocation (temporary solution until NMIs can safely access vmalloc'ed memory). Do a better separation between callchain reference tracking and allocation. Make the "put" path lockless for non-release cases. v4: Protect the callchain buffers with rcu. v5: Do the cpu buffers allocations node affine. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Miller <davem@davemloft.net> Cc: Borislav Petkov <bp@amd64.org>
2010-08-19perf: Factorize callchain context handlingFrederic Weisbecker
Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
2010-08-19perf: Generalize some arch callchain codeFrederic Weisbecker
- Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
2010-08-19perf: Generalize callchain_store()Frederic Weisbecker
callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
2010-08-19perf: Drop unappropriate tests on arch callchainsFrederic Weisbecker
Drop the TASK_RUNNING test on user tasks for callchains as this check doesn't seem to make any sense. Also remove the tests for !current that is not supposed to happen and current->pid as this should be handled at the generic level, with exclude_idle attribute. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
2010-08-18x86-32: Fix dummy trampoline-related inline stubsH. Peter Anvin
Fix dummy inline stubs for trampoline-related functions when no trampolines exist (until we get rid of the no-trampoline case entirely.) Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4C6C294D.3030404@zytor.com>
2010-08-18x86-32: Separate 1:1 pagetables from swapper_pg_dirJoerg Roedel
This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18x86, cpu: Fix regression in AMD errata checking codeHans Rosenfeld
A bug in the family-model-stepping matching code caused the presence of errata to go undetected when OSVW was not used. This causes hangs on some K8 systems because the E400 workaround is not enabled. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18perf, x86: Fix Intel-nhm PMU programming errata workaroundZhang, Yanmin
Fix the Errata AAK100/AAP53/BD53 workaround, the officialy documented workaround we implemented in: 11164cd: perf, x86: Add Nehelem PMU programming errata workaround doesn't actually work fully and causes a stuck PMU state under load and non-functioning perf profiling. A functional workaround was found by trial & error. Affects all Nehalem-class Intel PMUs. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1281073148.2125.63.camel@ymzhang.sh.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> # .35.x Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-17Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: vt,console,kdb: preserve console_blanked while in kdb vt: fix regression warnings from KMS merge arm,kgdb: fix GDB_MAX_REGS no longer used kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c kdb: fix compile error without CONFIG_KALLSYMS
2010-08-17Make do_execve() take a const filename pointerDavid Howells
Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are setJesse Barnes
Otherwise we'll duplicate definitions with the pci.h stubs. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-17KVM: PIT: free irq source id in handling error pathXiao Guangrong
Free irq source id if create pit workqueue fail Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-16kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.cNamhyung Kim
breakinfo->pev is a pointer to percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-15Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: gcc-4.6: ACPI: fix unused but set variables in ACPI ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI processor: remove deprecated ACPI procfs I/F ACPI power_resource: remove unused procfs I/F ACPI: remove deprecated ACPI procfs I/F ACPI: introduce drivers/acpi/sysfs.c ACPI: introduce module parameter acpi.aml_debug_output ACPI: introduce drivers/acpi/debugfs.c ACPI, APEI, ERST debug support ACPI, APEI, Manage GHES as platform devices ACPI, APEI, Rename CPER and GHES severity constants ACPI, APEI, Fix a typo of error path of apei_resources_request ACPI / ACPICA: Fix reference counting problems with GPE handlers ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device ACPI / Sleep: Drop acpi_suspend_finish() ACPI / Sleep: Consolidate suspend and hibernation routines ACPI / Wakeup: Simplify enabling of wakeup devices ACPI / Sleep: Rework enabling wakeup devices ACPI / Sleep: Free NVS copy if suspending of devices fails Fixed up totally buggered "ACPI: fix unused but set variables in ACPI" patch that doesn't even compile in the merge. Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the breakage before I even pulled. And a big "Grrr.." at Len for not even bothering to compile the tree before asking me to pull.
2010-08-15KVM: destroy workqueue on kvm_create_pit() failuresXiaotian Feng
kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise after pit is freed, the workqueue is leaked. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15KVM: fix poison overwritten caused by using wrong xstate sizeXiaotian Feng
fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep is xstate_size. xstate_size is set from cpuid instruction, which is often smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct) to fill in/out fpu.state.xsave, as what we allocated for fpu.state is xstate_size, kernel will write out of memory and caused poison/redzone/padding overwritten warnings. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Reviewed-by: Sheng Yang <sheng@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Sheng Yang <sheng@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15Merge branch 'linus' into releaseLen Brown
Conflicts: drivers/acpi/debug.c Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-14defconfig reductionSam Ravnborg
Use the defconfig files generated by "make savedefconfig" for remaining defconfig files. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2010-08-14archs: replace unifdef-y with header-ySam Ravnborg
unifdef-y and header-y have same semantic, so drop unifdef-y Signed-off-by: Sam Ravnborg <sam@ravnborg.org>