summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2011-12-05Merge branch 'ucode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp ↵Ingo Molnar
into x86/urgent
2011-12-05x86: Fix "Acer Aspire 1" reboot hangPeter Chubb
Looks like on some Acer Aspire 1s with older bioses, reboot via bios fails. It works on my machine, (with BIOS version 0.3310) but not on some others (BIOS version 0.3309). There's a log of problems at: https://bbs.archlinux.org/viewtopic.php?id=124136 This patch adds a different callback to the reboot quirk table, to allow rebooting via keybaord controller. Reported-by: Uroš Vampl <mobile.leecher@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1323093233-9481-1-git-send-email-anarsoul@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/mtrr: Resolve inconsistency with Intel processor manualAjaykumar Hotchandani
Following is from Notes of section 11.5.3 of Intel processor manual available at: http://www.intel.com/Assets/PDF/manual/325384.pdf For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled. This patch provides resolution for that. Ideally, I will like to make changes only for Pentium 4 and Xeon processors. But, I am not finding easier way to do it. And, extra wbinvd() instruction does not hurt much for other processors. Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, microcode: Fix the failure path of microcode update driver init codeSrivatsa S. Bhat
The microcode update driver's initialization code does not handle failures correctly. This patch fixes this issue. Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111107123530.12164.31227.stgit@srivatsabhat.in.ibm.com Link: http://lkml.kernel.org/r/4ED8E2270200007800065120@nat28.tlf.novell.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-05x86/config: Revamp configuration for MID devicesAlan Cox
This follows on from the patch applied in 3.2rc1 which creates an INTEL_MID configuration. We can now add the entry for Medfield specific code. After this is merged the final patch will be submitted which moves the rest of the device Kconfig dependancies to MRST/MEDFIELD/INTEL_MID as appropriate. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Use kmemdup() in copy_thread(), rather than duplicating its implementationThomas Meyer
The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1321569820.1624.275.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Replace the EVT_TO_HPET_DEV() macro with an inline functionFerenc Wagner
The original macro worked only when applied to variables named 'evt'. While this could have been fixed by simply renaming the macro argument, a more type-safe replacement is preferred. Signed-off-by: Ferenc Wagner <wferi@niif.hu> Cc: Venkatesh Pallipadi \(Venki\) <venki@google.com> Link: http://lkml.kernel.org/r/8ed5c66c02041226e8cf8b4d5d6b41e543d90bd6.1321626272.git.wferi@niif.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Add TAINT_FIRMWARE_WORKAROUND on MTRR fixupPrarit Bhargava
TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup is done. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/mpparse: Account for bus types other than ISA and PCIBjorn Helgaas
In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit and 64-bit versions of MP_bus_info were rearranged to match each other better. Unfortunately it introduced a regression: prior to that change we used to always set the mp_bus_not_pci bit, then clear it if we found a PCI bus. After it, we set mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave it alone otherwise. In the cases of ISA and PCI, there's not much difference. But ISA is not the only non-PCI bus, so it's better to always set mp_bus_not_pci and clear it only for PCI. Without this change, Dan's Dell PowerEdge 4200 panics on boot with a log indicating interrupt routing trouble unless the "noapic" option is supplied. With this change, the machine boots reliably without "noapic". Fixes http://bugs.debian.org/586494 Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # 2.6.26+ Cc: Dan McGrath <troubledaemon@gmail.com> Cc: Alexey Starikovskiy <aystarik@gmail.com> [jrnieder@gmail.com: clarified commit message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Fix the 32-bit stackoverflow-debug buildIngo Molnar
The panic_on_stackoverflow variable needs to be avilable on the 32-bit side as well ... Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI rebootRafael J. Wysocki
Dell OptiPlex 990 is known to require PCI reboot, so add it to the reboot blacklist in pci_reboot_dmi_table[]. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Link: http://lkml.kernel.org/r/201111160019.51303.rjw@sisk.pl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Default to vsyscall=emulateAndy Lutomirski
This essentially reverts: 2b666859ec32: x86: Default to vsyscall=native for now The ABI breakage should now be fixed by: commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Oct 20 08:48:19 2011 -0700 x86-64: Set siginfo and context on vsyscall emulation faults Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Set siginfo and context on vsyscall emulation faultsAndy Lutomirski
To make this work, we teach the page fault handler how to send signals on failed uaccess. This only works for user addresses (kernel addresses will never hit the page fault handler in the first place), so we need to generate signals for those separately. This gets the tricky case right: if the user buffer spans multiple pages and only the second page is invalid, we set cr2 and si_addr correctly. UML relies on this behavior to "fault in" pages as needed. We steal a bit from thread_info.uaccess_err to enable this. Before this change, uaccess_err was a 32-bit boolean value. This fixes issues with UML when vsyscall=emulate. Reported-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, NMI: Add knob to disable using NMI IPIs to stop cpusDon Zickus
Some machines may exhibit problems using the NMI to stop other cpus. This knob just allows one to revert back to the original behaviour to help diagnose the problem. V2: make function static Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, NMI: Add NMI IPI selftestDon Zickus
The previous patch modified the stop cpus path to use NMI instead of IRQ as the way to communicate to the other cpus to shutdown. There were some concerns that various machines may have problems with using an NMI IPI. This patch creates a selftest to check if NMI is working at boot. The idea is to help catch any issues before the machine panics and we learn the hard way. Loosely based on the locking-selftest.c file, this separate file runs a couple of simple tests and reports the results. The output looks like: ... Brought up 4 CPUs ---------------- | NMI testsuite: -------------------- remote IPI: ok | local IPI: ok | -------------------- Good, all 2 testcases passed! | --------------------------------- Total of 4 processors activated (21330.61 BogoMIPS). ... Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpusDon Zickus
A recent discussion started talking about the locking on the pstore fs and how it relates to the kmsg infrastructure. We noticed it was possible for userspace to r/w to the pstore fs (grabbing the locks in the process) and block the panic path from r/w to the same fs. The reason was the cpu with the lock could be doing work while the crashing cpu is panic'ing. Busting those spinlocks might cause those cpus to step on each other's data. Fine, fair enough. It was suggested it would be nice to serialize the panic path (ie stop the other cpus) and have only one cpu running. This would allow us to bust the spinlocks and not worry about another cpu stepping on the data. Of course, smp_send_stop() does this in the panic case. kmsg_dump() would have to be moved to be called after it. Easy enough. The only problem is on x86 the smp_send_stop() function calls the REBOOT_VECTOR. Any cpu with irqs disabled (which pstore and its backend ERST would do), block this IPI and thus do not stop. This makes it difficult to reliably log data to the pstore fs. The patch below switches from the REBOOT_VECTOR to NMI (and mimics what kdump does). Switching to NMI allows us to deliver the IPI when irqs are disabled, increasing the reliability of this function. However, Andi carefully noted that on some machines this approach does not work because of broken BIOSes or whatever. To help accomodate this, the next couple of patches will run a selftest and provide a knob to disable. V2: uses atomic ops to serialize the cpu that shuts everyone down V3: comment cleanup Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, UV: Fix UV2 hub part numberJack Steiner
There was a mixup when the SGI UV2 hub chip was sent to be fabricated, and it ended up with the wrong part number in the HRP_NODE_ID mmr. Future versions of the chip will (may) have the correct part number. Change the UV infrastructure to recognize both part numbers as valid IDs of a UV2 hub chip. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Clean up the range of stack overflow checkingMitsuo Hayasaka
The overflow checking of kernel stack checks if the stack pointer points to the available kernel stack range, which is derived from the original overflow checking. It is clear that curbase address is always less than low boundary of available kernel stack. So, this patch removes the first condition that checks if the pointer is higher than curbase. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060845.11076.40916.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-05x86: Panic on detection of stack overflowMitsuo Hayasaka
Currently, messages are just output on the detection of stack overflow, which is not sufficient for systems that need a high reliability. This is because in general the overflow may corrupt data, and the additional corruption may occur due to reading them unless systems stop. This patch adds the sysctl parameter kernel.panic_on_stackoverflow and causes a panic when detecting the overflows of kernel, IRQ and exception stacks except user stack according to the parameter. It is disabled by default. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Check stack overflow in detailMitsuo Hayasaka
Currently, only kernel stack is checked for the overflow, which is not sufficient for systems that need a high reliability. To enhance it, it is required to check the IRQ and exception stacks, as well. This patch checks all the stack types and will cause messages of stacks in detail when free stack space drops below a certain limit except user stack. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060829.11076.51733.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-05x86: Add user_mode_vm check in stack_overflow_checkMitsuo Hayasaka
The kernel stack overflow is checked in stack_overflow_check(), which may wrongly detect the overflow if the stack pointer in user space points to the kernel stack intentionally or accidentally. So, the actual overflow is never detected after this misdetection because WARN_ONCE() is used on the detection of it. This patch adds user-mode-vm checking before it to avoid this problem and bails out early if the user stack is used. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060821.11076.55315.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-05perf, x86: Force IBS LVT offset assignment for family 10hRobert Richter
On AMD family 10h we see firmware bug messages like the following: [Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu [Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100) [Firmware Bug]: using offset 1 for IBS interrupts [Firmware Bug]: workaround enabled for IBS LVT offset perf: AMD IBS detected (0x00000007) We always see this, since the offsets are not assigned by the BIOS for this family. Force LVT offset assignment in this case. If the OS assignment fails, fallback to BIOS settings and try to setup this. The fallback to BIOS settings weakens the family check since force_ibs_eilvt_setup() may fail e.g. in case of virtual machines. But setup may still succeed if BIOS offsets are correct. Other families don't have a workaround implemented that assigns LVT offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that families. With the patch the [Firmware Bug] messages vanish. We see now: IBS: LVT offset 1 assigned perf: AMD IBS detected (0x00000007) Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05perf, x86: Disable PEBS on SandyBridge chipsPeter Zijlstra
Cc: Stephane Eranian <eranian@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-04x86: Fix boot failures on older AMD CPU'sLinus Torvalds
People with old AMD chips are getting hung boots, because commit bcb80e53877c ("x86, microcode, AMD: Add microcode revision to /proc/cpuinfo") moved the microcode detection too early into "early_init_amd()". At that point we are *so* early in the booth that the exception tables haven't even been set up yet, so the whole rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy); doesn't actually work: if the rdmsr does a GP fault (due to non-existant MSR register on older CPU's), we can't fix it up yet, and the boot fails. Fix it by simply moving the code to a slightly later point in the boot (init_amd() instead of early_init_amd()), since the kernel itself doesn't even really care about the microcode patchlevel at this point (or really ever: it's made available to user space in /proc/cpuinfo, and updated if you do a microcode load). Reported-tested-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Bob Tracy <rct@gherkin.frus.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-03xen/pm_idle: Make pm_idle be default_idle under Xen.Konrad Rzeszutek Wilk
The idea behind commit d91ee5863b71 ("cpuidle: replace xen access to x86 pm_idle and default_idle") was to have one call - disable_cpuidle() which would make pm_idle not be molested by other code. It disallows cpuidle_idle_call to be set to pm_idle (which is excellent). But in the select_idle_routine() and idle_setup(), the pm_idle can still be set to either: amd_e400_idle, mwait_idle or default_idle. This depends on some CPU flags (MWAIT) and in AMD case on the type of CPU. In case of mwait_idle we can hit some instances where the hypervisor (Amazon EC2 specifically) sets the MWAIT and we get: Brought up 2 CPUs invalid opcode: 0000 [#1] SMP Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 RIP: e030:[<ffffffff81015d1d>] [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 ... Call Trace: [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8 [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10 RIP [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 RSP <ffff8801d28ddf10> In the case of amd_e400_idle we don't get so spectacular crashes, but we do end up making an MSR which is trapped in the hypervisor, and then follow it up with a yield hypercall. Meaning we end up going to hypervisor twice instead of just once. The previous behavior before v3.0 was that pm_idle was set to default_idle regardless of select_idle_routine/idle_setup. We want to do that, but only for one specific case: Xen. This patch does that. Fixes RH BZ #739499 and Ubuntu #881076 Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-28Merge branch 'master' into x86/memblockTejun Heo
Conflicts & resolutions: * arch/x86/xen/setup.c dc91c728fd "xen: allow extra memory to be in multiple regions" 24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c 166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/" 5dfe8660a3d "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig 6661672053a "memblock: add NO_BOOTMEM config symbol" c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c d1f0ece6cdc "mm/memblock.c: small function definition fixes" ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-26Merge 3.2-rc3 into tty-next to handle merge conflict in tty_ldisc.cGreg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-21time: x86: Remove CLOCK_TICK_RATE from tsc codeDeepak Saxena
The tsc code uses CLOCK_TICK_RATE which on x86 is defined to just be the same as PIT_TICK_RATE. This patch updates the code use the later as we want to depecrate and remove the global CLOCK_TICK_RATE symbol. Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-20Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM guest: prevent tracing recursion with kvmclock Revert "KVM: PPC: Add support for explicit HIOR setting" KVM: VMX: Check for automatic switch msr table overflow KVM: VMX: Add support for guest/host-only profiling KVM: VMX: add support for switching of PERF_GLOBAL_CTRL KVM: s390: announce SYNC_MMU KVM: s390: Fix tprot locking KVM: s390: handle SIGP sense running intercepts KVM: s390: Fix RUNNING flag misinterpretation
2011-11-20KVM guest: prevent tracing recursion with kvmclockAvi Kivity
Prevent tracing of preempt_disable() in get_cpu_var() in kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are traced and this causes the function_graph tracer to go into an infinite recursion. By open coding the preempt_disable() around the get_cpu_var(), we can use the notrace version which prevents preempt_disable/enable() from being traced and prevents the recursion. Based on a similar patch for Xen from Jeremy Fitzhardinge. Tested-by: Gleb Natapov <gleb@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-18x86, syscall: Re-fix typo in commentH. Peter Anvin
Fix the same typo as was fixed in: b7641d2c x86-64, syscall: Adjust comment spacing and remove typo ... for the new versions of this file (32-bit and IA32 compat). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1321569446-20433-4-git-send-email-hpa@linux.intel.com
2011-11-17x86: Generate system call tables and unistd_*.h from tablesH. Peter Anvin
Generate system call tables and unistd_*.h automatically from the tables in arch/x86/syscalls. All other information, like NR_syscalls, is auto-generated, some of which is in asm-offsets_*.c. This allows us to keep all the system call information in one place, and allows for kernel space and user space to see different information; this is currently used for the ia32 system call numbers when building the 64-bit kernel, but will be used by the x32 ABI in the near future. This also removes some gratuitious differences between i386, x86-64 and ia32; in particular, now all system call tables are generated with the same mechanism. Cc: H. J. Lu <hjl.tools@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-11-17x86-64, syscall: Adjust comment spacing and remove typoH. Peter Anvin
Adjust spacing for comment so that it matches the multiline comment style used in the rest of the kernel, and remove word duplication. It is not really clear what version of gcc this refers to, but the extra & doesn't cause any harm, so there is no reason to remove it. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-11-15serial, mfd: don't hardcode the consoleMika Westerberg
Add support to specify which HSU port to use as an early console. This can be selected by passing "earlyprintk=hsu<n>" on the kernel command line. By default port 0 is still used. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-15iommu: Add option to group multi-function devicesAlex Williamson
The option iommu=group_mf indicates the that the iommu driver should expose all functions of a multi-function PCI device as the same iommu_device_group. This is useful for disallowing individual functions being exposed as independent devices to userspace as there are often hidden dependencies. Virtual functions are not affected by this option. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-14x86: Call stop_machine_text_poke() on all CPUsRabin Vincent
It appears that stop_machine_text_poke() wants to be called on all CPUs, like it's done from text_poke_smp(). Fix text_poke_smp_batch() to do this. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/r/1319702072-32676-1-git-send-email-rabin@rab.in Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf/x86: Enable raw event access to Intel offcore eventsPeter Zijlstra
Now that the core offcore support is fixed up (thanks Stephane) and we have sane generic events utilizing them, re-enable the raw access to the feature as well. Note that it doesn't matter if you use event 0x1b7 or 0x1bb to specify an offcore event, either one works and neither guarantees you'll end up on a particular offcore MSR. Based on original patch from: Vince Weaver <vweaver1@eecs.utk.edu>. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vince Weaver <vweaver1@eecs.utk.edu>. Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1108031200390.703@cl320.eecs.utk.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf: Don't use -ENOSPC for out of PMU resourcesPeter Zijlstra
People (Linus) objected to using -ENOSPC to signal not having enough resources on the PMU to satisfy the request. Use -EINVAL. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-xv8geaz2zpbjhlx0svmpp28n@git.kernel.org [ merged to newer kernel, fixed up MIPS impact ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf/x86: Fix PEBS instruction unwindPeter Zijlstra
Masami spotted that we always try to decode the instruction stream as 64bit instructions when running a 64bit kernel, this doesn't work for ia32-compat proglets. Use TIF_IA32 to detect if we need to use the 32bit instruction decoder. Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10x86, ioapic: Only print ioapic debug information for IRQs belonging to an ↵Mathias Nyman
ioapic chip with "apic=verbose" the print_IO_APIC() function tries to print IRQ to pin mappings for every active irq. It assumes chip_data is of type irq_cfg and may cause an oops if not. As the print_IO_APIC() is called from a late_initcall other chained irq chips may already be registered with custom chip_data information, causing an oops. This is the case with intel MID SoC devices with gpio demuxers registered as irq_chips. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> [ -v2: fixed build failure ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10x86/mrst: Avoid reporting wrong nmi statusJacob Pan
Moorestown/Medfield platform does not have port 0x61 to report NMI status, nor does it have external NMI sources. The only NMI sources are from lapic, as results of perf counter overflow or IPI, e.g. NMI watchdog or spin lock debug. Reading port 0x61 on Moorestown will return 0xff which misled NMI handlers to false critical errors such memory parity error. The subsequent ioport access for NMI handling can also cause undefined behavior on Moorestown. This patch allows kernel process NMI due to watchdog or backrace dump without unnecessary hangs. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> [hand applied] Signed-off-by: Alan Cox <alan@linux.intel.com>
2011-11-10x86/apic: Allow use of lapic timer early calibration resultJacob Pan
lapic timer calibration can be combined with tsc in platform specific calibration functions. if such calibration result is obtained early, we can skip the redundant calibration loops. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10x86/apic: Do not clear nr_irqs_gsi if no legacy irqsJacob Pan
nr_legacy_irqs is set in probe_nr_irqs_gsi, we should not clear it after that. Otherwise, the result is that MSI irqs will be allocated from the wrong range for the systems without legacy PIC. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-10x86/platform: Add a wallclock_init func to x86_platforms opsFeng Tang
Some wall clock devices use MMIO based HW register, this new function will give them a chance to do some initialization work before their get/set_time service get called. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-08x86/mce: Make mce_chrdev_ops 'static const'Luck, Tony
Arjan would like to make struct file_operations const, but mce-inject directly writes to the mce_chrdev_ops to install its write handler. In an ideal world mce-inject would have its own character device, but we have a sizable legacy of test scripts that hardwire "/dev/mcelog", so it would be painful to switch to a separate device now. Instead, this patch switches to a stub function in the mce code, with a registration helper that mce-inject can call when it is loaded. Note that this would also allow for a sane process to allow mce-inject to be unloaded again (with an unregister function, and appropriate module_{get,put}() calls), but that is left for potential future patches. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-06Merge branch 'upstream/jump-label-noearly' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: jump-label: initialize jump-label subsystem much earlier x86/jump_label: add arch_jump_label_transform_static() s390/jump-label: add arch_jump_label_transform_static() jump_label: add arch_jump_label_transform_static() to optimise non-live code updates sparc/jump_label: drop arch_jump_label_text_poke_early() x86/jump_label: drop arch_jump_label_text_poke_early() jump_label: if a key has already been initialized, don't nop it out stop_machine: make stop_machine safe and efficient to call early jump_label: use proper atomic_t initializer Conflicts: - arch/x86/kernel/jump_label.c Added __init_or_module to arch_jump_label_text_poke_early vs removal of that function entirely - kernel/stop_machine.c same patch ("stop_machine: make stop_machine safe and efficient to call early") merged twice, with whitespace fix in one version
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-02Merge branch 'linux_next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits) MAINTAINERS: add an entry for Edac Sandy Bridge driver edac: tag sb_edac as EXPERIMENTAL, as it requires more testing EDAC: Fix incorrect edac mode reporting in sb_edac edac: sb_edac: Add it to the building system edac: Add an experimental new driver to support Sandy Bridge CPU's i7300_edac: Fix error cleanup logic i7core_edac: Initialize memory name with cpu, channel, bank i7core_edac: Fix compilation on 32 bits arch i7core_edac: scrubbing fixups EDAC: Correct Kconfig dependencies i7core_edac: return -ENODEV if no MC is found i7core_edac: use edac's own way to print errors MAINTAINERS: remove dropped edac_mce.* from the file i7core_edac: Drop the edac_mce facility x86, MCE: Use notifier chain only for MCE decoding EDAC i7core: Use mce socketid for better compatibility i7core_edac: Don't enable memory scrubbing for Xeon 35xx i7core_edac: Add scrubbing support edac: Move edac main structs to include/linux/edac.h i7core_edac: Fix oops when trying to inject errors ...
2011-11-01i7core_edac: Drop the edac_mce facilityBorislav Petkov
Remove edac_mce pieces and use the normal MCE decoder notifier chain by retaining the same functionality with considerably less code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31Cross Memory AttachChristopher Yeoh
The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. The following patch attempts to achieve this by allowing a destination process, given an address and size from a source process, to copy memory directly from the source process into its own address space via a system call. There is also a symmetrical ability to copy from the current process's address space into a destination process's address space. - Use of /proc/pid/mem has been considered, but there are issues with using it: - Does not allow for specifying iovecs for both src and dest, assuming preadv or pwritev was implemented either the area read from or written to would need to be contiguous. - Currently mem_read allows only processes who are currently ptrace'ing the target and are still able to ptrace the target to read from the target. This check could possibly be moved to the open call, but its not clear exactly what race this restriction is stopping (reason appears to have been lost) - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix domain socket is a bit ugly from a userspace point of view, especially when you may have hundreds if not (eventually) thousands of processes that all need to do this with each other - Doesn't allow for some future use of the interface we would like to consider adding in the future (see below) - Interestingly reading from /proc/pid/mem currently actually involves two copies! (But this could be fixed pretty easily) As mentioned previously use of vmsplice instead was considered, but has problems. Since you need the reader and writer working co-operatively if the pipe is not drained then you block. Which requires some wrapping to do non blocking on the send side or polling on the receive. In all to all communication it requires ordering otherwise you can deadlock. And in the example of many MPI tasks writing to one MPI task vmsplice serialises the copying. There are some cases of MPI collectives where even a single copy interface does not get us the performance gain we could. For example in an MPI_Reduce rather than copy the data from the source we would like to instead use it directly in a mathops (say the reduce is doing a sum) as this would save us doing a copy. We don't need to keep a copy of the data from the source. I haven't implemented this, but I think this interface could in the future do all this through the use of the flags - eg could specify the math operation and type and the kernel rather than just copying the data would apply the specified operation between the source and destination and store it in the destination. Although we don't have a "second user" of the interface (though I've had some nibbles from people who may be interested in using it for intra process messaging which is not MPI). This interface is something which hardware vendors are already doing for their custom drivers to implement fast local communication. And so in addition to this being useful for OpenMPI it would mean the driver maintainers don't have to fix things up when the mm changes. There was some discussion about how much faster a true zero copy would go. Here's a link back to the email with some testing I did on that: http://marc.info/?l=linux-mm&m=130105930902915&w=2 There is a basic man page for the proposed interface here: http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt This has been implemented for x86 and powerpc, other architecture should mainly (I think) just need to add syscall numbers for the process_vm_readv and process_vm_writev. There are 32 bit compatibility versions for 64-bit kernels. For arch maintainers there are some simple tests to be able to quickly verify that the syscalls are working correctly here: http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <linux-man@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>