summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2011-12-11nohz: Separate out irq exit and idle loop dyntick logicFrederic Weisbecker
The tick_nohz_stop_sched_tick() function, which tries to delay the next timer tick as long as possible, can be called from two places: - From the idle loop to start the dytick idle mode - From interrupt exit if we have interrupted the dyntick idle mode, so that we reprogram the next tick event in case the irq changed some internal state that requires this action. There are only few minor differences between both that are handled by that function, driven by the ts->inidle cpu variable and the inidle parameter. The whole guarantees that we only update the dyntick mode on irq exit if we actually interrupted the dyntick idle mode, and that we enter in RCU extended quiescent state from idle loop entry only. Split this function into: - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters dynticks idle mode unconditionally if it can, and enters into RCU extended quiescent state. - tick_nohz_irq_exit() which only updates the dynticks idle mode when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called). To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed into tick_nohz_idle_exit(). This simplifies the code and micro-optimize the irq exit path (no need for local_irq_save there). This also prepares for the split between dynticks and rcu extended quiescent state logics. We'll need this split to further fix illegal uses of RCU in extended quiescent states in the idle loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-09x86: Don't use magic strings for EFI loader signatureMatt Fleming
Introduce a symbol, EFI_LOADER_SIGNATURE instead of using the magic strings, which also helps to reduce the amount of ifdeffery. Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09x86, efi: Calling __pa() with an ioremap()ed address is invalidMatt Fleming
If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set in ->attribute we currently call set_memory_uc(), which in turn calls __pa() on a potentially ioremap'd address. On CONFIG_X86_32 this is invalid, resulting in the following oops on some machines: BUG: unable to handle kernel paging request at f7f22280 IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210 [...] Call Trace: [<c104f8ca>] ? page_is_ram+0x1a/0x40 [<c1025aff>] reserve_memtype+0xdf/0x2f0 [<c1024dc9>] set_memory_uc+0x49/0xa0 [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa [<c19216d4>] start_kernel+0x291/0x2f2 [<c19211c7>] ? loglevel+0x1b/0x1b [<c19210bf>] i386_start_kernel+0xbf/0xc8 A better approach to this problem is to map the memory region with the correct attributes from the start, instead of modifying it after the fact. The uncached case can be handled by ioremap_nocache() and the cached by ioremap_cache(). Despite first impressions, it's not possible to use ioremap_cache() to map all cached memory regions on CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really don't like being mapped into the vmalloc space, as detailed in the following bug report, https://bugzilla.redhat.com/show_bug.cgi?id=748516 Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA regions are covered by the direct kernel mapping table on CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI regions via the direct kernel mapping with the initial call to init_memory_mapping() in setup_arch(), whereas previously these regions wouldn't be mapped if they were after the last E820_RAM region until efi_ioremap() was called. Doing it this way allows us to delete efi_ioremap() completely. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-09x86, CPU: Drop superfluous get_cpu_cap() prototypeBorislav Petkov
The get_cpu_cap() external function prototype was declared twice so lose one of them. Clean up the header guard while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1322594083-14507-1-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-08x86, hpet: Immediately disable HPET timer 1 if rtc irq is maskedMark Langsdorf
When HPET is operating in RTC mode, the TN_ENABLE bit on timer1 controls whether the HPET or the RTC delivers interrupts to irq8. When the system goes into suspend, the RTC driver sends a signal to the HPET driver so that the HPET releases control of irq8, allowing the RTC to wake the system from suspend. The switchover is accomplished by a write to the HPET configuration registers which currently only occurs while servicing the HPET interrupt. On some systems, I have seen the system suspend before an HPET interrupt occurs, preventing the write to the HPET configuration register and leaving the HPET in control of the irq8. As the HPET is not active during suspend, it does not generate a wake signal and RTC alarms do not work. This patch forces the HPET driver to immediately transfer control of the irq8 channel to the RTC instead of waiting until the next interrupt event. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2011-12-08memblock: s/memblock_analyze()/memblock_allow_resize()/ and update usersTejun Heo
The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08memblock: Kill memblock_init()Tejun Heo
memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-07x86, NMI: Add to_cpumask() to silence compile warningDan Carpenter
Gcc complains if we don't cast this to a struct cpumask pointer. arch/x86/kernel/nmi_selftest.c:93:2: warning: passing argument 1 of ‘cpumask_empty’ from incompatible pointer type [enabled by default] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/20111207110612.GA3437@mwanda Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-07x86: Add stack top margin for stack overflow checkingMitsuo Hayasaka
It seems that a margin for stack overflow checking is added to top of a kernel stack but is not added to IRQ and exception stacks in stack_overflow_check(). Therefore, the overflows of IRQ and exception stacks are always detected only after they actually occurred and data corruption might occur due to them. This patch adds the margin to top of IRQ and exception stacks as well as a kernel stack to enhance reliability. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20111207082910.9847.3359.stgit@ltc219.sdl.hitachi.co.jp [ removed the #undef - we typically don't do that for uncommon names ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86, NMI: NMI-selftest should handle the UP case properlyDon Zickus
If no remote cpus are online, then just quietly skip the remote IPI test for now. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: andi@firstfloor.org Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Cc: robert.richter@amd.com Link: http://lkml.kernel.org/r/20111206180859.GR1669@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Expose perf capability to other modulesGleb Natapov
KVM needs to know perf capability to decide which PMU it can expose to a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement arch event mask as quirkPeter Zijlstra
Implement the disabling of arch events as a quirk so that we can print a message along with it. This creates some visibility into the problem space and could allow us to work on adding more work-around like the AAJ80 one. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86, perf: Disable non available architectural eventsGleb Natapov
Intel CPUs report non-available architectural events in cpuid leaf 0AH.EBX. Use it to disable events that are not available according to CPU. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06jump_label, x86: Fix section mismatchPeter Zijlstra
WARNING: arch/x86/kernel/built-in.o(.text+0x4c71): Section mismatch in reference from the function arch_jump_label_transform_static() to the function .init.text:text_poke_early() The function arch_jump_label_transform_static() references the function __init text_poke_early(). This is often because arch_jump_label_transform_static lacks a __init annotation or the annotation of text_poke_early is wrong. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-9lefe89mrvurrwpqw5h8xm8z@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Fix rflags in FAKE_STACK_FRAMESeiichi Ikarashi
The x86_64 kernel pushes the fake kernel stack in arch/x86/kernel/entry_64.S:FAKE_STACK_FRAME, and rflags register in it does not conform to the specification. Although Intel's manual[1] says bit 1 of it shall be set to 1, this bit is cleared to 0 on pushing the fake stack. [1] Intel(R) 64 and IA-32 Architectures Software Developer's Manual Vol.1 3-21 Figure 3-8. EFLAGS Register If it is not on purpose, it is better to be fixed, because it can lead some tools misunderstanding the stack frame. For example, "crash" utility[2] actually detects it and warns you like below: RIP: ffffffff8005dfa2 RSP: ffff8104ce0c7f58 RFLAGS: 00000200 [...] bt: WARNING: possibly bogus exception frame Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com> Tested-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Prefer fixed-purpose counters when schedulingPeter Zijlstra
This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter
The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement event scheduler helper functionsRobert Richter
This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Clean up and extend do_int3()Srikar Dronamraju
Since there is a possibility of !KPROBES int3 listeners (such as kgdb) and since DIE_TRAP is currently not being used by anybody, notify all listeners with DIE_INT3. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025142159.GB21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Call do_notify_resume() with interrupts enabledSrikar Dronamraju
do_notify_resume() gets called with interrupts disabled on x86_32. This is different from the x86_64 behavior, where interrupts are enabled at the time. Queries on lkml on this issue hasn't yielded any clear answer. Lets make x86_32 behave the same as x86_64, unless there is a real reason to maintain status quo. Please refer https://lkml.org/lkml/2011/9/27/130 for more details. A similar change was suggested in ARM: https://lkml.org/lkml/2011/8/25/231 My 32-bit machine works fine (tm) with this patch. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025141812.GA21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code supportSteffen Persvold
I used "ifdef CONFIG_NUMA" simply because it doesn't make sense in a non-numa configuration even with SMP enabled. Besides, the only place where it is called right now is in kernel/cpu/amd.c:srat_detect_node() within the "CONFIG_NUMA" protected part. Signed-off-by: Steffen Persvold <sp@numascale.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intr_remapping: Fix section mismatch in ir_dev_scope_init() intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev() x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions x86, AMD: Correct align_va_addr documentation x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms x86/mrst: Battery fixes x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode x86: Fix "Acer Aspire 1" reboot hang x86/mtrr: Resolve inconsistency with Intel processor manual x86: Document rdmsr_safe restrictions x86, microcode: Fix the failure path of microcode update driver init code Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup x86/mpparse: Account for bus types other than ISA and PCI x86, mrst: Change the pmic_gpio device type to IPC mrst: Added some platform data for the SFI translations x86,mrst: Power control commands update x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot x86, UV: Fix UV2 hub part number x86: Add user_mode_vm check in stack_overflow_check
2011-12-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix loss of notification with multi-event perf, x86: Force IBS LVT offset assignment for family 10h perf, x86: Disable PEBS on SandyBridge chips trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter perf session: Fix crash with invalid CPU list perf python: Fix undefined symbol problem perf/x86: Enable raw event access to Intel offcore events perf: Don't use -ENOSPC for out of PMU resources perf: Do not set task_ctx pointer in cpuctx if there are no events in the context perf/x86: Fix PEBS instruction unwind oprofile, x86: Fix crash when unloading module (nmi timer mode) oprofile: Fix crash when unloading module (hr timer mode)
2011-12-05Merge branch 'fortglx/3.3/tip/timers/core' of ↵Thomas Gleixner
git://git.linaro.org/people/jstultz/linux into timers/core
2011-12-05x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh ↵Andreas Herrmann
northbridge functions I've received complaints that the numa_node attribute for family 15h model 00-0fh (e.g. Interlagos) northbridge functions shows -1 instead of the proper node ID. Correct this with attached quirks (similar to quirks for other AMD CPU families used in multi-socket systems). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Frank Arnold <frank.arnold@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, tsc: Skip TSC synchronization checks for tsc=reliableSuresh Siddha
tsc=reliable boot parameter is supposed to skip all the TSC stablility checks during boot time. On a 8-socket system where we want to run an experiment with the "tsc=reliable" boot option, TSC synchronization checks are not getting skipped and marking the TSC as not stable. Check for tsc_clocksource_reliable (which is set via tsc=reliable or for platforms supporting synthetic TSC_RELIABLE feature bit etc) and when set, skip the TSC synchronization tests during boot. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: John Stultz <johnstul@us.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1320446537.15071.14.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Cleanup some assembly entry pointsJan Beulich
system_call_after_swapgs doesn't really benefit from forcing alignment from it - quite the opposite, native code needlessly so far got a big NOP instruction inserted in front of it. Xen being the only user of the separate entry point can well live with the branch going to three bytes into a cache line. The compatibility mode ptregs entry points for one can make use of the GLOBAL() macro, and should be suitably aligned. Their shared continuation point (ia32_ptregs_common) otoh doesn't need to be global at all, but should continue to be properly aligned. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Slightly shorten line system call entry and exit pathsJan Beulich
GET_THREAD_INFO() involves a memory read immediately followed by an "sub" on the value read, in turn (in several cases) immediately followed by a use of the calculated value as the base address of a memory access. This combination of instructions has a non-negligible potential for stalls. In the system call entry point code, however, the (fixed) offset of the stack pointer from the end of the stack is generally known, and hence we can instead avoid the memory load and subtract, and instead do the memory reference using %rsp as the base register. To do so in a legible fashion, introduce a THREAD_INFO() macro which, provided a register (generally %rsp) and the known offset from the end of the stack, produces a suitable memory access operand. The patch attempts to only touch the fast paths (no auditing and alike), but manages to do so only in the 64-bit entry point case; the compatibility mode entry points have so many interdependencies between their various branch targets that it was necessary to also adjust the slow paths to eliminate the risk of having missed some register dependency during code analysis. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Reduce amount of redundant code generated for invalidate_interruptNNJan Beulich
Previously these up to 32 entry points, consisting of all the same code except for their very first instruction, consumed 0x70 bytes per instance. Just like for device interrupt entry points, fold them together so that they all use a single instance of the code after having pushed their vector indicator (resulting in 0x10 bytes per instance, to retain 16-byte alignment of the individual entry points). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CA230200007800064065@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86-64: Slightly shorten int_ret_from_sys_callJan Beulich
Testing for a return to ring 0 was necessary here solely because of the branch out of ret_from_fork. That branch, however, can be directed to retint_restore_args, and thus the test-and-branch can be eliminated here. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4C7EE0200007800064028@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Add NumaChip supportSteffen Persvold
Adds support for Numascale NumaChip large-SMP systems. It is needed to enable the booting of more than ~168 cores. v2: - [Steffen] enumerate only accessible northbridges - [Daniel] rediffed and validated against 3.1-rc10 v3: - [Daniel] use x86_init core numbering override - [Daniel] cleanups as per feedback v4: - [Daniel] use updated x86_cpuinit override v5: - drop disabling interrupts locally, as ISR write is atomic; drop delay - added read-mostly annotations where appropriate - require CONFIG_SMP, so drop conditional path Workload tested on 96 cores/16 sockets. Signed-off-by: Steffen Persvold <sp@numascale.com> Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323101246-2400-1-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Add x86_init platform override to fix up NUMA core numberingDaniel J Blueman
Add an x86_init vector for handling inconsistent core numbering. This is useful for multi-fabric platforms, such as Numascale NumaConnect. v2: - use struct x86_cpuinit_ops - provide default fall-back function to warn Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Make flat_init_apic_ldr() availableDaniel J Blueman
Allow flat_init_apic_ldr() to be used outside the compilation unit for similar APIC implementations. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-1-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Reduce clock calibration time during slave cpu startupJack Steiner
Reduce the startup time for slave cpus. Adds hooks for an arch-specific function for clock calibration. These hooks are used on x86. If a newly started cpu has the same phys_proc_id as a core already active, uses the TSC for the delay loop and has a CONSTANT_TSC, use the already-calculated value of loops_per_jiffy. This patch reduces the time required to start slave cpus on a 4096 cpu system from: 465 sec OLD 62 sec NEW This reduces boot time on a 4096p system by almost 7 minutes. Nice... Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> [fix CONFIG_SMP=n build] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointerH Hartley Sweeten
The last parameter to sort() is a pointer to the function used to swap items. This parameter should be NULL, not 0, when not used. This quiets the following sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platformsMathias Nyman
Intel MID x86 platforms have a memory mapped virtual RTC instead. No MID platform have the default ports (and accessing them may do weird stuff). Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: feng.tang@intel.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map()Mike Ditto
Replace the bubble sort in sanitize_e820_map() with a call to the generic kernel sort function to avoid pathological performance with large maps. On large (thousands of entries) E820 maps, the previous code took minutes to run; with this change it's now milliseconds. Signed-off-by: Mike Ditto <mditto@google.com> Cc: sassmann@kpanic.de Cc: yuenn@google.com Cc: Stefan Assmann <sassmann@kpanic.de> Cc: Nancy Yuen <yuenn@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05arch/x86/kernel/ptrace.c: Quiet sparse noiseH Hartley Sweeten
ptrace_set_debugreg() is only used in this file and should be static. This also quiets the following sparse warning: warning: symbol 'ptrace_set_debugreg' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Merge branch 'ucode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp ↵Ingo Molnar
into x86/urgent
2011-12-05x86: Fix "Acer Aspire 1" reboot hangPeter Chubb
Looks like on some Acer Aspire 1s with older bioses, reboot via bios fails. It works on my machine, (with BIOS version 0.3310) but not on some others (BIOS version 0.3309). There's a log of problems at: https://bbs.archlinux.org/viewtopic.php?id=124136 This patch adds a different callback to the reboot quirk table, to allow rebooting via keybaord controller. Reported-by: Uroš Vampl <mobile.leecher@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1323093233-9481-1-git-send-email-anarsoul@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/mtrr: Resolve inconsistency with Intel processor manualAjaykumar Hotchandani
Following is from Notes of section 11.5.3 of Intel processor manual available at: http://www.intel.com/Assets/PDF/manual/325384.pdf For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled. This patch provides resolution for that. Ideally, I will like to make changes only for Pentium 4 and Xeon processors. But, I am not finding easier way to do it. And, extra wbinvd() instruction does not hurt much for other processors. Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, microcode: Fix the failure path of microcode update driver init codeSrivatsa S. Bhat
The microcode update driver's initialization code does not handle failures correctly. This patch fixes this issue. Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111107123530.12164.31227.stgit@srivatsabhat.in.ibm.com Link: http://lkml.kernel.org/r/4ED8E2270200007800065120@nat28.tlf.novell.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-05x86/config: Revamp configuration for MID devicesAlan Cox
This follows on from the patch applied in 3.2rc1 which creates an INTEL_MID configuration. We can now add the entry for Medfield specific code. After this is merged the final patch will be submitted which moves the rest of the device Kconfig dependancies to MRST/MEDFIELD/INTEL_MID as appropriate. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Use kmemdup() in copy_thread(), rather than duplicating its implementationThomas Meyer
The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1321569820.1624.275.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Replace the EVT_TO_HPET_DEV() macro with an inline functionFerenc Wagner
The original macro worked only when applied to variables named 'evt'. While this could have been fixed by simply renaming the macro argument, a more type-safe replacement is preferred. Signed-off-by: Ferenc Wagner <wferi@niif.hu> Cc: Venkatesh Pallipadi \(Venki\) <venki@google.com> Link: http://lkml.kernel.org/r/8ed5c66c02041226e8cf8b4d5d6b41e543d90bd6.1321626272.git.wferi@niif.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Add TAINT_FIRMWARE_WORKAROUND on MTRR fixupPrarit Bhargava
TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup is done. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/mpparse: Account for bus types other than ISA and PCIBjorn Helgaas
In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit and 64-bit versions of MP_bus_info were rearranged to match each other better. Unfortunately it introduced a regression: prior to that change we used to always set the mp_bus_not_pci bit, then clear it if we found a PCI bus. After it, we set mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave it alone otherwise. In the cases of ISA and PCI, there's not much difference. But ISA is not the only non-PCI bus, so it's better to always set mp_bus_not_pci and clear it only for PCI. Without this change, Dan's Dell PowerEdge 4200 panics on boot with a log indicating interrupt routing trouble unless the "noapic" option is supplied. With this change, the machine boots reliably without "noapic". Fixes http://bugs.debian.org/586494 Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # 2.6.26+ Cc: Dan McGrath <troubledaemon@gmail.com> Cc: Alexey Starikovskiy <aystarik@gmail.com> [jrnieder@gmail.com: clarified commit message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Fix the 32-bit stackoverflow-debug buildIngo Molnar
The panic_on_stackoverflow variable needs to be avilable on the 32-bit side as well ... Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI rebootRafael J. Wysocki
Dell OptiPlex 990 is known to require PCI reboot, so add it to the reboot blacklist in pci_reboot_dmi_table[]. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Link: http://lkml.kernel.org/r/201111160019.51303.rjw@sisk.pl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Default to vsyscall=emulateAndy Lutomirski
This essentially reverts: 2b666859ec32: x86: Default to vsyscall=native for now The ABI breakage should now be fixed by: commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Oct 20 08:48:19 2011 -0700 x86-64: Set siginfo and context on vsyscall emulation faults Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>