summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-09-03x86: Use {push,pop}{l,q}_cfi in more placesJan Beulich
... plus additionally introduce {push,pop}f{l,q}_cfi. All in the hope that the code becomes better readable this way (it gets quite a bit smaller in any case). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03i386: Add unwind directives to syscall ptregs stubsJan Beulich
When these stubs are actual functions (i.e. having a return instruction) and have stack manipulation instructions in them, they should also be annotated to allow unwinding through them. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBCF00200007800013F99@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Use symbolics instead of raw numbers in entry_64.SJan Beulich
... making the code a little less fragile. Also use pushq_cfi instead of raw CFI annotations in two more places, and add two missing annotations after stack pointer adjustments which got modified here anyway. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Adjust frame type at paranoid_exit:Jan Beulich
As this isn't an exception or interrupt entry point, it doesn't have any of the hardware provide frame layouts active. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Fix unwind annotations in syscall stubsJan Beulich
With the return address removed from the stack, these should really refer to their caller's register state. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Try to handle unknown nmis with an enabled PMURobert Richter
When the PMU is enabled it is valid to have unhandled nmis, two events could trigger 'simultaneously' raising two back-to-back NMIs. If the first NMI handles both, the latter will be empty and daze the CPU. The solution to avoid an 'unknown nmi' massage in this case was simply to stop the nmi handler chain when the PMU is enabled by stating the nmi was handled. This has the drawback that a) we can not detect unknown nmis anymore, and b) subsequent nmi handlers are not called. This patch addresses this. Now, we check this unknown NMI if it could be a PMU back-to-back NMI. Otherwise we pass it and let the kernel handle the unknown nmi. This is a debug log: cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430 cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616 cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320 cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139 cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100 cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607 cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416 cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032 cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830 cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743 cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552 cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224 cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677 cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772 cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818 cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591 Uhhuh. NMI received for unknown reason 00 on CPU 6. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue Deltas: nmi #32334 340186 nmi #32335 1327704 nmi #32336 1819 <<<< back-to-back nmi [1] nmi #32337 85961 nmi #32338 284507 nmi #32339 1578809 nmi #32340 217616 nmi #32341 1798 <<<< back-to-back nmi [2] nmi #32342 240913 nmi #32343 1512809 nmi #32344 116672 nmi #32345 412453 nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3] For back-to-back nmi detection there are the following rules: The PMU nmi handler was handling more than one counter and no counter was handled in the subsequent nmi (see [1] and [2] above). There is another case if there are two subsequent back-to-back nmis [3]. The 2nd is detected as back-to-back because the first handled more than one counter. If the second handles one counter and the 3rd handles nothing, we drop the 3rd nmi because it could be a back-to-back nmi. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Fix handle_irq return valuesPeter Zijlstra
Now that we rely on the number of handled overflows, ensure all handle_irq implementations actually return the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Fix accidentally ack'ing a second event on intel perf counterDon Zickus
During testing of a patch to stop having the perf subsytem swallow nmis, it was uncovered that Nehalem boxes were randomly getting unknown nmis when using the perf tool. Moving the ack'ing of the PMI closer to when we get the status allows the hardware to properly re-set the PMU bit signaling another PMI was triggered during the processing of the first PMI. This allows the new logic for dealing with the shortcomings of multiple PMIs to handle the extra NMI by 'eat'ing it later. Now one can wonder why are we getting a second PMI when we disable all the PMUs in the begining of the NMI handler to prevent such a case, for that I do not know. But I know the fix below helps deal with this quirk. Tested on multiple Nehalems where the problem was occuring. With the patch, the code now loops a second time to handle the second PMI (whereas before it was not). Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01Merge branch 'urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent
2010-09-01oprofile, x86: fix init_sysfs() function stubRobert Richter
The use of the return value of init_sysfs() with commit 10f0412 oprofile, x86: fix init_sysfs error handling discovered the following build error for !CONFIG_PM: .../linux/arch/x86/oprofile/nmi_int.c: In function ‘op_nmi_init’: .../linux/arch/x86/oprofile/nmi_int.c:784: error: expected expression before ‘do’ make[2]: *** [arch/x86/oprofile/nmi_int.o] Error 1 make[1]: *** [arch/x86/oprofile] Error 2 This patch fixes this. Reported-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-09-01perf, x86, Pentium4: Add RAW events verificationCyrill Gorcunov
Implements verification of - Bits of ESCR EventMask field (meaningful bits in field are hardware predefined and others bits should be set to zero) - INSTR_COMPLETED event (it is available on predefined cpu model only) - Thread shared events (they should be guarded by "perf_event_paranoid" sysctl due to security reason). The side effect of this action is that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Tested-by: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100825182334.GB14874@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31oprofile, x86: fix init_sysfs error handlingRobert Richter
On failure init_sysfs() might not properly free resources. The error code of the function is not checked. And, when reinitializing the exit function might be called twice. This patch fixes all this. Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-08-31Merge commit 'v2.6.36-rc3' into x86/memblockIngo Molnar
Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31x86, iommu: Fix IOMMU_INIT alignment rulesKonrad Rzeszutek Wilk
This boot crash was observed: DMA-API: preallocated 32768 debug entries DMA-API: debugging enabled by kernel config BUG: unable to handle kernel paging request at 19da8955 IP: [<f4ffffff>] 0xf4ffffff *pde = 00000000 The crux of the failure was that even if we did not use any of the .iommu_table section, the linker would still insert it in the vmlinux file. This patch fixes that and also fixes the runtime crash where we would try to access the array. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-30x86, kmemcheck: Remove double testJulia Lawall
The opcodes 0x2e and 0x3e are tested for in the first Group 2 line as well. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @expression@ expression E; @@ ( * E || ... || E | * E && ... && E ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-27x86, doc: Adding comments about .iommu_table and its neighbors.Konrad Rzeszutek Wilk
Updating the linker section with comments about .iommu_table and some other ones that I know of. CC: Sam Ravnborg <sam@ravnborg.org> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282933173-19960-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86: Remove old bootmem codeYinghai Lu
Requested by Ingo, Thomas and HPA. The old bootmem code is no longer necessary, and the transition is complete. Remove it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get ↵Yinghai Lu
correct dma_reserve memblock_memory_size() will return memory size in memblock.memory.region. memblock_free_memory_size() will return free memory size in memblock.memory.region. So We can get exact reseved size in specified range. Set the size right after initmem_init(), because later bootmem API will get area above 16M. (except some fallback). Later after we remove the bootmem, We could call that just before paging_init(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86: Remove not used early_res codeYinghai Lu
and some functions in e820.c that are not used anymore Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Replace e820_/_early string with memblock_Yinghai Lu
1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86: Use memblock to replace early_resYinghai Lu
1. replace find_e820_area with memblock_find_in_range 2. replace reserve_early with memblock_x86_reserve_range 3. replace free_early with memblock_x86_free_range. 4. NO_BOOTMEM will switch to use memblock too. 5. use _e820, _early wrap in the patch, in following patch, will replace them all 6. because memblock_x86_free_range support partial free, we can remove some special care 7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill() so adjust some calling later in setup.c::setup_arch() -- corruption_check and mptable_update -v2: Move reserve_brk() early Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range() that could happen We have more then 128 RAM entry in E820 tables, and memblock_x86_fill() could use memblock_find_in_range() to find a new place for memblock.memory.region array. and We don't need to use extend_brk() after fill_memblock_area() So move reserve_brk() early before fill_memblock_area(). -v3: Move find_smp_config early To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable in right place. -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in memblock.reserved already.. use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later. -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit active_region for 32bit does include high pages need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped() -v6: Use current_limit instead -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L -v8: Set memblock_can_resize early to handle EFI with more RAM entries -v9: update after kmemleak changes in mainline Suggested-by: David S. Miller <davem@davemloft.net> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Use memblock_debug to control debug message print outYinghai Lu
Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is specified will also print ther name when reserve_memblock_area/free_memblock_area are called. -v2: according to Ingo, put " if (memblock_debug) " in one place Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_memory_in_range()Yinghai Lu
It will return memory size in specified range according to memblock.memory.region Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to __memblock_x86_memory_in_range(). -v2: Ben want _in_range in the name instead of size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_free_memory_in_range()Yinghai Lu
It will return free memory size in specified range. We can not use memory_size - reserved_size here, because some reserved area may not be in the scope of memblock.memory.region. Use memblock.memory.region subtracting memblock.reserved.region to get free range array. then count size of all free ranges. -v2: Ben insist on using _in_range Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_find_in_range_node()Yinghai Lu
It can be used to find NODE_DATA for numa. Need to make sure early_node_map[] is filled before it is called, otherwise it will fallback to memblock_find_in_range(), with node range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_register_active_regions() and ↵Yinghai Lu
memblock_x86_hole_size() memblock_x86_register_active_regions() will be used to fill early_node_map, the result will be memblock.memory.region AND numa data memblock_x86_hole_size will be used to find hole size on memblock.memory.region with specified range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add get_free_all_memory_range()Yinghai Lu
get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by free_all_memory_core_early(). It will use early_node_map aka active ranges subtract memblock.reserved to get all free range, and those ranges will convert to slab pages. -v4: increase range size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_rangeYinghai Lu
They are wrappers for core versions, which take start/end/name instead of base/size. This will make x86 conversion eaasier. could add more debug print out -v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael Ellerman and Ben change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman -v3: call check_and_double after reserve/free, so could avoid to use find_memblock_area. Suggested by Michael Ellerman Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_to_bootmem()Yinghai Lu
memblock_x86_to_bootmem() will reserve memblock.reserved.region in bootmem after bootmem is set up. We can use it to with all arches that support memblock later. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27bootmem, x86: Add weak version of reserve_bootmem_genericYinghai Lu
It will be used memblock_x86_to_bootmem converting It is an wrapper for reserve_bootmem, and x86 64bit is using special one. Also clean up that version for x86_64. We don't need to take care of numa path for that, bootmem can handle it how Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27x86, memblock: Add memblock_x86_find_in_range_size()Yinghai Lu
size is returned according free range. Will be used to find free ranges for early_memtest and memory corruption check Do not mess it up with lib/memblock.c yet. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27Merge branch 'perf/urgent' into perf/coreFrederic Weisbecker
Conflicts: tools/perf/util/callchain.h Merge reason: Fix a non-trivial conflict with latest fixes
2010-08-26x86, mm: Make spurious_fault check explicitly check the PRESENT bitShaohua Li
pte_present() returns true even present bit isn't set but _PAGE_PROTNONE (global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have global bit set but present bit clear. This patch makes we could catch free pages access with CONFIG_DEBUG_PAGEALLOC enabled. [ hpa: added a comment in the code as a warning to janitors ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, iommu: Utilize the IOMMU_INIT macros functionality.Konrad Rzeszutek Wilk
We remove all of the sub-platform detection/init routines and instead use on the .iommu_table array of structs to call the .early_init if .detect returned a positive value. Also we can stop detecting other IOMMUs if the IOMMU used the _FINISH type macro. During the 'pci_iommu_init' stage, we call .init for the second-stage initialization if it was defined. Currently only SWIOTLB has this defined and it used to de-allocate the SWIOTLB if the other detected IOMMUs have deemed it unnecessary to use SWIOTLB. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-11-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros.Konrad Rzeszutek Wilk
We utilize the IOMMU_INIT macros to create this dependency: [null] | [pci_xen_swiotlb_detect] | [pci_swiotlb_detect_override] | [pci_swiotlb_detect_4gb] | +-------+--------+ / \ [detect_calgary] [gart_iommu_hole_init] | [amd_iommu_detect] Meaning that 'amd_iommu_detect' will be called after 'gart_iommu_hole_init'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-9-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Joerg Roedel <joerg.roedel@amd.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros.Konrad Rzeszutek Wilk
We utilize the IOMMU_INIT macros to create this dependency: [pci_xen_swiotlb_detect] | [pci_swiotlb_detect_override] | [pci_swiotlb_detect_4gb] | [detect_calgary] Meaning that 'detect_calgary' is going to be called after 'pci_swiotlb_detect'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-8-git-send-email-konrad.wilk@oracle.com> CC: Muli Ben-Yehuda <muli@il.ibm.com> CC: "Jon D. Mason" <jdmason@kudzu.us> CC: "Darrick J. Wong" <djwong@us.ibm.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros.Konrad Rzeszutek Wilk
We utilize the IOMMU_INIT macros to create this dependency: [null] | [pci_xen_swiotlb_detect] | [pci_swiotlb_detect_override] | [pci_swiotlb_detect_4gb] In other words, we set 'pci_xen_swiotlb_detect' to be the first detection to be run during start. CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-7-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros.Konrad Rzeszutek Wilk
We utilize the IOMMU_INIT macros to create this dependency: [pci_xen_swiotlb_detect] | [pci_swiotlb_detect_override] | [pci_swiotlb_detect_4gb] And set the SWIOTLB IOMMU_INIT to utilize 'pci_swiotlb_init' for .init and 'pci_swiotlb_late_init' for .late_init. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-6-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine.Konrad Rzeszutek Wilk
In 'pci_swiotlb_detect' we used to do two different things: a). If user provided 'iommu=soft' or 'swiotlb=force' we would set swiotlb=1 and return 1 (and forcing pci-dma.c to call pci_swiotlb_init() immediately). b). If 4GB or more would be detected and if user did not specify iommu=off, we would set 'swiotlb=1' and return whatever 'a)' figured out. We simplify this by splitting a) and b) in two different routines. CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-5-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, iommu: Add proper dependency sort routine (and sanity check).Konrad Rzeszutek Wilk
We are using a very simple sort routine which sorts the .iommu_table array in the order of dependencies. Specifically each structure of iommu_table_entry has a field 'depend' which contains the function pointer to the IOMMU that MUST be run before us. We sort the array of structures so that the struct iommu_table_entry with no 'depend' field are first, and then the subsequent ones are the ones for which the 'depend' function has been already invoked (in other words, precede us). Using the kernel's version 'sort', which is a mergeheap is feasible, but would require making the comparison operator scan recursivly the array to satisfy the "heapify" process: setting the levels properly. The end result would much more complex than it should be an it is just much simpler to utilize this simple sort routine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-4-git-send-email-konrad.wilk@oracle.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, iommu: Make all IOMMU's detection routines return a value.Konrad Rzeszutek Wilk
We return 1 if the IOMMU has been detected. Zero or an error number if we failed to find it. This is in preperation of using the IOMMU_INIT so that we can detect whether an IOMMU is present. I have not tested this for regression on Calgary, nor on AMD Vi chipsets as I don't have that hardware. CC: Muli Ben-Yehuda <muli@il.ibm.com> CC: "Jon D. Mason" <jdmason@kudzu.us> CC: "Darrick J. Wong" <djwong@us.ibm.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: David Woodhouse <David.Woodhouse@intel.com> CC: Chris Wright <chrisw@sous-sol.org> CC: Yinghai Lu <yinghai@kernel.org> CC: Joerg Roedel <joerg.roedel@amd.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-3-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and ↵Konrad Rzeszutek Wilk
iommu_table_entry structure This patch set adds a mechanism to "modularize" the IOMMUs we have on X86. Currently the count of IOMMUs is up to six and they have a complex relationship that requires careful execution order. 'pci_iommu_alloc' does that today, but most folks are unhappy with how it does it. This patch set addresses this and also paves a mechanism to jettison unused IOMMUs during run-time. For details that sparked this, please refer to: http://lkml.org/lkml/2010/8/2/282 The first solution that comes to mind is to convert wholesale the IOMMU detection routines to be called during initcall time frame. Unfortunately that misses the dependency relationship that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work, GART detection MUST run first, and before all of that SWIOTLB MUST run). The second solution would be to introduce a registration call wherein the IOMMU would provide its detection/init routines and as well on what MUST run before it. That would work, except that the 'pci_iommu_alloc' which would run through this list, is called during mem_init. This means we don't have any memory allocator, and it is so early that we haven't yet started running through the initcall_t list. This solution borrows concepts from the 2nd idea and from how MODULE_INIT works. A macro is provided that each IOMMU uses to define it's detect function and early_init (before the memory allocate is active), and as well what other IOMMU MUST run before us. Since most IOMMUs depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro to depends on that is also provided. This macro is similar in design to MODULE_PARAM macro wherein we setup a .iommu_table section in which we populate it with the values that match a struct iommu_table_entry. During bootup we will sort through the array so that the IOMMUs that MUST run before us are first elements in the array. And then we just iterate through them calling the detection routine and if appropiate, the init routines. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changesHaicheng Li
When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000 [ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0 [ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001 [ 1139.243308] FS: 00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1139.243311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0 [ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1139.243321] Call Trace: [ 1139.243324] [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69 [ 1139.243328] [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a [ 1139.243333] [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b [ 1139.243337] [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb [ 1139.243340] [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152 [ 1139.243344] [<ffffffff813388ce>] ? store_online+0x4d/0x75 [ 1139.243348] [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d [ 1139.243351] [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121 [ 1139.243355] [<ffffffff810b539d>] ? vfs_write+0xae/0x14a [ 1139.243358] [<ffffffff810b587f>] ? sys_write+0x47/0x6f [ 1139.243362] [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b This patch makes sure to always replicate new direct mapping PGD entries to the PGDs of all processes, as well as ensures corresponding vmemmap mapping gets synced. V1: initial code by Andi Kleen. V2: fix several issues found in testing. V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all(). [ hpa: changed pgd_change from int to bool ] Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4FD8.6080100@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26x86, mm: Separate x86_64 vmalloc_sync_all() into separate functionsHaicheng Li
No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25x86, bios: Make the x86 early memory reservation a kernel optionH. Peter Anvin
Add a kernel command-line option so the x86 early memory reservation size can be adjusted at runtime instead of only at compile time. Suggested-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25x86, tsc: Remove CPU frequency calibration on AMDBorislav Petkov
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency calibration code for AMD CPUs whose TSCs didn't increment with the core's P0 frequency. From F10h, revB onward, however, the TSC increment rate is denoted by MSRC001_0015[24] and when this bit is set (which should be done by the BIOS) the TSC increments with the P0 frequency so the calibration is not needed and booting can be a couple of mcecs faster on those machines. Besides, there should be virtually no machines out there which don't have this bit set, therefore this calibration can be safely removed. It is a shaky hack anyway since it assumes implicitly that the core is in P0 when BIOS hands off to the OS, which might not always be the case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100825162823.GE26438@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag tracing/trace_stack: Fix stack trace on ppc64
2010-08-25Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks
2010-08-25perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flagLin Ming
If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens on every event, which can generate a flood of NMIs. Clear it. Reported-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25Merge branch 'linus' into perf/coreIngo Molnar
Merge reason: pick up perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>