summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2013-08-07nEPT: MMU context for nested EPTNadav Har'El
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Add nEPT violation/misconfigration supportYang Zhang
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: correctly check if remote tlb flush is needed for shadowed EPT tablesGleb Natapov
need_remote_flush() assumes that shadow page is in PT64 format, but with addition of nested EPT this is no longer always true. Fix it by bits definitions that depend on host shadow page type. Reported-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Redefine EPT-specific link_shadow_page()Yang Zhang
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Add EPT tables support to paging_tmpl.hNadav Har'El
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Support shadow paging for guest paging without A/D bitsGleb Natapov
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: make guest's A/D bits depends on guest's paging modeGleb Natapov
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Move common code to paging_tmpl.hNadav Har'El
For preparation, we just move gpte_access(), prefetch_invalid_gpte(), s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c to paging_tmpl.h. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Fix wrong test in kvm_set_cr3Nadav Har'El
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by: Orit Wasserman <owasserm@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Fix cr3 handling in nested exit and entryNadav Har'El
The existing code for handling cr3 and related VMCS fields during nested exit and entry wasn't correct in all cases: If L2 is allowed to control cr3 (and this is indeed the case in nested EPT), during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and we forgot to do so. This patch adds this copy. If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and whoever does control cr3 (L1 or L2) is using PAE, the processor might have saved PDPTEs and we should also save them in vmcs12 (and restore later). Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Support LOAD_IA32_EFER entry/exit controls for L1Nadav Har'El
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07KVM: MMU: fix check the reserved bits on the gpte of L2Xiao Guangrong
Current code always uses arch.mmu to check the reserved bits on guest gpte which is valid only for L1 guest, we should use arch.nested_mmu instead when we translate gva to gpa for the L2 guest Fix it by using @mmu instead since it is adapted to the current mmu mode automatically The bug can be triggered when nested npt is used and L1 guest and L2 guest use different mmu mode Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07KVM: nVMX: correctly set tr base on nested vmexit emulationGleb Natapov
After commit 21feb4eb64e21f8dc91136b91ee886b978ce6421 tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679 Reported-by: Yongjie Ren <yongjie.ren@intel.com> Reviewed-by: Arthur Chunqi Li <yzt356@gmail.com> Tested-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-06x86/jump-label: Show where and what was wrong on errorsSteven Rostedt
When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system "bugs". But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06x86/jump-label: Add safety checks to jump label conversionsSteven Rostedt
As with all modifying of kernel text, we need to be very paranoid. When converting the jump label locations to and from nops to jumps a check has been added to make sure what we are replacing is what we expect, otherwise we bug. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06x86/jump-label: Do not bother updating nops if they are correctSteven Rostedt
On boot up, the jump label init function scans all the jump label locations and converts them to the best nop for the machine. If the nop is already the ideal nop, do not bother with changing it. Cc: Jason Baron <jbaron@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06x86/jump-label: Use best default nops for inital jump label callsSteven Rostedt
As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 also known as GENERIC_NOP5_ATOMIC 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5_ATOMIC Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06x86, asmlinkage, vdso: Mark vdso variables __visibleAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-17-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage, power: Make various symbols used by the suspend asm code ↵Andi Kleen
visible Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-16-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make 64bit checksum functions visibleAndi Kleen
They are implemented in assembler. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-14-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt opsAndi Kleen
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage, apm: Make APM data structure used from assembler visibleAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-12-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make syscall tables visibleAndi Kleen
They are referenced from entry*.S. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-11-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make several variables used from assembler/linker script ↵Andi Kleen
visible Plus one function, load_gs_index(). Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make kprobes code visible and fix assembler codeAndi Kleen
- Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make various syscalls asmlinkageAndi Kleen
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make 32bit/64bit __switch_to visibleAndi Kleen
This function is called from inline assembler, so has to be visible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make _*_start_kernel visibleAndi Kleen
Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Make all interrupt handlers asmlinkage / __visibleAndi Kleen
These handlers are all referenced from assembler stubs, so need to be visible. The handlers without arguments become asmlinkage, the others __visible to not force regparms(0) on x86-32. I put it all into a single patch, please let me know if you want it it split up. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-4-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage: Change dotraplinkage into __visible on 32bitAndi Kleen
Mark 32bit dotraplinkage functions as __visible for LTO. 64bit already is using asmlinkage which includes it. v2: Clean up (M.Marek) Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86: Fix sys_call_table type in asm/syscall.hAndi Kleen
Make the sys_call_table type defined in asm/syscall.h match the definition in syscall_64.c v2: include asm/syscall.h in syscall_64.c too. I left uml alone because it doesn't have an syscall.h on its own and including the native one leads to other errors. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Richard Weinberger <richard@nod.at>
2013-08-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Peter Anvin. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, amd, microcode: Fix error path in apply_microcode_amd() x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz x86, efi: correct call to free_pages x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
2013-08-06x86, insn: Add new opcodes as of June, 2013Masami Hiramatsu
Add TSX-NI related instructions and new instructions to x86-opcode-map.txt according to the Intel(R) 64 and IA-32 Architectures Software Developer's Manual Vol2C (June, 2013). This also includes below updates. - Fix a typo of MWAIT (the lack of (11B)). - Change NOP Ev to prefetchw Ev - Add CRC32 new prefix style (66&F2) - Add ADCX, ADOX, RDSEED, CLAC and STAC instructions Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20130806073750.4049.12365.stgit@udc4-manage.rcp.hitachi.co.jp Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errorsTony Luck
The 0x1000 bit of the MCACOD field of machine check MCi_STATUS registers is only defined for corrected errors (where it means that hardware may be filtering errors see SDM section 15.9.2.1). For uncorrected errors it may, or may not be set - so we should mask it out when checking for the architecturaly defined recoverable error signatures (see SDM 15.9.3.1 and 15.9.3.2) Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-08-05x86: Correctly detect hypervisorJason Wang
We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Doug Covelli <dcovelli@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05x86, kvm: Switch to use hypervisor_cpuid_base()Jason Wang
Switch to use hypervisor_cpuid_base() to detect KVM. Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-3-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05xen: Switch to use hypervisor_cpuid_base()Jason Wang
Switch to use hypervisor_cpuid_base() to detect Xen. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-2-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05x86: Introduce hypervisor_cpuid_base()Jason Wang
This patch introduce hypervisor_cpuid_base() which loop test the hypervisor existence function until the signature match and check the number of leaves if required. This could be used by Xen/KVM guest to detect the existence of hypervisor. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-1-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05Merging v3.10-rc2 as I need to apply a fix forStefano Stabellini
3cc8e40e8ff8e232a9dd672da81beabd09f87366 "xen/arm: rename xen_secondary_init and run it on every online cpu" The commit is in v3.10-rc2, the current branch is based on v3.10-rc1.
2013-08-05perf/x86: Fix intel QPI uncore event definitionsVince Weaver
John McCalpin reports that the "drs_data" and "ncb_data" QPI uncore events are missing the "extra bit" and always return zero values unless the bit is properly set. More details from him: According to the Xeon E5-2600 Product Family Uncore Performance Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events (including the ones that "perf" calls "drs_data" and "ncb_data") require that the "extra bit" be set. This was confusing for a while -- a note at the bottom of page 94 says that the "extra bit" is bit 16 of the control register. Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must be zero. Looking around a bit, I found that bit 21 appears to be the correct "extra bit", and further investigation shows that "perf" actually agrees with me: [root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event config:0-7,21 So the command # perf -e "uncore_qpi_0/event=drs_data/" Is the same as # perf -e "uncore_qpi_0/event=0x02,umask=0x08/" While it should be # perf -e "uncore_qpi_0/event=0x102,umask=0x08/" I confirmed that this last version gives results that agree with the amount of data that I expected the STREAM benchmark to move across the QPI link in the second (cross-chip) test of the original script. Reported-by: John McCalpin <mccalpin@tacc.utexas.edu> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Cc: zheng.z.yan@intel.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-02x86: sysfb: move EFI quirks from efifb to sysfbDavid Herrmann
The EFI FB quirks from efifb.c are useful for simple-framebuffer devices as well. Apply them by default so we can convert efifb.c to use efi-framebuffer platform devices. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-02x86: provide platform-devices for boot-framebuffersDavid Herrmann
The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on x86 causes troubles when loading multiple fbdev drivers. The global "struct screen_info" does not provide any state-tracking about which drivers use the FBs. request_mem_region() theoretically works, but unfortunately vesafb/efifb ignore it due to quirks for broken boards. Avoid this by creating a platform framebuffer devices with a pointer to the "struct screen_info" as platform-data. Drivers can now create platform-drivers and the driver-core will refuse multiple drivers being active simultaneously. We keep the screen_info available for backwards-compatibility. Drivers can be converted in follow-up patches. Different devices are created for VGA/VESA/EFI FBs to allow multiple drivers to be loaded on distro kernels. We create: - "vesa-framebuffer" for VBE/VESA graphics FBs - "efi-framebuffer" for EFI FBs - "platform-framebuffer" for everything else This allows to load vesafb, efifb and others simultaneously and each picks up only the supported FB types. Apart from platform-framebuffer devices, this also introduces a compatibility option for "simple-framebuffer" drivers which recently got introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we try to match the screen_info against a simple-framebuffer supported format. If we succeed, we create a "simple-framebuffer" device instead of a platform-framebuffer. This allows to reuse the simplefb.c driver across architectures and also to introduce a SimpleDRM driver. There is no need to have vesafb.c, efifb.c, simplefb.c and more just to have architecture specific quirks in their setup-routines. Instead, we now move the architecture specific quirks into x86-setup and provide a generic simple-framebuffer. For backwards-compatibility (if strange formats are used), we still allow vesafb/efifb to be loaded simultaneously and pick up all remaining devices. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.com Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-02Merge tag 'please-pull-fix-mce-regression' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull MCE fix from Tony Luck: "Fix a regression in mce-severity.c" * tag 'please-pull-fix-mce-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix mce regression from recent cleanup
2013-08-01sched/x86: Optimize switch_mm() for multi-threaded workloadsRik van Riel
Dick Fowles, Don Zickus and Joe Mario have been working on improvements to perf, and noticed heavy cache line contention on the mm_cpumask, running linpack on a 60 core / 120 thread system. The cause turned out to be unnecessary atomic accesses to the mm_cpumask. When in lazy TLB mode, the CPU is only removed from the mm_cpumask if there is a TLB flush event. Most of the time, no such TLB flush happens, and the kernel skips the TLB reload. It can also skip the atomic memory set & test. Here is a summary of Joe's test results: * The __schedule function dropped from 24% of all program cycles down to 5.5%. * The cacheline contention/hotness for accesses to that bitmask went from being the 1st/2nd hottest - down to the 84th hottest (0.3% of all shared misses which is now quite cold) * The average load latency for the bit-test-n-set instruction in __schedule dropped from 10k-15k cycles down to an average of 600 cycles. * The linpack program results improved from 133 GFlops to 144 GFlops. Peak GFlops rose from 133 to 153. Reported-by: Don Zickus <dzickus@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Tested-by: Joe Mario <jmario@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com [ Made the comments consistent around the modified code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-31arch/x86/platform/ce4100/ce4100.c: include reboot.hAndrew Morton
Fix the build: arch/x86/platform/ce4100/ce4100.c: In function 'x86_ce4100_early_setup': arch/x86/platform/ce4100/ce4100.c:165:2: error: 'reboot_type' undeclared (first use in this function) Reported-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31x86, amd, microcode: Fix error path in apply_microcode_amd()Torsten Kaiser
Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> Link: http://lkml.kernel.org/r/20130723225823.2e4e7588@googlemail.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-31x86 / tboot / ACPI: Fail extended mode reduced hardware sleepBen Guthro
Register for the extended sleep callback from ACPI. As tboot currently does not support the reduced hardware sleep interface, fail this extended sleep call. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Cc: tboot-devel@lists.sourceforge.net Cc: Gang Wei <gang.wei@intel.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-30Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fix from Matt Fleming: * The size of memory that gets freed by free_pages() needs to be specified in pages, not bytes - by Roy Franz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-29x86 / cpu topology: remove the stale macro arch_provides_topology_pointersHanjun Guo
Macro arch_provides_topology_pointers is pointless now, remove it. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-29x86/mce: Fix mce regression from recent cleanupTony Luck
In commit 33d7885b594e169256daef652e8d3527b2298e75 x86/mce: Update MCE severity condition check We simplified the rules to recognise each classification of recoverable machine check combining the instruction and data fetch rules into a single entry based on clarifications in the June 2013 SDM that all recoverable events would be reported on the unaffected processor with MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified rule has a couple of bugs. Fix them here. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>