summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2014-05-30x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)H. Peter Anvin
The XSAVE instruction family takes a memory argment. The macros use (%edi)/(%rdi) as that memory argument - make that clear to the reader. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com
2014-05-29Define kernel API to get address of each state in xsave areaFenghua Yu
In standard form, each state is saved in the xsave area in fixed offset. But in compacted form, offset of each saved state only can be calculated during run time because some xstates may not be enabled and saved. We define kernel API get_xsave_addr() returns address of a given state saved in a xsave area. It can be called in kernel to get address of each xstate in xsave area in either standard format or compacted format. It's useful when kernel wants to directly access each state in xsave area. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting timeFenghua Yu
__save_fpu() can be called during early booting time when cpu caps are not enabled and alternative can not be used yet. Therefore, it calls xsave_state_booting() during booting time to save xstate to task's xsave area. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Add xsaves and xrstors support for booting timeFenghua Yu
Since boot_cpu_data and cpu capabilities are not enabled yet during early booting time, alternative can not be used in some functions to access xsave area. Therefore, we define two new functions xrstor_state_booting() and xsave_state_booting() to access xsave area just during early booting time. xrstor_state_booting restores xstate from xsave area during early booting time. xsave_state_booting saves xstate to xsave area during early booting time. The two functions are similar to xrstor_state and xsave_state respectively. But the two functions don't use alternatives because alternatives are not enabled when they are called in such early booting time. xrstor_state_booting is called only by functions defined as __init. So it's defined as __init and will be removed from memory after booting time. There is no extra memory cost caused by this function during running time. But because xsave_state_booting can be called by run-time function __save_fpu(), it's not defined as __init and will stay in memory during running time although it will not be called anymore during running time. It is not ideal to have this function stay in memory during running time. But it's a pretty small function and the memory cost will be small. By doing in this way, we can avoid to change a lot of code to just remove this small function and save a bit memory for running time. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-13-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Use xsave/xrstor for saving and restoring user space contextFenghua Yu
We use legacy xsave/xrstor to save and restore standard form of xsave area in user space context. No xsaveopt or xsaves is used here for two reasons. First, we don't want to use modified optimization which is implemented in xsaveopt and xsaves because xrstor/xrstors might track a wrong user space application. Secondly, we don't use compacted format of xsave area for backward compatibility because legacy user space applications only don't understand the compacted format of the xsave area. Using standard form of the xsave area may allocate more memory for user context than compacted form, but preserves compatibility with legacy applications. Furthermore, even with holes, the relevant cache lines don't get touched and thus the performance impact is limited. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-11-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Use xsaves/xrstors for context switchFenghua Yu
If xsaves is eanbled, use xsaves/xrstors for context switch to support compacted format xsave area to occupy less memory and modified optimization to improve saving performance. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-10-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Use xsaves/xrstors to save and restore xsave areaFenghua Yu
If xsaves is eanbled, use xsaves/xrstors instrucitons to save and restore xstate. xsaves and xrstors support compacted format, init optimization, modified optimization, and supervisor states. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-9-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Define a macro for handling xsave/xrstor instruction faultFenghua Yu
Define a macro to handle fault generated by xsave, xsaveopt, xsaves, xrstor, and xrstors instructions. It is used in functions like xsave_state() etc. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Define macros for xsave instructionsFenghua Yu
Define macros for xsave, xsaveopt, xsaves, xrstor, and xrstors inline instructions. The instructions will be used for saving and restoring xstate. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Change compacted format xsave area headerFenghua Yu
The XSAVE area header is changed to support both compacted format and standard format of xsave area. The XSAVE header of an xsave area comprises the 64 bytes starting at offset 512 from the area base address: - Bytes 7:0 of the xsave header is a state-component bitmap called xstate_bv. It identifies the state components in the xsave area. - Bytes 15:8 of the xsave header is a state-component bitmap called xcomp_bv. It is used as follows: - xcomp_bv[63] indicates the format of the extended region of the xsave area. If it is clear, the standard format is used. If it is set, the compacted format is used. - xcomp_bv[62:0] indicate which features (starting at feature 2) have space allocated for them in the compacted format. - Bytes 63:16 of the xsave header are reserved. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/alternative: Add alternative_input_2 to support alternative with two ↵Fenghua Yu
features and input alternative_input_2() replaces old instruction with new instructions with input based on two features. In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2, input...), feature2 has higher priority to replace oldinstr than feature1. If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is executed during run time. If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr and newinstr1 is executed during run time. If CPU doesn't have feature2 and feature1, oldinstr is executed during run time. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/xsaves: Detect xsaves/xrstors featureFenghua Yu
Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended state enumberation sub-leaf (eax=0x0d, ecx=1): Bit 00: XSAVEOPT is available Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set Bit 02: Supports XGETBV with ECX = 1 if set Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set The above features are defined in the new word 10 in cpu features. The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies the state components that software has enabled xsaves and xrstors to manage. If the bit corresponding to a state component is clear in XCR0 | IA32_XSS, xsaves and xrstors will not operate on that state component, regardless of the value of the instruction mask. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29x86/cpufeature.h: Reformat x86 feature macrosFenghua Yu
In each X86 feature macro definition, add one space in front of the word number which is a one-digit number currently. The purpose of reformatting the macros is to align one-digit and two-digit word numbers. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-2-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-27PCI: Turn pcibios_penalize_isa_irq() into a weak functionHanjun Guo
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA is not used by some architectures. Make pcibios_penalize_isa_irq() a __weak function to simplify the code. This removes the need for new platforms to add stub implementations of pcibios_penalize_isa_irq(). [bhelgaas: changelog, comments] Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
2014-05-27ACPICA: Clean up redudant definitions already defined elsewhereLv Zheng
Since mis-order issues have been solved, we can cleanup redundant definitions that already have defaults in <acpi/platform/acenv.h>. This patch removes redudant environments for __KERNEL__ surrounded code. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of ↵Lv Zheng
<asm/acpi.h> There is a mis-order inclusion for <asm/acpi.h>. As we will enforce including <linux/acpi.h> for all Linux ACPI users, we can find the inclusion order is as follows: <linux/acpi.h> <acpi/acpi.h> <acpi/platform/acenv.h> (acenv.h before including aclinux.h) <acpi/platform/aclinux.h> ........................................................................... (aclinux.h before including asm/acpi.h) <asm/acpi.h> @Redundant@ (ACPICA specific stuff) ........................................................................... ........................................................................... (Linux ACPI specific stuff) ? - - - - - - - - - - - - + (aclinux.h after including asm/acpi.h) @Invisible@ | (acenv.h after including aclinux.h) @Invisible@ | other ACPICA headers @Invisible@ | ............................................................|.............. <acpi/acpi_bus.h> | <acpi/acpi_drivers.h> | <asm/acpi.h> (Excluded) | (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - + NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig generated <generated/autoconf.h> for Linux, it is meant to be included before including any ACPICA code. In the above figure, there is a question mark for "Linux ACPI specific stuff" in <asm/acpi.h> which should be included after including all other ACPICA header files. Thus they really need to be moved to the position marked with exclaimation mark or the definitions in the blocks marked with "@Invisible@" will be invisible to such architecture specific "Linux ACPI specific stuff" header blocks. This leaves 2 issues: 1. All environmental definitions in these blocks should have a copy in the area marked with "@Redundant@" if they are required by the "Linux ACPI specific stuff". 2. We cannot use any ACPICA defined types in <asm/acpi.h>. This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to fix this issue. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22x86: fix page fault tracing when KVM guest support enabledDave Hansen
I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: get CPL from SS.DPLPaolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL *is* always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: drop set_rflags callbackPaolo Bonzini
Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-21Merge remote-tracking branch 'origin/x86/espfix' into x86/vdsoH. Peter Anvin
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code that otherwise cause conflicts. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfixH. Peter Anvin
Merge in Linus' tree with: fa81511bb0bb x86-64, modify_ldt: Make support for 16-bit segments a runtime option ... reverted, to avoid a conflict. This commit is no longer necessary with the proper fix in place. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20x86, microcode: Add a disable chicken bitBorislav Petkov
Add a cmdline param which disables the microcode loader. This is useful mostly in debugging situations where we want to turn off microcode loading, both early from the initrd and late, as a means to be able to rule out its influence on the machine. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20x86, boot: Carve out early cmdline parsing functionBorislav Petkov
Carve out early cmdline parsing function into .../lib/cmdline.c so it can be used by early code in the kernel proper as well. Adapted from arch/x86/boot/cmdline.c. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20x86, mm: Improve _install_special_mapping and fix x86 vdso namingAndy Lutomirski
Using arch_vma_name to give special mappings a name is awkward. x86 currently implements it by comparing the start address of the vma to the expected address of the vdso. This requires tracking the start address of special mappings and is probably buggy if a special vma is split or moved. Improve _install_special_mapping to just name the vma directly. Use it to give the x86 vvar area a name, which should make CRIU's life easier. As a side effect, the vvar area will show up in core dumps. This could be considered weird and is fixable. [hpa: I say we accept this as-is but be prepared to deal with knocking out the vvars from core dumps if this becomes a problem.] Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-16iommu: dmar: Provide arch specific irq allocationThomas Gleixner
ia64 and x86 share this driver. x86 is moving to a different irq allocation and ia64 keeps its private irq_create/destroy stuff. Use macros to redirect to one or the other. Yes, macros to avoid include hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Grant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: x86@kernel.org Cc: linux-ia64@vger.kernel.org Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20140507154336.372289825@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16x86: Get rid of get_nr_irqs_gsi()Thomas Gleixner
No need to expose this outside of the ioapic code. The dynamic allocations are guaranteed not to happen in the gsi space. See commit 62a08ae2a. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Grant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14x86/traps: Make math_error() staticOleg Nesterov
Trivial, make math_error() static. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14uprobes/x86: Simplify rip-relative handlingDenys Vlasenko
It is possible to replace rip-relative addressing mode with addressing mode of the same length: (reg+disp32). This eliminates the need to fix up immediate and correct for changing instruction length. And we can kill arch_uprobe->def.riprel_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-13x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()Anthony Iliopoulos
The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-09x86, iosf: Added Quark MBI identifiersOng Boon Leong
Added all the MBI units below and their associated read/write opcodes: - Host Bridge Arbiter - Host Bridge - Remote Management Unit - Memory Manager & eSRAM - SoC Unit Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, iosf: Make IOSF driver modular and usable by more driversDavid E. Box
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF driver on SOC's without selecting it which forces an unnecessary and limiting dependency. Provides dummy functions to allow these modules to conditionally use the driver on IOSF equipped platforms without impacting their ability to compile and load on non-IOSF platforms. Build default m to ensure availability on x86 SOC's. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macroAndres Freund
The spuriously added semicolon didn't have any effect because the macro isn't currently in use. c0a639ad0bc6b178b46996bd1f821a04643e2bde Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAGPeter Zijlstra
Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word. This will allow us, using fetch_or(), to both set NEED_RESCHED and check for POLLING_NRFLAG in a single operation and avoid pointless wakeups. Changing from the non-atomic thread_info::status flags to the atomic thread_info::flags shouldn't be a big issue since most polling state changes were followed/preceded by a full memory barrier anyway. Also, fix up the apm_32 idle function, clearly that was forgotten in the last conversion. The default idle state is !POLLING so just kill the lot. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08x86/hpet: Make boot_hpet_disable externFeng Tang
HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-05x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSOAndy Lutomirski
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move the 32-bit vdso special pages after the textAndy Lutomirski
This unifies the vdso mapping code and teaches it how to map special pages at addresses corresponding to symbols in the vdso image. The new code is used for all vdso variants, but so far only the 32-bit variants use the new vvar page position. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Reimplement vdso.so preparation in build-time CAndy Lutomirski
Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.cAndy Lutomirski
This code is used during CPU setup, and it isn't strictly speaking related to the 32-bit vdso. It's easier to understand how this works when the code is closer to its callers. This also lets syscall32_cpu_init be static, which might save some trivial amount of kernel text. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Clean up 32-bit vs 64-bit vdso paramsAndy Lutomirski
Rather than using 'vdso_enabled' and an awful #define, just call the parameters vdso32_enabled and vdso64_enabled. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05net: Change x86_64 add32_with_carry to allow memory operandTom Herbert
Note add32_with_carry(a, b) is suboptimal, as it forces a and b in registers. b could be a memory or a register operand. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05x86_64: csum_add for x86_64Tom Herbert
Add csum_add function for x86_64. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02x86, espfix: Fix broken header guardH. Peter Anvin
Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-01x86, espfix: Move espfix definitions into a separate header fileH. Peter Anvin
Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-30x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin
The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logicOleg Nesterov
The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL was 0xe8 "call relative", and now it is handled by branch_xol_ops. So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push the address of next insn == utask->vaddr + insn.length, just we need to record insn.length into the new auprobe->def.ilen member. Note: if/when we teach branch_xol_ops to support jcxz/loopz we can remove the "correction" logic, UPROBE_FIX_IP can use the same address. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8Oleg Nesterov
handle_riprel_insn() assumes that nobody else could modify ->fixups before. This is correct but fragile, change it to use "|=". Also make ->fixups u8, we are going to add the new members into the union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte, redefine them so that they can fit into u8. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Move default_xol_ops's data into arch_uprobe->defOleg Nesterov
Finally we can move arch_uprobe->fixups/rip_rela_target_address into the new "def" struct and place this struct in the union, they are only used by default_xol_ops paths. The patch also renames rip_rela_target_address to riprel_target just to make this name shorter. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-24arm: xen: implement multicall hypercall support.Ian Campbell
As part of this make the usual change to xen_ulong_t in place of unsigned long. This change has no impact on x86. The Linux definition of struct multicall_entry.result differs from the Xen definition, I think for good reasons, and used a long rather than an unsigned long. Therefore introduce a xen_long_t, which is a long on x86 architectures and a signed 64-bit integer on ARM. Use uint32_t nr_calls on x86 for consistency with the ARM definition. Build tested on amd64 and i386 builds. Runtime tested on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>