summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2014-05-14x86/traps: Make math_error() staticOleg Nesterov
Trivial, make math_error() static. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14uprobes/x86: Fix scratch register selection for rip-relative fixupsDenys Vlasenko
Before this patch, instructions such as div, mul, shifts with count in CL, cmpxchg are mishandled. This patch adds vex prefix handling. In particular, it avoids colliding with register operand encoded in vex.vvvv field. Since we need to avoid two possible register operands, the selection of scratch register needs to be from at least three registers. After looking through a lot of CPU docs, it looks like the safest choice is SI,DI,BX. Selecting BX needs care to not collide with implicit use of BX by cmpxchg8b. Test-case: #include <stdio.h> static const char *const pass[] = { "FAIL", "pass" }; long two = 2; void test1(void) { long ax = 0, dx = 0; asm volatile("\n" " xor %%edx,%%edx\n" " lea 2(%%edx),%%eax\n" // We divide 2 by 2. Result (in eax) should be 1: " probe1: .globl probe1\n" " divl two(%%rip)\n" // If we have a bug (eax mangled on entry) the result will be 2, // because eax gets restored by probe machinery. : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[ax == 1] ); } long val2 = 0; void test2(void) { long old_val = val2; long ax = 0, dx = 0; asm volatile("\n" " mov val2,%%eax\n" // eax := val2 " lea 1(%%eax),%%edx\n" // edx := eax+1 // eax is equal to val2. cmpxchg should store edx to val2: " probe2: .globl probe2\n" " cmpxchg %%edx,val2(%%rip)\n" // If we have a bug (eax mangled on entry), val2 will stay unchanged : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[val2 == old_val + 1] ); } long val3[2] = {0,0}; void test3(void) { long old_val = val3[0]; long ax = 0, dx = 0; asm volatile("\n" " mov val3,%%eax\n" // edx:eax := val3 " mov val3+4,%%edx\n" " mov %%eax,%%ebx\n" // ecx:ebx := edx:eax + 1 " mov %%edx,%%ecx\n" " add $1,%%ebx\n" " adc $0,%%ecx\n" // edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3: " probe3: .globl probe3\n" " cmpxchg8b val3(%%rip)\n" // If we have a bug (edx:eax mangled on entry), val3 will stay unchanged. // If ecx:edx in mangled, val3 will get wrong value. : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "cx", "bx", "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[val3[0] == old_val + 1 && val3[1] == 0] ); } int main(int argc, char **argv) { test1(); test2(); test3(); return 0; } Before this change all tests fail if probe{1,2,3} are probed. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14uprobes/x86: Simplify rip-relative handlingDenys Vlasenko
It is possible to replace rip-relative addressing mode with addressing mode of the same length: (reg+disp32). This eliminates the need to fix up immediate and correct for changing instruction length. And we can kill arch_uprobe->def.riprel_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-13Merge branch 'dt-bus-name' into for-nextRob Herring
2014-05-13x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()Anthony Iliopoulos
The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-13net: filter: x86: fix JIT address randomizationAlexei Starovoitov
bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13x86/gpu: Sprinkle const, __init and __initconst to stolen memory quirksVille Syrjälä
gen8_stolen_size() is missing __init, so add it. Also all the intel_stolen_funcs structures can be marked __initconst. intel_stolen_ids[] can also be made const if we replace the __initdata with __initconst. Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13x86/gpu: Implement stolen memory size early quirk for CHVDamien Lespiau
CHV uses the same bits as SNB/VLV to code the Graphics Mode Select field (GFX stolen memory size) with the addition of finer granularity modes: 4MB increments from 0x11 (8MB) to 0x1d. Values strictly above 0x1d are either reserved or not supported. v2: 4MB increments, not 8MB. 32MB has been omitted from the list of new values (Ville Syrjälä) v3: Also correctly interpret GGMS (GTT Graphics Memory Size) (Ville Syrjälä) v4: Don't assign a value that needs 20bits or more to a u16 (Rafael Barbalho) [vsyrjala: v5: Split from i915 changes and add chv_stolen_funcs] Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Tested-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-12KVM: x86: Fix CR3 reserved bits check in long modeJan Kiszka
Regression of 346874c9: PAE is set in long mode, but that does not mean we have valid PDPTRs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12xen: refactor suspend pre/post hooksDavid Vrabel
New architectures currently have to provide implementations of 5 different functions: xen_arch_pre_suspend(), xen_arch_post_suspend(), xen_arch_hvm_post_suspend(), xen_mm_pin_all(), and xen_mm_unpin_all(). Refactor the suspend code to only require xen_arch_pre_suspend() and xen_arch_post_suspend(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2014-05-11x86, rdrand: When nordrand is specified, disable RDSEED as wellH. Peter Anvin
One can logically expect that when the user has specified "nordrand", the user doesn't want any use of the CPU random number generator, neither RDRAND nor RDSEED, so disable both. Reported-by: Stephan Mueller <smueller@chronox.de> Cc: Theodore Ts'o <tytso@mit.edu> Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, iosf: Add PCI ID macros for better readabilityOng Boon Leong
Introduce PCI IDs macro for the list of supported product: BayTrail & Quark X1000. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-5-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, iosf: Add Quark X1000 PCI IDOng Boon Leong
Add PCI device ID, i.e. that of the Host Bridge, for IOSF MBI driver. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-4-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, iosf: Added Quark MBI identifiersOng Boon Leong
Added all the MBI units below and their associated read/write opcodes: - Host Bridge Arbiter - Host Bridge - Remote Management Unit - Memory Manager & eSRAM - SoC Unit Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, iosf: Make IOSF driver modular and usable by more driversDavid E. Box
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF driver on SOC's without selecting it which forces an unnecessary and limiting dependency. Provides dummy functions to allow these modules to conditionally use the driver on IOSF equipped platforms without impacting their ability to compile and load on non-IOSF platforms. Build default m to ensure availability on x86 SOC's. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()Boris Ostrovsky
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall() may lose upper bits or, worse, add them since compiler will do this: (u64)(tk->wall_to_monotonic.tv_nsec << tk->shift) instead of ((u64)tk->wall_to_monotonic.tv_nsec << tk->shift) So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in the subsequent 'while' loop. We need an explicit cast. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macroAndres Freund
The spuriously added semicolon didn't have any effect because the macro isn't currently in use. c0a639ad0bc6b178b46996bd1f821a04643e2bde Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09x86: Fix typo preventing msr_set/clear_bit from having an effectAndres Freund
Due to a typo the msr accessor function introduced in 22085a66c2fab6cf9b9393c056a3600a6b4735de didn't have any lasting effects because they accidentally wrote the old value back. After c0a639ad0bc6b178b46996bd1f821a04643e2bde this at the very least this causes cpuid limits not to be lifted on some cpus leading to missing capabilities for those. Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08x86, boot: Remove misc.h inclusion from compressed/string.cVivek Goyal
Given the fact that we removed inclusion of boot.h from boot/string.c does not look like we need misc.h inclusion in compressed/string.c. So remove it. misc.h was also pulling in string_32.h which in turn had macros for memcmp and memcpy. So we don't need to #undef memcmp and memcpy anymore. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1398447972-27896-3-git-send-email-vgoyal@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08x86, boot: Do not include boot.h in string.cVivek Goyal
string.c does not require whole of boot.h. Just inclusion of linux/types.h and ctypes.h seems to be sufficient. Keep list of stuff being included in string.c to bare minimal so that string.c can be included in other places easily. For example, Currently boot/compressed/string.c includes boot/string.c but looks like it does not want boot/boot.h. Hence there is a define in boot/compressed/misc.h "define BOOT_BOOT_H" which prevents inclusion of boot.h in compressed/string.c. And compressed/string.c is forced to include misc.h just for that reason. So by removing inclusion of boot.h, we can also get rid of inclusion of misch.h in compressed/misc.c. This also enables including of boot/string.c in purgatory/ code relatively easily. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1398447972-27896-2-git-send-email-vgoyal@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08kvm: x86: emulate monitor and mwait instructions as nopGabriel L. Somlo
Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAGPeter Zijlstra
Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word. This will allow us, using fetch_or(), to both set NEED_RESCHED and check for POLLING_NRFLAG in a single operation and avoid pointless wakeups. Changing from the non-atomic thread_info::status flags to the atomic thread_info::flags shouldn't be a big issue since most polling state changes were followed/preceded by a full memory barrier anyway. Also, fix up the apm_32 idle function, clearly that was forgotten in the last conversion. The default idle state is !POLLING so just kill the lot. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08x86/intel: Add quirk to disable HPET for the Baytrail platformFeng Tang
HPET on current Baytrail platform has accuracy problem to be used as reliable clocksource/clockevent, so add a early quirk to disable it. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08x86/hpet: Make boot_hpet_disable externFeng Tang
HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07x86-64, build: Fix stack protector Makefile breakage with 32-bit userlandGeorge Spelvin
If you are using a 64-bit kernel with 32-bit userland, then scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc with -mcmodel=kernel, which produces: <stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode and trips the "broken compiler" test at arch/x86/Makefile:120. There are several places a fix is possible, but the following seems cleanest. (But it's minimal; it would also be possible to factor out a bunch of stuff from the two branches of the if.) Signed-off-by: George Spelvin <linux@horizon.com> Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-07kvm/x86: implement hv EOI assistMichael S. Tsirkin
It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Notes: -"No EOI required" is set only if interrupt injected is edge triggered; this is true because level interrupts are going through IOAPIC which disables PV EOI. In any case, if guest triggers EOI the bit will get cleared on exit. -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE seems sufficient In any case, bit is cleared on exit so worst case it's never re-enabled -no handling of PV EOI data is performed at HV_X64_MSR_EOI write; HV_X64_MSR_EOI is a separate optimization - it's an X2APIC replacement that lets you do EOI with an MSR and not IO. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages supportNadav Amit
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are supported by the CPU. Currently the bit is considered by KVM as always reserved. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: vmx: handle_dr does not handle RSP correctlyNadav Amit
The RSP register is not automatically cached, causing mov DR instruction with RSP to fail. Instead the regular register accessing interface should be used. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: vmx: disable APIC virtualization in nested guestsPaolo Bonzini
While running a nested guest, we should disable APIC virtualization controls (virtualized APIC register accesses, virtual interrupt delivery and posted interrupts), because we do not expose them to the nested guest. Reported-by: Hu Yaohui <loki2441@gmail.com> Suggested-by: Abel Gordon <abel@stratoscale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07Merge branch 'perf/urgent' into perf/core, to avoid conflictsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07perf/x86/intel: Fix Silvermont's event constraintsYan, Zheng
Event 0x013c is not the same as fixed counter2, remove it from Silvermont's event constraints. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07x86/reboot: Add reboot quirk for Certec BPC600Christian Gmeiner
Certec BPC600 needs reboot=pci to actually reboot. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Jones <davej@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-06KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptrBandan Das
Some checks are common to all, and moreover, according to the spec, the check for whether any bits beyond the physical address width are set are also applicable to all of them Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: fail on invalid vmclear/vmptrld pointerBandan Das
The spec mandates that if the vmptrld or vmclear address is equal to the vmxon region pointer, the instruction should fail with error "VMPTRLD with VMXON pointer" or "VMCLEAR with VMXON pointer" Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: additional checks on vmxon regionBandan Das
Currently, the vmxon region isn't used in the nested case. However, according to the spec, the vmxon instruction performs additional sanity checks on this region and the associated pointer. Modify emulated vmxon to better adhere to the spec requirements Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: rearrange get_vmx_mem_addressBandan Das
Our common function for vmptr checks (in 2/4) needs to fetch the memory address Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05asmlinkage, x86: Add explicit __visible to arch/x86/*Andi Kleen
As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, build: Don't get confused by local symbolsH. Peter Anvin
arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local symbol, which broke the build under certain circumstances. Although the wisdom of _end as a local symbol can definitely be questioned, the build should not break for that reason. Thus, filter the output of nm to only get global symbols of appropriate type. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org
2014-05-05KVM: x86: improve the usability of the 'kvm_pio' tracepointUlrich Obergfell
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated commentsAndy Lutomirski
These definitions had no effect. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/946c104e40c47319f8ab406e54118799cb55bd99.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSOAndy Lutomirski
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move the 32-bit vdso special pages after the textAndy Lutomirski
This unifies the vdso mapping code and teaches it how to map special pages at addresses corresponding to symbols in the vdso image. The new code is used for all vdso variants, but so far only the 32-bit variants use the new vvar page position. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Reimplement vdso.so preparation in build-time CAndy Lutomirski
Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.cAndy Lutomirski
This code is used during CPU setup, and it isn't strictly speaking related to the 32-bit vdso. It's easier to understand how this works when the code is closer to its callers. This also lets syscall32_cpu_init be static, which might save some trivial amount of kernel text. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Clean up 32-bit vs 64-bit vdso paramsAndy Lutomirski
Rather than using 'vdso_enabled' and an awful #define, just call the parameters vdso32_enabled and vdso64_enabled. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, mm: Ensure correct alignment of the fixmapAndy Lutomirski
The early_ioremap code requires that its buffers not span a PMD boundary. The logic for ensuring that only works if the fixmap is aligned, so assert that it's aligned correctly. To make this work reliably, reserve_top_address needs to be adjusted. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e59a5f4362661f75dd4841fa74e1f2448045e245.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05net: Change x86_64 add32_with_carry to allow memory operandTom Herbert
Note add32_with_carry(a, b) is suboptimal, as it forces a and b in registers. b could be a memory or a register operand. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05x86_64: csum_add for x86_64Tom Herbert
Add csum_add function for x86_64. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04x86, x32: Use compat shims for io_{setup,submit}Mike Frysinger
The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vapier@gentoo.org Cc: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org> # v3.4+