summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-02-16x86/mm/ptdump: Add address marker for KASAN shadow regionAndrey Ryabinin
Annotate the KASAN shadow with address markers in page table dump output: $ cat /sys/kernel/debug/kernel_page_tables ... ---[ Vmemmap ]--- 0xffffea0000000000-0xffffea0003000000 48M RW PSE GLB NX pmd 0xffffea0003000000-0xffffea0004000000 16M pmd 0xffffea0004000000-0xffffea0005000000 16M RW PSE GLB NX pmd 0xffffea0005000000-0xffffea0040000000 944M pmd 0xffffea0040000000-0xffffea8000000000 511G pud 0xffffea8000000000-0xffffec0000000000 1536G pgd ---[ KASAN shadow ]--- 0xffffec0000000000-0xffffed0000000000 1T ro GLB NX pte 0xffffed0000000000-0xffffed0018000000 384M RW PSE GLB NX pmd 0xffffed0018000000-0xffffed0020000000 128M pmd 0xffffed0020000000-0xffffed0028200000 130M RW PSE GLB NX pmd 0xffffed0028200000-0xffffed0040000000 382M pmd 0xffffed0040000000-0xffffed8000000000 511G pud 0xffffed8000000000-0xfffff50000000000 7680G pgd 0xfffff50000000000-0xfffffbfff0000000 7339776M ro GLB NX pte 0xfffffbfff0000000-0xfffffbfff0200000 2M pmd 0xfffffbfff0200000-0xfffffbfff0a00000 8M RW PSE GLB NX pmd 0xfffffbfff0a00000-0xfffffbffffe00000 244M pmd 0xfffffbffffe00000-0xfffffc0000000000 2M ro GLB NX pte ---[ KASAN shadow end ]--- 0xfffffc0000000000-0xffffff0000000000 3T pgd ---[ ESPfix Area ]--- ... Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-2-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-16x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=yAndrey Ryabinin
Enabling both DEBUG_WX=y and KASAN=y options significantly increases boot time (dozens of seconds at least). KASAN fills kernel page tables with repeated values to map several TBs of the virtual memory to the single kasan_zero_page: kasan_zero_pud -> kasan_zero_pmd-> kasan_zero_pte-> kasan_zero_page So, the page table walker used to find W+X mapping check the same kasan_zero_p?d page table entries a lot more than once. With patch pud walker will skip the pud if it has the same value as the previous one . Skipping done iff we search for W+X mappings, so this optimization won't affect the page table dump via debugfs. This dropped time spend in W+X check from ~30 sec to reasonable 0.1 sec: Before: [ 4.579991] Freeing unused kernel memory: 1000K [ 35.257523] x86/mm: Checked W+X mappings: passed, no W+X pages found. After: [ 5.138756] Freeing unused kernel memory: 1000K [ 5.266496] x86/mm: Checked W+X mappings: passed, no W+X pages found. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-16Merge branch 'linus' into x86/mmThomas Gleixner
Make sure to get the latest fixes before applying the ptdump enhancements.
2017-02-16KVM: Support vCPU-based gfn->hva cacheCao, Lei
Provide versions of struct gfn_to_hva_cache functions that take vcpu as a parameter instead of struct kvm. The existing functions are not needed anymore, so delete them. This allows dirty pages to be logged in the vcpu dirty ring, instead of the global dirty ring, for ring-based dirty memory tracking. Signed-off-by: Lei Cao <lei.cao@stratus.com> Message-Id: <CY1PR08MB19929BD2AC47A291FD680E83F04F0@CY1PR08MB1992.namprd08.prod.outlook.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-16KVM: VMX: use vmcs_set/clear_bits for CPU-based execution controlsPaolo Bonzini
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-16x86/cpufeature: Move RING3MWAIT feature to avoid conflictsThomas Gleixner
The original feature bit is used in a different branch already. Move it to scattered bits. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-16x86/platform/UV/NMI: Fix uneccessary kABI breakagetravis@sgi.com
The addition of support for UV Hubless systems unneccessarily broke the kABI for a symbol that is not used by external kernel modules. Remove the symbol from the EXPORT list. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Link: http://lkml.kernel.org/r/20170215001129.068078379@asylum.americas.sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-15x86/platform/goldfish: Prevent unconditional loadingThomas Gleixner
The goldfish platform code registers the platform device unconditionally which causes havoc in several ways if the goldfish_pdev_bus driver is enabled: - Access to the hardcoded physical memory region, which is either not available or contains stuff which is completely unrelated. - Prevents that the interrupt of the serial port can be requested - In case of a spurious interrupt it goes into a infinite loop in the interrupt handler of the pdev_bus driver (which needs to be fixed seperately). Add a 'goldfish' command line option to make the registration opt-in when the platform is compiled in. I'm seriously grumpy about this engineering trainwreck, which has seven SOBs from Intel developers for 50 lines of code. And none of them figured out that this is broken. Impressive fail! Fixes: ddd70cf93d78 ("goldfish: platform device for x86") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-15KVM: svm: inititalize hash table structures directlyDavid Hildenbrand
The hashtable and guarding spinlock are global data structures, we can inititalize them statically. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170124212116.4568-1-david@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Refactor nested_vmx_run()Jim Mattson
Nested_vmx_run is split into two parts: the part that handles the VMLAUNCH/VMRESUME instruction, and the part that modifies the vcpu state to transition from VMX root mode to VMX non-root mode. The latter will be used when restoring the checkpointed state of a vCPU that was in VMX operation when a snapshot was taken. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Split VMCS checks from nested_vmx_run()Jim Mattson
The checks performed on the contents of the vmcs12 are extracted from nested_vmx_run so that they can be used to validate a vmcs12 that has been restored from a checkpoint. Signed-off-by: Jim Mattson <jmattson@google.com> [Change prepare_vmcs02 and nested_vmx_load_cr3's last argument to u32, to match check_vmentry_postreqs. Update comments for singlestep handling. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Refactor nested_get_vmcs12_pages()Jim Mattson
Perform the checks on vmcs12 state early, but defer the gpa->hpa lookups until after prepare_vmcs02. Later, when we restore the checkpointed state of a vCPU in guest mode, we will not be able to do the gpa->hpa lookups when the restore is done. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Refactor handle_vmptrld()Jim Mattson
Handle_vmptrld is split into two parts: the part that handles the VMPTRLD instruction, and the part that establishes the current VMCS pointer. The latter will be used when restoring the checkpointed state of a vCPU that had a valid VMCS pointer when a snapshot was taken. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Refactor handle_vmon()Jim Mattson
Handle_vmon is split into two parts: the part that handles the VMXON instruction, and the part that modifies the vcpu state to transition from legacy mode to VMX operation. The latter will be used when restoring the checkpointed state of a vCPU that was in VMX operation when a snapshot was taken. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: Prepare for checkpointing L2 stateJim Mattson
Split prepare_vmcs12 into two parts: the part that stores the current L2 guest state and the part that sets up the exit information fields. The former will be used when checkpointing the vCPU's VMX state. Modify prepare_vmcs02 so that it can construct a vmcs02 midway through L2 execution, using the checkpointed L2 guest state saved into the cached vmcs12 above. Signed-off-by: Jim Mattson <jmattson@google.com> [Rebasing: add from_vmentry argument to prepare_vmcs02 instead of using vmx->nested.nested_run_pending, because it is no longer 1 at the point prepare_vmcs02 is called. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injectionPaolo Bonzini
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked", 2015-09-18) the posted interrupt descriptor is checked unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to trigger the scan and, if NMIs or SMIs are not involved, we can avoid the complicated event injection path. Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been there since APICv was introduced. However, without the KVM_REQ_EVENT safety net KVM needs to be much more careful about races between vmx_deliver_posted_interrupt and vcpu_enter_guest. First, the IPI for posted interrupts may be issued between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts. If that happens, kvm_trigger_posted_interrupt returns true, but smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is entered with PIR.ON, but the posted interrupt IPI has not been sent and the interrupt is only delivered to the guest on the next vmentry (if any). To fix this, disable interrupts before setting vcpu->mode. This ensures that the IPI is delayed until the guest enters non-root mode; it is then trapped by the processor causing the interrupt to be injected. Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu) and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called but it (correctly) doesn't do anything because it sees vcpu->mode == OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no posted interrupt IPI is pending; this time, the fix for this is to move the RVI update after IN_GUEST_MODE. Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT, though the second could actually happen with VT-d posted interrupts. In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting in another vmentry which would inject the interrupt. This saves about 300 cycles on the self_ipi_* tests of vmexit.flat. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15KVM: x86: do not scan IRR twice on APICv vmentryPaolo Bonzini
Calls to apic_find_highest_irr are scanning IRR twice, once in vmx_sync_pir_from_irr and once in apic_search_irr. Change sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr; now that it does the computation, it can also do the RVI write. In order to avoid complications in svm.c, make the callback optional. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callersPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15KVM: x86: preparatory changes for APICv cleanupsPaolo Bonzini
Add return value to __kvm_apic_update_irr/kvm_apic_update_irr. Move vmx_sync_pir_to_irr around. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: nVMX: move nested events check to kvm_vcpu_runningPaolo Bonzini
vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable, and the former does not call check_nested_events. Once KVM_REQ_EVENT is removed from the APICv interrupt injection path, however, this would leave no place to trigger a vmexit from L2 to L1, causing a missed interrupt delivery while in guest mode. This is caught by the "ack interrupt on exit" test in vmx.flat. [This does not change the calls to check_nested_events in inject_pending_event. That is material for a separate cleanup.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15KVM: vmx: clear pending interrupts on KVM_SET_LAPICPaolo Bonzini
Pending interrupts might be in the PI descriptor when the LAPIC is restored from an external state; we do not want them to be injected. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15kvm: vmx: Use the hardware provided GPA instead of page walkPaolo Bonzini
As in the SVM patch, the guest physical address is passed by VMX to x86_emulate_instruction already, so mark the GPA as available in vcpu->arch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-14xen/privcmd: Add IOCTL_PRIVCMD_DM_OPPaul Durrant
Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism for restricting device emulators (such as QEMU) to a limited set of hypervisor operations, and being able to audit those operations in the kernel of the domain in which they run. This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op. NOTE: There is no requirement for user-space code to bounce data through locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since privcmd has enough information to lock the original buffers directly. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2 Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-02-14Merge tag 'v4.10-rc8' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-13x86/vm86: Fix unused variable warning if THP is disabledKirill A. Shutemov
GCC complains about unused variable 'vma' in mark_screen_rdonly() if THP is disabled: arch/x86/kernel/vm86_32.c: In function ‘mark_screen_rdonly’: arch/x86/kernel/vm86_32.c:180:26: warning: unused variable ‘vma’ [-Wunused-variable] struct vm_area_struct *vma = find_vma(mm, 0xA0000); That's silly. pmd_trans_huge() resolves to 0 when THP is disabled, so the whole block should be eliminated. Moving the variable declaration outside the if() block shuts GCC up. Reported-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Carlos O'Donell <carlos@redhat.com> Link: http://lkml.kernel.org/r/20170213125228.63645-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-11perf/x86/intel: Add Kaby Lake supportSrinivas Pandruvada
Add Kaby Lake mobile and desktop models for RAPL, CSTATE and UNCORE matching Skylake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: peterz@infradead.org Cc: kan.liang@intel.com Cc: bigeasy@linutronix.de Cc: dave.hansen@linux.intel.com Cc: piotr.luc@intel.com Cc: davidcc@google.com Cc: bp@suse.de Link: http://lkml.kernel.org/r/1486755517-17812-1-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Last minute x86 fixes: - Fix a softlockup detector warning and long delays if using ptdump with KASAN enabled. - Two more TSC-adjust fixes for interesting firmware interactions. - Two commits to fix an AMD CPU topology enumeration bug that caused a measurable gaming performance regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ptdump: Fix soft lockup in page table walker x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST x86/CPU/AMD: Fix Zen SMT topology x86/CPU/AMD: Bring back Compute Unit ID
2017-02-11crypto: sha512-mb - Protect sha512 mb ctx mgr accessTim Chen
The flusher and regular multi-buffer computation via mcryptd may race with another. Add here a lock and turn off interrupt to to access multi-buffer computation state cstate->mgr before a round of computation. This should prevent the flusher code jumping in. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-10PCI/MSI: Remove pci_msi_domain_{alloc,free}_irqs()Christoph Hellwig
Just call the msi_* version directly instead of having trivial wrappers for one or two callsites. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10drivers: hv: Turn off write permission on the hypercall pageK. Y. Srinivasan
The hypercall page only needs to be executable but currently it is setup to be writable as well. Fix the issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Tested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-10hv: export current Hyper-V clocksourceVitaly Kuznetsov
As a preparation to implementing Hyper-V PTP device supporting .getcrosststamp we need to export a reference to the current Hyper-V clocksource in use (MSR or TSC page). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-10Drivers: hv: Fix the bug in generating the guest IDK. Y. Srinivasan
Fix the bug in the generation of the guest ID. Without this fix the host side telemetry code is broken. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Fixes: 352c9624242d ("Drivers: hv: vmbus: Move the definition of generate_guest_id()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-10x86/mm/ptdump: Fix soft lockup in page table walkerAndrey Ryabinin
CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow. In that case ptdump_walk_pgd_level_core() takes a lot of time to walk across all page tables and doing this without a rescheduling causes soft lockups: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1] ... Call Trace: ptdump_walk_pgd_level_core+0x40c/0x550 ptdump_walk_pgd_level_checkwx+0x17/0x20 mark_rodata_ro+0x13b/0x150 kernel_init+0x2f/0x120 ret_from_fork+0x2c/0x40 I guess that this issue might arise even without KASAN on huge machines with several terabytes of RAM. Stick cond_resched() in pgd loop to fix this. Reported-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliableThomas Gleixner
When the TSC is marked reliable then the synchronization check is skipped, but that also skips the TSC ADJUST sanitizing code. So on a machine with a wreckaged BIOS the TSC deviation between CPUs might go unnoticed. Let the TSC adjust sanitizing code run unconditionally and just skip the expensive synchronization checks when TSC is marked reliable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Olof Johansson <olof@lixom.net> Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10x86/tsc: Avoid the large time jump when sanitizing TSC ADJUSTThomas Gleixner
Olof reported that on a machine which has a BIOS wreckaged TSC the timestamps in dmesg are making a large jump because the TSC value is jumping forward after resetting the TSC ADJUST register to a sane value. This can be avoided by calling the TSC ADJUST saniziting function before initializing the per cpu sched clock machinery. That takes the offset into account and avoid the time jump. What cannot be avoided is that the 'Firmware Bug' warnings on the secondary CPUs are printed with the large time offsets because it would be too much effort and ugly hackery to print those warnings into a buffer and emit them after the adjustemt on the starting CPUs. It's a firmware bug and should be fixed in firmware. The weird timestamps are collateral damage and just illustrate the sillyness of the BIOS folks: [ 0.397445] smp: Bringing up secondary CPUs ... [ 0.402100] x86: Booting SMP configuration: [ 0.406343] .... node #0, CPUs: #1 [1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101 [1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101 [ 0.508119] #2 [1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677 [1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677 [ 0.607643] #3 [1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530 [1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530 [ 0.707108] smp: Brought up 1 node, 4 CPUs [ 0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS) Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-09KVM: x86: hide KVM_HC_CLOCK_PAIRING on 32 bitArnd Bergmann
The newly added hypercall doesn't work on x86-32: arch/x86/kvm/x86.c: In function 'kvm_pv_clock_pairing': arch/x86/kvm/x86.c:6163:6: error: implicit declaration of function 'kvm_get_walltime_and_clockread';did you mean 'kvm_get_time_scale'? [-Werror=implicit-function-declaration] This adds an #ifdef around it, matching the one around the related functions that are also only implemented on 64-bit systems. Fixes: 55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-09Merge tag 'kvmarm-for-4.11' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD kvmarm updates for 4.11 - GICv3 save restore - Cache flushing fixes - MSI injection fix for GICv3 ITS - Physical timer emulation support
2017-02-08Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"Linus Torvalds
This reverts commit 020eb3daaba2857b32c4cf4c82f503d6a00a67de. Gabriel C reports that it causes his machine to not boot, and we haven't tracked down the reason for it yet. Since the bug it fixes has been around for a longish time, we're better off reverting the fix for now. Gabriel says: "It hangs early and freezes with a lot RCU warnings. I bisected it down to : > Ruslan Ruslichenko (1): > x86/ioapic: Restore IO-APIC irq_chip retrigger callback Reverting this one fixes the problem for me.. The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed" and Ruslan and Thomas are currently stumped. Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # for the backport of the original commit Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08kvmclock: export kvmclock clocksource and data pointersMarcelo Tosatti
To be used by KVM PTP driver. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-08KVM: x86: fix compilationPaolo Bonzini
Fix rebase breakage from commit 55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall", 2017-01-24), courtesy of the "I could have sworn I had pushed the right branch" department. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-07arch: Move CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX to be commonLaura Abbott
There are multiple architectures that support CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX. These options also now have the ability to be turned off at runtime. Move these to an architecture independent location and make these options def_bool y for almost all of those arches. Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-07KVM: x86: add KVM_HC_CLOCK_PAIRING hypercallMarcelo Tosatti
Add a hypercall to retrieve the host realtime clock and the TSC value used to calculate that clock read. Used to implement clock synchronization between host and guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-07KVM: nVMX: vmx_complete_nested_posted_interrupt() can't failDavid Hildenbrand
vmx_complete_nested_posted_interrupt() can't fail, let's turn it into a void function. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-07KVM: nVMX: kmap() can't failDavid Hildenbrand
kmap() can't fail, therefore it will always return a valid pointer. Let's just get rid of the unnecessary checks. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-07xen/pvh: Use Xen's emergency_restart op for PVH guestsBoris Ostrovsky
Using native_machine_emergency_restart (called during reboot) will lead PVH guests to machine_real_restart() where we try to use real_mode_header which is not initialized. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2017-02-07xen/pvh: PVH guests always have PV devicesBoris Ostrovsky
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-02-07xen/pvh: Make sure we don't use ACPI_IRQ_MODEL_PIC for SCIBoris Ostrovsky
Since we are not using PIC and (at least currently) don't have IOAPIC we want to make sure that acpi_irq_model doesn't stay set to ACPI_IRQ_MODEL_PIC (which is the default value). If we allowed it to stay then acpi_os_install_interrupt_handler() would try (and fail) to request_irq() for PIC. Instead we set the model to ACPI_IRQ_MODEL_PLATFORM which will prevent this from happening. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2017-02-07xen/pvh: Bootstrap PVH guestBoris Ostrovsky
Start PVH guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall page, initialize boot_params, enable early page tables. Since this stub is executed before kernel entry point we cannot use variables in .bss which is cleared by kernel. We explicitly place variables that are initialized here into .data. While adjusting xen_hvm_init_shared_info() make it use cpuid_e?x() instead of cpuid() (wherever possible). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2017-02-07xen/x86: Remove PVH supportBoris Ostrovsky
We are replacing existing PVH guests with new implementation. We are keeping xen_pvh_domain() macro (for now set to zero) because when we introduce new PVH implementation later in this series we will reuse current PVH-specific code (xen_pvh_gnttab_setup()), and that code is conditioned by 'if (xen_pvh_domain())'. (We will also need a noop xen_pvh_domain() for !CONFIG_XEN_PVH). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-02-07x86/boot/32: Convert the 32-bit pgtable setup code from assembly to CBoris Ostrovsky
The new Xen PVH entry point requires page tables to be setup by the kernel since it is entered with paging disabled. Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be invoked from both the new Xen entry point and the existing startup_32() code. Convert resulting common code to C. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: matt@codeblueprint.co.uk Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>