Age | Commit message (Collapse) | Author |
|
During nested vmentry into vm86 mode a vcpu state is found to be incorrect
because rflags does not have VM flag set since it is read from the cache
and has L1's value instead of L2's. If emulate_invalid_guest_state=1 L0
KVM tries to emulate it, but emulation does not work for nVMX and it
never should happen anyway. Fix that by using vmx_set_rflags() to set
rflags during nested vmentry which takes care of updating register cache.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This lets debugging work better during emulation of invalid
guest state.
This time the check is done after emulation, but before writeback
of the flags; we need to check the flags *before* execution of the
instruction, we cannot check singlestep_rip because the CS base may
have already been modified.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Conflicts:
arch/x86/kvm/x86.c
|
|
This lets debugging work better during emulation of invalid
guest state.
The check is done before emulating the instruction, and (in the case
of guest debugging) reuses EMULATE_DO_MMIO to exit with KVM_EXIT_DEBUG.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The next patch will reuse it for other userspace exits than MMIO,
namely debug events.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The implementation of function __acpi_map_table() has been
changed long time ago, and now it directly invokes
early_ioremap() to setup the temporarily acpi table mappings.
So correct its out-of-date comment.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: len.brown@intel.com
Cc: pavel@ucw.cz
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/51EE7F1C.9020506@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We can check for addr being zero earlier and thus avoid the mutex_unlock()
cleanup path.
[bhelgaas: drop warning printk]
Signed-off-by: ethan.zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
|
|
GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c:
memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = fx_scratch.mxcsr_mask;
if (mask == 0)
mask = 0x0000ffbf;
to
memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = 0x0000ffbf;
since asm statement doesn’t say it will update fx_scratch. As the
result, the DAZ bit will be cleared. This patch fixes it. This bug
dates back to at least kernel 2.6.12.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
|
|
Specify memory size in pages, not bytes.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the APERF/MPERF support in cpufreq is not used any
more, so drop it.
[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This patch fixes warning and errors found by checkpatch.pl:
* replace asm/acpi.h, asm/io.h and asm/smp.h with linux/acpi.h,
linux/io.h and linux/smp.h respectively
* remove explicit initialization to 0 of a static global variable
* replace printk(KERN_INFO ...) with pr_info
* use tabs instead of spaces for indentation
* arrange comments so that they adhere to Documentation/CodingStyle
[bhelgaas: capitalize "PCI", "Langwell", "Lincroft" consistently]
Signed-off-by: Valentina Manea <valentina.manea.m@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
|
|
Both have no users anymore.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
If posted interrupts are enabled, we can no longer track if an IRQ was
coalesced based on IRR. So drop this logic also from the classic
software path and simplify apic_test_and_set_irr to apic_set_irr.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Pull crypto fixes from Herbert Xu:
"This push fixes a memory corruption issue in caam, as well as
reverting the new optimised crct10dif implementation as it breaks boot
on initrd systems.
Hopefully crct10dif will be reinstated once the supporting code is
added so that it doesn't break boot"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework"
crypto: caam - Fixed the memory out of bound overwrite issue
|
|
On some PAE architectures, the entire range of physical memory could reside
outside the 32-bit limit. These systems need the ability to specify the
initrd location using 64-bit numbers.
This patch globally modifies the early_init_dt_setup_initrd_arch() function to
use 64-bit numbers instead of the current unsigned long.
There has been quite a bit of debate about whether to use u64 or phys_addr_t.
It was concluded to stick to u64 to be consistent with rest of the device
tree code. As summarized by Geert, "The address to load the initrd is decided
by the bootloader/user and set at that point later in time. The dtb should not
be tied to the kernel you are booting"
More details on the discussion can be found here:
https://lkml.org/lkml/2013/6/20/690
https://lkml.org/lkml/2012/9/13/544
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
|
|
transform framework"
This reverts commits
67822649d7305caf3dd50ed46c27b99c94eff996
39761214eefc6b070f29402aa1165f24d789b3f7
0b95a7f85718adcbba36407ef88bba0a7379ed03
31d939625a9a20b1badd2d4e6bf6fd39fa523405
2d31e518a42828df7877bca23a958627d60408bc
Unfortunately this change broke boot on some systems that used an
initrd which does not include the newly created crct10dif modules.
As these modules are required by sd_mod under certain configurations
this is a serious problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
For modern CPUs, perf clock is directly related to TSC. TSC
can be calculated from perf clock and vice versa using a simple
calculation. Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page. This patch
exports the third.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Recently we added an early quirk to detect 5500/5520 chipsets
with early revisions that had problems with irq draining with
interrupt remapping enabled:
commit 03bbcb2e7e292838bb0244f5a7816d194c911d62
Author: Neil Horman <nhorman@tuxdriver.com>
Date: Tue Apr 16 16:38:32 2013 -0400
iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
It turns out this same problem is present in the intel X58
chipset as well. See errata 69 here:
http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html
This patch extends the pci early quirk so that the chip
devices/revisions specified in the above update are also covered
in the same way:
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Malcolm Crossley <malcolm.crossley@citrix.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1374059639-8631-1-git-send-email-nhorman@tuxdriver.com
[ Small edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 3fe26fa ("x86: get rid of pt_regs argument in sigreturn variants",
from 2012-11-12) changed the body of PTREGSCALL to drop arg, and
updated the callsites; unfortunately, it forgot to update the
macro argument list, leaving an unused argument. Fix this.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1373479468-7175-1-git-send-email-artagnon@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.
This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:
int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
Stack:
ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
Call Trace:
<IRQ>
[<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
<EOI>
[<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[<ffffffff81015f10>] default_idle+0x147/0x282
[<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
[...]
Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We wanted to check if the APIC ID is out of range. It should be:
if (id >= MAX_LOCAL_APIC)
There's no known bad effect of this bug.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: pavel@ucw.cz
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/1374566419-21120-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In hotplug case (especially with Thunderbolt enabled systems) we might need
to call pcibios_resource_survey_bus() several times for a bus. The function
ends up calling pci_claim_resource() for each bridge resource that then
fails claiming that the resource exists already (which it does). Once this
happens the resource is invalidated thus preventing devices behind the
bridge to allocate their resources.
To fix this we do what has been done in pcibios_allocate_dev_resources()
and check 'parent' of the given resource. If it is non-NULL it means that
the resource has been allocated already and we can skip it. We do the same
for ROM resources as well.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
CONFIG_ARCH_MEMORY_PROBE enables the
/sys/devices/system/memory/probe interface, which allows a given
memory address to be hot-added as follows:
# echo start_address_of_new_memory > /sys/devices/system/memory/probe
(See Documentation/memory-hotplug.txt for more details.)
This probe interface is required on powerpc. On x86, however,
ACPI notifies a memory hotplug event to the kernel, which
performs its hotplug operation as the result.
Therefore, regular users do not need this interface on x86. This probe
interface is also error-prone and misleading that the kernel blindly
adds a given memory address without checking if the memory is present
on the system; no probing is done despite of its name.
The kernel crashes when a user requests to online a memory block
that is not present on the system. This interface is currently
used for testing as it can fake a hotplug event.
This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86,
adds its Kconfig menu entry on x86, and clarifies its use in
Documentation/ memory-hotplug.txt.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: linux-mm@kvack.org
Cc: dave@sr71.net
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: tangchen@cn.fujitsu.com
Cc: vasilis.liaskovitis@profitbricks.com
Link: http://lkml.kernel.org/r/1374256068-26016-1-git-send-email-toshi.kani@hp.com
[ Edited it slightly. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
reporting issues!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove dead code
um: siginfo cleanup
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
um: Fix wait_stub_done() error handling
um: Mark stub pages mapping with VM_PFNMAP
um: Fix return value of strnlen_user()
|
|
Pull KVM fix from Paolo Bonzini:
"This single patch fixes a regression caused by one of the
optimizations introduced in 3.11, which is generally visible only on
AMD processors"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: avoid fast page fault fixing mmio page fault
|
|
[KVM maintainers:
The underlying support for this is in perf/core now. So please merge
this patch into the KVM tree.]
This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest
v2: Various fixes to address review feedback
v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
v4: Use reserved bits for #GP
v5: Remove obsolete argument
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
"me" is not used.
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Since introducing the text_poke_bp() for all text_poke_smp*()
callers, text_poke_smp*() are now unused. This patch basically
reverts:
3d55cc8a058e ("x86: Add text_poke_smp for SMP cross modifying code")
7deb18dcf047 ("x86: Introduce text_poke_smp_batch() for batch-code modifying")
and related commits.
This patch also fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Use text_poke_bp() for optimizing kprobes instead of
text_poke_smp*(). Since the number of kprobes is usually not so
large (<100) and text_poke_bp() is much lighter than
text_poke_smp() [which uses stop_machine()], this just stops
using batch processing.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Remove a comment about an int3 issue in NMI/MCE, since
commit:
3f3c8b8c4b2a ("x86: Add workaround to NMI iret woes")
already fixed that. Keeping this incorrect comment can mislead developers.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114747.26675.84110.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Upcoming kprobes patches rely on the int3 code-patching machinery introduced by:
fd4363fff3d9 x86: Introduce int3 (breakpoint)-based instruction patching
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Trying again to get the fixes queue, including the fixed IDT alignment
patch.
The UEFI patch is by far the biggest issue at hand: it is currently
causing quite a few machines to boot. Which is sad, because the only
reason they would is because their BIOSes touch memory that has
already been freed. The other major issue is that we finally have
tracked down the root cause of a significant number of machines
failing to suspend/resume"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Make sure IDT is page aligned
x86, suspend: Handle CPUs which fail to #GP on RDMSR
x86/platform/ce4100: Add header file for reboot type
Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
efivars: check for EFI_RUNTIME_SERVICES
|
|
When L2 exits to L1, segment infomations of L1 are not set correctly.
According to Intel SDM 27.5.2(Loading Host Segment and Descriptor
Table Registers), segment base/limit/access right of L1 should be
set to some designed value when L2 exits to L1. This patch fixes
this.
Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Reviewed-by: Gleb Natapov <gnatapov@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Linux as a guest on KVM hypervisor, the only user of the pvclock
vsyscall interface, does not require notification on task migration
because:
1. cpu ID number maps 1:1 to per-CPU pvclock time info.
2. per-CPU pvclock time info is updated if the
underlying CPU changes.
3. that version is increased whenever underlying CPU
changes.
Which is sufficient to guarantee nanoseconds counter
is calculated properly.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
This patch simulate this MSR in nested_vmx and the default value is
0x0. BIOS should set it to 0x5 before VMXON. After setting the lock
bit, write to it will cause #GP(0).
Another QEMU patch is also needed to handle emulation of reset
and migration. Reset to vCPU should clear this MSR and migration
should reserve value of it.
This patch is based on Nadav's previous commit.
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478
Signed-off-by: Nadav Har'El <nyh@math.technion.ac.il>
Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Void pointers don't need no casting, drop it.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Use a const pointer type instead of casting away the const qualifier
from const arrays. Keep the pointer array on the stack, nonetheless.
Making it static just increases the object size.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Set rflags after successfully emulateing VMXON/VMXOFF in VMX.
Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move nested_vmx_succeed/nested_vmx_failInvalid/nested_vmx_failValid
ahead of handle_vmon to eliminate double declaration in the same
file
Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that kvm_arch_memslots_updated() catches every increment of the
memslots->generation, checking if the mmio generation has reached its
maximum value is enough.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots(). Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.
In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Acked-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, fast page fault incorrectly tries to fix mmio page fault when
the generation number is invalid (spte.gen != kvm.gen). It then returns
to guest to retry the fault since it sees the last spte is nonpresent.
This causes an infinite loop.
Since fast page fault only works for direct mmu, the issue exists when
1) tdp is enabled. It is only triggered only on AMD host since on Intel host
the mmio page fault is recognized as ept-misconfig whose handler call
fault-page path with error_code = 0
2) guest paging is disabled. Under this case, the issue is hardly discovered
since paging disable is short-lived and the sptes will be invalid after
memslot changed for 150 times
Fix it by filtering out MMIO page faults in page_fault_can_be_fast.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121120250.29788@pobox.suse.cz
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Introduce a method for run-time instruction patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().
The way this is achieved:
- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replace the first byte (int3) by the first byte of
replacing opcode
- sync cores
According to
http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
synchronization after replacing "all but first" instructions should not
be necessary (on Intel hardware), as the syncing after the subsequent
patching of the first byte provides enough safety.
But there's not only Intel HW out there, and we'd rather be on a safe
side.
If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).
Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.
Based on activities of Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
a few years ago.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121102440.29788@pobox.suse.cz
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Change the bitops operation to be naturally "long", i.e. 63 bits on
the 64-bit kernel. Additional bugs are likely to crop up in the
future.
We already have bugs which machines with > 16 TiB of memory in a
single node, as can happen if memory is interleaved. The x86 bitop
operations take a signed index, so using an unsigned type is not an
option.
Jim Kukunas measured the effect of this patch on kernel size: it adds
2779 bytes to the allyesconfig kernel. Some of that probably could be
elided by replacing the inline functions with macros which select the
32-bit type if the index is a 32-bit value, something like:
In that case we could also use "Jr" constraints for the 64-bit
version.
However, this would more than double the amount of code for a
relatively small gain.
Note that we can't use ilog2() for _BITOPS_LONG_SHIFT, as that causes
a recursive header inclusion problem.
The change to constant_test_bit() should both generate better code and
give correct result for negative bit indicies. As previously written
the compiler had to generate extra code to create the proper wrong
result for negative values.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jim Kukunas <james.t.kukunas@intel.com>
Link: http://lkml.kernel.org/n/tip-z61ofiwe90xeyb461o72h8ya@git.kernel.org
|
|
Since the IDT is referenced from a fixmap, make sure it is page aligned.
Merge with 32-bit one, since it was already aligned to deal with F00F
bug. Since bss is cleared before IDT setup, it can live there. This also
moves the other *_idt_table variables into common locations.
This avoids the risk of the IDT ever being moved in the bss and having
the mapping be offset, resulting in calling incorrect handlers. In the
current upstream kernel this is not a manifested bug, but heavily patched
kernels (such as those using the PaX patch series) did encounter this bug.
The tables other than idt_table technically do not need to be page
aligned, at least not at the current time, but using a common
declaration avoids mistakes. On 64 bits the table is exactly one page
long, anyway.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.net
Reported-by: PaX Team <pageexec@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
There are CPUs which have errata causing RDMSR of a nonexistent MSR to
not fault. We would then try to WRMSR to restore the value of that
MSR, causing a crash. Specifically, some Pentium M variants would
have this problem trying to save and restore the non-existent EFER,
causing a crash on resume.
Work around this by making sure we can write back the result at
suspend time.
Huge thanks to Christian Sünkenberg for finding the offending erratum
that finally deciphered the mystery.
Reported-and-tested-by: Johan Heinrich <onny@project-insanity.org>
Debugged-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/51DDC972.3010005@student.kit.edu
Cc: <stable@vger.kernel.org> # v3.7+
|
|
The code was moved in 07fe9977b6234ede1bd29e10e0323e478860c871 but
the comment was not updated. The reference in drivers/vhost/vhost.c
is left alone as it is historic.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/x86 uses of the __cpuinit macros from
all C files. x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
- fix for do_div() abuse on x86
- locking fix in perf core
- a pile of (build) fixes and cleanups in perf tools
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
perf/x86: Fix incorrect use of do_div() in NMI warning
perf: Fix perf_lock_task_context() vs RCU
perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
perf: Clone child context from parent context pmu
perf script: Fix broken include in Context.xs
perf tools: Fix -ldw/-lelf link test when static linking
perf tools: Revert regression in configuration of Python support
perf tools: Fix perf version generation
perf stat: Fix per-socket output bug for uncore events
perf symbols: Fix vdso list searching
perf evsel: Fix missing increment in sample parsing
perf tools: Update symbol_conf.nr_events when processing attribute events
perf tools: Fix new_term() missing free on error path
perf tools: Fix parse_events_terms() segfault on error path
perf evsel: Fix count parameter to read call in event_format__new
perf tools: fix a typo of a Power7 event name
perf tools: Fix -x/--exclude-other option for report command
perf evlist: Enhance perf_evlist__start_workload()
perf record: Remove -f/--force option
perf record: Remove -A/--append option
...
|
|
I completely botched understanding the calling conventions of
do_div(). I assumed that do_div() returned the result instead
of realizing that it modifies its argument and returns a
remainder. The side-effect from this would be bogus numbers
for the "msecs" value in the warning messages:
INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.114 msecs
Note, there was a second fix posted by Stephane Eranian for
a separate patch which I also botched:
http://lkml.kernel.org/r/20130704223010.GA30625@quad
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20130708214404.B0B6EA66@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|