Age | Commit message (Collapse) | Author |
|
Use the new helper function kvm_arch_fault_in_page() for faulting-in
the guest pages and only inject addressing errors when we've really
hit a bad address (and return other error codes to userspace instead).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Rework the function kvm_arch_fault_in_sync() to become a proper helper
function for faulting-in a guest page. Now it takes the guest address as
a parameter and does not ignore the possible error code from gmap_fault()
anymore (which could cause undetected error conditions before).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
If the new PSW for program interrupts is invalid, the VM ends up
in an endless loop of specification exceptions. Since there is not
much left we can do in this case, we should better drop to userspace
instead so that the crash can be reported to the user.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
PSCIv0.2 adds a new function called AFFINITY_INFO, which
can be used to query if a specified CPU has actually gone
offline. Calling this function via cpu_kill ensures that
a CPU has quiesced after a call to cpu_die. This helps
prevent the CPU from doing arbitrary bad things when data
or instructions are clobbered (as happens with kexec)
in the window between a CPU announcing that it is dead
and said CPU leaving the kernel.
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The PSCI v0.2+ spec defines standard values for PSCI function IDs.
Add a new binding entry so that pre v0.2 implementations can
use DT entries for function IDs and v0.2+ implementations use
standard entries as defined by the PSCIv0.2 specification.
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The PSCIv0.2 spec defines standard values of function IDs
and introduces a few new functions. Detect version of PSCI
and appropriately select the right PSCI functions.
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.
Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.
Notes:
-"No EOI required" is set only if interrupt injected
is edge triggered; this is true because level interrupts are going
through IOAPIC which disables PV EOI.
In any case, if guest triggers EOI the bit will get cleared on exit.
-For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
seems sufficient
In any case, bit is cleared on exit so worst case it's never re-enabled
-no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
replacement that lets you do EOI with an MSR and not IO.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail. Instead the regular register accessing interface should be used.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Our common function for vmptr checks (in 2/4) needs to fetch
the memory address
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Fixes an error return code for the breakpoint setup
2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-
|
|
The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Add an interface to inject clock comparator and CPU timer interrupts
into the guest. This is needed for handling the external interrupt
interception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
When copy_from_user() fails, this code returns the number of bytes
remaining instead of a negative error code. The positive number is
returned to the user but otherwise it is harmless.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When starting lots of dataplane devices the bootup takes very long on
Christian's s390 with irqfd patches. With larger setups he is even
able to trigger some timeouts in some components. Turns out that the
KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
when having multiple CPUs. This is caused by the synchronize_rcu and
the HZ=100 of s390. By changing the code to use a private srcu we can
speed things up. This patch reduces the boot time till mounting root
from 8 to 2 seconds on my s390 guest with 100 disks.
Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
uses rcu_dereference_raw rather than rcu_dereference, and write-sides
do not do rcu lockdep at all).
Note that we're hardly relying on the "sleepable" part of srcu. We just
want SRCU's faster detection of grace periods.
Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
and RR. The difference between results "before" and "after" the patch
has mean -0.2% and standard deviation 0.6%. Using a paired t-test on the
data points says that there is a 2.5% probability that the patch is the
cause of the performance difference (rather than a random fluctuation).
(Restricting the t-test to RR, which is the most likely to be affected,
changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
probability that the numbers actually say something about the patch.
The probability increases mostly because there are fewer data points).
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We have PSCI v0.2 emulation available in KVM ARM/ARM64
hence advertise this to user space (i.e. QEMU or KVMTOOL)
via KVM_CHECK_EXTENSION ioctl.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch adds emulation of PSCI v0.2 CPU_SUSPEND function call for
KVM ARM/ARM64. This is a CPU-level function call which can suspend
current CPU or current CPU cluster. We don't have VCPU clusters in
KVM so we only suspend the current VCPU.
The CPU_SUSPEND emulation is not tested much because currently there
is no CPUIDLE driver in Linux kernel that uses PSCI CPU_SUSPEND. The
PSCI CPU_SUSPEND implementation in ARM64 kernel was tested using a
Simple CPUIDLE driver which is not published due to unstable DT-bindings
for PSCI.
(For more info, http://lwn.net/Articles/574950/)
For simplicity, we implement CPU_SUSPEND emulation similar to WFI
(Wait-for-interrupt) emulation and we also treat power-down request
to be same as stand-by request. This is consistent with section
5.4.1 and section 5.4.2 of PSCI v0.2 specification.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
As-per PSCI v0.2, the source CPU provides physical address of
"entry point" and "context id" for starting a target CPU. Also,
if target CPU is already running then we should return ALREADY_ON.
Current emulation of CPU_ON function does not consider physical
address of "context id" and returns INVALID_PARAMETERS if target
CPU is already running.
This patch updates kvm_psci_vcpu_on() such that it works for both
PSCI v0.1 and PSCI v0.2.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch adds emulation of PSCI v0.2 MIGRATE, MIGRATE_INFO_TYPE, and
MIGRATE_INFO_UP_CPU function calls for KVM ARM/ARM64.
KVM ARM/ARM64 being a hypervisor (and not a Trusted OS), we cannot provide
this functions hence we emulate these functions in following way:
1. MIGRATE - Returns "Not Supported"
2. MIGRATE_INFO_TYPE - Return 2 i.e. Trusted OS is not present
3. MIGRATE_INFO_UP_CPU - Returns "Not Supported"
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This patch adds emulation of PSCI v0.2 AFFINITY_INFO function call
for KVM ARM/ARM64. This is a VCPU-level function call which will be
used to determine current state of given affinity level.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The PSCI v0.2 SYSTEM_OFF and SYSTEM_RESET functions are system-level
functions hence cannot be fully emulated by in-kernel PSCI emulation code.
To tackle this, we forward PSCI v0.2 SYSTEM_OFF and SYSTEM_RESET function
calls from vcpu to user space (i.e. QEMU or KVMTOOL) via kvm_run structure
using KVM_EXIT_SYSTEM_EVENT exit reasons.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Currently, we don't have an exit reason to notify user space about
a system-level event (for e.g. system reset or shutdown) triggered
by the VCPU. This patch adds exit reason KVM_EXIT_SYSTEM_EVENT for
this purpose. We can also inform user space about the 'type' and
architecture specific 'flags' of a system-level event using the
kvm_run structure.
This newly added KVM_EXIT_SYSTEM_EVENT will be used by KVM ARM/ARM64
in-kernel PSCI v0.2 support to reset/shutdown VMs.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Currently, the kvm_psci_call() returns 'true' or 'false' based on whether
the PSCI function call was handled successfully or not. This does not help
us emulate system-level PSCI functions where the actual emulation work will
be done by user space (QEMU or KVMTOOL). Examples of such system-level PSCI
functions are: PSCI v0.2 SYSTEM_OFF and SYSTEM_RESET.
This patch updates kvm_psci_call() to return three types of values:
1) > 0 (success)
2) = 0 (success but exit to user space)
3) < 0 (errors)
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
We have in-kernel emulation of PSCI v0.2 in KVM ARM/ARM64. To provide
PSCI v0.2 interface to VCPUs, we have to enable KVM_ARM_VCPU_PSCI_0_2
feature when doing KVM_ARM_VCPU_INIT ioctl.
The patch updates documentation of KVM_ARM_VCPU_INIT ioctl to provide
info regarding KVM_ARM_VCPU_PSCI_0_2 feature.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Currently, the in-kernel PSCI emulation provides PSCI v0.1 interface to
VCPUs. This patch extends current in-kernel PSCI emulation to provide
PSCI v0.2 interface to VCPUs.
By default, ARM/ARM64 KVM will always provide PSCI v0.1 interface for
keeping the ABI backward-compatible.
To select PSCI v0.2 interface for VCPUs, the user space (i.e. QEMU or
KVMTOOL) will have to set KVM_ARM_VCPU_PSCI_0_2 feature when doing VCPU
init using KVM_ARM_VCPU_INIT ioctl.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
We need a common place to share PSCI related defines among ARM kernel,
ARM64 kernel, KVM ARM/ARM64 PSCI emulation, and user space.
We introduce uapi/linux/psci.h for this purpose. This newly added
header will be first used by KVM ARM/ARM64 in-kernel PSCI emulation
and user space (i.e. QEMU or KVMTOOL).
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
User space (i.e. QEMU or KVMTOOL) should be able to check whether KVM
ARM/ARM64 supports in-kernel PSCI v0.2 emulation. For this purpose, we
define KVM_CAP_ARM_PSCI_0_2 in KVM user space interface header.
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.
2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.
3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS 23
define KVM_REQ_DISABLE_IBS 24
|
|
Invariant TSC is a property of TSC, no additional
support code necessary.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.
When this facility is operating, some instructions can be executed
faster for single-cpu guests.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and
makes use of it at all applicable places. These functions prepare the additional
execution of code when starting/stopping a vcpu.
The CPUSTAT_STOPPED bit should not be touched outside of these functions.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
TEST BLOCK is also subject to the low-address protection, so we need
to check the destination address in our handler.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Add a check for low-address protection to the PFMF handler and
convert real-addresses to absolute if necessary, as it is defined
in the Principles of Operations specification.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
The s390 architecture has a special protection mechanism that can
be used to prevent write access to the vital data in the low-core
memory area. This patch adds a new helper function that can be used
to check for such write accesses and in case of protection, it also
sets up the exception data accordingly.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
When the guest executes the MVPG instruction with DAT disabled,
and the source or destination page is not mapped in the host,
the so-called partial execution interception occurs. We need to
handle this event by setting up a mapping for the corresponding
user pages.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
async_pf_execute() passes tsk == current to gup(), this is doesn't
hurt but unnecessary and misleading. "tsk" is only used to account
the number of faults and current is the random workqueue thread.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
async_pf_execute() has no reasons to adopt apf->mm, gup(current, mm)
should work just fine even if current has another or NULL ->mm.
Recently kvm_async_page_present_sync() was added insedie the "use_mm"
section, but it seems that it doesn't need current->mm too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
just change the spte from writable to readonly so that we only need to care
the case of changing spte from present to present (changing the spte from
present to nonpresent will flush all the TLBs immediately), in other words,
the only case we need to care is mmu_spte_update()
- in mmu_spte_update(), we haved checked
SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
means it does not depend on PT_WRITABLE_MASK anymore
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)
This patch is used to avoid this kind of race:
VCPU 0 VCPU 1
lockless wirte protection:
set spte.w = 0
lock mmu-lock
write protection the spte to sync shadow page,
see spte.w = 0, then without flush tlb
unlock mmu-lock
!!! At this point, the shadow page can still be
writable due to the corrupt tlb entry
Flush all TLB
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided
The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter. This removes the need for the return value.
This version has fixed the issue reported in 6b73a9606, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Using sp->role.level instead of @level since @level is not got from the
page table hierarchy
There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)
This patch makes the code more readable and avoids potential issue in the
further development
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
This reverts commit 5befdc385ddb2d5ae8995ad89004529a3acf58fc.
Since we will allow flush tlb out of mmu-lock in the later
patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|