From 3af4a9e61e71117d5df2df3e1605fa062e04b869 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 30 Nov 2022 23:08:59 +0000 Subject: KVM: x86: Serialize vendor module initialization (hardware setup) Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while doing hardware setup to ensure that concurrent calls are fully serialized. KVM rejects attempts to load vendor modules if a different module has already been loaded, but doesn't handle the case where multiple vendor modules are loaded at the same time, and module_init() doesn't run under the global module_mutex. Note, in practice, this is likely a benign bug as no platform exists that supports both SVM and VMX, i.e. barring a weird VM setup, one of the vendor modules is guaranteed to fail a support check before modifying common KVM state. Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable, but that comes with its own ugliness as it would require setting .hardware_enable before success is guaranteed, e.g. attempting to load the "wrong" could result in spurious failure to load the "right" module. Introduce a new mutex as using kvm_lock is extremely deadlock prone due to kvm_lock being taken under cpus_write_lock(), and in the future, under under cpus_read_lock(). Any operation that takes cpus_read_lock() while holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes cpus_read_lock() to register a callback. In theory, KVM could avoid such problematic paths, i.e. do less setup under kvm_lock, but avoiding all calls to cpus_read_lock() is subtly difficult and thus fragile. E.g. updating static calls also acquires cpus_read_lock(). Inverting the lock ordering, i.e. always taking kvm_lock outside cpus_read_lock(), is not a viable option as kvm_lock is taken in various callbacks that may be invoked under cpus_read_lock(), e.g. x86's kvmclock_cpufreq_notifier(). The lockdep splat below is dependent on future patches to take cpus_read_lock() in hardware_enable_all(), but as above, deadlock is already is already possible. ====================================================== WARNING: possible circular locking dependency detected 6.0.0-smp--7ec93244f194-init2 #27 Tainted: G O ------------------------------------------------------ stable/251833 is trying to acquire lock: ffffffffc097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm] but task is already holding lock: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2a/0xa0 __cpuhp_setup_state+0x2b/0x60 __kvm_x86_vendor_init+0x16a/0x1870 [kvm] kvm_x86_vendor_init+0x23/0x40 [kvm] 0xffffffffc0a4d02b do_one_initcall+0x110/0x200 do_init_module+0x4f/0x250 load_module+0x1730/0x18f0 __se_sys_finit_module+0xca/0x100 __x64_sys_finit_module+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (kvm_lock){+.+.}-{3:3}: __lock_acquire+0x16f4/0x30d0 lock_acquire+0xb2/0x190 __mutex_lock+0x98/0x6f0 mutex_lock_nested+0x1b/0x20 hardware_enable_all+0x1f/0xc0 [kvm] kvm_dev_ioctl+0x45e/0x930 [kvm] __se_sys_ioctl+0x77/0xc0 __x64_sys_ioctl+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug_lock); lock(kvm_lock); lock(cpu_hotplug_lock); lock(kvm_lock); *** DEADLOCK *** 1 lock held by stable/251833: #0: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] Signed-off-by: Sean Christopherson Message-Id: <20221130230934.1014142-16-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/locking.rst | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index a3ca76f9be75..0f6067d13622 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -291,3 +291,9 @@ time it will be set using the Dirty tracking mechanism described above. wakeup notification event since external interrupts from the assigned devices happens, we will find the vCPU on the list to wakeup. + +``vendor_module_lock`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +:Type: mutex +:Arch: x86 +:Protects: loading a vendor module (kvm_amd or kvm_intel) -- cgit From 0bf50497f03b3d892c470c7d1a10a3e9c3c95821 Mon Sep 17 00:00:00 2001 From: Isaku Yamahata Date: Wed, 30 Nov 2022 23:09:28 +0000 Subject: KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now that KVM hooks CPU hotplug during the ONLINE phase, which can sleep. Previously, KVM hooked the STARTING phase, which is not allowed to sleep and thus could not take kvm_lock (a mutex). This effectively allows the task that's initiating hardware enabling/disabling to preempted and/or migrated. Note, the Documentation/virt/kvm/locking.rst statement that kvm_count_lock is "raw" because hardware enabling/disabling needs to be atomic with respect to migration is wrong on multiple fronts. First, while regular spinlocks can be preempted, the task holding the lock cannot be migrated. Second, preventing migration is not required. on_each_cpu() disables preemption, which ensures that cpus_hardware_enabled correctly reflects hardware state. The task may be preempted/migrated between bumping kvm_usage_count and invoking on_each_cpu(), but that's perfectly ok as kvm_usage_count is still protected, e.g. other tasks that call hardware_enable_all() will be blocked until the preempted/migrated owner exits its critical section. KVM does have lockless accesses to kvm_usage_count in the suspend/resume flows, but those are safe because all tasks must be frozen prior to suspending CPUs, and a task cannot be frozen while it holds one or more locks (userspace tasks are frozen via a fake signal). Preemption doesn't need to be explicitly disabled in the hotplug path. The hotplug thread is pinned to the CPU that's being hotplugged, and KVM only cares about having a stable CPU, i.e. to ensure hardware is enabled on the correct CPU. Lockep, i.e. check_preemption_disabled(), plays nice with this state too, as is_percpu_thread() is true for the hotplug thread. Signed-off-by: Isaku Yamahata Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Message-Id: <20221130230934.1014142-45-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/locking.rst | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 0f6067d13622..1147836742cc 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -9,6 +9,8 @@ KVM Lock Overview The acquisition orders for mutexes are as follows: +- cpus_read_lock() is taken outside kvm_lock + - kvm->lock is taken outside vcpu->mutex - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock @@ -225,15 +227,10 @@ time it will be set using the Dirty tracking mechanism described above. :Type: mutex :Arch: any :Protects: - vm_list - -``kvm_count_lock`` -^^^^^^^^^^^^^^^^^^ - -:Type: raw_spinlock_t -:Arch: any -:Protects: - hardware virtualization enable/disable -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt - migration. + - kvm_usage_count + - hardware virtualization enable/disable +:Comment: KVM also disables CPU hotplug via cpus_read_lock() during + enable/disable. ``kvm->mn_invalidate_lock`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -297,3 +294,7 @@ time it will be set using the Dirty tracking mechanism described above. :Type: mutex :Arch: x86 :Protects: loading a vendor module (kvm_amd or kvm_intel) +:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is + taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and + many operations need to take cpu_hotplug_lock when loading a vendor module, + e.g. updating static calls. -- cgit From 5b84b0291702ea84db2c9e55696cdcdd95f9cf1a Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 6 Jan 2023 01:12:55 +0000 Subject: KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to honor x86 architectural behavior if multiple vCPUs shared a physical APIC ID. As called out in the changelog that added the hack, all CPUs whose (possibly truncated) APIC ID matches the target are supposed to receive the IPI. KVM intentionally differs from real hardware, because real hardware (Knights Landing) does just "x2apic_id & 0xff" to decide whether to accept the interrupt in xAPIC mode and it can deliver one interrupt to more than one physical destination, e.g. 0x123 to 0x123 and 0x23. Applying the hack even when x2APIC is not fully enabled means KVM doesn't correctly handle scenarios where the guest has aliased xAPIC IDs across multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any interrupts. It's extremely unlikely any real world guest aliases APIC IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which vCPU is "normal". Furthermore, the hack is _not_ guaranteed to work! The hack works if and only if the optimized APIC map is successfully allocated. If the map allocation fails (unlikely), KVM will fall back to its unoptimized behavior, which _does_ honor the architectural behavior. Pivot on 32-bit x2APIC IDs being enabled as that is required to take advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't break existing setups unless they are way, way off in the weeds. And an entry in KVM's errata to document the hack. Alternatively, KVM could provide an actual x2APIC quirk and document the hack that way, but there's unlikely to ever be a use case for disabling the quirk. Go the errata route to avoid having to validate a quirk no one cares about. Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff") Reviewed-by: Maxim Levitsky Signed-off-by: Sean Christopherson Message-Id: <20230106011306.85230-23-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/x86/errata.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst index 410e0aa63493..49a05f24747b 100644 --- a/Documentation/virt/kvm/x86/errata.rst +++ b/Documentation/virt/kvm/x86/errata.rst @@ -37,3 +37,14 @@ Nested virtualization features ------------------------------ TBD + +x2APIC +------ +When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that +allows sending events to a single vCPU using its x2APIC ID even if the target +vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI +on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs +have the same physical APIC ID, KVM will deliver events targeting that APIC ID +only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is +not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs +matching the target APIC ID receive the interrupt). -- cgit From 14329b825ffb7f2710c13fdcc37fc2e7c67b6781 Mon Sep 17 00:00:00 2001 From: Aaron Lewis Date: Tue, 20 Dec 2022 16:12:33 +0000 Subject: KVM: x86/pmu: Introduce masked events to the pmu event filter When building a list of filter events, it can sometimes be a challenge to fit all the events needed to adequately restrict the guest into the limited space available in the pmu event filter. This stems from the fact that the pmu event filter requires each event (i.e. event select + unit mask) be listed, when the intention might be to restrict the event select all together, regardless of it's unit mask. Instead of increasing the number of filter events in the pmu event filter, add a new encoding that is able to do a more generalized match on the unit mask. Introduce masked events as another encoding the pmu event filter understands. Masked events has the fields: mask, match, and exclude. When filtering based on these events, the mask is applied to the guest's unit mask to see if it matches the match value (i.e. umask & mask == match). The exclude bit can then be used to exclude events from that match. E.g. for a given event select, if it's easier to say which unit mask values shouldn't be filtered, a masked event can be set up to match all possible unit mask values, then another masked event can be set up to match the unit mask values that shouldn't be filtered. Userspace can query to see if this feature exists by looking for the capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS. This feature is enabled by setting the flags field in the pmu event filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS. Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY(). It is an error to have a bit set outside the valid bits for a masked event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in such cases, including the high bits of the event select (35:32) if called on Intel. With these updates the filter matching code has been updated to match on a common event. Masked events were flexible enough to handle both event types, so they were used as the common event. This changes how guest events get filtered because regardless of the type of event used in the uAPI, they will be converted to masked events. Because of this there could be a slight performance hit because instead of matching the filter event with a lookup on event select + unit mask, it does a lookup on event select then walks the unit masks to find the match. This shouldn't be a big problem because I would expect the set of common event selects to be small, and if they aren't the set can likely be reduced by using masked events to generalize the unit mask. Using one type of event when filtering guest events allows for a common code path to be used. Signed-off-by: Aaron Lewis Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/api.rst | 78 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 71 insertions(+), 7 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 9807b05a1b57..83e3acc9e321 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -5005,6 +5005,15 @@ using this ioctl. :Parameters: struct kvm_pmu_event_filter (in) :Returns: 0 on success, -1 on error +Errors: + + ====== ============================================================ + EFAULT args[0] cannot be accessed + EINVAL args[0] contains invalid data in the filter or filter events + E2BIG nevents is too large + EBUSY not enough memory to allocate the filter + ====== ============================================================ + :: struct kvm_pmu_event_filter { @@ -5016,14 +5025,69 @@ using this ioctl. __u64 events[0]; }; -This ioctl restricts the set of PMU events that the guest can program. -The argument holds a list of events which will be allowed or denied. -The eventsel+umask of each event the guest attempts to program is compared -against the events field to determine whether the guest should have access. -The events field only controls general purpose counters; fixed purpose -counters are controlled by the fixed_counter_bitmap. +This ioctl restricts the set of PMU events the guest can program by limiting +which event select and unit mask combinations are permitted. + +The argument holds a list of filter events which will be allowed or denied. + +Filter events only control general purpose counters; fixed purpose counters +are controlled by the fixed_counter_bitmap. + +Valid values for 'flags':: + +``0`` + +To use this mode, clear the 'flags' field. + +In this mode each event will contain an event select + unit mask. + +When the guest attempts to program the PMU the guest's event select + +unit mask is compared against the filter events to determine whether the +guest should have access. + +``KVM_PMU_EVENT_FLAG_MASKED_EVENTS`` +:Capability: KVM_CAP_PMU_EVENT_MASKED_EVENTS + +In this mode each filter event will contain an event select, mask, match, and +exclude value. To encode a masked event use:: + + KVM_PMU_ENCODE_MASKED_ENTRY() + +An encoded event will follow this layout:: + + Bits Description + ---- ----------- + 7:0 event select (low bits) + 15:8 umask match + 31:16 unused + 35:32 event select (high bits) + 36:54 unused + 55 exclude bit + 63:56 umask mask + +When the guest attempts to program the PMU, these steps are followed in +determining if the guest should have access: + + 1. Match the event select from the guest against the filter events. + 2. If a match is found, match the guest's unit mask to the mask and match + values of the included filter events. + I.e. (unit mask & mask) == match && !exclude. + 3. If a match is found, match the guest's unit mask to the mask and match + values of the excluded filter events. + I.e. (unit mask & mask) == match && exclude. + 4. + a. If an included match is found and an excluded match is not found, filter + the event. + b. For everything else, do not filter the event. + 5. + a. If the event is filtered and it's an allow list, allow the guest to + program the event. + b. If the event is filtered and it's a deny list, do not allow the guest to + program the event. -No flags are defined yet, the field must be zero. +When setting a new pmu event filter, -EINVAL will be returned if any of the +unused fields are set or if any of the high bits (35:32) in the event +select are set when called on Intel. Valid values for 'action':: -- cgit From f2d3155e2a6bac44d16f04415a321e8707d895c6 Mon Sep 17 00:00:00 2001 From: Nico Boehr Date: Fri, 27 Jan 2023 15:05:32 +0100 Subject: KVM: s390: disable migration mode when dirty tracking is disabled Migration mode is a VM attribute which enables tracking of changes in storage attributes (PGSTE). It assumes dirty tracking is enabled on all memslots to keep a dirty bitmap of pages with changed storage attributes. When enabling migration mode, we currently check that dirty tracking is enabled for all memslots. However, userspace can disable dirty tracking without disabling migration mode. Since migration mode is pointless with dirty tracking disabled, disable migration mode whenever userspace disables dirty tracking on any slot. Also update the documentation to clarify that dirty tracking must be enabled when enabling migration mode, which is already enforced by the code in kvm_s390_vm_start_migration(). Also highlight in the documentation for KVM_S390_GET_CMMA_BITS that it can now fail with -EINVAL when dirty tracking is disabled while migration mode is on. Move all the error codes to a table so this stays readable. To disable migration mode, slots_lock should be held, which is taken in kvm_set_memory_region() and thus held in kvm_arch_prepare_memory_region(). Restructure the prepare code a bit so all the sanity checking is done before disabling migration mode. This ensures migration mode isn't disabled when some sanity check fails. Cc: stable@vger.kernel.org Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode") Signed-off-by: Nico Boehr Reviewed-by: Janosch Frank Reviewed-by: Claudio Imbrenda Link: https://lore.kernel.org/r/20230127140532.230651-2-nrb@linux.ibm.com Message-Id: <20230127140532.230651-2-nrb@linux.ibm.com> [frankja@linux.ibm.com: fixed commit message typo, moved api.rst error table upwards] Signed-off-by: Janosch Frank --- Documentation/virt/kvm/api.rst | 18 ++++++++++++------ Documentation/virt/kvm/devices/vm.rst | 4 ++++ 2 files changed, 16 insertions(+), 6 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index deb494f759ed..8cd7fd05d53b 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4449,6 +4449,18 @@ not holding a previously reported uncorrected error). :Parameters: struct kvm_s390_cmma_log (in, out) :Returns: 0 on success, a negative value on error +Errors: + + ====== ============================================================= + ENOMEM not enough memory can be allocated to complete the task + ENXIO if CMMA is not enabled + EINVAL if KVM_S390_CMMA_PEEK is not set but migration mode was not enabled + EINVAL if KVM_S390_CMMA_PEEK is not set but dirty tracking has been + disabled (and thus migration mode was automatically disabled) + EFAULT if the userspace address is invalid or if no page table is + present for the addresses (e.g. when using hugepages). + ====== ============================================================= + This ioctl is used to get the values of the CMMA bits on the s390 architecture. It is meant to be used in two scenarios: @@ -4529,12 +4541,6 @@ mask is unused. values points to the userspace buffer where the result will be stored. -This ioctl can fail with -ENOMEM if not enough memory can be allocated to -complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if -KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with --EFAULT if the userspace address is invalid or if no page table is -present for the addresses (e.g. when using hugepages). - 4.108 KVM_S390_SET_CMMA_BITS ---------------------------- diff --git a/Documentation/virt/kvm/devices/vm.rst b/Documentation/virt/kvm/devices/vm.rst index 60acc39e0e93..147efec626e5 100644 --- a/Documentation/virt/kvm/devices/vm.rst +++ b/Documentation/virt/kvm/devices/vm.rst @@ -302,6 +302,10 @@ Allows userspace to start migration mode, needed for PGSTE migration. Setting this attribute when migration mode is already active will have no effects. +Dirty tracking must be enabled on all memslots, else -EINVAL is returned. When +dirty tracking is disabled on any memslot, migration mode is automatically +stopped. + :Parameters: none :Returns: -ENOMEM if there is not enough free memory to start migration mode; -EINVAL if the state of the VM is invalid (e.g. no memory defined); -- cgit From a7b041732802db0dab0a7740a6c8338b803bf933 Mon Sep 17 00:00:00 2001 From: Janis Schoetterl-Glausch Date: Mon, 6 Feb 2023 17:46:01 +0100 Subject: Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory. Signed-off-by: Janis Schoetterl-Glausch Reviewed-by: Janosch Frank Link: https://lore.kernel.org/r/20230206164602.138068-14-scgl@linux.ibm.com Message-Id: <20230206164602.138068-14-scgl@linux.ibm.com> [frankja@de.ibm.com: Removed a line from an earlier version] Signed-off-by: Janosch Frank --- Documentation/virt/kvm/api.rst | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 8cd7fd05d53b..a4c9dbccc7c2 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3728,7 +3728,7 @@ The fields in each entry are defined as follows: :Parameters: struct kvm_s390_mem_op (in) :Returns: = 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM), - > 0 if an exception occurred while walking the page tables + 16 bit program exception code if the access causes such an exception Read or write data from/to the VM's memory. The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is @@ -3746,6 +3746,8 @@ Parameters are specified via the following structure:: struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + __u8 pad1[6]; /* ignored */ + __u64 old_addr; /* ignored if flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -3773,6 +3775,7 @@ Possible operations are: * ``KVM_S390_MEMOP_ABSOLUTE_WRITE`` * ``KVM_S390_MEMOP_SIDA_READ`` * ``KVM_S390_MEMOP_SIDA_WRITE`` + * ``KVM_S390_MEMOP_ABSOLUTE_CMPXCHG`` Logical read/write: ^^^^^^^^^^^^^^^^^^^ @@ -3821,7 +3824,7 @@ the checks required for storage key protection as one operation (as opposed to user space getting the storage keys, performing the checks, and accessing memory thereafter, which could lead to a delay between check and access). Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION -is > 0. +has the KVM_S390_MEMOP_EXTENSION_CAP_BASE bit set. Currently absolute accesses are not permitted for VCPU ioctls. Absolute accesses are permitted for non-protected guests only. @@ -3829,7 +3832,26 @@ Supported flags: * ``KVM_S390_MEMOP_F_CHECK_ONLY`` * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` -The semantics of the flags are as for logical accesses. +The semantics of the flags common with logical accesses are as for logical +accesses. + +Absolute cmpxchg: +^^^^^^^^^^^^^^^^^ + +Perform cmpxchg on absolute guest memory. Intended for use with the +KVM_S390_MEMOP_F_SKEY_PROTECTION flag. +Instead of doing an unconditional write, the access occurs only if the target +location contains the value pointed to by "old_addr". +This is performed as an atomic cmpxchg with the length specified by the "size" +parameter. "size" must be a power of two up to and including 16. +If the exchange did not take place because the target value doesn't match the +old value, the value "old_addr" points to is replaced by the target value. +User space can tell if an exchange took place by checking if this replacement +occurred. The cmpxchg op is permitted for the VM ioctl if +KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set. + +Supported flags: + * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` SIDA read/write: ^^^^^^^^^^^^^^^^ -- cgit