summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-24KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_testJing Zhang
Remove the assumption from kvm_binary_stats_test that all stats are laid out contiguously in memory. The current stats in KVM are contiguously laid out in memory, but that may change in the future and the ABI specifically allows holes in the stats data (since each stat exposes its own offset). While here drop the check that each stats' offset is less than size_data, as that is now always true by construction. Link: https://lore.kernel.org/kvm/20221208193857.4090582-9-dmatlack@google.com/ Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface") Signed-off-by: Jing Zhang <jingzhangos@google.com> [dmatlack: Re-worded the commit message.] Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230117222707.3949974-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/xen: Remove unneeded semicolonzhang songyi
The semicolon after the "}" is unneeded. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202212191432274558936@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Use host's native hypercall instruction in kvm_hypercall()Vishal Annapurve
Use the host CPU's native hypercall instruction, i.e. VMCALL vs. VMMCALL, in kvm_hypercall(), as relying on KVM to patch in the native hypercall on a #UD for the "wrong" hypercall requires KVM_X86_QUIRK_FIX_HYPERCALL_INSN to be enabled and flat out doesn't work if guest memory is encrypted with a private key, e.g. for SEV VMs. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-4-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Cache host CPU vendor (AMD vs. Intel)Vishal Annapurve
Cache the host CPU vendor for userspace and share it with guest code. All the current callers of this_cpu* actually care about host cpu so they are updated to check host_cpu_is*. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-3-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Use "this_cpu" prefix for cpu vendor queriesVishal Annapurve
Replace is_intel/amd_cpu helpers with this_cpu_* helpers to better convey the intent of querying vendor of the current cpu. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-2-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Fix a typo in the vcpu_msrs_set assertAaron Lewis
The assert incorrectly identifies the ioctl being called. Switch it from KVM_GET_MSRS to KVM_SET_MSRS. Fixes: 6ebfef83f03f ("KVM: selftest: Add proper helpers for x86-specific save/restore ioctls") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221209201326.2781950-1-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: kvm_vm_elf_load() and elfhdr_get() should close fdReiji Watanabe
kvm_vm_elf_load() and elfhdr_get() open one file each, but they never close the opened file descriptor. If a test repeatedly creates and destroys a VM with __vm_create(), which (directly or indirectly) calls those two functions, the test might end up getting a open failure with EMFILE. Fix those two functions to close the file descriptor. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221220170921.2499209-2-reijiw@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Test masked events in PMU filterAaron Lewis
Add testing to show that a pmu event can be filtered with a generalized match on it's unit mask. These tests set up test cases to demonstrate various ways of filtering a pmu event that has multiple unit mask values. It does this by setting up the filter in KVM with the masked events provided, then enabling three pmu counters in the guest. The test then verifies that the pmu counters agree with which counters should be counting and which counters should be filtered for both a sparse filter list and a dense filter list. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-8-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Add testing for KVM_SET_PMU_EVENT_FILTERAaron Lewis
Test that masked events are not using invalid bits, and if they are, ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER. The only valid bits that can be used for masked events are set when using KVM_PMU_ENCODE_MASKED_ENTRY() with one exception: If any of the high bits (35:32) of the event select are set when using Intel, the pmu event filter will fail. Also, because validation was not being done prior to the introduction of masked events, only expect validation to fail when masked events are used. E.g. in the first test a filter event with all its bits set is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-7-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Add flags when creating a pmu event filterAaron Lewis
Now that the flags field can be non-zero, pass it in when creating a pmu event filter. This is needed in preparation for testing masked events. No functional change intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-6-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Introduce masked events to the pmu event filterAaron Lewis
When building a list of filter events, it can sometimes be a challenge to fit all the events needed to adequately restrict the guest into the limited space available in the pmu event filter. This stems from the fact that the pmu event filter requires each event (i.e. event select + unit mask) be listed, when the intention might be to restrict the event select all together, regardless of it's unit mask. Instead of increasing the number of filter events in the pmu event filter, add a new encoding that is able to do a more generalized match on the unit mask. Introduce masked events as another encoding the pmu event filter understands. Masked events has the fields: mask, match, and exclude. When filtering based on these events, the mask is applied to the guest's unit mask to see if it matches the match value (i.e. umask & mask == match). The exclude bit can then be used to exclude events from that match. E.g. for a given event select, if it's easier to say which unit mask values shouldn't be filtered, a masked event can be set up to match all possible unit mask values, then another masked event can be set up to match the unit mask values that shouldn't be filtered. Userspace can query to see if this feature exists by looking for the capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS. This feature is enabled by setting the flags field in the pmu event filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS. Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY(). It is an error to have a bit set outside the valid bits for a masked event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in such cases, including the high bits of the event select (35:32) if called on Intel. With these updates the filter matching code has been updated to match on a common event. Masked events were flexible enough to handle both event types, so they were used as the common event. This changes how guest events get filtered because regardless of the type of event used in the uAPI, they will be converted to masked events. Because of this there could be a slight performance hit because instead of matching the filter event with a lookup on event select + unit mask, it does a lookup on event select then walks the unit masks to find the match. This shouldn't be a big problem because I would expect the set of common event selects to be small, and if they aren't the set can likely be reduced by using masked events to generalize the unit mask. Using one type of event when filtering guest events allows for a common code path to be used. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: prepare the pmu event filter for masked eventsAaron Lewis
Refactor check_pmu_event_filter() in preparation for masked events. No functional changes intended Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Remove impossible events from the pmu event filterAaron Lewis
If it's not possible for an event in the pmu event filter to match a pmu event being programmed by the guest, it's pointless to have it in the list. Opt for a shorter list by removing those events. Because this is established uAPI the pmu event filter can't outright rejected these events as garbage and return an error. Instead, play nice and remove them from the list. Also, opportunistically rewrite the comment when the filter is set to clarify that it guards against *all* TOCTOU attacks on the verified data. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Correct the mask used in a pmu event filter lookupAaron Lewis
When checking if a pmu event the guest is attempting to program should be filtered, only consider the event select + unit mask in that decision. Use an architecture specific mask to mask out all other bits, including bits 35:32 on Intel. Those bits are not part of the event select and should not be considered in that decision. Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Use kstrtobool() instead of strtobool()Christophe JAILLET
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Cleanup range-based flushing for given pageHou Wenlong
Use the new kvm_flush_remote_tlbs_gfn() helper to cleanup the call sites of range-based flushing for given page, which makes the code clear. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/593ee1a876ece0e819191c0b23f56b940d6686db.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong gfn range of tlb flushing in validate_direct_spte()Hou Wenlong
The spte pointing to the children SP is dropped, so the whole gfn range covered by the children SP should be flushed. Although, Hyper-V may treat a 1-page flush the same if the address points to a huge page, it still would be better to use the correct size of huge page. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/5f297c566f7d7ff2ea6da3c66d050f69ce1b8ede.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong start gfn of tlb flushing with rangeHou Wenlong
When a spte is dropped, the start gfn of tlb flushing should be the gfn of spte not the base gfn of SP which contains the spte. Also introduce a helper function to do range-based flushing when a spte is dropped, which would help prevent future buggy use of kvm_flush_remote_tlbs_with_address() in such case. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/72ac2169a261976f00c1703e88cda676dfb960f5.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Reduce gfn range of tlb flushing in ↵Hou Wenlong
tdp_mmu_map_handle_target_level() Since the children SP is zapped, the gfn range of tlb flushing should be the range covered by children SP not parent SP. Replace sp->gfn which is the base gfn of parent SP with iter->gfn and use the correct size of gfn range for children SP to reduce tlb flushing range. Fixes: bb95dfb9e2df ("KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/528ab9c784a486e9ce05f61462ad9260796a8732.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong gfn range of tlb flushing in kvm_set_pte_rmapp()Hou Wenlong
When the spte of hupe page is dropped in kvm_set_pte_rmapp(), the whole gfn range covered by the spte should be flushed. However, rmap_walk_init_level() doesn't align down the gfn for new level like tdp iterator does, then the gfn used in kvm_set_pte_rmapp() is not the base gfn of huge page. And the size of gfn range is wrong too for huge page. Use the base gfn of huge page and the size of huge page for flushing tlbs for huge page. Also introduce a helper function to flush the given page (huge or not) of guest memory, which would help prevent future buggy use of kvm_flush_remote_tlbs_with_address() in such case. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/0ce24d7078fa5f1f8d64b0c59826c50f32f8065e.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Move round_gfn_for_level() helper into mmu_internal.hHou Wenlong
Rounding down the GFN to a huge page size is a common pattern throughout KVM, so move round_gfn_for_level() helper in tdp_iter.c to mmu_internal.h for common usage. Also rename it as gfn_round_for_level() to use gfn_* prefix and clean up the other call sites. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/415c64782f27444898db650e21cf28eeb6441dfa.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: fix an incorrect comment in kvm_mmu_new_pgd()Wei Liu
There is no function named kvm_mmu_ensure_valid_pgd(). Fix the comment and remove the pair of braces to conform to Linux kernel coding style. Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221128214709.224710-1-wei.liu@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24kvm: x86/mmu: Don't clear write flooding for direct SPLai Jiangshan
Although there is no harm, but there is no point to clear write flooding for direct SP. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230105100310.6700-1-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24kvm: x86/mmu: Rename SPTE_TDP_AD_ENABLED_MASK to SPTE_TDP_AD_ENABLEDLai Jiangshan
SPTE_TDP_AD_ENABLED_MASK, SPTE_TDP_AD_DISABLED_MASK and SPTE_TDP_AD_WRPROT_ONLY_MASK are actual value, not mask. Remove "MASK" from their names. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230105100204.6521-1-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/reboot: Disable SVM, not just VMX, when stopping CPUsSean Christopherson
Disable SVM and more importantly force GIF=1 when halting a CPU or rebooting the machine. Similar to VMX, SVM allows software to block INITs via CLGI, and thus can be problematic for a crash/reboot. The window for failure is smaller with SVM as INIT is only blocked while GIF=0, i.e. between CLGI and STGI, but the window does exist. Fixes: fba4f472b33a ("x86/reboot: Turn off KVM when halting a CPU") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/reboot: Disable virtualization in an emergency if SVM is supportedSean Christopherson
Disable SVM on all CPUs via NMI shootdown during an emergency reboot. Like VMX, SVM can block INIT, e.g. if the emergency reboot is triggered between CLGI and STGI, and thus can prevent bringing up other CPUs via INIT-SIPI-SIPI. Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)Sean Christopherson
Set GIF=1 prior to disabling SVM to ensure that INIT is recognized if the kernel is disabling SVM in an emergency, e.g. if the kernel is about to jump into a crash kernel or may reboot without doing a full CPU RESET. If GIF is left cleared, the new kernel (or firmware) will be unabled to awaken APs. Eat faults on STGI (due to EFER.SVME=0) as it's possible that SVM could be disabled via NMI shootdown between reading EFER.SVME and executing STGI. Link: https://lore.kernel.org/all/cbcb6f35-e5d7-c1c9-4db9-fe5cc4de579a@amd.com Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/crash: Disable virt in core NMI crash handler to avoid double shootdownSean Christopherson
Disable virtualization in crash_nmi_callback() and rework the emergency_vmx_disable_all() path to do an NMI shootdown if and only if a shootdown has not already occurred. NMI crash shootdown fundamentally can't support multiple invocations as responding CPUs are deliberately put into halt state without unblocking NMIs. But, the emergency reboot path doesn't have any work of its own, it simply cares about disabling virtualization, i.e. so long as a shootdown occurred, emergency reboot doesn't care who initiated the shootdown, or when. If "crash_kexec_post_notifiers" is specified on the kernel command line, panic() will invoke crash_smp_send_stop() and result in a second call to nmi_shootdown_cpus() during native_machine_emergency_restart(). Invoke the callback _before_ disabling virtualization, as the current VMCS needs to be cleared before doing VMXOFF. Note, this results in a subtle change in ordering between disabling virtualization and stopping Intel PT on the responding CPUs. While VMX and Intel PT do interact, VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one another, which is all that matters when panicking. Harden nmi_shootdown_cpus() against multiple invocations to try and capture any such kernel bugs via a WARN instead of hanging the system during a crash/dump, e.g. prior to the recent hardening of register_nmi_handler(), re-registering the NMI handler would trigger a double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y. list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800. WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 Call Trace: __register_nmi_handler+0xcf/0x130 nmi_shootdown_cpus+0x39/0x90 native_machine_emergency_restart+0x1c9/0x1d0 panic+0x237/0x29b Extract the disabling logic to a common helper to deduplicate code, and to prepare for doing the shootdown in the emergency reboot path if SVM is supported. Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected against a second invocation by a cpu_vmx_enabled() check as the kdump handler would disable VMX if it ran first. Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported") Cc: stable@vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/xen: update Xen CPUID Leaf 4 (tsc info) sub-leaves, if presentPaul Durrant
The scaling information in subleaf 1 should match the values set by KVM in the 'vcpu_info' sub-structure 'time_info' (a.k.a. pvclock_vcpu_time_info) which is shared with the guest, but is not directly available to the VMM. The offset values are not set since a TSC offset is already applied. The TSC frequency should also be set in sub-leaf 2. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-3-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/cpuid: generalize kvm_update_kvm_cpuid_base() and also capture limitPaul Durrant
A subsequent patch will need to acquire the CPUID leaf range for emulated Xen so explicitly pass the signature of the hypervisor we're interested in to the new function. Also introduce a new kvm_hypervisor_cpuid structure so we can neatly store both the base and limit leaf indices. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-2-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Replace cpu_dirty_logging_count with nr_memslots_dirty_loggingDavid Matlack
Drop cpu_dirty_logging_count in favor of nr_memslots_dirty_logging. Both fields count the number of memslots that have dirty-logging enabled, with the only difference being that cpu_dirty_logging_count is only incremented when using PML. So while nr_memslots_dirty_logging is not a direct replacement for cpu_dirty_logging_count, it can be combined with enable_pml to get the same information. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230105214303.2919415-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Replace 0-length arrays with flexible arraysKees Cook
Zero-length arrays are deprecated[1]. Replace struct kvm_nested_state's "data" union 0-length arrays with flexible arrays. (How are the sizes of these arrays verified?) Detected with GCC 13, using -fstrict-flex-arrays=3: arch/x86/kvm/svm/nested.c: In function 'svm_get_nested_state': arch/x86/kvm/svm/nested.c:1536:17: error: array subscript 0 is outside array bounds of 'struct kvm_svm_nested_state_data[0]' [-Werror=array-bounds=] 1536 | &user_kvm_nested_state->data.svm[0]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/uapi/linux/kvm.h:15, from include/linux/kvm_host.h:40, from arch/x86/kvm/svm/nested.c:18: arch/x86/include/uapi/asm/kvm.h:511:50: note: while referencing 'svm' 511 | struct kvm_svm_nested_state_data svm[0]; | ^~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230105190548.never.323-kees@kernel.org Link: https://lore.kernel.org/r/20230118195905.gonna.693-kees@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Advertise fast REP string features inherent to the CPUJim Mattson
Fast zero-length REP MOVSB, fast short REP STOSB, and fast short REP {CMPSB,SCASB} are inherent features of the processor that cannot be hidden by the hypervisor. When these features are present on the host, enumerate them in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220901211811.2883855-2-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/cpufeatures: Add macros for Intel's new fast rep string featuresJim Mattson
KVM_GET_SUPPORTED_CPUID should reflect these host CPUID bits. The bits are already cached in word 12. Give the bits X86_FEATURE names, so that they can be easily referenced. Hide these bits from /proc/cpuinfo, since the host kernel makes no use of them at present. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220901211811.2883855-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24kvm_host.h: fix spelling typo in function declarationWang Liang
Make parameters in function declaration consistent with those in function definition for better cscope-ability Signed-off-by: Wang Liang <wangliangzz@inspur.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220920060210.4842-1-wangliangzz@126.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: account allocation in generic version of kvm_arch_alloc_vm()Alexey Dobriyan
Account the allocation of VMs in the generic version of kvm_arch_alloc_vm(), the VM is tied to the current task/process. Note, x86 already accounts its allocation. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/Y3aay2u2KQgiR0un@p183 [sean: reworded changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: PPC: Fix refactoring goof in kvmppc_e500mc_init()Sean Christopherson
Fix a build error due to a mixup during a recent refactoring. The error was reported during code review, but the fixed up patch didn't make it into the final commit. Fixes: 474856bad921 ("KVM: PPC: Move processor compatibility check to module init") Link: https://lore.kernel.org/all/87cz93snqc.fsf@mpe.ellerman.id.au Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230119182158.4026656-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-24usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_waitUdipto Goswami
__ffs_ep0_queue_wait executes holding the spinlock of &ffs->ev.waitq.lock and unlocks it after the assignments to usb_request are done. However in the code if the request is already NULL we bail out returning -EINVAL but never unlocked the spinlock. Fix this by adding spin_unlock_irq &ffs->ev.waitq.lock before returning. Fixes: 6a19da111057 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait") Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com> Link: https://lore.kernel.org/r/20230124091149.18647-1-quic_ugoswami@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24usb: dwc3: qcom: enable vbus override when in OTG dr-modeNeil Armstrong
With vbus override enabled when in OTG dr_mode, Host<->Peripheral switch now works on SM8550, otherwise the DWC3 seems to be stuck in Host mode only. Fixes: a4333c3a6ba9 ("usb: dwc3: Add Qualcomm DWC3 glue driver") Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20230123-topic-sm8550-upstream-dwc3-qcom-otg-v2-1-2d400e598463@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24MAINTAINERS: Add myself as UVC Gadget MaintainerDaniel Scally
Add myself as a second maintainer for the UVC Gadget. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Link: https://lore.kernel.org/r/20230124153909.867202-1-dan.scally@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24Revert "Merge branch 'ethtool-mac-merge'"Paolo Abeni
This reverts commit 0ad999c1eec879f06cc52ef7df4d0dbee4a2d7eb, reversing changes made to e38553bdc377e3e7a6caa9dd9770d8b644d8dac3. It was not intended for net. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24tracing: Make sure trace_printk() can output as soon as it can be usedSteven Rostedt (Google)
Currently trace_printk() can be used as soon as early_trace_init() is called from start_kernel(). But if a crash happens, and "ftrace_dump_on_oops" is set on the kernel command line, all you get will be: [ 0.456075] <idle>-0 0dN.2. 347519us : Unknown type 6 [ 0.456075] <idle>-0 0dN.2. 353141us : Unknown type 6 [ 0.456075] <idle>-0 0dN.2. 358684us : Unknown type 6 This is because the trace_printk() event (type 6) hasn't been registered yet. That gets done via an early_initcall(), which may be early, but not early enough. Instead of registering the trace_printk() event (and other ftrace events, which are not trace events) via an early_initcall(), have them registered at the same time that trace_printk() can be used. This way, if there is a crash before early_initcall(), then the trace_printk()s will actually be useful. Link: https://lkml.kernel.org/r/20230104161412.019f6c55@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: e725c731e3bb1 ("tracing: Split tracing initialization into two for early initialization") Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24ftrace: Export ftrace_free_filter() to modulesMark Rutland
Setting filters on an ftrace ops results in some memory being allocated for the filter hashes, which must be freed before the ops can be freed. This can be done by removing every individual element of the hash by calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove` set, but this is somewhat error prone as it's easy to forget to remove an element. Make it easier to clean this up by exporting ftrace_free_filter(), which can be used to clean up all of the filter hashes after an ftrace_ops has been unregistered. Using this, fix the ftrace-direct* samples to free hashes prior to being unloaded. All other code either removes individual filters explicitly or is built-in and already calls ftrace_free_filter(). Link: https://lkml.kernel.org/r/20230103124912.2948963-3-mark.rutland@arm.com Cc: stable@vger.kernel.org Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface") Fixes: 5fae941b9a6f ("ftrace/samples: Add multi direct interface test module") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24media: videobuf2: set q->streaming laterHans Verkuil
Commit a10b21532574 ("media: vb2: add (un)prepare_streaming queue ops") moved up the q->streaming = 1 assignment to before the call to vb2_start_streaming(). This does make sense since q->streaming indicates that VIDIOC_STREAMON is called, and the call to start_streaming happens either at that time or later if q->min_buffers_needed > 0. So q->streaming should be 1 before start_streaming is called. However, it turned out that some drivers use vb2_is_streaming() in buf_queue, and if q->min_buffers_needed == 0, then that will now return true instead of false. So for the time being revert to the original behavior. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: a10b21532574 ("media: vb2: add (un)prepare_streaming queue ops") Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-01-24fuse: fixes after adapting to new posix acl apiChristian Brauner
This cycle we ported all filesystems to the new posix acl api. While looking at further simplifications in this area to remove the last remnants of the generic dummy posix acl handlers we realized that we regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use of posix acls. With the change to a dedicated posix acl api interacting with posix acls doesn't go through the old xattr codepaths anymore and instead only relies the get acl and set acl inode operations. Before this change fuse daemons that don't set FUSE_POSIX_ACL were able to get and set posix acl albeit with two caveats. First, that posix acls aren't cached. And second, that they aren't used for permission checking in the vfs. We regressed that use-case as we currently refuse to retrieve any posix acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons would see a change in behavior. We can restore the old behavior in multiple ways. We could change the new posix acl api and look for a dedicated xattr handler and if we find one prefer that over the dedicated posix acl api. That would break the consistency of the new posix acl api so we would very much prefer not to do that. We could introduce a new ACL_*_CACHE sentinel that would instruct the vfs permission checking codepath to not call into the filesystem and ignore acls. But a more straightforward fix for v6.2 is to do the same thing that Overlayfs does and give fuse a separate get acl method for permission checking. Overlayfs uses this to express different needs for vfs permission lookup and acl based retrieval via the regular system call path as well. Let fuse do the same for now. This way fuse can continue to refuse to retrieve posix acls for daemons that don't set FUSE_POSXI_ACL for permission checking while allowing a fuse server to retrieve it via the usual system calls. In the future, we could extend the get acl inode operation to not just pass a simple boolean to indicate rcu lookup but instead make it a flag argument. Then in addition to passing the information that this is an rcu lookup to the filesystem we could also introduce a flag that tells the filesystem that this is a request from the vfs to use these acls for permission checking. Then fuse could refuse the get acl request for permission checking when the daemon doesn't have FUSE_POSIX_ACL set in the same get acl method. This would also help Overlayfs and allow us to remove the second method for it as well. But since that change is more invasive as we need to update the get acl inode operation for multiple filesystems we should not do this as a fix for v6.2. Instead we will do this for the v6.3 merge window. Fwiw, since posix acls are now always correctly translated in the new posix acl api we could also allow them to be used for daemons without FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral change and again if dones should be done for v6.3. For now, let's just restore the original behavior. A nice side-effect of this change is that for fuse daemons with and without FUSE_POSIX_ACL the same code is used for posix acls in a backwards compatible way. This also means we can remove the legacy xattr handlers completely. We've also added comments to explain the expected behavior for daemons without FUSE_POSIX_ACL into the code. Fixes: 318e66856dde ("xattr: use posix acl api") Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-24ACPI: video: Fix apple gmux detectionHans de Goede
Some apple laptop models have an ACPI device with a HID of APP000B and that device has an IO resource (so it does not describe the new unsupported MMIO based gmux type), but there actually is no gmux in the laptop at all. The gmux_probe() function of the actual apple-gmux driver has code to detect this, this code has been factored out into a new apple_gmux_detect() helper in apple-gmux.h. Use this new function to fix acpi_video_get_backlight_type() wrongly returning apple_gmux as type on the following laptops: MacBookPro5,4 https://pastebin.com/8Xjq7RhS MacBookPro8,1 https://linux-hardware.org/?probe=e513cfbadb&log=dmesg MacBookPro9,2 https://bugzilla.kernel.org/attachment.cgi?id=278961 MacBookPro10,2 https://lkml.org/lkml/2014/9/22/657 MacBookPro11,2 https://forums.fedora-fr.org/viewtopic.php?id=70142 MacBookPro11,4 https://raw.githubusercontent.com/im-0/investigate-card-reader-suspend-problem-on-mbp11.4/mast Fixes: 21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection") Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/ Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230124105754.62167-4-hdegoede@redhat.com
2023-01-24platform/x86: apple-gmux: Add apple_gmux_detect() helperHans de Goede
Add a new (static inline) apple_gmux_detect() helper to apple-gmux.h which can be used for gmux detection instead of apple_gmux_present(). The latter is not really reliable since an ACPI device with a HID of APP000B is present on some devices without a gmux at all, as well as on devices with a newer (unsupported) MMIO based gmux model. This causes apple_gmux_present() to return false-positives on a number of different Apple laptop models. This new helper uses the same probing as the actual apple-gmux driver, so that it does not return false positives. To avoid code duplication the gmux_probe() function of the actual driver is also moved over to using the new apple_gmux_detect() helper. This avoids false positives (vs _HID + IO region detection) on: MacBookPro5,4 https://pastebin.com/8Xjq7RhS MacBookPro8,1 https://linux-hardware.org/?probe=e513cfbadb&log=dmesg MacBookPro9,2 https://bugzilla.kernel.org/attachment.cgi?id=278961 MacBookPro10,2 https://lkml.org/lkml/2014/9/22/657 MacBookPro11,2 https://forums.fedora-fr.org/viewtopic.php?id=70142 MacBookPro11,4 https://raw.githubusercontent.com/im-0/investigate-card-reader-suspend-problem-on-mbp11.4/master/test-16/dmesg Fixes: 21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection") Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/ Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230124105754.62167-3-hdegoede@redhat.com
2023-01-24platform/x86: apple-gmux: Move port defines to apple-gmux.hHans de Goede
This is a preparation patch for adding a new static inline apple_gmux_detect() helper which actually checks a supported gmux is present, rather then only checking an ACPI device with the HID is there as apple_gmux_present() does. Fixes: 21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection") Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/ Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230124105754.62167-2-hdegoede@redhat.com
2023-01-24platform/x86: hp-wmi: Fix cast to smaller integer type warningHans de Goede
Fix the following compiler warning: drivers/platform/x86/hp/hp-wmi.c:551:24: warning: cast to smaller integer type 'enum hp_wmi_radio' from 'void *' [-Wvoid-pointer-to-enum-cast] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230123132824.660062-1-hdegoede@redhat.com
2023-01-24platform/x86/amd: pmc: Add a module parameter to disable workaroundsMario Limonciello
Some users may want to live with the bugs that exist in platform firmware and have workarounds in AMD PMC driver. To allow them to bypass these workarounds, introduce a module parameter. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20230120191519.15926-2-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>