summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-22Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== Devlink and error recovery bug fix patches. Most of the work is by Vasundhara Volam. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bnxt_en: Avoid disabling pci device in bnxt_remove_one() for already ↵Vasundhara Volam
disabled device. With the recently added error recovery logic, the device may already be disabled if the firmware recovery is unsuccessful. In bnxt_remove_one(), check that the device is still enabled first before calling pci_disable_device(). Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bnxt_en: Minor formatting changes in FW devlink_health_reporterVasundhara Volam
Minor formatting changes to diagnose cb for FW devlink health reporter. Suggested-by: Jiri Pirko <jiri@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bnxt_en: Adjust the time to wait before polling firmware readiness.Vasundhara Volam
When firmware indicates that driver needs to invoke firmware reset which is common for both error recovery and live firmware reset path, driver needs a different time to wait before polling for firmware readiness. Modify the wait time to fw_reset_min_dsecs, which is initialised to correct timeout for error recovery and firmware reset. Fixes: 4037eb715680 ("bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bnxt_en: Fix devlink NVRAM related byte order related issues.Michael Chan
The current code does not do endian swapping between the devlink parameter and the internal NVRAM representation. Define a union to represent the little endian NVRAM data and add 2 helper functions to copy to and from the NVRAM data with the proper byte swapping. Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial port params table and register it") Cc: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bnxt_en: Fix the size of devlink MSIX parameters.Vasundhara Volam
The current code that rounds up the NVRAM parameter bit size to the next byte size for the devlink parameter is not always correct. The MSIX devlink parameters are 4 bytes and we don't get the correct size using this method. Fix it by adding a new dl_num_bytes member to the bnxt_dl_nvm_param structure which statically provides bytesize information according to the devlink parameter type definition. Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial port params table and register it") Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: stmmac: Fix the problem of tso_xmityuqi jin
When the address width of DMA is greater than 32, the packet header occupies a BD descriptor. The starting address of the data should be added to the header length. Fixes: a993db88d17d ("net: stmmac: Enable support for > 32 Bits addressing in XGMAC") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: Jose Abreu <joabreu@synopsys.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: yuqi jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22dynamic_debug: provide dynamic_hex_dump stubArnd Bergmann
The ionic driver started using dymamic_hex_dump(), but that is not always defined: drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration] Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG is disabled, printing nothing. Fixes: 938962d55229 ("ionic: Add adminq action") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22bpf: Fix use after free in subprog's jited symbol removalDaniel Borkmann
syzkaller managed to trigger the following crash: [...] BUG: unable to handle page fault for address: ffffc90001923030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline] RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline] RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline] RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline] RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline] RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline] RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709 [...] Call Trace: kernel_text_address kernel/extable.c:147 [inline] __kernel_text_address+0x9a/0x110 kernel/extable.c:102 unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19 arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26 stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123 save_stack mm/kasan/common.c:69 [inline] set_track mm/kasan/common.c:77 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3319 [inline] kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483 getname_flags+0xba/0x640 fs/namei.c:138 getname+0x19/0x20 fs/namei.c:209 do_sys_open+0x261/0x560 fs/open.c:1091 __do_sys_open fs/open.c:1115 [inline] __se_sys_open fs/open.c:1110 [inline] __x64_sys_open+0x87/0x90 fs/open.c:1110 do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] After further debugging it turns out that we walk kallsyms while in parallel we tear down a BPF program which contains subprograms that have been JITed though the program itself has not been fully exposed and is eventually bailing out with error. The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes the symbols, however, bpf_prog_free() tears down the JIT memory too early via scheduled work. Instead, it needs to properly respect RCU grace period as the kallsyms walk for BPF is under RCU. Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error path where we defer final destruction when we have subprogs in the program. Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup") Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
2019-10-22RDMA/uverbs: Prevent potential underflowDan Carpenter
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that: if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) { But we don't check if "attr.comp_vector" is negative. It could potentially lead to an array underflow. My concern would be where cq->vector is used in the create_cq() function from the cxgb4 driver. And really "attr.comp_vector" is appears as a u32 to user space so that's the right type to use. Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions") Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22KVM: nVMX: Don't leak L1 MMIO regions to L2Jim Mattson
If the "virtualize APIC accesses" VM-execution control is set in the VMCS, the APIC virtualization hardware is triggered when a page walk in VMX non-root mode terminates at a PTE wherein the address of the 4k page frame matches the APIC-access address specified in the VMCS. On hardware, the APIC-access address may be any valid 4k-aligned physical address. KVM's nVMX implementation enforces the additional constraint that the APIC-access address specified in the vmcs12 must be backed by a "struct page" in L1. If not, L0 will simply clear the "virtualize APIC accesses" VM-execution control in the vmcs02. The problem with this approach is that the L1 guest has arranged the vmcs12 EPT tables--or shadow page tables, if the "enable EPT" VM-execution control is clear in the vmcs12--so that the L2 guest physical address(es)--or L2 guest linear address(es)--that reference the L2 APIC map to the APIC-access address specified in the vmcs12. Without the "virtualize APIC accesses" VM-execution control in the vmcs02, the APIC accesses in the L2 guest will directly access the APIC-access page in L1. When there is no mapping whatsoever for the APIC-access address in L1, the L2 VM just loses the intended APIC virtualization. However, when the APIC-access address is mapped to an MMIO region in L1, the L2 guest gets direct access to the L1 MMIO device. For example, if the APIC-access address specified in the vmcs12 is 0xfee00000, then L2 gets direct access to L1's APIC. Since this vmcs12 configuration is something that KVM cannot faithfully emulate, the appropriate response is to exit to userspace with KVM_INTERNAL_ERROR_EMULATION. Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12") Reported-by: Dan Cross <dcross@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22ARC: perf: Accommodate big-endian CPUAlexey Brodkin
8-letter strings representing ARC perf events are stores in two 32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc. And the same order of bytes in the word is used regardless CPU endianness. Which means in case of big-endian CPU core we need to swap bytes to get the same order as if it was on little-endian CPU. Otherwise we're seeing the following error message on boot: ------------------------->8---------------------- ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support] sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3 Stack Trace: arc_unwind_core+0xd4/0xfc dump_stack+0x64/0x80 sysfs_warn_dup+0x46/0x58 sysfs_add_file_mode_ns+0xb2/0x168 create_files+0x70/0x2a0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0 Failed to register pmu: arc_pct, reason -17 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3 Stack Trace: arc_unwind_core+0xd4/0xfc dump_stack+0x64/0x80 __warn+0x9c/0xd4 warn_slowpath_fmt+0x22/0x2c perf_event_sysfs_init+0x70/0xa0 ---[ end trace a75fb9a9837bd1ec ]--- ------------------------->8---------------------- What happens here we're trying to register more than one raw perf event with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters and encoded into two 32-bit words. In this particular case we deal with 2 events: * "IJMP____" which counts all jump & branch instructions * "IJMPC___" which counts only conditional jumps & branches Those strings are split in two 32-bit words this way "IJMP" + "____" & "IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core being big-endian then we read "PMJI" + "____" & "PMJI" + "___C". And since we interpret read array of ASCII letters as a null-terminated string on big-endian CPU we end up with 2 events of the same name "PMJI". Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-10-22ARC: [plat-hsdk]: Enable on-boardi SPI ADC ICEugeniy Paltsev
HSDK board has adc108s102 SPI ADC IC installed, enable it. Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-10-22ARC: [plat-hsdk]: Enable on-board SPI NOR flash ICEugeniy Paltsev
HSDK board has sst26wf016b SPI NOR flash IC installed, enable it. Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-10-22KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_updateMiaohe Lin
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22Merge branch 'misc' into fixesRussell King
2019-10-22cpufreq: Cancel policy update work scheduled before freeingSudeep Holla
Scheduled policy update work may end up racing with the freeing of the policy and unregistering the driver. One possible race is as below, where the cpufreq_driver is unregistered, but the scheduled work gets executed at later stage when, cpufreq_driver is NULL (i.e. after freeing the policy and driver). Unable to handle kernel NULL pointer dereference at virtual address 0000001c pgd = (ptrval) [0000001c] *pgd=80000080204003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP THUMB2 Modules linked in: CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86 Hardware name: ARM-Versatile Express Workqueue: events handle_update PC is at cpufreq_set_policy+0x58/0x228 LR is at dev_pm_qos_read_value+0x77/0xac Control: 70c5387d Table: 80203000 DAC: fffffffd Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval)) (cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48) (refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38) (handle_update) from (process_one_work+0x16d/0x3cc) (process_one_work) from (worker_thread+0xff/0x414) (worker_thread) from (kthread+0xff/0x100) (kthread) from (ret_from_fork+0x11/0x28) Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework") Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> [ rjw: Cancel the work before dropping the QoS requests ] Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-10-22s390/kaslr: add support for R_390_GLOB_DAT relocation typeGerald Schaefer
Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined __weak symbol, which results in an R_390_GLOB_DAT relocation type. That is not yet handled by the KASLR relocation code, and the kernel stops with the message "Unknown relocation type". Add code to detect and handle R_390_GLOB_DAT relocation types and undefined symbols. Fixes: 805bc0bc238f ("s390/kernel: build a relocatable kernel") Cc: <stable@vger.kernel.org> # v5.2+ Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-22s390/zcrypt: fix memleak at releaseJohan Hovold
If a process is interrupted while accessing the crypto device and the global ap_perms_mutex is contented, release() could return early and fail to free related resources. Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support") Cc: <stable@vger.kernel.org> # 4.19 Cc: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-22ALSA: usb-audio: Fix copy&paste error in the validatorTakashi Iwai
The recently introduced USB-audio descriptor validator had a stupid copy&paste error that may lead to an unexpected overlook of too short descriptors for processing and extension units. It's likely the cause of the report triggered by syzkaller fuzzer. Let's fix it. Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units") Reported-by: syzbot+0620f79a1978b1133fd7@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/s5hsgnkdbsl.wl-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-10-22KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: tests: Add test to verify MSR_IA32_XSSAaron Lewis
Ensure that IA32_XSS appears in KVM_GET_MSR_INDEX_LIST if it can be set to a non-zero value. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ia2d644f69e2d6d8c27d7e0a7a45c2bf9c42bf5ff Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: svm: Update svm_xsaves_supportedAaron Lewis
AMD CPUs now support XSAVES in a limited fashion (they require IA32_XSS to be zero). AMD has no equivalent of Intel's "Enable XSAVES/XRSTORS" VM-execution control. Instead, XSAVES is always available to the guest when supported on the host. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I40dc2c682eb0d38c2208d95d5eb7bbb6c47f6317 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Move IA32_XSS to kvm_{get,set}_msr_commonAaron Lewis
Hoist support for RDMSR/WRMSR of IA32_XSS from vmx into common code so that it can be used for svm as well. Right now, kvm only allows the guest IA32_XSS to be zero, so the guest's usage of XSAVES will be exactly the same as XSAVEC. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ie4b0f777d71e428fbee6e82071ac2d7618e9bb40 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Move IA32_XSS-swapping on VM-entry/VM-exit to common x86 codeAaron Lewis
Hoist the vendor-specific code related to loading the hardware IA32_XSS MSR with guest/host values on VM-entry/VM-exit to common x86 code. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ic6e3430833955b98eb9b79ae6715cf2a3fdd6d82 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Use wrmsr for switching between guest and host IA32_XSS on IntelAaron Lewis
When the guest can execute the XSAVES/XRSTORS instructions, use wrmsr to set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit, rather than the MSR-load areas. By using the same approach as AMD, we will be able to use a common implementation for both (in the next patch). Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I9447d104b2615c04e39e4af0c911e1e7309bf464 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Use wrmsr for switching between guest and host IA32_XSS on AMDAaron Lewis
When the guest can execute the XSAVES/XRSTORS instructions, set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit. Note that vcpu->arch.ia32_xss is currently guaranteed to be 0 on AMD, since there is no way to change it. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Id51a782462086e6d7a3ab621838e200f1c005afd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Remove unneeded kvm_vcpu variable, guest_xcr0_loadedAaron Lewis
The kvm_vcpu variable, guest_xcr0_loaded, is a waste of an 'int' and a conditional branch. VMX and SVM are the only users, and both unconditionally pair kvm_load_guest_xcr0() with kvm_put_guest_xcr0() making this check unnecessary. Without this variable, the predicates in kvm_load_guest_xcr0 and kvm_put_guest_xcr0 should match. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I7b1eb9b62969d7bbb2850f27e42f863421641b23 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Fix conditions for guest IA32_XSS supportAaron Lewis
Volume 4 of the SDM says that IA32_XSS is supported if CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3] is set, so only the X86_FEATURE_XSAVES check is necessary (X86_FEATURE_XSAVES is the Linux name for CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3]). Fixes: 4d763b168e9c5 ("KVM: VMX: check CPUID before allowing read/write of IA32_XSS") Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I9059b9f2e3595e4b09a4cdcf14b933b22ebad419 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Introduce vcpu->arch.xsaves_enabledAaron Lewis
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22tools: gpio: Use !building_out_of_srctree to determine srctreeShuah Khan
make TARGETS=gpio kselftest fails with: Makefile:23: tools/build/Makefile.include: No such file or directory When the gpio tool make is invoked from tools Makefile, srctree is cleared and the current logic check for srctree equals to empty string to determine srctree location from CURDIR. When the build in invoked from selftests/gpio Makefile, the srctree is set to "." and the same logic used for srctree equals to empty is needed to determine srctree. Check building_out_of_srctree undefined as the condition for both cases to fix "make TARGETS=gpio kselftest" build failure. Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2019-10-22perf/aux: Fix AUX output stoppingAlexander Shishkin
Commit: 8a58ddae2379 ("perf/core: Fix exclusive events' grouping") allows CAP_EXCLUSIVE events to be grouped with other events. Since all of those also happen to be AUX events (which is not the case the other way around, because arch/s390), this changes the rules for stopping the output: the AUX event may not be on its PMU's context any more, if it's grouped with a HW event, in which case it will be on that HW event's context instead. If that's the case, munmap() of the AUX buffer can't find and stop the AUX event, potentially leaving the last reference with the atomic context, which will then end up freeing the AUX buffer. This will then trip warnings: Fix this by using the context's PMU context when looking for events to stop, instead of the event's PMU context. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22KVM: VMX: Rename {vmx,nested_vmx}_vcpu_setup()Xiaoyao Li
Rename {vmx,nested_vmx}_vcpu_setup() to match what they really do. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Initialize vmx->guest_msrs[] right after allocationXiaoyao Li
Move the initialization of vmx->guest_msrs[] from vmx_vcpu_setup() to vmx_create_vcpu(), and put it right after its allocation. This also is the preperation for next patch. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22iio: imu: inv_mpu6050: fix no data on MPU6050Jean-Baptiste Maneyrol
Some chips have a fifo overflow bit issue where the bit is always set. The result is that every data is dropped. Change fifo overflow management by checking fifo count against a maximum value. Add fifo size in chip hardware set of values. Fixes: f5057e7b2dba ("iio: imu: inv_mpu6050: better fifo overflow handling") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-10-22KVM: VMX: Remove vmx->hv_deadline_tsc initialization from vmx_vcpu_setup()Xiaoyao Li
... It can be removed here because the same code is called later in vmx_vcpu_reset() as the flow: kvm_arch_vcpu_setup() -> kvm_vcpu_reset() -> vmx_vcpu_reset() Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Write VPID to vmcs when creating vcpuXiaoyao Li
Move the code that writes vmx->vpid to vmcs from vmx_vcpu_reset() to vmx_vcpu_setup(), because vmx->vpid is allocated when creating vcpu and never changed. So we don't need to update the vmcs.vpid when resetting vcpu. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86/vPMU: Declare kvm_pmu->reprogram_pmi field using DECLARE_BITMAPLike Xu
Replace the explicit declaration of "u64 reprogram_pmi" with the generic macro DECLARE_BITMAP for all possible appropriate number of bits. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: remove redundant code in kvm_arch_vm_ioctlMiaohe Lin
If we reach here with r = 0, we will reassign r = 0 unnecesarry, then do the label set_irqchip_out work. If we reach here with r != 0, then we will do the label work directly. So this if statement and r = 0 assignment is redundant. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameterSuthikulpanit, Suravee
Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Fold decache_cr3() into cache_reg()Sean Christopherson
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Add helpers to test/mark reg availability and dirtinessSean Christopherson
Add helpers to prettify code that tests and/or marks whether or not a register is available and/or dirty. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'Sean Christopherson
Now that indexing into arch.regs is either protected by WARN_ON_ONCE or done with hardcoded enums, combine all definitions for registers that are tracked by regs_avail and regs_dirty into 'enum kvm_reg'. Having a single enum type will simplify additional cleanup related to regs_avail and regs_dirty. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Add WARNs to detect out-of-bounds register indicesSean Christopherson
Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg values that would cause KVM to overflow vcpu->arch.regs. Change the reg param to an 'int' to make it clear that the reg index is unverified. Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead is limited to flows where the reg index is generated at runtime. And there is at least one historical bug where KVM has generated an out-of- bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels"). Adding the WARN_ON_ONCE() protection paves the way for additional cleanup related to kvm_reg and kvm_reg_ex. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Optimize vmx_set_rflags() for unrestricted guestSean Christopherson
Rework vmx_set_rflags() to avoid the extra code need to handle emulation of real mode and invalid state when unrestricted guest is disabled. The primary reason for doing so is to avoid the call to vmx_get_rflags(), which will incur a VMREAD when RFLAGS is not already available. When running nested VMs, the majority of calls to vmx_set_rflags() will occur without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS during transitions between vmcs01 and vmcs02. Note, vmx_get_rflags() guarantees RFLAGS is marked available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Replace "else" with early "return" in the unrestricted guest branch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessorsSean Christopherson
Capture struct vcpu_vmx in a local variable to improve the readability of vmx_{g,s}et_rflags(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-dateSean Christopherson
Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has not been read from the VMCS since the last VM-Enter. If vcpu->arch.cr3 is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS, meaning KVM will do a VMREAD and then VMWRITE the value it just pulled from the VMCS. Note, this is a purely theoretical change, no instances of skipping the VMREAD+VMWRITE have been observed with this change. Tested-by: Reto Buerki <reet@codelabs.ch> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Reduce WBINVD/DF_FLUSH invocationsTom Lendacky
Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However, the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs that have been disassociated from guests through DEACTIVATE. To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID bitmap to track ASIDs that need to be reclaimed. When an SEV guest is terminated, add its ASID to the reclaim bitmap instead of clearing the bitmap in the existing SEV ASID bitmap. This delays the need to perform a WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH invocation can be performed and all ASIDs in the reclaim bitmap moved to the available ASIDs bitmap. The semaphore around DEACTIVATE can be changed to a read semaphore with the semaphore taken in write mode before performing the WBINVD/DF_FLUSH. Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Remove unneeded WBINVD and DF_FLUSH when starting SEV guestsTom Lendacky
Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support currently performs this WBINVD/DF_FLUSH combination when an SEV guest is terminated, so there is no need for it to be done before LAUNCH. However, when the SEV firmware transitions the platform from UNINIT state to INIT state, all ASIDs will be marked invalid across all threads. Therefore, as part of transitioning the platform to INIT state, perform a WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver. Since the PSP/SEV device driver is x86 only, it can reference and use the WBINVD related functions directly. Cc: Gary Hook <gary.hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-EnterSean Christopherson
Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter instead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated VM-Exit occurs without actually entering L2, e.g. if the nested run is squashed because nested VM-Enter (from L1) is putting L2 into HLT. Note, the above scenario can occur regardless of whether L1 is intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with vmcs.GUEST_ACTIVITY_STATE=HALTED. But practically speaking, a VMM will likely put a guest into HALTED if and only if it's not intercepting HLT. In an ideal world where EPT *requires* unrestricted guest (and vice versa), VMX could handle CR3 similar to how it handles RSP and RIP, e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But the unrestricted guest silliness complicates the dirty tracking logic to the point that explicitly handling vmcs02.GUEST_CR3 during nested VM-Enter is a simpler overall implementation. Cc: stable@vger.kernel.org Reported-and-tested-by: Reto Buerki <reet@codelabs.ch> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>