summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-08-22KVM: X86: Add pv tlb shootdown tracepointWanpeng Li
Add pv tlb shootdown tracepoint. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Unconditionally call x86 ops that are always implementedSean Christopherson
Remove a few stale checks for non-NULL ops now that the ops in question are implemented by both VMX and SVM. Note, this is **not** stable material, the Fixes tags are there purely to show when a particular op was first supported by both VMX and SVM. Fixes: 74f169090b6f ("kvm/svm: Setup MCG_CAP on AMD properly") Fixes: b31c114b82b2 ("KVM: X86: Provide a capability to disable PAUSE intercepts") Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86/mmu: Consolidate "is MMIO SPTE" codeSean Christopherson
Replace the open-coded "is MMIO SPTE" checks in the MMU warnings related to software-based access/dirty tracking to make the code slightly more self-documenting. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86/mmu: Add explicit access mask for MMIO SPTEsSean Christopherson
When shadow paging is enabled, KVM tracks the allowed access type for MMIO SPTEs so that it can do a permission check on a MMIO GVA cache hit without having to walk the guest's page tables. The tracking is done by retaining the WRITE and USER bits of the access when inserting the MMIO SPTE (read access is implicitly allowed), which allows the MMIO page fault handler to retrieve and cache the WRITE/USER bits from the SPTE. Unfortunately for EPT, the mask used to retain the WRITE/USER bits is hardcoded using the x86 paging versions of the bits. This funkiness happens to work because KVM uses a completely different mask/value for MMIO SPTEs when EPT is enabled, and the EPT mask/value just happens to overlap exactly with the x86 WRITE/USER bits[*]. Explicitly define the access mask for MMIO SPTEs to accurately reflect that EPT does not want to incorporate any access bits into the SPTE, and so that KVM isn't subtly relying on EPT's WX bits always being set in MMIO SPTEs, e.g. attempting to use other bits for experimentation breaks horribly. Note, vcpu_match_mmio_gva() explicits prevents matching GVA==0, and all TDP flows explicit set mmio_gva to 0, i.e. zeroing vcpu->arch.access for EPT has no (known) functional impact. [*] Using WX to generate EPT misconfigurations (equivalent to reserved bit page fault) ensures KVM can employ its MMIO page fault tricks even platforms without reserved address bits. Fixes: ce88decffd17 ("KVM: MMU: mmio page fault support") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Rename access permissions cache member in struct kvm_vcpu_archSean Christopherson
Rename "access" to "mmio_access" to match the other MMIO cache members and to make it more obvious that it's tracking the access permissions for the MMIO cache. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()Vitaly Kuznetsov
Just like we do with other intercepts, in vmrun_interception() we should be doing kvm_skip_emulated_instruction() and not just RIP += 3. Also, it is wrong to increment RIP before nested_svm_vmrun() as it can result in kvm_inject_gp(). We can't call kvm_skip_emulated_instruction() after nested_svm_vmrun() so move it inside. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: eliminate weird goto from vmrun_interception()Vitaly Kuznetsov
Regardless of whether or not nested_svm_vmrun_msrpm() fails, we return 1 from vmrun_interception() so there's no point in doing goto. Also, nested_svm_vmrun_msrpm() call can be made from nested_svm_vmrun() where other nested launch issues are handled. nested_svm_vmrun() returns a bool, however, its result is ignored in vmrun_interception() as we always return '1'. As a preparatory change to putting kvm_skip_emulated_instruction() inside nested_svm_vmrun() make nested_svm_vmrun() return an int (always '1' for now). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: remove hardcoded instruction length from interceptsVitaly Kuznetsov
Various intercepts hard-code the respective instruction lengths to optimize skip_emulated_instruction(): when next_rip is pre-set we skip kvm_emulate_instruction(vcpu, EMULTYPE_SKIP). The optimization is, however, incorrect: different (redundant) prefixes could be used to enlarge the instruction. We can't really avoid decoding. svm->next_rip is not used when CPU supports 'nrips' (X86_FEATURE_NRIPS) feature: next RIP is provided in VMCB. The feature is not really new (Opteron G3s had it already) and the change should have zero affect. Remove manual svm->next_rip setting with hard-coded instruction lengths. The only case where we now use svm->next_rip is EXIT_IOIO: the instruction length is provided to us by hardware. Hardcoded RIP advancement remains in vmrun_interception(), this is going to be taken care of separately. Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: add xsetbv to the emulatorVitaly Kuznetsov
To avoid hardcoding xsetbv length to '3' we need to support decoding it in the emulator. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: clear interrupt shadow on EMULTYPE_SKIPVitaly Kuznetsov
When doing x86_emulate_instruction(EMULTYPE_SKIP) interrupt shadow has to be cleared if and only if the skipping is successful. There are two immediate issues: - In SVM skip_emulated_instruction() we are not zapping interrupt shadow in case kvm_emulate_instruction(EMULTYPE_SKIP) is used to advance RIP (!nrpip_save). - In VMX handle_ept_misconfig() when running as a nested hypervisor we (static_cpu_has(X86_FEATURE_HYPERVISOR) case) forget to clear interrupt shadow. Note that we intentionally don't handle the case when the skipped instruction is supposed to prolong the interrupt shadow ("MOV/POP SS") as skip-emulation of those instructions should not happen under normal circumstances. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: kvm: svm: propagate errors from skip_emulated_instruction()Vitaly Kuznetsov
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory, fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP). Currently, we only do printk(KERN_DEBUG) when this happens and this is not ideal. Propagate the error up the stack. On VMX, skip_emulated_instruction() doesn't fail, we have two call sites calling it explicitly: handle_exception_nmi() and handle_task_switch(), we can just ignore the result. On SVM, we also have two explicit call sites: svm_queue_exception() and it seems we don't need to do anything there as we check if RIP was advanced or not. In task_switch_interception(), however, we are better off not proceeding to kvm_task_switch() in case skip_emulated_instruction() failed. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: don't pretend to advance RIP in case wrmsr_interception() ↵Vitaly Kuznetsov
results in #GP svm->next_rip is only used by skip_emulated_instruction() and in case kvm_set_msr() fails we rightfully don't do that. Move svm->next_rip advancement to 'else' branch to avoid creating false impression that it's always advanced (and make it look like rdmsr_interception()). This is a preparatory change to removing hardcoded RIP advancement from instruction intercepts, no functional change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Fix x86_decode_insn() return when fetching insn bytes failsSean Christopherson
Jump to the common error handling in x86_decode_insn() if __do_insn_fetch_bytes() fails so that its error code is converted to the appropriate return type. Although the various helpers used by x86_decode_insn() return X86EMUL_* values, x86_decode_insn() itself returns EMULATION_FAILED or EMULATION_OK. This doesn't cause a functional issue as the sole caller, x86_emulate_instruction(), currently only cares about success vs. failure, and success is indicated by '0' for both types (X86EMUL_CONTINUE and EMULATION_OK). Fixes: 285ca9e948fa ("KVM: emulate: speed up do_insn_fetch") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: use Intel speculation bugs and features as derived in generic x86 codePaolo Bonzini
Similar to AMD bits, set the Intel bits from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on AMD processors as well. Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: always expose VIRT_SSBD to guestsPaolo Bonzini
Even though it is preferrable to use SPEC_CTRL (represented by X86_FEATURE_AMD_SSBD) instead of VIRT_SPEC, VIRT_SPEC is always supported anyway because otherwise it would be impossible to migrate from old to new CPUs. Make this apparent in the result of KVM_GET_SUPPORTED_CPUID as well. However, we need to hide the bit on Intel processors, so move the setting to svm_set_supported_cpuid. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: fix reporting of AMD speculation bug CPUID leafPaolo Bonzini
The AMD_* bits have to be set from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on Intel processors as well. On top of this, SSBD, STIBP and AMD_SSB_NO bit were not set, and VIRT_SSBD does not have to be added manually because it is a cpufeature that comes directly from the host's CPUID bit. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22crypto: sha256 - Make lib/crypto/sha256.c suitable for generic useHans de Goede
Before this commit lib/crypto/sha256.c has only been used in the s390 and x86 purgatory code, make it suitable for generic use: * Export interesting symbols * Add -D__DISABLE_EXPORTS to CFLAGS_sha256.o for purgatory builds to avoid the exports for the purgatory builds * Add to lib/crypto/Makefile and crypto/Kconfig Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: sha256 - Move lib/sha256.c to lib/cryptoHans de Goede
Generic crypto implementations belong under lib/crypto not directly in lib, likewise the header should be in include/crypto, not include/linux. Note that the code in lib/crypto/sha256.c is not yet available for generic use after this commit, it is still only used by the s390 and x86 purgatory code. Making it suitable for generic use is done in further patches in this series. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: x86/xts - implement support for ciphertext stealingArd Biesheuvel
Align the x86 code with the generic XTS template, which now supports ciphertext stealing as described by the IEEE XTS-AES spec P1619. Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: x86/des - switch to library interfaceArd Biesheuvel
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: des - split off DES library from generic DES cipher driverArd Biesheuvel
Another one for the cipher museum: split off DES core processing into a separate module so other drivers (mostly for crypto accelerators) can reuse the code without pulling in the generic DES cipher itself. This will also permit the cipher interface to be made private to the crypto API itself once we move the only user in the kernel (CIFS) to this library interface. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: 3des - move verification out of exported routineArd Biesheuvel
In preparation of moving the shared key expansion routine into the DES library, move the verification done by __des3_ede_setkey() into its callers. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-21x86/boot: Fix boot regression caused by bootparam sanitizingJohn Hubbard
commit a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") had two errors: * It preserved boot_params.acpi_rsdp_addr, and * It failed to preserve boot_params.hdr Therefore, zero out acpi_rsdp_addr, and preserve hdr. Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") Reported-by: Neil MacLeod <neil@nmacleod.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neil MacLeod <neil@nmacleod.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.com
2019-08-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "A couple bugfixes, and mostly selftests changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests/kvm: make platform_info_test pass on AMD Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot" selftests: kvm: fix state save/load on processors without XSAVE selftests: kvm: fix vmx_set_nested_state_test selftests: kvm: provide common function to enable eVMCS selftests: kvm: do not try running the VM in vmx_set_nested_state_test KVM: x86: svm: remove redundant assignment of var new_entry MAINTAINERS: add KVM x86 reviewers MAINTAINERS: change list for KVM/s390 kvm: x86: skip populating logical dest map if apic is not sw enabled
2019-08-22kbuild: add CONFIG_ASM_MODVERSIONSMasahiro Yamada
Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional nesting in scripts/Makefile.build. scripts/Makefile.build is run every time Kbuild descends into a sub-directory. So, I want to avoid $(wildcard ...) evaluation where possible although computing $(wildcard ...) is so cheap that it may not make measurable performance difference. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2019-08-21Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"Paolo Bonzini
This reverts commit 4e103134b862314dc2f2f18f2fb0ab972adc3f5f. Alex Williamson reported regressions with device assignment with this patch. Even though the bug is probably elsewhere and still latent, this is needed to fix the regression. Fixes: 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot", 2019-02-05) Reported-by: Alex Willamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20x86/PCI: Remove superfluous returns from void functionsKrzysztof Wilczynski
Remove unnecessary empty return statements at the end of void functions in arch/x86/kernel/quirks.c. Signed-off-by: Krzysztof Wilczynski <kw@linux.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-pci@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190820065121.16594-1-kw@linux.com
2019-08-19x86/mmiotrace: Lock down the testmmiotrace moduleDavid Howells
The testmmiotrace module shouldn't be permitted when the kernel is locked down as it can be used to arbitrarily read and write MMIO space. This is a runtime check rather than buildtime in order to allow configurations where the same kernel may be run in both locked down or permissive modes depending on local policy. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Howells <dhowells@redhat.com Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Thomas Gleixner <tglx@linutronix.de> cc: Steven Rostedt <rostedt@goodmis.org> cc: Ingo Molnar <mingo@kernel.org> cc: "H. Peter Anvin" <hpa@zytor.com> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19acpi: Ignore acpi_rsdp kernel param when the kernel has been locked downJosh Boyer
This option allows userspace to pass the RSDP address to the kernel, which makes it possible for a user to modify the workings of hardware. Reject the option when the kernel is locked down. This requires some reworking of the existing RSDP command line logic, since the early boot code also makes use of a command-line passed RSDP when locating the SRAT table before the lockdown code has been initialised. This is achieved by separating the command line RSDP path in the early boot code from the generic RSDP path, and then copying the command line RSDP into boot params in the kernel proper if lockdown is not enabled. If lockdown is enabled and an RSDP is provided on the command line, this will only be used when parsing SRAT (which shouldn't permit kernel code execution) and will be ignored in the rest of the kernel. (Modified by Matthew Garrett in order to handle the early boot RSDP environment) Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Dave Young <dyoung@redhat.com> cc: linux-acpi@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86/msr: Restrict MSR access when the kernel is locked downMatthew Garrett
Writing to MSRs should not be allowed if the kernel is locked down, since it could lead to execution of arbitrary code in kernel mode. Based on a patch by Kees Cook. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86: Lock down IO port access when the kernel is locked downMatthew Garrett
IO port access would permit users to gain access to PCI configuration registers, which in turn (on a lot of hardware) give access to MMIO register space. This would potentially permit root to trigger arbitrary DMA, so lock it down by default. This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and KDDISABIO console ioctls. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCEJiri Bohac
This is a preparatory patch for kexec_file_load() lockdown. A locked down kernel needs to prevent unsigned kernel images from being loaded with kexec_file_load(). Currently, the only way to force the signature verification is compiling with KEXEC_VERIFY_SIG. This prevents loading usigned images even when the kernel is not locked down at runtime. This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE. Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG turns on the signature verification but allows unsigned images to be loaded. KEXEC_SIG_FORCE disallows images without a valid signature. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Copy secure_boot flag in boot params across kexec rebootDave Young
Kexec reboot in case secure boot being enabled does not keep the secure boot mode in new kernel, so later one can load unsigned kernel via legacy kexec_load. In this state, the system is missing the protections provided by secure boot. Adding a patch to fix this by retain the secure_boot flag in original kernel. secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the stub. Fixing this issue by copying secure_boot flag across kexec reboot. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86/irq: Check for VECTOR_UNUSED directlyHeiner Kallweit
It's simpler and more intuitive to directly check for VECTOR_UNUSED than checking whether the other error codes are not set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/caeaca93-5ee1-cea1-8894-3aa0d5b19241@gmail.com
2019-08-19x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() codeHeiner Kallweit
Both the 64bit and the 32bit handle_irq() implementation check the irq descriptor pointer with IS_ERR_OR_NULL() and return failure. That can be done simpler in the common do_IRQ() code. This reduces the 64bit handle_irq() function to a wrapper around generic_handle_irq_desc(). Invoke it directly from do_IRQ() to spare the extra function call. [ tglx: Got rid of the #ifdef and massaged changelog ] Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/2ec758c7-9aaa-73ab-f083-cc44c86aa741@gmail.com
2019-08-19x86/irq: Improve definition of VECTOR_SHUTDOWN et alHeiner Kallweit
These values are used with IS_ERR(), so it's more intuitive to define them like a standard PTR_ERR() of a negative errno. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/146835e8-c086-4e85-7ece-bcba6795e6db@gmail.com
2019-08-19x86/fixmap: Cleanup outdated commentsCao jin
Remove stale comments and fix the not longer valid pagetable entry reference. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190809114612.2569-1-caoj.fnst@cn.fujitsu.com
2019-08-19x86/platform/intel/iosf_mbi Rewrite lockingHans de Goede
There are 2 problems with the old iosf PMIC I2C bus arbritration code which need to be addressed: 1. The lockdep code complains about a possible deadlock in the iosf_mbi_[un]block_punit_i2c_access code: [ 6.712662] ====================================================== [ 6.712673] WARNING: possible circular locking dependency detected [ 6.712685] 5.3.0-rc2+ #79 Not tainted [ 6.712692] ------------------------------------------------------ [ 6.712702] kworker/0:1/7 is trying to acquire lock: [ 6.712712] 00000000df1c5681 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}, at: iosf_mbi_unblock_punit_i2c_access+0x13/0x90 [ 6.712739] but task is already holding lock: [ 6.712749] 0000000067cb23e7 (iosf_mbi_punit_mutex){+.+.}, at: iosf_mbi_block_punit_i2c_access+0x97/0x186 [ 6.712768] which lock already depends on the new lock. [ 6.712780] the existing dependency chain (in reverse order) is: [ 6.712792] -> #1 (iosf_mbi_punit_mutex){+.+.}: [ 6.712808] __mutex_lock+0xa8/0x9a0 [ 6.712818] iosf_mbi_block_punit_i2c_access+0x97/0x186 [ 6.712831] i2c_dw_acquire_lock+0x20/0x30 [ 6.712841] i2c_dw_set_reg_access+0x15/0xb0 [ 6.712851] i2c_dw_probe+0x57/0x473 [ 6.712861] dw_i2c_plat_probe+0x33e/0x640 [ 6.712874] platform_drv_probe+0x38/0x80 [ 6.712884] really_probe+0xf3/0x380 [ 6.712894] driver_probe_device+0x59/0xd0 [ 6.712905] bus_for_each_drv+0x84/0xd0 [ 6.712915] __device_attach+0xe4/0x170 [ 6.712925] bus_probe_device+0x9f/0xb0 [ 6.712935] deferred_probe_work_func+0x79/0xd0 [ 6.712946] process_one_work+0x234/0x560 [ 6.712957] worker_thread+0x50/0x3b0 [ 6.712967] kthread+0x10a/0x140 [ 6.712977] ret_from_fork+0x3a/0x50 [ 6.712986] -> #0 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}: [ 6.713004] __lock_acquire+0xe07/0x1930 [ 6.713015] lock_acquire+0x9d/0x1a0 [ 6.713025] __mutex_lock+0xa8/0x9a0 [ 6.713035] iosf_mbi_unblock_punit_i2c_access+0x13/0x90 [ 6.713047] i2c_dw_set_reg_access+0x4d/0xb0 [ 6.713058] i2c_dw_probe+0x57/0x473 [ 6.713068] dw_i2c_plat_probe+0x33e/0x640 [ 6.713079] platform_drv_probe+0x38/0x80 [ 6.713089] really_probe+0xf3/0x380 [ 6.713099] driver_probe_device+0x59/0xd0 [ 6.713109] bus_for_each_drv+0x84/0xd0 [ 6.713119] __device_attach+0xe4/0x170 [ 6.713129] bus_probe_device+0x9f/0xb0 [ 6.713140] deferred_probe_work_func+0x79/0xd0 [ 6.713150] process_one_work+0x234/0x560 [ 6.713160] worker_thread+0x50/0x3b0 [ 6.713170] kthread+0x10a/0x140 [ 6.713180] ret_from_fork+0x3a/0x50 [ 6.713189] other info that might help us debug this: [ 6.713202] Possible unsafe locking scenario: [ 6.713212] CPU0 CPU1 [ 6.713221] ---- ---- [ 6.713229] lock(iosf_mbi_punit_mutex); [ 6.713239] lock(iosf_mbi_block_punit_i2c_access_count_mutex); [ 6.713253] lock(iosf_mbi_punit_mutex); [ 6.713265] lock(iosf_mbi_block_punit_i2c_access_count_mutex); [ 6.713276] *** DEADLOCK *** In practice can never happen because only the first caller which increments iosf_mbi_block_punit_i2c_access_count will also take iosf_mbi_punit_mutex, that is the whole purpose of the counter, which itself is protected by iosf_mbi_block_punit_i2c_access_count_mutex. But there is no way to tell the lockdep code about this and we really want to be able to run a kernel with lockdep enabled without these warnings being triggered. 2. The lockdep warning also points out another real problem, if 2 threads both are in a block of code protected by iosf_mbi_block_punit_i2c_access and the first thread to acquire the block exits before the second thread then the second thread will call mutex_unlock on iosf_mbi_punit_mutex, but it is not the thread which took the mutex and unlocking by another thread is not allowed. Fix this by getting rid of the notion of holding a mutex for the entire duration of the PMIC accesses, be it either from the PUnit side, or from an in kernel I2C driver. In general holding a mutex after exiting a function is a bad idea and the above problems show this case is no different. Instead 2 counters are now used, one for PMIC accesses from the PUnit and one for accesses from in kernel I2C code. When access is requested now the code will wait (using a waitqueue) for the counter of the other type of access to reach 0 and on release, if the counter reaches 0 the wakequeue is woken. Note that the counter approach is necessary to allow nested calls. The main reason for this is so that a series of i2c transfers can be done with the punit blocked from accessing the bus the whole time. This is necessary to be able to safely read/modify/write a PMIC register without racing with the PUNIT doing the same thing. Allowing nested iosf_mbi_block_punit_i2c_access() calls also is desirable from a performance pov since the whole dance necessary to block the PUnit from accessing the PMIC I2C bus is somewhat expensive. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lkml.kernel.org/r/20190812102113.95794-1-hdegoede@redhat.com
2019-08-19x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16hTom Lendacky
There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chen Yu <yu.c.chen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org> Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov. 2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan. 3) Fix severe performance regression due to lack of SKB coalescing of fragments during local delivery, from Guillaume Nault. 4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk. 5) Fix batched events in skbedit packet action, from Roman Mashak. 6) Propagate VLAN TX offload to hw_enc_features in bond and team drivers, from Yue Haibing. 7) RXRPC local endpoint refcounting fix and read after free in rxrpc_queue_local(), from David Howells. 8) Fix endian bug in ibmveth multicast list handling, from Thomas Falcon. 9) Oops, make nlmsg_parse() wrap around the correct function, __nlmsg_parse not __nla_parse(). Fix from David Ahern. 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin. 11) Fix memory leak in cxgb4, from Wenwen Wang. 12) Yet another race in AF_PACKET, from Eric Dumazet. 13) Fix false detection of retransmit failures in tipc, from Tuong Lien. 14) Use after free in ravb_tstamp_skb, from Tho Vu. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) ravb: Fix use-after-free ravb_tstamp_skb netfilter: nf_tables: map basechain priority to hardware priority net: sched: use major priority number as hardware priority wimax/i2400m: fix a memory leak bug net: cavium: fix driver name ibmvnic: Unmap DMA address of TX descriptor buffers after use bnxt_en: Fix to include flow direction in L2 key bnxt_en: Use correct src_fid to determine direction of the flow bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails bnxt_en: Improve RX doorbell sequence. bnxt_en: Fix VNIC clearing logic for 57500 chips. net: kalmia: fix memory leaks cx82310_eth: fix a memory leak bug bnx2x: Fix VF's VLAN reconfiguration in reload. Bluetooth: Add debug setting for changing minimum encryption key size tipc: fix false detection of retransmit failures lan78xx: Fix memory leaks MAINTAINERS: r8169: Update path to the driver MAINTAINERS: PHY LIBRARY: Update files in the record ...
2019-08-19x86/boot/compressed/64: Fix boot on machines with broken E820 tableKirill A. Shutemov
BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that consists of 2 entries: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved It breaks logic in find_trampoline_placement(): bios_start lands on the end of the first 4k page and trampoline start gets placed below 0. Detect underflow and don't touch bios_start for such cases. It makes kernel ignore E820 table on machines that doesn't have two usable pages below BIOS_START_MAX. Fixes: 1b3a62643660 ("x86/boot/compressed/64: Validate trampoline placement against E820") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463 Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
2019-08-19x86/apic: Handle missing global clockevent gracefullyThomas Gleixner
Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well. Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency. As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device. Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago. Reported-by: Daniel Drake <drake@endlessm.com> Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Drake <drake@endlessm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
2019-08-19perf/x86: Fix typo in commentSu Yanjun
No functional change. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1565945001-4413-1-git-send-email-suyj.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-19x86/msr-index: Move AMD MSRs where they belongBorislav Petkov
... sort them in and fixup comment, while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190819070140.23708-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-17x86/cpu: Use constant definitions for CPU modelsRahul Tanwar
Replace model numbers with their respective macro definitions when comparing CPU models. Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: alan@linux.intel.com Cc: cheol.yong.kim@intel.com Cc: Hans de Goede <hdegoede@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: qi-ming.wu@intel.com Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f7a0e142faa953a53d5f81f78055e1b3c793b134.1565940653.git.rahul.tanwar@linux.intel.com
2019-08-17x86/cpu: Explain Intel model naming conventionTony Luck
Dave Hansen spelled out the rules in an e-mail: https://lkml.kernel.org/r/91eefbe4-e32b-d762-be4d-672ff915db47@intel.com Copy those right into the <asm/intel-family.h> file to make it easy for people to find them. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190815224704.GA10025@agluck-desk2.amr.corp.intel.com
2019-08-16x86/boot: Save fields explicitly, zero out everything elseJohn Hubbard
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds memset, if the memset goes accross several fields of a struct. This generated a couple of warnings on x86_64 builds in sanitize_boot_params(). Fix this by explicitly saving the fields in struct boot_params that are intended to be preserved, and zeroing all the rest. [ tglx: Tagged for stable as it breaks the warning free build there as well ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
2019-08-16x86/boot: Use common BUILD_BUG_ONRikard Falkeborn
Defining BUILD_BUG_ON causes redefinition warnings when adding includes of include/linux/build_bug.h in files unrelated to x86/boot. For example, adding an include of build_bug.h to include/linux/bits.h shows the following warnings: CC arch/x86/boot/cpucheck.o In file included from ./include/linux/bits.h:22, from ./arch/x86/include/asm/msr-index.h:5, from arch/x86/boot/cpucheck.c:28: ./include/linux/build_bug.h:49: warning: "BUILD_BUG_ON" redefined 49 | #define BUILD_BUG_ON(condition) \ | In file included from arch/x86/boot/cpucheck.c:22: arch/x86/boot/boot.h:31: note: this is the location of the previous definition 31 | #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)])) | The macro was added to boot.h in commit 62bd0337d0c4 ("Top header file for new x86 setup code"). At that time, BUILD_BUG_ON was defined in kernel.h. Presumably BUILD_BUG_ON was redefined to avoid pulling in kernel.h. Since then, BUILD_BUG_ON and similar macros have been split to a separate header file. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190811184938.1796-2-rikard.falkeborn@gmail.com
2019-08-14KVM: x86: svm: remove redundant assignment of var new_entryMiaohe Lin
new_entry is reassigned a new value next line. So it's redundant and remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-14kvm: x86: skip populating logical dest map if apic is not sw enabledRadim Krcmar
recalculate_apic_map does not santize ldr and it's possible that multiple bits are set. In that case, a previous valid entry can potentially be overwritten by an invalid one. This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then triggering a crash to boot a kdump kernel. This is the sequence of events: 1. Linux boots in bigsmp mode and enables PhysFlat, however, it still writes to the LDR which probably will never be used. 2. However, when booting into kdump, the stale LDR values remain as they are not cleared by the guest and there isn't a apic reset. 3. kdump boots with 1 cpu, and uses Logical Destination Mode but the logical map has been overwritten and points to an inactive vcpu. Signed-off-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>