summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-11-29Merge tag 'locking-urgent-2020-11-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two more places which invoke tracing from RCU disabled regions in the idle path. Similar to the entry path the low level idle functions have to be non-instrumentable" * tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Fix intel_idle() vs tracing sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-29Merge tag 'x86_urgent_for_v5.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "A couple of urgent fixes which accumulated this last week: - Two resctrl fixes to prevent refcount leaks when manipulating the resctrl fs (Xiaochen Shen) - Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry) - A fix to not lose already seen MCE severity which determines whether the machine can recover (Gabriele Paoloni)" * tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Do not overwrite no_way_out if mce_end() fails x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
2020-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix alignment of the new HYP sections - Fix GICR_TYPER access from userspace S390: - do not reset the global diag318 data for per-cpu reset - do not mark memory as protected too early - fix for destroy page ultravisor call x86: - fix for SEV debugging - fix incorrect return code - fix for 'noapic' with PIC in userspace and LAPIC in kernel - fix for 5-level paging" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT KVM: x86: Fix split-irqchip vs interrupt injection window request KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint MAINTAINERS: Update email address for Sean Christopherson MAINTAINERS: add uv.c also to KVM/s390 s390/uv: handle destroy page legacy interface KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace KVM: SVM: fix error return code in svm_create_vcpu() KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt(). KVM: arm64: Correctly align nVHE percpu data KVM: s390: remove diag318 reset code KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
2020-11-27Merge tag 'iommu-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull iommu fixes from Will Deacon: "Here's another round of IOMMU fixes for -rc6 consisting mainly of a bunch of independent driver fixes. Thomas agreed for me to take the x86 'tboot' fix here, as it fixes a regression introduced by a vt-d change. - Fix intel iommu driver when running on devices without VCCAP_REG - Fix swiotlb and "iommu=pt" interaction under TXT (tboot) - Fix missing return value check during device probe() - Fix probe ordering for Qualcomm SMMU implementation - Ensure page-sized mappings are used for AMD IOMMU buffers with SNP RMP" * tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: iommu/vt-d: Don't read VCCAP register unless it exists x86/tboot: Don't disable swiotlb when iommu is forced on iommu: Check return of __iommu_attach_device() arm-smmu-qcom: Ensure the qcom_scm driver has finished probing iommu/amd: Enforce 4k mapping for certain IOMMU data structures
2020-11-27x86/mce: Do not overwrite no_way_out if mce_end() failsGabriele Paoloni
Currently, if mce_end() fails, no_way_out - the variable denoting whether the machine can recover from this MCE - is determined by whether the worst severity that was found across the MCA banks associated with the current CPU, is of panic severity. However, at this point no_way_out could have been already set by mca_start() after looking at all severities of all CPUs that entered the MCE handler. If mce_end() fails, check first if no_way_out is already set and, if so, stick to it, otherwise use the local worst value. [ bp: Massage. ] Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
2020-11-27kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PTVitaly Kuznetsov
Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-27KVM: x86: Fix split-irqchip vs interrupt injection window requestPaolo Bonzini
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: David Woodhouse <dwmw@amazon.co.uk> Analyzed-by: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
2020-11-27KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extintPaolo Bonzini
Centralize handling of interrupts from the userspace APIC in kvm_cpu_has_extint and kvm_cpu_get_extint, since userspace APIC interrupts are handled more or less the same as ExtINTs are with split irqchip. This removes duplicated code from kvm_cpu_has_injectable_intr and kvm_cpu_has_interrupt, and makes the code more similar between kvm_cpu_has_{extint,interrupt} on one side and kvm_cpu_get_{extint,interrupt} on the other. Cc: stable@vger.kernel.org Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-25x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpbAnand K Mistry
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command line, IBPB is force-enabled and STIPB is conditionally-enabled (or not available). However, since 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour. Because the issuing of IBPB relies on the switch_mm_*_ibpb static branches, the mitigations behave as expected. Since 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") this discrepency caused the misreporting of IB speculation via prctl(). On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb, prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL | PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB speculation mode, even though the flag is ignored. Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not available. [ bp: Massage commit message. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
2020-11-25x86/tboot: Don't disable swiotlb when iommu is forced onLu Baolu
After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA domain"), swiotlb could also be used for direct memory access if IOMMU is enabled but a device is configured to pass through the DMA translation. Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work if "iommu=pt" kernel parameter is used. Fixes: 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA domain") Reported-and-tested-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201125014124.4070776-1-baolu.lu@linux.intel.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237 Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-24x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leakXiaochen Shen
On resource group creation via a mkdir an extra kernfs_node reference is obtained by kernfs_get() to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. Currently the extra kernfs_node reference count is only dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup structure is removed in a few other locations that lack the matching reference drop. In call paths of rmdir and umount, when a control group is removed, kernfs_remove() is called to remove the whole kernfs nodes tree of the control group (including the kernfs nodes trees of all child monitoring groups), and then rdtgroup structure is freed by kfree(). The rdtgroup structures of all child monitoring groups under the control group are freed by kfree() in free_all_child_rdtgrp(). Before calling kfree() to free the rdtgroup structures, the kernfs node of the control group itself as well as the kernfs nodes of all child monitoring groups still take the extra references which will never be dropped to 0 and the kernfs nodes will never be freed. It leads to reference count leak and kernfs_node_cache memory leak. For example, reference count leak is observed in these two cases: (1) mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 umount /sys/fs/resctrl (2) mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 rmdir /sys/fs/resctrl/c1 The same reference count leak issue also exists in the error exit paths of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon(). Fix this issue by following changes to make sure the extra kernfs_node reference on rdtgroup is dropped before freeing the rdtgroup structure. (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up kernfs_put() and kfree(). (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup structure is about to be freed by kfree(). (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error exit paths of mkdir where an extra reference is taken by kernfs_get(). Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
2020-11-24x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leakXiaochen Shen
Willem reported growing of kernfs_node_cache entries in slabtop when repeatedly creating and removing resctrl subdirectories as well as when repeatedly mounting and unmounting the resctrl filesystem. On resource group (control as well as monitoring) creation via a mkdir an extra kernfs_node reference is obtained to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. The kernfs_node reference count is dropped by kernfs_put() in rdtgroup_kn_unlock(). With the above explaining the need for one kernfs_get()/kernfs_put() pair in resctrl there are more places where a kernfs_node reference is obtained without a corresponding release. The excessive amount of reference count on kernfs nodes will never be dropped to 0 and the kernfs nodes will never be freed in the call paths of rmdir and umount. It leads to reference count leak and kernfs_node_cache memory leak. Remove the superfluous kernfs_get() calls and expand the existing comments surrounding the remaining kernfs_get()/kernfs_put() pair that remains in use. Superfluous kernfs_get() calls are removed from two areas: (1) In call paths of mount and mkdir, when kernfs nodes for "info", "mon_groups" and "mon_data" directories and sub-directories are created, the reference count of newly created kernfs node is set to 1. But after kernfs_create_dir() returns, superfluous kernfs_get() are called to take an additional reference. (2) kernfs_get() calls in rmdir call paths. Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two") Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support") Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
2020-11-22Merge tag 'perf-urgent-2020-11-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the x86 perf sysfs interfaces which used kobject attributes instead of device attributes and therefore making clang's control flow integrity checker upset" * tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: fix sysfs type mismatches
2020-11-22Merge tag 'efi-urgent-for-v5.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Borislav Petkov: "Forwarded EFI fixes from Ard Biesheuvel: - fix memory leak in efivarfs driver - fix HYP mode issue in 32-bit ARM version of the EFI stub when built in Thumb2 mode - avoid leaking EFI pgd pages on allocation failure" * tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Free efi_pgd with free_pages() efivarfs: fix memory leak in efivarfs_create() efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
2020-11-22Merge tag 'x86_urgent_for_v5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of same because the proper one is going through the IOMMU tree (Thomas Gleixner) - An Intel microcode loader fix to save the correct microcode patch to apply during resume (Chen Yu) - A fix to not access user memory of other processes when dumping opcode bytes (Thomas Gleixner) * tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account" x86/dumpstack: Do not try to access user space code of other tasks x86/microcode/intel: Check patch signature before saving microcode for early loading iommu/vt-d: Take CONFIG_PCI_ATS into account
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-20Merge tag 'for-linus-5.10b-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for avoiding WARN splats when booting a Xen guest with nosmt" * tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: don't unbind uninitialized lock_kicker_irq
2020-11-20Merge tag 'iommu-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull iommu fixes from Will Deacon: "Two straightforward vt-d fixes: - Fix boot when intel iommu initialisation fails under TXT (tboot) - Fix intel iommu compilation error when DMAR is enabled without ATS and temporarily update IOMMU MAINTAINERs entry" * tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: MAINTAINERS: Temporarily add myself to the IOMMU entry iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set iommu/vt-d: Avoid panic if iommu init fails in tboot system
2020-11-19Merge tag 'x86-urgent-2020-11-15' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/iommu/fixes Pull in x86 fixes from Thomas, as they include a change to the Intel DMAR code on which we depend: * tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iommu/vt-d: Cure VF irqdomain hickup x86/platform/uv: Fix copied UV5 output archtype x86/platform/uv: Drop last traces of uv_flush_tlb_others
2020-11-18iommu/vt-d: Avoid panic if iommu init fails in tboot systemZhenzhong Duan
"intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-18x86/dumpstack: Do not try to access user space code of other tasksThomas Gleixner
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
2020-11-17perf/x86: fix sysfs type mismatchesSami Tolvanen
This change switches rapl to use PMU_FORMAT_ATTR, and fixes two other macros to use device_attribute instead of kobj_attribute to avoid callback type mismatches that trip indirect call checking with Clang's Control-Flow Integrity (CFI). Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20201113183126.1239404-1-samitolvanen@google.com
2020-11-17x86/microcode/intel: Check patch signature before saving microcode for early ↵Chen Yu
loading Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
2020-11-17KVM: SVM: fix error return code in svm_create_vcpu()Chen Zhou
Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().Ashish Kalra
Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Fixes for ARM and x86, the latter especially for old processors without two-dimensional paging (EPT/NPT)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch KVM: x86: clflushopt should be treated as a no-op by emulation KVM: arm64: Handle SCXTNUM_ELx traps KVM: arm64: Unify trap handlers injecting an UNDEF KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
2020-11-15Merge tag 'x86-urgent-2020-11-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Cure the fallout from the MSI irqdomain overhaul which missed that the Intel IOMMU does not register virtual function devices and therefore never reaches the point where the MSI interrupt domain is assigned. This made the VF devices use the non-remapped MSI domain which is trapped by the IOMMU/remap unit - Remove an extra space in the SGI_UV architecture type procfs output for UV5 - Remove a unused function which was missed when removing the UV BAU TLB shootdown handler" * tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iommu/vt-d: Cure VF irqdomain hickup x86/platform/uv: Fix copied UV5 output archtype x86/platform/uv: Drop last traces of uv_flush_tlb_others
2020-11-15Merge tag 'perf-urgent-2020-11-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A set of fixes for perf: - A set of commits which reduce the stack usage of various perf event handling functions which allocated large data structs on stack causing stack overflows in the worst case - Use the proper mechanism for detecting soft interrupts in the recursion protection - Make the resursion protection simpler and more robust - Simplify the scheduling of event groups to make the code more robust and prepare for fixing the issues vs. scheduling of exclusive event groups - Prevent event multiplexing and rotation for exclusive event groups - Correct the perf event attribute exclusive semantics to take pinned events, e.g. the PMU watchdog, into account - Make the anythread filtering conditional for Intel's generic PMU counters as it is not longer guaranteed to be supported on newer CPUs. Check the corresponding CPUID leaf to make sure - Fixup a duplicate initialization in an array which was probably caused by the usual 'copy & paste - forgot to edit' mishap" * tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix Add BW copypasta perf/x86/intel: Make anythread filter support conditional perf: Tweak perf_event_attr::exclusive semantics perf: Fix event multiplexing for exclusive groups perf: Simplify group_sched_in() perf: Simplify group_sched_out() perf/x86: Make dummy_iregs static perf/arch: Remove perf_sample_data::regs_user_copy perf: Optimize get_recursion_context() perf: Fix get_recursion_context() perf/x86: Reduce stack usage for x86_pmu::drain_pebs() perf: Reduce stack usage of perf_output_begin()
2020-11-15kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in usePaolo Bonzini
In some cases where shadow paging is in use, the root page will be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will not have an associated struct kvm_mmu_page, because it is allocated with alloc_page instead of kvm_mmu_alloc_page. Just return false quickly from is_tdp_mmu_root if the TDP MMU is not in use, which also includes the case where shadow paging is enabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guestsBabu Moger
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_archBabu Moger
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: x86: clflushopt should be treated as a no-op by emulationDavid Edmondson
The instruction emulator ignores clflush instructions, yet fails to support clflushopt. Treat both similarly. Fixes: 13e457e0eebf ("KVM: x86: Emulator does not decode clflush well") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13x86/platform/uv: Fix copied UV5 output archtypeMike Travis
A test shows that the output contains a space: # cat /proc/sgi_uv/archtype NSGI4 U/UVX Remove that embedded space by copying the "trimmed" buffer instead of the untrimmed input character list. Use sizeof to remove size dependency on copy out length. Increase output buffer size by one character just in case BIOS sends an 8 character string for archtype. Fixes: 1e61f5a95f19 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
2020-11-11x86/platform/uv: Drop last traces of uv_flush_tlb_othersJiri Slaby
Commit 39297dde7390 ("x86/platform/uv: Remove UV BAU TLB Shootdown Handler") removed uv_flush_tlb_others. Its declaration was removed also from asm/uv/uv.h. But only for the CONFIG_X86_UV=y case. The inline definition (!X86_UV case) is still in place. So remove this implementation with everything what was added to support uv_flush_tlb_others: * include of asm/tlbflush.h * forward declarations of struct cpumask, mm_struct, and flush_tlb_info Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <mike.travis@hpe.com> Acked-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20201109093653.2042-1-jslaby@suse.cz
2020-11-10efi/x86: Free efi_pgd with free_pages()Arvind Sankar
Commit d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in the error path it should be freed using free_pages() rather than free_page(). Commit 06ace26f4e6f ("x86/efi: Free efi_pgd with free_pages()") fixed one instance of this, but missed another. Move the freeing out-of-line to avoid code duplication and fix this bug. Fixes: d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.edu Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-10perf/x86/intel/uncore: Fix Add BW copypastaArnd Bergmann
gcc -Wextra points out a duplicate initialization of one array member: arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init] 478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, The only sensible explanation is that a duplicate 'READS' was used instead of the correct 'WRITES', so change it back. Fixes: 24633d901ea4 ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org
2020-11-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - fix compilation error when PMD and PUD are folded - fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1 - add aarch64 get-reg-list test x86: - fix semantic conflict between two series merged for 5.10 - fix (and test) enforcement of paravirtual cpuid features selftests: - various cleanups to memory management selftests - new selftests testcase for performance of dirty logging" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) KVM: selftests: allow two iterations of dirty_log_perf_test KVM: selftests: Introduce the dirty log perf test KVM: selftests: Make the number of vcpus global KVM: selftests: Make the per vcpu memory size global KVM: selftests: Drop pointless vm_create wrapper KVM: selftests: Add wrfract to common guest code KVM: selftests: Simplify demand_paging_test with timespec_diff_now KVM: selftests: Remove address rounding in guest code KVM: selftests: Factor code out of demand_paging_test KVM: selftests: Use a single binary for dirty/clear log test KVM: selftests: Always clear dirty bitmap after iteration KVM: selftests: Add blessed SVE registers to get-reg-list KVM: selftests: Add aarch64 get-reg-list test selftests: kvm: test enforcement of paravirtual cpuid features selftests: kvm: Add exception handling to selftests selftests: kvm: Clear uc so UCALL_NONE is being properly reported selftests: kvm: Fix the segment descriptor layout to match the actual layout KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs kvm: x86: request masterclock update any time guest uses different msr kvm: x86: ensure pv_cpuid.features is initialized when enabling cap ...
2020-11-09perf/x86/intel: Make anythread filter support conditionalStephane Eranian
Starting with Arch Perfmon v5, the anythread filter on generic counters may be deprecated. The current kernel was exporting the any filter without checking. On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf function to determine if anythread is supported or not as described in the Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
2020-11-09perf/x86: Make dummy_iregs staticPeter Zijlstra
Having pt_regs on-stack is unfortunate, it's 168 bytes. Since it isn't actually used, make it a static variable. This both gets if off the stack and ensures it gets 0 initialized, just in case someone does look at it. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.324273677@infradead.org
2020-11-09perf/arch: Remove perf_sample_data::regs_user_copyPeter Zijlstra
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09perf/x86: Reduce stack usage for x86_pmu::drain_pebs()Peter Zijlstra
intel_pmu_drain_pebs_*() is typically called from handle_pmi_common(), both have an on-stack struct perf_sample_data, which is *big*. Rewire things so that drain_pebs() can use the one handle_pmi_common() has. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.054099690@infradead.org
2020-11-09perf: Reduce stack usage of perf_output_begin()Peter Zijlstra
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-11-09x86/xen: don't unbind uninitialized lock_kicker_irqBrian Masney
When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs: WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU. Signed-off-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-11-08Merge tag 'x86-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs integrated assembler upset - Correct the mitigation selection logic which prevented the related prctl to work correctly - Make the UV5 hubless system work correctly by fixing up the malformed table entries and adding the missing ones" * tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Recognize UV5 hubless system identifier x86/platform/uv: Remove spaces from OEM IDs x86/platform/uv: Fix missing OEM_TABLE_ID x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
2020-11-08KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrsPankaj Gupta
Windows2016 guest tries to enable LBR by setting the corresponding bits in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and spams the host kernel logs with error messages like: kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop" This patch fixes this by enabling error logging only with 'report_ignored_msrs=1'. Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: request masterclock update any time guest uses different msrOliver Upton
Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) subtly changed the behavior of guest writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update the masterclock any time the guest uses a different msr than before. Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: ensure pv_cpuid.features is initialized when enabling capOliver Upton
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl() ordering by updating pv_cpuid.features whenever userspace requests the capability. Extract this update out of kvm_update_cpuid_runtime() into a new helper function and move its other call site into kvm_vcpu_after_set_cpuid() where it more likely belongs. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: reads of restricted pv msrs should also result in #GPOliver Upton
commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") only protects against disallowed guest writes to KVM paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing KVM_CPUID_FEATURES for msr reads as well. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: x86: use positive error values for msr emulation that causes #GPMaxim Levitsky
Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>