summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-12-19x86/mtrr: Don't copy uninitialized gentry fields back to userspaceColin Ian King
Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
2018-12-18kvm: x86: Add AMD's EX_CFG to the list of ignored MSRsEduardo Habkost
Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18KVM: X86: Fix NULL deref in vcpu_scan_ioapicWanpeng Li
Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18KVM: Fix UAF in nested posted interrupt processingCfir Cohen
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: 5e2f30b756a37 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18x86/fsgsbase/64: Fix the base write helper functionsChang S. Bae
Andy spotted a regression in the fs/gs base helpers after the patch series was committed. The helper functions which write fs/gs base are not just writing the base, they are also changing the index. That's wrong and needs to be separated because writing the base has not to modify the index. While the regression is not causing any harm right now because the only caller depends on that behaviour, it's a guarantee for subtle breakage down the road. Make the index explicitly changed from the caller, instead of including the code in the helpers. Subsequently, the task write helpers do not handle for the current task anymore. The range check for a base value is also factored out, to minimize code redundancy from the caller. Fixes: b1378a561fd1 ("x86/fsgsbase/64: Introduce FS/GS base helper functions") Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
2018-12-18x86/speculation: Add support for STIBP always-on preferred modeThomas Lendacky
Different AMD processors may have different implementations of STIBP. When STIBP is conditionally enabled, some implementations would benefit from having STIBP always on instead of toggling the STIBP bit through MSR writes. This preference is advertised through a CPUID feature bit. When conditional STIBP support is requested at boot and the CPU advertises STIBP always-on mode as preferred, switch to STIBP "on" support. To show that this transition has occurred, create a new spectre_v2_user_mitigation value and a new spectre_v2_user_strings message. The new mitigation value is used in spectre_v2_user_select_mitigation() to print the new mitigation message as well as to return a new string from stibp_state(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
2018-12-18x86/topology: Use total_cpus for max logical packages calculationHui Wang
nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the logical package management because it results in a smaller number of packages while in kdump kernel. Check below case: There is a two sockets system, each socket has 8 cores, which has 16 logical cpus while HT was turn on. 0 1 2 3 4 5 6 7 | 16 17 18 19 20 21 22 23 cores on socket 0 threads on socket 0 8 9 10 11 12 13 14 15 | 24 25 26 27 28 29 30 31 cores on socket 1 threads on socket 1 While starting the kdump kernel with command line option nr_cpus=16 panic was triggered on one of the cpus 24-31 eg. 26, then online cpu will be 1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and __max_logical_packages will be 1, but actually two packages were booted on. This issue can reproduced by set kdump option nr_cpus=<real physical core numbers>, and then trigger panic on last socket's thread, for example: taskset -c 26 echo c > /proc/sysrq-trigger Use total_cpus which will not be limited by nr_cpus command line to calculate the value of __max_logical_packages. Signed-off-by: Hui Wang <john.wanghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <guijianfeng@huawei.com> Cc: <wencongyang2@huawei.com> Cc: <douliyang1@huawei.com> Cc: <qiaonuohan@huawei.com> Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com
2018-12-18x86/mm/dump_pagetables: Use DEFINE_SHOW_ATTRIBUTE()Yangtao Li
Use DEFINE_SHOW_ATTRIBUTE() instead of open coding it. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: keescook@chromium.org Cc: luto@kernel.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20181119154334.18265-1-tiny.windzz@gmail.com
2018-12-17Merge branch 'next-integrity' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next-integrity From Mimi: In Linux 4.19, a new LSM hook named security_kernel_load_data was upstreamed, allowing LSMs and IMA to prevent the kexec_load syscall.  Different signature verification methods exist for verifying the kexec'ed kernel image.  This pull request adds additional support in IMA to prevent loading unsigned kernel images via the kexec_load syscall, independently of the IMA policy rules, based on the runtime "secure boot" flag.  An initial IMA kselftest is included. In addition, this pull request defines a new, separate keyring named ".platform" for storing the preboot/firmware keys needed for verifying the kexec'ed kernel image's signature and includes the associated IMA kexec usage of the ".platform" keyring. (David Howell's and Josh Boyer's patches for reading the preboot/firmware keys, which were previously posted for a different use case scenario, are included here.)
2018-12-17x86/mm/cpa: Rename @addrinarray to @numpagesPeter Zijlstra
The CPA_ARRAY interface works in single pages, and everything, except in these 'few' locations is this variable called 'numpages'. Remove this 'addrinarray' abberation and use 'numpages' consistently. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.695039210@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Better use CLFLUSHOPTPeter Zijlstra
Currently we issue an MFENCE before and after flushing a range. This means that if we flush a bunch of single page ranges -- like with the cpa array, we issue a whole bunch of superfluous MFENCEs. Reorgainze the code a little to avoid this. [ mingo: capitalize instructions, tweak changelog and comments. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.626999883@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single ↵Peter Zijlstra
cpa_flush() function Note that the cache flush loop in cpa_flush_*() is identical when we use __cpa_addr(); further observe that flush_tlb_kernel_range() is a special case of to the cpa_flush_array() TLB invalidation code. This then means the two functions are virtually identical. Fold these two functions into a single cpa_flush() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.559855600@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Make cpa_data::numpages invariantPeter Zijlstra
Make sure __change_page_attr_set_clr() doesn't modify cpa->numpages. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.493000228@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Optimize cpa_flush_array() TLB invalidationPeter Zijlstra
Instead of punting and doing tlb_flush_all(), do the same as flush_tlb_kernel_range() does and use single page invalidations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.430001980@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Simplify the code after making cpa->vaddr invariantPeter Zijlstra
Since cpa->vaddr is invariant, this means we can remove all workarounds that deal with it changing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.366619025@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Make cpa_data::vaddr invariantPeter Zijlstra
Currently __change_page_attr_set_clr() will modify cpa->vaddr when !(CPA_ARRAY | CPA_PAGES_ARRAY), whereas in the array cases it will increment cpa->curpage. Change __cpa_addr() such that its @idx argument also works in the !array case and use cpa->curpage increments for all cases. NOTE: since cpa_data::numpages is 'unsigned long' so should cpa_data::curpage be. NOTE: after this only cpa->numpages is still modified. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.295174892@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Add __cpa_addr() helperPeter Zijlstra
The code to compute the virtual address of a cpa_data is duplicated; introduce a helper before more copies happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.229119497@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftestsPeter Zijlstra
The current pageattr-test code only uses the regular range interface, add code that also tests the array and pages interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.162771364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17Merge branch 'x86/urgent' into x86/mm, to pick up dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17x86/mm/cpa: Fix cpa_flush_array() TLB invalidationPeter Zijlstra
In commit: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()") I misread the CAP array code and incorrectly used tlb_flush_kernel_range(), resulting in missing TLB flushes and consequent failures. Instead do a full invalidate in this case -- for now. Reported-by: StDenis, Tom <Tom.StDenis@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()") Link: http://lkml.kernel.org/r/20181203171043.089868285@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17kprobes/x86: Remove unneeded arch_within_kprobe_blacklist from x86Masami Hiramatsu
Remove x86 specific arch_within_kprobe_blacklist(). Since we have already added all blacklisted symbols to the kprobe blacklist by arch_populate_kprobe_blacklist(), we don't need arch_within_kprobe_blacklist() on x86 anymore. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503491354.26176.13903264647254766066.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17kprobes/x86: Show x86-64 specific blacklisted symbols correctlyMasami Hiramatsu
Show x86-64 specific blacklisted symbols in debugfs. Since x86-64 prohibits probing on symbols which are in entry text, those should be shown. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17kprobes/x86/xen: blacklist non-attachable xen interrupt functionsAndrea Righi
Blacklist symbols in Xen probe-prohibited areas, so that user can see these prohibited symbols in debugfs. See also: a50480cb6d61. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-15x86/vdso: Pass --eh-frame-hdr to the linkerAlistair Strachan
Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. See relevant bug reports for additional info: https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: Carlos O'Donell <carlos@redhat.com> Reported-by: "H. J. Lu" <hjl.tools@gmail.com> Signed-off-by: Alistair Strachan <astrachan@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: kernel-team@android.com Cc: Laura Abbott <labbott@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com
2018-12-14kvm: x86: Dynamically allocate guest_fpuMarc Orr
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: x86: Use task structs fpu field for userMarc Orr
Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu field to save/restore fpu-related architectural state, which will differ from kvm's fpu state. However, this is redundant to the 'struct fpu' field, called fpu, embedded in the task struct, via the thread field. Thus, this patch removes the user_fpu field from the kvm_vcpu_arch struct and replaces it with the task struct's fpu field. This change is significant because the fpu struct is actually quite large. For example, on the system used to develop this patch, this change reduces the size of the vcpu_vmx struct from 23680 bytes down to 19520 bytes, when building the kernel with kvmconfig. This reduction in the size of the vcpu_vmx struct moves us closer to being able to allocate the struct at order 2, rather than order 3. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Move the checks for Guest Non-Register States to a separate ↵Krish Sadhukhan
helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Move the checks for Host Control Registers and MSRs to a separate ↵Krish Sadhukhan
helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Move the checks for VM-Entry Control Fields to a separate helper ↵Krish Sadhukhan
function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Move the checks for VM-Exit Control Fields to a separate helper ↵Krish Sadhukhan
function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Remove param indirection from nested_vmx_check_msr_switch()Sean Christopherson
Passing the enum and doing an indirect lookup is silly when we can simply pass the field directly. Remove the "fast path" code in nested_vmx_check_msr_switch_controls() as it's now nothing more than a redundant check. Remove the debug message rather than continue passing the enum for the address field. Having debug messages for the MSRs themselves is useful as MSR legality is a huge space, whereas messing up a physical address means the VMM is fundamentally broken. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate ↵Krish Sadhukhan
helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: nVMX: Prepend "nested_vmx_" to check_vmentry_{pre,post}reqs()Krish Sadhukhan
.. as they are used only in nested vmx context. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM/VMX: Check ept_pointer before flushing ept tlbLan Tianyu
This patch is to initialize ept_pointer to INVALID_PAGE and check it before flushing ept tlb. If ept_pointer is invalid, bypass the flush request. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM nVMX: MSRs should not be stored if VM-entry fails during or after ↵Krish Sadhukhan
loading guest state According to section "VM-entry Failures During or After Loading Guest State" in Intel SDM vol 3C, "No MSRs are saved into the VM-exit MSR-store area." when bit 31 of the exit reason is set. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: x86: Don't modify MSR_PLATFORM_INFO on vCPU resetJim Mattson
If userspace has provided a different value for this MSR (e.g with the turbo bits set), the userspace-provided value should survive a vCPU reset. For backwards compatibility, MSR_PLATFORM_INFO is initialized in kvm_arch_vcpu_setup. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Drew Schmitt <dasch@google.com> Cc: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: vmx: add cpu into VMX preemption timer bug listWei Huang
This patch adds Intel "Xeon CPU E3-1220 V2", with CPUID.01H.EAX=0x000306A8, into the list of known broken CPUs which fail to support VMX preemption timer. This bug was found while running the APIC timer test of kvm-unit-test on this specific CPU, even though the errata info can't be located in the public domain for this CPU. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: x86: Report STIBP on GET_SUPPORTED_CPUIDEduardo Habkost
Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL to the guest, which makes STIBP available to guests. This was implemented by commits d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") and b2ac58f90540 ("KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL"). However, we never updated GET_SUPPORTED_CPUID to let userspace know that STIBP can be enabled in CPUID. Fix that by updating kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Stop caring about EOI for direct stimersVitaly Kuznetsov
Turns out we over-engineered Direct Mode for stimers a bit: unlike traditional stimers where we may want to try to re-inject the message upon EOI, Direct Mode stimers just set the irq in APIC and kvm_apic_set_irq() fails only when APIC is disabled (see APIC_DM_FIXED case in __apic_accept_irq()). Remove the redundant part. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: avoid open-coding stimer_mark_pending() in ↵Vitaly Kuznetsov
kvm_hv_notify_acked_sint() stimers_pending optimization only helps us to avoid multiple kvm_make_request() calls. This doesn't happen very often and these calls are very cheap in the first place, remove open-coded version of stimer_mark_pending() from kvm_hv_notify_acked_sint(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: direct mode for synthetic timersVitaly Kuznetsov
Turns out Hyper-V on KVM (as of 2016) will only use synthetic timers if direct mode is available. With direct mode we notify the guest by asserting APIC irq instead of sending a SynIC message. The implementation uses existing vec_bitmap for letting lapic code know that we're interested in the particular IRQ's EOI request. We assume that the same APIC irq won't be used by the guest for both direct mode stimer and as sint source (especially with AutoEOI semantics). It is unclear how things should be handled if that's not true. Direct mode is also somewhat less expensive; in my testing stimer_send_msg() takes not less than 1500 cpu cycles and stimer_notify_direct() can usually be done in 300-400. WS2016 without Hyper-V, however, always sticks to non-direct version. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: use stimer config definition from hyperv-tlfs.hVitaly Kuznetsov
As a preparation to implementing Direct Mode for Hyper-V synthetic timers switch to using stimer config definition from hyperv-tlfs.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: move synic/stimer control structures definitions to hyperv-tlfs.hVitaly Kuznetsov
We implement Hyper-V SynIC and synthetic timers in KVM too so there's some room for code sharing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUIDVitaly Kuznetsov
With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helperVitaly Kuznetsov
The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Drop HV_X64_CONFIGURE_PROFILER definitionVitaly Kuznetsov
BIT(13) in HYPERV_CPUID_FEATURES.EBX is described as "ConfigureProfiler" in TLFS v4.0 but starting 5.0 it is replaced with 'Reserved'. As we don't currently us it in kernel it can just be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Do some housekeeping in hyperv-tlfs.hVitaly Kuznetsov
hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted, it's hard to get which CPUID they belong to, some items are duplicated (e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY). Do some housekeeping work. While on it, replace all (1 << X) with BIT(X) macro. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Mark TLFS structures packedVitaly Kuznetsov
The TLFS structures are used for hypervisor-guest communication and must exactly meet the specification. Compilers can add alignment padding to structures or reorder struct members for randomization and optimization, which would break the hypervisor ABI. Mark the structures as packed to prevent this. 'struct hv_vp_assist_page' and 'struct hv_enlightened_vmcs' need to be properly padded to support the change. Suggested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86: kvm: hyperv: don't retry message delivery for periodic timersRoman Kagan
The SynIC message delivery protocol allows the message originator to request, should the message slot be busy, to be notified when it's free. However, this is unnecessary and even undesirable for messages generated by SynIC timers in periodic mode: if the period is short enough compared to the time the guest spends in the timer interrupt handler, so the timer ticks start piling up, the excessive interactions due to this notification and retried message delivery only makes the things worse. [This was observed, in particular, with Windows L2 guests setting (temporarily) the periodic timer to 2 kHz, and spending hundreds of microseconds in the timer interrupt handler due to several L2->L1 exits; under some load in L0 this could exceed 500 us so the timer ticks started to pile up and the guest livelocked.] Relieve the situation somewhat by not retrying message delivery for periodic SynIC timers. This appears to remain within the "lazy" lost ticks policy for SynIC timers as implemented in KVM. Note that it doesn't solve the fundamental problem of livelocking the guest with a periodic timer whose period is smaller than the time needed to process a tick, but it makes it a bit less likely to be triggered. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86: kvm: hyperv: simplify SynIC message deliveryRoman Kagan
SynIC message delivery is somewhat overengineered: it pretends to follow the ordering rules when grabbing the message slot, using atomic operations and all that, but does it incorrectly and unnecessarily. The correct order would be to first set .msg_pending, then atomically replace .message_type if it was zero, and then clear .msg_pending if the previous step was successful. But this all is done in vcpu context so the whole update looks atomic to the guest (it's assumed to only access the message page from this cpu), and therefore can be done in whatever order is most convenient (and is also the reason why the incorrect order didn't trigger any bugs so far). While at this, also switch to kvm_vcpu_{read,write}_guest_page, and drop the no longer needed synic_clear_sint_msg_pending. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>