summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-04-10x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMIBorislav Petkov
Apparently, some machines used to report DRAM errors through a PCI SERR NMI. This is why we have a call into EDAC in the NMI handler. See c0d121720220 ("drivers/edac: add new nmi rescan"). From looking at the patch above, that's two drivers: e752x_edac.c and e7xxx_edac.c. Now, I wanna say those are old machines which are probably decommissioned already. Tony says that "[t]the newest CPU supported by either of those drivers is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some folks are still using these ... but people that hold onto h/w for 7 years generally cling to old s/w too ... so I'd guess it unlikely that we will get complaints for breaking these in upstream." So even if there is a small number still in use, we did load EDAC with edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact) which means a default EDAC setup without any parameters supplied on the command line or otherwise would never even log the error in the NMI handler because we're polling by default: inline int edac_handler_set(void) { if (edac_op_state == EDAC_OPSTATE_POLL) return 0; return atomic_read(&edac_handlers); } So, long story short, I'd like to get rid of that nastiness called edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If we ever have to do stuff like that again, it should be notifiers we're using and not some insanity like this one. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com>
2017-04-08Drivers: hv: Issue explicit EOI when autoeoi is not enabledK. Y. Srinivasan
When auto EOI is not enabled; issue an explicit EOI for hyper-v interrupts. Fixes: 6c248aad81c8 ("Drivers: hv: Base autoeoi enablement based on hypervisor hints") Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code PPC: - Check for a NULL return from kzalloc s390: - Prevent translation exception errors on valid page tables for the instruction-exection-protection support x86: - Fix Page-Modification Logging when running a nested guest" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl KVM: nVMX: initialize PML fields in vmcs02 KVM: nVMX: do not leak PML full vmexit to L1 KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI KVM: arm64: Ensure LRs are clear when they should be kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd KVM: s390: remove change-recording override support arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
2017-04-08Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"Thomas Gleixner
This reverts commit 474aeffd88b87746a75583f356183d5c6caa4213 due to testing failures. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20170406124459.dwn5zhpr2xqg3lqm@node.shutemov.name
2017-04-07kvm: nVMX: Disallow userspace-injected exceptions in guest modeJim Mattson
The userspace exception injection API and code path are entirely unprepared for exceptions that might cause a VM-exit from L2 to L1, so the best course of action may be to simply disallow this for now. 1. The API provides no mechanism for userspace to specify the new DR6 bits for a #DB exception or the new CR2 value for a #PF exception. Presumably, userspace is expected to modify these registers directly with KVM_SET_SREGS before the next KVM_RUN ioctl. However, in the event that L1 intercepts the exception, these registers should not be changed. Instead, the new values should be provided in the exit_qualification field of vmcs12 (Intel SDM vol 3, section 27.1). 2. In the case of a userspace-injected #DB, inject_pending_event() clears DR7.GD before calling vmx_queue_exception(). However, in the event that L1 intercepts the exception, this is too early, because DR7.GD should not be modified by a #DB that causes a VM-exit directly (Intel SDM vol 3, section 27.1). 3. If the injected exception is a #PF, nested_vmx_check_exception() doesn't properly check whether or not L1 is interested in the associated error code (using the #PF error code mask and match fields from vmcs12). It may either return 0 when it should call nested_vmx_vmexit() or vice versa. 4. nested_vmx_check_exception() assumes that it is dealing with a hardware-generated exception intercept from L2, with some of the relevant details (the VM-exit interruption-information and the exit qualification) live in vmcs02. For userspace-injected exceptions, this is not the case. 5. prepare_vmcs12() assumes that when its exit_intr_info argument specifies valid information with a valid error code that it can VMREAD the VM-exit interruption error code from vmcs02. For userspace-injected exceptions, this is not the case. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07KVM: x86: fix user triggerable warning in kvm_apic_accept_events()David Hildenbrand
If we already entered/are about to enter SMM, don't allow switching to INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events() will report a warning. Same applies if we are already in MP state INIT_RECEIVED and SMM is requested to be turned on. Refuse to set the VCPU events in this case. Fixes: cd7764fe9f73 ("KVM: x86: latch INITs while in system management mode") Cc: stable@vger.kernel.org # 4.2+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET publicPaolo Bonzini
Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: make KVM_CAP_COALESCED_MMIO architecture agnosticPaolo Bonzini
Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07KVM: nVMX: support RDRAND and RDSEED exitingPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07KVM: VMX: add missing exit reasonsPaolo Bonzini
In order to simplify adding exit reasons in the future, the array of exit reason names is now also sorted by exit reason code. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: nVMX: support EPT accessed/dirty bitsPaolo Bonzini
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another thing to change is that, when EPT accessed and dirty bits are not in use, VMX treats accesses to guest paging structures as data reads. When they are in use (bit 6 of EPTP is set), they are treated as writes and the corresponding EPT dirty bit is set. The MMU didn't know this detail, so this patch adds it. We also have to fix up the exit qualification. It may be wrong because KVM sets bit 6 but the guest might not. L1 emulates EPT A/D bits using write permissions, so in principle it may be possible for EPT A/D bits to be used by L1 even though not available in hardware. The problem is that guest page-table walks will be treated as reads rather than writes, so they would not cause an EPT violation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed typo in walk_addr_generic() comment and changed bit clear + conditional-set pattern in handle_ept_violation() to conditional-clear] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: x86: MMU support for EPT accessed/dirty bitsPaolo Bonzini
This prepares the MMU paging code for EPT accessed and dirty bits, which can be enabled optionally at runtime. Code that updates the accessed and dirty bits will need a pointer to the struct kvm_mmu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07KVM: VMX: remove bogus check for invalid EPT violationPaolo Bonzini
handle_ept_violation is checking for "guest-linear-address invalid" + "not a paging-structure walk". However, _all_ EPT violations without a valid guest linear address are paging structure walks, because those EPT violations happen when loading the guest PDPTEs. Therefore, the check can never be true, and even if it were, KVM doesn't care about the guest linear address; it only uses the guest *physical* address VMCS field. So, remove the check altogether. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07KVM: nVMX: we support 1GB EPT pagesPaolo Bonzini
Large pages at the PDPE level can be emulated by the MMU, so the bit can be set unconditionally in the EPT capabilities MSR. The same is true of 2MB EPT pages, though all Intel processors with EPT in practice support those. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07KVM: x86: drop legacy device assignmentPaolo Bonzini
Legacy device assignment has been deprecated since 4.2 (released 1.5 years ago). VFIO is better and everyone should have switched to it. If they haven't, this should convince them. :) Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07KVM: VMX: require virtual NMI supportPaolo Bonzini
Virtual NMIs are only missing in Prescott and Yonah chips. Both are obsolete for virtualization usage---Yonah is 32-bit only even---so drop vNMI emulation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07kvm/svm: Setup MCG_CAP on AMD properlyBorislav Petkov
MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this MSR returns 0x100010a. More specifically, bit 24 is set, which is simply wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean up the reserved bits in order not to confuse guests. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07KVM: nVMX: single function for switching between vmcsDavid Hildenbrand
Let's combine it in a single function vmx_switch_vmcs(). Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: vmx: Don't use INVVPID when EPT is enabledJim Mattson
According to the Intel SDM, volume 3, section 28.3.2: Creating and Using Cached Translation Information, "No linear mappings are used while EPT is in use." INVEPT will invalidate both the guest-physical mappings and the combined mappings in the TLBs and paging-structure caches, so an INVVPID is superfluous. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-06Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64Al Viro
backmerge of mainline ia64 fix
2017-04-06Merge commit 'fc69910f329d' into uaccess.mipsAl Viro
backmerge of a build fix from mainline
2017-04-05x86/intel_rdt: Update schemata read to show data in tabular formatVikas Shivappa
The schemata file displays data from different resources on all domains. Its cumbersome to read since they are not tabular and data/names could be of different widths. Make the schemata file to display data in a tabular format thereby making it nice and simple to read. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: h.peter.anvin@intel.com Link: http://lkml.kernel.org/r/1491255857-17213-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05x86/intel_rdt: Implement "update" mode when writing schemata fileTony Luck
The schemata file can have multiple lines and it is cumbersome to update all lines. Remove code that requires that the user provides values for every resource (in the right order). If the user provides values for just a few resources, update them and leave the rest unchanged. Side benefit: we now check which values were updated and only send IPIs to cpus that actually have updates. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: h.peter.anvin@intel.com Link: http://lkml.kernel.org/r/1491255857-17213-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05crypto: glue_helper - remove the le128_gf128mul_x_ble functionOndrej Mosnáček
The le128_gf128mul_x_ble function in glue_helper.h is now obsolete and can be replaced with the gf128mul_x_ble function from gf128mul.h. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Reviewd-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05crypto: gf128mul - switch gf128mul_x_ble to le128Ondrej Mosnáček
Currently, gf128mul_x_ble works with pointers to be128, even though it actually interprets the words as little-endian. Consequently, it uses cpu_to_le64/le64_to_cpu on fields of type __be64, which is incorrect. This patch fixes that by changing the function to accept pointers to le128 and updating all users accordingly. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Reviewd-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05x86/efi: Clean up a minor mistake in commentBaoquan He
EFI allocates runtime services regions from EFI_VA_START, -4G, down to -68G, EFI_VA_END - 64G altogether, top-down. The mechanism was introduced in commit: d2f7cbe7b26a7 ("x86/efi: Runtime services virtual mapping") Fix the comment that still says bottom-up. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05efi/bgrt: Enable ACPI BGRT handling on arm64Bhupesh Sharma
Now that the ACPI BGRT handling code has been made generic, we can enable it for arm64. Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> [ Updated commit log to reflect that BGRT is only enabled for arm64, and added missing 'return' statement to the dummy acpi_parse_bgrt() function. ] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170404160245.27812-8-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05x86/efi/bgrt: Move efi-bgrt handling out of arch/x86Bhupesh Sharma
Now with open-source boot firmware (EDK2) supporting ACPI BGRT table addition even for architectures like AARCH64, it makes sense to move out the 'efi-bgrt.c' file and supporting infrastructure from 'arch/x86' directory and house it inside 'drivers/firmware/efi', so that this common code can be used across architectures. Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170404160245.27812-7-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05x86/signals: Fix lower/upper bound reporting in compat siginfoJoerg Roedel
Put the right values from the original siginfo into the userspace compat-siginfo. This fixes the 32-bit MPX "tabletest" testcase on 64-bit kernels. Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.8+ Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: a4455082dc6f0 ('x86/signals: Add missing signal_compat code for x86 features') Link: http://lkml.kernel.org/r/1491322501-5054-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04usercopy: Move enum for arch_within_stack_frames()Sahara
This patch moves the arch_within_stack_frames() return value enum up in the header files so that per-architecture implementations can reuse the same return values. Signed-off-by: Sahara <keun-o.park@darkmatter.ae> Signed-off-by: James Morse <james.morse@arm.com> [kees: adjusted naming and commit log] Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-04Annotate hardware config module parameters in arch/x86/mm/David Howells
When the kernel is running in secure boot mode, we lock down the kernel to prevent userspace from modifying the running kernel image. Whilst this includes prohibiting access to things like /dev/mem, it must also prevent access by means of configuring driver modules in such a way as to cause a device to access or modify the kernel image. To this end, annotate module_param* statements that refer to hardware configuration and indicate for future reference what type of parameter they specify. The parameter parser in the core sees this information and can skip such parameters with an error message if the kernel is locked down. The module initialisation then runs as normal, but just sees whatever the default values for those parameters is. Note that we do still need to do the module initialisation because some drivers have viable defaults set in case parameters aren't specified and some drivers support automatic configuration (e.g. PNP or PCI) in addition to manually coded parameters. This patch annotates drivers in arch/x86/mm/. [Note: With respect to testmmiotrace, an additional patch will be added separately that makes the module refuse to load if the kernel is locked down.] Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> cc: Ingo Molnar <mingo@kernel.org> cc: Thomas Gleixner <tglx@linutronix.de> cc: "H. Peter Anvin" <hpa@zytor.com> cc: x86@kernel.org cc: linux-kernel@vger.kernel.org cc: nouveau@lists.freedesktop.org
2017-04-04KVM: nVMX: initialize PML fields in vmcs02Ladi Prosek
L2 was running with uninitialized PML fields which led to incomplete dirty bitmap logging. This manifested as all kinds of subtle erratic behavior of the nested guest. Fixes: 843e4330573c ("KVM: VMX: Add PML support in VMX") Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-04KVM: nVMX: do not leak PML full vmexit to L1Ladi Prosek
The PML feature is not exposed to guests so we should not be forwarding the vmexit either. This commit fixes BSOD 0x20001 (HYPERVISOR_ERROR) when running Hyper-V enabled Windows Server 2016 in L1 on hardware that supports PML. Fixes: 843e4330573c ("KVM: VMX: Add PML support in VMX") Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-04x86/espfix: Add support for 5-level pagingKirill A. Shutemov
We don't need extra virtual address space for ESPFIX, so it stays within one PUD page table for both 4- and 5-level paging. Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would make it stay in the same place regarding of paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/kasan: Extend KASAN to support 5-level pagingKirill A. Shutemov
This patch bring support for a non-folded additional page table level. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=yKirill A. Shutemov
Extends pagetable headers to support the new paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/paravirt: Add 5-level support to the paravirt codeKirill A. Shutemov
Add operations to allocate/release p4ds. Xen requires more work. We will need to come back to it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/mm: Define virtual memory map for 5-level pagingKirill A. Shutemov
The first part of memory map (up to %esp fixup) simply scales existing map for 4-level paging by factor of 9 -- number of bits addressed by the additional page table level. The rest of the map is unchanged. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assertKirill A. Shutemov
We don't need the assert anymore, as: 17be0aec74fb ("x86/asm/entry/64: Implement better check for canonical addresses") made canonical address checks generic wrt. address width. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04x86/boot: Detect 5-level paging supportKirill A. Shutemov
In this initial implementation we force-require 5-level paging support from the hardware, when compiled with CONFIG_X86_5LEVEL=y. (The kernel will panic during boot on CPUs that don't support 5-level paging.) We will implement boot-time switch between 4- and 5-level paging later. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "Prevent dmesg from being spammed when MCE logging is active" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Don't print MCEs when mcelog is active
2017-04-03Merge tag 'v4.11-rc5' into x86/mm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03x86/mm/numa: Remove numa_nodemask_from_meminfo()Wei Yang
numa_nodemask_from_meminfo() generates a nodemask of nodes which have memory according to a meminfo descriptor. The two callsites of that function both set bits in copies of the numa_nodes_parsed nodemask. In both cases, the information in supplied numa_meminfo is a subset of numa_nodes_parsed. So setting those bits again is not really necessary. Here are the three call paths which show that the supplied numa_meminfo argument describes memory regions in nodes which are already in numa_nodes_parsed: x86_numa_init() numa_init() Case 1: acpi_numa_init() acpi_parse_memory_affinity() numa_add_memblk() node_set(numa_nodes_parsed) acpi_parse_slit() acpi_numa_slit_init() numa_set_distance() numa_alloc_distance() numa_nodemask_from_meminfo() Case 2: amd_numa_init() numa_add_memblk() node_set(numa_nodes_parsed) Case 3 dummy_numa_init() node_set(numa_nodes_parsed) numa_add_memblk() numa_register_memblks() numa_nodemask_from_meminfo() Thus, in all three cases, the respective bit in numa_nodes_parsed is set, which means it is not necessary to set it again in a copy of numa_nodes_parsed. So remove that function. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20170314030801.13656-2-richard.weiyang@gmail.com [ Heavily massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-03x86/mm/numa: Improve alloc_node_data() error path messageWei Yang
alloc_node_data() tries to allocate from the local node first and, if that attempt fails, falls back to any node. Improve the error message to issue the initial node for ease during debugging. Fix a typo in the comments, while at it. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Link: http://lkml.kernel.org/r/20170314030801.13656-1-richard.weiyang@gmail.com [ Masssage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-03debug: Fix __bug_table[] in arch linker scriptsPeter Zijlstra
The kbuild test robot reported this build failure on a number of architectures: > make.cross ARCH=arm > lib/lib.a(bug.o): In function `find_bug': > >> lib/bug.c:135: undefined reference to `__start___bug_table' > >> lib/bug.c:135: undefined reference to `__stop___bug_table' Caused by: 19d436268dde ("debug: Add _ONCE() logic to report_bug()") Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(), but a number of architectures don't use RW_DATA_SECTION(), so they ended up with no __bug_table[] ... Ideally all those would use RW_DATA_SECTION() in their linker scripts, but that's for another day. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update provides: - prevent KASLR from randomizing EFI regions - restrict the usage of -maccumulate-outgoing-args and document when and why it is required. - make the Global Physical Address calculation for UV4 systems work correctly. - address a copy->paste->forgot-edit problem in the MCE exception table entries. - assign a name to AMD MCA bank 3, so the sysfs file registration works. - add a missing include in the boot code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Include missing header file x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs x86/build: Mostly disable '-maccumulate-outgoing-args' x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization x86/mce: Fix copy/paste error in exception table entries x86/platform/uv: Fix calculation of Global Physical Address
2017-04-02Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "This update provides: - make the scheduler clock switch to unstable mode smooth so the timestamps stay at microseconds granularity instead of switching to tick granularity. - unbreak perf test tsc by taking the new offset into account which was added in order to proveide better sched clock continuity - switching sched clock to unstable mode runs all clock related computations which affect the sched clock output itself from a work queue. In case of preemption sched clock uses half updated data and provides wrong timestamps. Keep the math in the protected context and delegate only the static key switch to workqueue context. - remove a duplicate header include" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/headers: Remove duplicate #include <linux/sched/debug.h> line sched/clock: Fix broken stable to unstable transfer sched/clock, x86/perf: Fix "perf test tsc" sched/clock: Fix clear_sched_clock_stable() preempt wobbly
2017-04-02Merge branch 'parisc-4.11-3' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc
2017-03-31kasan: do not sanitize kexec purgatoryMike Galbraith
Fixes this: kexec: Undefined symbol: __asan_load8_noabort kexec-bzImage64: Loading purgatory failed Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.de Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31x86/mm: Make in_compat_syscall() work during execDmitry Safonov
The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. On execve the registers of the task invoking exec() are copied to the child pt_regs. So child->pt_regs->orig_ax contains the execve syscall number of the parent. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 child->thread.status is corretly set up in set_personality_*(), but the syscall number in child->pt_regs.orig_ax is left unmodified. Therefore the parent/child combinations work or fail in the following way: Parent Child Child->thread_status child->pt_regs.orig_ax in_compat() Works ia64 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false Y ia64 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 0 true Y ia64 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false N ia32 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false Y ia32 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 0 true Y ia32 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false N x32 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 1 true N x32 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 1 true Y x32 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 1 true Y Make set_personality_*() store the syscall number incl. __X32_SYSCALL_BIT which corresponds to the newly started ELF executable in the childs pt_regs, i.e. pretend that the exec was invoked from a task with the same executable format. So both thread.status and pt_regs.orig_ax correspond to the new ELF format and in_compat_syscall() returns the correct result. [ tglx: Rewrote changelog ] Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: Adam Borowski <kilobyte@angband.pl> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andrei Vagin <avagin@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170331111137.28170-1-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>