summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-03-16KVM: VMX: don't configure EPT identity map for unrestricted guestSean Christopherson
An unrestricted guest can run with hardware CR0.PG==0, i.e. IA32 paging disabled, in which case there is no need to load the guest's CR3 with identity mapped IA32 page tables since hardware will effectively ignore CR3. If unrestricted guest is enabled, don't configure the identity mapped IA32 page table and always load the guest's desired CR3. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: VMX: don't configure RM TSS for unrestricted guestSean Christopherson
An unrestricted guest can run with CR0.PG==0 and/or CR0.PE==0, e.g. it can run in Real Mode without requiring host emulation. The RM TSS is only used for emulating RM, i.e. it will never be used when unrestricted guest is enabled and so doesn't need to be configured. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16x86/kvm/hyper-v: inject #GP only when invalid SINTx vector is unmaskedVitaly Kuznetsov
Hyper-V 2016 on KVM with SynIC enabled doesn't boot with the following trace: kvm_entry: vcpu 0 kvm_exit: reason MSR_WRITE rip 0xfffff8000131c1e5 info 0 0 kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000090 data 0x10000 host 0 kvm_msr: msr_write 40000090 = 0x10000 (#GP) kvm_inj_exception: #GP (0x0) KVM acts according to the following statement from TLFS: " 11.8.4 SINTx Registers ... Valid values for vector are 16-255 inclusive. Specifying an invalid vector number results in #GP. " However, I checked and genuine Hyper-V doesn't #GP when we write 0x10000 to SINTx. I checked with Microsoft and they confirmed that if either the Masked bit (bit 16) or the Polling bit (bit 18) is set to 1, then they ignore the value of Vector. Make KVM act accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16x86/kvm/hyper-v: remove stale entries from vec_bitmap/auto_eoi_bitmap on ↵Vitaly Kuznetsov
vector change When a new vector is written to SINx we update vec_bitmap/auto_eoi_bitmap but we forget to remove old vector from these masks (in case it is not present in some other SINTx). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16x86/kvm/hyper-v: add reenlightenment MSRs supportVitaly Kuznetsov
Nested Hyper-V/Windows guest running on top of KVM will use TSC page clocksource in two cases: - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]). - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]). Exposing invariant TSC effectively blocks migration to hosts with different TSC frequencies, providing reenlightenment support will be needed when we start migrating nested workloads. Implement rudimentary support for reenlightenment MSRs. For now, these are just read/write MSRs with no effect. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: x86: Update the exit_qualification access bits while walking an addressKarimAllah Ahmed
... to avoid having a stale value when handling an EPT misconfig for MMIO regions. MMIO regions that are not passed-through to the guest are handled through EPT misconfigs. The first time a certain MMIO page is touched it causes an EPT violation, then KVM marks the EPT entry to cause an EPT misconfig instead. Any subsequent accesses to the entry will generate an EPT misconfig. Things gets slightly complicated with nested guest handling for MMIO regions that are not passed through from L0 (i.e. emulated by L0 user-space). An EPT violation for one of these MMIO regions from L2, exits to L0 hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and realize it is not present (or not sufficient to serve the request). Then L0 injects an EPT violation to L1. L1 would then update its EPT mappings. The EXIT_QUALIFICATION value for L1 would come from exit_qualification variable in "struct vcpu". The problem is that this variable is only updated on EPT violation and not on EPT misconfig. So if an EPT violation because of a read happened first, then an EPT misconfig because of a write happened afterwards. The L0 hypervisor will still contain exit_qualification value from the previous read instead of the write and end up injecting an EPT violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION. The EPT violation that is injected from L0 to L1 needs to have the correct EXIT_QUALIFICATION specially for the access bits because the individual access bits for MMIO EPTs are updated only on actual access of this specific type. So for the example above, the L1 hypervisor will keep updating only the read bit in the EPT then resume the L2 guest. The L2 guest would end up causing another exit where the L0 *again* will inject another EPT violation to L1 hypervisor with *again* an out of date exit_qualification which indicates a read and not a write. Then this ping-pong just keeps happening without making any forward progress. The behavior of mapping MMIO regions changed in: commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)") ... where an EPT violation for a read would also fixup the write bits to avoid another EPT violation which by acciddent would fix the bug mentioned above. This commit fixes this situation and ensures that the access bits for the exit_qualifcation is up to date. That ensures that even L1 hypervisor running with a KVM version before the commit mentioned above would still work. ( The description above assumes EPT to be available and used by L1 hypervisor + the L1 hypervisor is passing through the MMIO region to the L2 guest while this MMIO region is emulated by the L0 user-space ). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: x86: Make enum conversion explicit in kvm_pdptr_read()Matthias Kaehlcke
The type 'enum kvm_reg_ex' is an extension of 'enum kvm_reg', however the extension is only semantical and the compiler doesn't know about the relationship between the two types. In kvm_pdptr_read() a value of the extended type is passed to kvm_x86_ops->cache_reg(), which expects a value of the base type. Clang raises the following warning about the type mismatch: arch/x86/kvm/kvm_cache_regs.h:44:32: warning: implicit conversion from enumeration type 'enum kvm_reg_ex' to different enumeration type 'enum kvm_reg' [-Wenum-conversion] kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR); Cast VCPU_EXREG_PDPTR to 'enum kvm_reg' to make the compiler happy. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in useVitaly Kuznetsov
Devices which use level-triggered interrupts under Windows 2016 with Hyper-V role enabled don't work: Windows disables EOI broadcast in SPIV unconditionally. Our in-kernel IOAPIC implementation emulates an old IOAPIC version which has no EOI register so EOI never happens. The issue was discovered and discussed a while ago: https://www.spinics.net/lists/kvm/msg148098.html While this is a guest OS bug (it should check that IOAPIC has the required capabilities before disabling EOI broadcast) we can workaround it in KVM: advertising DIRECTED_EOI with in-kernel IOAPIC makes little sense anyway. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: x86: Add support for AMD Core Perf Extension in guestJanakarajan Natarajan
Add support for AMD Core Performance counters in the guest. The base event select and counter MSRs are changed. In addition, with the core extension, there are 2 extra counters available for performance measurements for a total of 6. With the new MSRs, the logic to map them to the gp_counters[] is changed. New functions are added to check the validity of the get/set MSRs. If the guest has the X86_FEATURE_PERFCTR_CORE cpuid flag set, the number of counters available to the vcpu is set to 6. It the flag is not set then it is 4. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> [Squashed "Expose AMD Core Perf Extension flag to guests" - Radim.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16x86/msr: Add AMD Core Perf Extension MSRsJanakarajan Natarajan
Add the EventSelect and Counter MSRs for AMD Core Perf Extension. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16x86/microcode: Fix CPU synchronization routineBorislav Petkov
Emanuel reported an issue with a hang during microcode update because my dumb idea to use one atomic synchronization variable for both rendezvous - before and after update - was simply bollocks: microcode: microcode_reload_late: late_cpus: 4 microcode: __reload_late: cpu 2 entered microcode: __reload_late: cpu 1 entered microcode: __reload_late: cpu 3 entered microcode: __reload_late: cpu 0 entered microcode: __reload_late: cpu 1 left microcode: Timeout while waiting for CPUs rendezvous, remaining: 1 CPU1 above would finish, leave and the others will still spin waiting for it to join. So do two synchronization atomics instead, which makes the code a lot more straightforward. Also, since the update is serialized and it also takes quite some time per microcode engine, increase the exit timeout by the number of CPUs on the system. That's ok because the moment all CPUs are done, that timeout will be cut short. Furthermore, panic when some of the CPUs timeout when returning from a microcode update: we can't allow a system with not all cores updated. Also, as an optimization, do not do the exit sync if microcode wasn't updated. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de
2018-03-16x86/microcode: Attempt late loading only when new microcode is presentBorislav Petkov
Return UCODE_NEW from the scanning functions to denote that new microcode was found and only then attempt the expensive synchronization dance. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-1-bp@alien8.de
2018-03-16perf: Fix sibling iterationPeter Zijlstra
Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
2018-03-16x86/tsc: Convert ART in nanoseconds to TSCRajvi Jingar
Device drivers use get_device_system_crosststamp() to produce precise system/device cross-timestamps. The PHC clock and ALSA interfaces, for example, make the cross-timestamps available to user applications. On Intel platforms, get_device_system_crosststamp() requires a TSC value derived from ART (Always Running Timer) to compute the monotonic raw and realtime system timestamps. Starting with Intel Goldmont platforms, the PCIe root complex supports the PTM time sync protocol. PTM requires all timestamps to be in units of nanoseconds. The Intel root complex hardware propagates system time derived from ART in units of nanoseconds performing the conversion as follows: ART_NS = ART * 1e9 / <crystal frequency> When user software requests a cross-timestamp, the system timestamps (generally read from device registers) must be converted to TSC by the driver software as follows: TSC = ART_NS * TSC_KHZ / 1e6 This is valid when CPU feature flag X86_FEATURE_TSC_KNOWN_FREQ is set indicating that tsc_khz is derived from CPUID[15H]. Drivers should check whether this flag is set before conversion to TSC is attempted. Suggested-by: Christopher S. Hall <christopher.s.hall@intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/1520530116-4925-1-git-send-email-rajvi.jingar@intel.com
2018-03-16KVM: x86: Fix device passthrough when SME is activeTom Lendacky
When using device passthrough with SME active, the MMIO range that is mapped for the device should not be mapped encrypted. Add a check in set_spte() to insure that a page is not mapped encrypted if that page is a device MMIO page as indicated by kvm_is_mmio_pfn(). Cc: <stable@vger.kernel.org> # 4.14.x- Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-16x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklistAlexander Sergeyev
In accordance with Intel's microcode revision guidance from March 6 MCU rev 0xc2 is cleared on both Skylake H/S and Skylake Xeon E3 processors that share CPUID 506E3. Signed-off-by: Alexander Sergeyev <sergeev917@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jia Zhang <qianyue.zj@alibaba-inc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kyle Huey <me@kylehuey.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180313193856.GA8580@localhost.localdomain
2018-03-15signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptionsDave Martin
Some architectures cannot always report accurately what kind of floating-point exception triggered a floating-point exception trap. This can occur with fp exceptions occurring on lanes in a vector instruction on arm64 for example. Rather than have every architecture come up with its own way of describing such a condition, this patch adds a common FPE_FLTUNK si_code value to report that an fp exception caused a trap but we cannot be certain which kind of fp exception it was. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-03-15x86/mm: Remove pointless checks in vmalloc_faultToshi Kani
vmalloc_fault() sets user's pgd or p4d from the kernel page table. Once it's set, all tables underneath are identical. There is no point of following the same page table with two separate pointers and make sure they see the same with BUG(). Remove the pointless checks in vmalloc_fault(). Also rename the kernel pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in the file. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@alien8.de> Cc: Gratian Crisan <gratian.crisan@ni.com> Link: https://lkml.kernel.org/r/20180314205932.7193-1-toshi.kani@hpe.com
2018-03-15x86/rtc: Stop using deprecated functionsBenjamin Gaignard
rtc_time_to_tm() and rtc_tm_to_time() are deprecated because they rely on 32bits variables and that will make rtc break in y2038/2016. Use the proper y2038 safe functions. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Link: https://lkml.kernel.org/r/1520620971-9567-5-git-send-email-john.stultz@linaro.org
2018-03-14x86, memremap: fix altmap accounting at freeDan Williams
Commit 24b6d4164348 "mm: pass the vmem_altmap to vmemmap_free" converted the vmemmap_free() path to pass the altmap argument all the way through the call chain rather than looking it up based on the page. Unfortunately that ends up over freeing altmap allocated pages in some cases since free_pagetable() is used to free both memmap space and pte space, where only the memmap stored in huge pages uses altmap allocations. Given that altmap allocations for memmap space are special cased in vmemmap_populate_hugepages() add a symmetric / special case free_hugepage_table() to handle altmap freeing, and cleanup the unneeded passing of altmap to leaf functions that do not require it. Without this change the sanity check accounting in devm_memremap_pages_release() will throw a warning with the following signature. nd_pmem pfn10.1: devm_memremap_pages_release: failed to free all reserved pages WARNING: CPU: 44 PID: 3539 at kernel/memremap.c:310 devm_memremap_pages_release+0x1c7/0x220 CPU: 44 PID: 3539 Comm: ndctl Tainted: G L 4.16.0-rc1-linux-stable #7 RIP: 0010:devm_memremap_pages_release+0x1c7/0x220 [..] Call Trace: release_nodes+0x225/0x270 device_release_driver_internal+0x15d/0x210 bus_remove_device+0xe2/0x160 device_del+0x130/0x310 ? klist_release+0x56/0x100 ? nd_region_notify+0xc0/0xc0 [libnvdimm] device_unregister+0x16/0x60 This was missed in testing since not all configurations will trigger this warning. Fixes: 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free") Reported-by: Jane Chu <jane.chu@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-14Merge branch 'x86/urgent' into x86/mm to pick up dependenciesThomas Gleixner
2018-03-14x86/mm: Fix vmalloc_fault to use pXd_largeToshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS is not set since the function inadvertently uses pXn_huge(), which always return 0 in this case. ioremap() does not depend on CONFIG_HUGETLBFS. Fix vmalloc_fault() to call pXd_large() instead. Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages properly") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.kani@hpe.com
2018-03-14x86/speculation, objtool: Annotate indirect calls/jumps for objtool on ↵Andy Whitcroft
32-bit kernels In the following commit: 9e0e3c5130e9 ("x86/speculation, objtool: Annotate indirect calls/jumps for objtool") ... we added annotations for CALL_NOSPEC/JMP_NOSPEC on 64-bit x86 kernels, but we did not annotate the 32-bit path. Annotate it similarly. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180314112427.22351-1-apw@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-14x86/vm86/32: Fix POPF emulationAndy Lutomirski
POPF would trap if VIP was set regardless of whether IF was set. Fix it. Suggested-by: Stas Sergeev <stsp@list.ru> Reported-by: Bart Oldeman <bartoldeman@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 5ed92a8ab71f ("x86/vm86: Use the normal pt_regs area for vm86") Link: http://lkml.kernel.org/r/ce95f40556e7b2178b6bc06ee9557827ff94bd28.1521003603.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12perf/core: Remove perf_event::group_entryPeter Zijlstra
Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/platform/intel-mid: Add special handling for ACPI HW reduced platformsAndy Shevchenko
When switching to ACPI HW reduced platforms we still want to initialize timers. Override x86_init.acpi.reduced_hw_init to achieve that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-3-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callbackAndy Shevchenko
Some ACPI hardware reduced platforms need to initialize certain devices defined by the ACPI hardware specification even though in principle those devices should not be present in an ACPI hardware reduced platform. To allow that to happen, make it possible to override the generic x86_init callbacks and provide a custom legacy_pic value, add a new ->reduced_hw_early_init() callback to struct x86_init_acpi and make acpi_reduced_hw_init() use it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-2-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and exportAndy Shevchenko
This is a preparation patch to allow override the hardware reduced initialization on ACPI enabled platforms. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/20180220180506.65523-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12locking/atomic, asm-generic, x86: Add comments for atomic instrumentationDmitry Vyukov
The comments are factored out from the code changes to make them easier to read. Add them separately to explain some non-obvious aspects. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/cc595efc644bb905407012d82d3eb8bac3368e7a.1517246437.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12locking/atomic/x86: Switch atomic.h to use atomic-instrumented.hDmitry Vyukov
Add arch_ prefix to all atomic operations and include <asm-generic/atomic-instrumented.h>. This will allow to add KASAN instrumentation to all atomic ops. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/54f0eb64260b84199e538652e079a89b5423ad41.1517246437.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leafKirill A. Shutemov
MKTME_KEY_PROG allows to manipulate MKTME keys in the CPU. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/pconfig: Detect PCONFIG targetsKirill A. Shutemov
Intel PCONFIG targets are enumerated via new CPUID leaf 0x1b. This patch detects all supported targets of PCONFIG and implements helper to check if the target is supported. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/tme: Detect if TME and MKTME is activated by BIOSKirill A. Shutemov
IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has enabled TME and MKTME. It includes which encryption policy/algorithm is selected for TME or available for MKTME. For MKTME, the MSR also enumerates how many KeyIDs are available. We would need to exclude KeyID bits from physical address bits. detect_tme() would adjust cpuinfo_x86::x86_phys_bits accordingly. We have to do this even if we are not going to use KeyID bits ourself. VM guests still have to know that these bits are not usable for physical address. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12Merge branch 'x86/pti' into x86/mm, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/cpufeatures: Add Intel PCONFIG cpufeatureKirill A. Shutemov
CPUID.0x7.0x0:EDX[18] indicates whether Intel CPU support PCONFIG instruction. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/cpufeatures: Add Intel Total Memory Encryption cpufeatureKirill A. Shutemov
CPUID.0x7.0x0:ECX[13] indicates whether CPU supports Intel Total Memory Encryption. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kai Huang <kai.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180305162610.37510-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4GKirill A. Shutemov
This patch addresses a shortcoming in current boot process on machines that supports 5-level paging. If a bootloader enables 64-bit mode with 4-level paging, we might need to switch over to 5-level paging. The switching requires the disabling paging. It works fine if kernel itself is loaded below 4G. But if the bootloader put the kernel above 4G (not sure if anybody does this), we would lose control as soon as paging is disabled, because the code becomes unreachable to the CPU. This patch implements a trampoline in lower memory to handle this situation. We only need the memory for a very short time, until the main kernel image sets up own page tables. We go through the trampoline even if we don't have to: if we're already in 5-level paging mode or if we don't need to switch to it. This way the trampoline gets tested on every boot. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Use page table in trampoline memoryKirill A. Shutemov
If a bootloader enables 64-bit mode with 4-level paging, we might need to switch over to 5-level paging. The switching requires the disabling paging. It works fine if kernel itself is loaded below 4G. But if the bootloader put the kernel above 4G (i.e. in kexec() case), we would lose control as soon as paging is disabled, because the code becomes unreachable to the CPU. To handle the situation, we need a trampoline in lower memory that would take care of switching on 5-level paging. Apart from the trampoline code itself we also need a place to store top-level page table in lower memory as we don't have a way to load 64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there as we only use the very first entry of the page table. But we allocate a whole page anyway. This patch switches 32-bit code to use page table in trampoline memory. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Use stack from trampoline memoryKirill A. Shutemov
As the first step on using trampoline memory, let's make 32-bit code use stack there. Separate stack is required to return back from trampoline and we cannot user stack from 64-bit mode as it may be above 4G. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Make sure we have a 32-bit code segmentKirill A. Shutemov
When kernel starts in 64-bit mode we inherit the GDT from the bootloader. It may cause a problem if the GDT doesn't have a 32-bit code segment where we expect it to be. Load our own GDT with known segments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180312100246.89175-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3Sai Praneeth
Use helper function efi_switch_mm() to switch to/from efi_mm when invoking any UEFI runtime services. Likewise, we need to switch back to previous mm (mm context stolen by efi_mm) after the above calls return successfully. We can use efi_switch_mm() helper function only with x86_64 kernel and "efi=old_map" disabled because, x86_32 and efi=old_map do not use efi_pgd, rather they use swapper_pg_dir. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> [ardb: add #include of sched/task.h for task_lock/_unlock] Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/efi: Replace efi_pgd with efi_mm.pgdSai Praneeth
Since the previous patch added support for efi_mm, let's handle efi_pgd through efi_mm and remove global variable efi_pgd. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/mm: Do not use paravirtualized calls in native_set_p4d()Kirill A. Shutemov
In 4-level paging mode, native_set_p4d() updates the entry in the top-level page table. With PTI, update to the top-level kernel page table requires update to the userspace copy of the table as well, using pti_set_user_pgd(). native_set_p4d() uses p4d_val() and pgd_val() to convert types between p4d_t and pgd_t. p4d_val() and pgd_val() are paravirtualized and we must not use them in native helpers, as they crash the boot in paravirtualized environments. Replace p4d_val() and pgd_val() with native_p4d_val() and native_pgd_val() in native_set_p4d(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 91f606a8fa68 ("x86/mm: Replace compile-time checks for 5-level paging with runtime-time checks") Link: http://lkml.kernel.org/r/20180305081641.4290-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12efi: Use string literals for efi_char16_t variable initializersArd Biesheuvel
Now that we unambiguously build the entire kernel with -fshort-wchar, it is no longer necessary to open code efi_char16_t[] initializers as arrays of characters, and we can move to the L"xxx" notation instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180312084500.10764-6-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12efi: Use efi_mm in x86 as well as ARMSai Praneeth
Presently, only ARM uses mm_struct to manage EFI page tables and EFI runtime region mappings. As this is the preferred approach, let's make this data structure common across architectures. Specially, for x86, using this data structure improves code maintainability and readability. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> [ardb: don't #include the world to get a declaration of struct mm_struct] Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180312084500.10764-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12Merge branch 'x86/mm' into efi/coreIngo Molnar
This commit in x86/mm changed EFI code: 116fef640859: x86/mm/dump_pagetables: Add the EFI pagetable to the debugfs 'page_tables' directory So merge in that commit plus its dependencies, before continuing with EFI work. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12kdump, vmcoreinfo: Export pgtable_l5_enabled valueBaoquan He
User-space utilities examining crash-kernels need to know if the crashed kernel was in 5-level paging mode or not. So write 'pgtable_l5_enabled' to vmcoreinfo, which covers these three cases: pgtable_l5_enabled == 0 when: - Compiled with !CONFIG_X86_5LEVEL - Compiled with CONFIG_X86_5LEVEL=y while CPU has no 'la57' flag pgtable_l5_enabled != 0 when: - Compiled with CONFIG_X86_5LEVEL=y and CPU has 'la57' flag Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: dyoung@redhat.com Cc: ebiederm@xmission.com Cc: kirill.shutemov@linux.intel.com Cc: vgoyal@redhat.com Link: http://lkml.kernel.org/r/20180302051801.19594-1-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Prepare new top-level page table for trampolineKirill A. Shutemov
If trampoline code would need to switch between 4- and 5-level paging modes, we have to use a page table in trampoline memory. Having it in trampoline memory guarantees that it's below 4G and we can point CR3 to it from 32-bit trampoline code. We only use the page table if the desired paging mode doesn't match the mode we are in. Otherwise the page table is unused and trampoline code wouldn't touch CR3. For 4- to 5-level paging transition, we set up current (4-level paging) CR3 as the first and the only entry in a new top-level page table. For 5- to 4-level paging transition, copy page table pointed by first entry in the current top-level page table as our new top-level page table. If the page table is used by trampoline we would need to copy it to new page table outside trampoline and update CR3 before restoring trampoline memory. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180226180451.86788-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Set up trampoline memoryKirill A. Shutemov
This patch clears up trampoline memory and copies trampoline code in place. It's not yet used though. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180226180451.86788-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12x86/boot/compressed/64: Save and restore trampoline memoryKirill A. Shutemov
The memory area we found for trampoline shouldn't contain anything useful. But let's preserve the data anyway. Just to be on safe side. paging_prepare() would save the data into a buffer. cleanup_trampoline() would restore it back once we are done with the trampoline. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180226180451.86788-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>