summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2018-10-04x66/vdso: Add CLOCK_TAI supportThomas Gleixner
With the storage array in place it's now trivial to support CLOCK_TAI in the vdso. Extend the base time storage array and add the update code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Matt Rickard <matt@softrans.com.au> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.823878601@linutronix.de
2018-10-04x86/vdso: Introduce and use vgtod_tsThomas Gleixner
It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This results either in indirect calls due to the larger switch case, which then requires retpolines or when the compiler is forced to avoid jump tables it results in even more conditionals. To avoid both variants which are bad for performance the high resolution functions and the coarse grained functions will be collapsed into one for each. That requires to store the clock specific base time in an array. Introcude struct vgtod_ts for storage and convert the data store, the update function and the individual clock functions over to use it. The new storage does not longer use gtod_long_t for seconds depending on 32 or 64 bit compile because this needs to be the full 64bit value even for 32bit when a Y2038 function is added. No point in keeping the distinction alive in the internal representation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de
2018-10-04x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: SeqThomas Gleixner
The sequence count in vgtod_data is unsigned int, but the call sites use unsigned long, which is a pointless exercise. Fix the call sites and replace 'unsigned' with unsinged 'int' while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.236250416@linutronix.de
2018-10-04x86/paravirt: Work around GCC inlining bugs when compiling paravirt opsNadav Amit
As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) In this patch we wrap the paravirt call section tricks in a macro, to hide it from GCC. The effect of the patch is a more aggressive inlining, which also causes a size increase of kernel. text data bss dec hex filename 18147336 10226688 2957312 31331336 1de1408 ./vmlinux before 18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819) The number of static text symbols (non-inlined functions) goes down: Before: 40053 After: 39942 (-111) [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20181003213100.189959-8-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04x86/bug: Macrofy the BUG table section handling, to work around GCC inlining ↵Nadav Amit
bugs As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This patch increases the kernel size: text data bss dec hex filename 18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before 18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755) But enables more aggressive inlining (and probably better branch decisions). The number of static text symbols in vmlinux is much lower: Before: 40218 After: 40053 (-165) The assembly code gets harder to read due to the extra macro layer. [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugsNadav Amit
As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - i.e. to macrify the affected block. As a result GCC considers the inline assembly block as a single instruction. This patch handles the LOCK prefix, allowing more aggresive inlining: text data bss dec hex filename 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before 18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845) This is the reduction in non-inlined functions: Before: 40286 After: 40218 (-68) Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04x86/refcount: Work around GCC inlining bugNadav Amit
As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This patch allows GCC to inline simple functions such as __get_seccomp_filter(). To no-one's surprise the result is that GCC performs more aggressive (read: correct) inlining decisions in these senarios, which reduces the kernel size and presumably also speeds it up: text data bss dec hex filename 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958) 16 fewer static text symbols: Before: 40302 After: 40286 (-16) these got inlined instead. Functions such as kref_get(), free_user(), fuse_file_get() now get inlined. Hurray! [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-5-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04Merge branch 'linus' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-03signal: Distinguish between kernel_siginfo and siginfoEric W. Biederman
Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-03signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZEEric W. Biederman
Rework the defintion of struct siginfo so that the array padding struct siginfo to SI_MAX_SIZE can be placed in a union along side of the rest of the struct siginfo members. The result is that we no longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-03x86, hibernate: Rename temp_level4_pgt to temp_pgtZhimin Gu
As 32bit system is not using 4-level page, rename it to temp_pgt so that it can be reused for both 32bit and 64bit hibernation. No functional change. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-03x86, hibernate: Extract the common code of 64/32 bit systemZhimin Gu
Reduce the hibernation code duplication between x86-32 and x86-64 by extracting the common code into hibernate.c. Currently only pfn_is_nosave() is the activated common function in hibernate.c No functional change. Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-03x86-32/asm/power: Create stack frames in hibernate_asm_32.SZhimin Gu
swsusp_arch_suspend() is callable non-leaf function which doesn't honor CONFIG_FRAME_POINTER, which can result in bad stack traces. Also it's not annotated as ELF callable function which can confuse tooling. Create a stack frame for it when CONFIG_FRAME_POINTER is enabled and give it proper ELF function annotation. Also in this patch introduces the restore_registers() symbol and gives it ELF function annotation, thus to prepare for later register restore. Analogous changes were made for 64bit before in commit ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S) and commit 4ce827b4cc58 (x86/power/64: Fix hibernation return address corruption). Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-02x86/platform/uv: Provide is_early_uv_system()Mike Travis
Introduce is_early_uv_system() which uses efi.uv_systab to decide early in the boot process whether the kernel runs on a UV system. This is needed to skip other early setup/init code that might break the UV platform if done too early such as before necessary ACPI tables parsing takes place. Suggested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Link: https://lkml.kernel.org/r/20181002180144.801700401@stormcage.americas.sgi.com
2018-10-02x86/cpu: Sanitize FAM6_ATOM namingPeter Zijlstra
Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel: Add a separate Arch Perfmon v4 PMI handlerAndi Kleen
Implements counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Differences to the old handler: - It relies on counter freezing, which eliminates several MSR writes from the PMI handler and lowers the overhead significantly. It makes the PMI handler more accurate, as all counters get frozen atomically as soon as any counter overflows. So there is much less counting of the PMI handler itself. With the freezing we don't need to disable or enable counters or PEBS. Only BTS which does not support auto-freezing still needs to be explicitly managed. - The PMU acking is done at the end, not the beginning. This makes it possible to avoid manual enabling/disabling of the PMU, instead we just rely on the freezing/acking. - The APIC is acked before reenabling the PMU, which avoids problems with LBRs occasionally not getting unfreezed on Skylake. - Looping is only needed to workaround a corner case which several PMIs are very close to each other. For common cases, the counters are freezed during PMI handler. It doesn't need to do re-check. This patch: - Adds code to enable v4 counter freezing - Fork <=v3 and >=v4 PMI handlers into separate functions. - Add kernel parameter to disable counter freezing. It took some time to debug counter freezing, so in case there are new problems we added an option to turn it off. Would not expect this to be used until there are new bugs. - Only for big core. The patch for small core will be posted later separately. Performance: When profiling a kernel build on Kabylake with different perf options, measuring the length of all NMI handlers using the nmi handler trace point: V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options ` V3(ns) V4(ns) delta -c 100000 1088 894 -18% -g -c 100000 1862 1646 -12% --call-graph lbr -c 100000 3649 3367 -8% --c.g. dwarf -c 100000 2248 1982 -12% Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02Merge branch 'x86/cache' into perf/core, to resolve conflictsIngo Molnar
Avoid conflict with upcoming perf/core patches, merge in the RDT perf work. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf eventsNatarajan, Janakarajan
In Family 17h, some L3 Cache Performance events require the ThreadMask and SliceMask to be set. For other events, these fields do not affect the count either way. Set ThreadMask and SliceMask to 0xFF and 0xF respectively. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-01Merge tag 'v4.19-rc6' into for-4.20/blockJens Axboe
Merge -rc6 in, for two reasons: 1) Resolve a trivial conflict in the blk-mq-tag.c documentation 2) A few important regression fixes went into upstream directly, so they aren't in the 4.20 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v4.19-rc6': (780 commits) Linux 4.19-rc6 MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c cpufreq: qcom-kryo: Fix section annotations perf/core: Add sanity check to deal with pinned event failure xen/blkfront: correct purging of persistent grants Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" selftests/powerpc: Fix Makefiles for headers_install change blk-mq: I/O and timer unplugs are inverted in blktrace dax: Fix deadlock in dax_lock_mapping_entry() x86/boot: Fix kexec booting failure in the SEV bit detection code bcache: add separate workqueue for journal_write to avoid deadlock drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" xen/blkfront: When purging persistent grants, keep them in the buffer clocksource/drivers/timer-atmel-pit: Properly handle error cases block: fix deadline elevator drain for zoned block devices ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-01x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double()Uros Bizjak
Replace open-coded use of the SETcc instruction with CC_SET()/CC_OUT() in __cmpxchg_double(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/CAFULd4YdvwwhXWHqqPsGk5+TLG71ozgSscTZNsqmrm+Jzg941w@mail.gmail.com
2018-09-28perf/x86: Add helper to obtain performance counter indexReinette Chatre
perf_event_read_local() is the safest way to obtain measurements associated with performance events. In some cases the overhead introduced by perf_event_read_local() affects the measurements and the use of rdpmcl() is needed. rdpmcl() requires the index of the performance counter used so a helper is introduced to determine the index used by a provided performance event. The index used by a performance event may change when interrupts are enabled. A check is added to ensure that the index is only accessed with interrupts disabled. Even with this check the use of this counter needs to be done with care to ensure it is queried and used within the same disabled interrupts section. This change introduces a new checkpatch warning: CHECK: extern prototypes should be avoided in .h files +extern int x86_perf_rdpmc_index(struct perf_event *event); This warning was discussed and designated as a false positive in http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.net Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.com
2018-09-27x86/kvm: Add Hygon Dhyana support to KVMPu Wen
The Hygon Dhyana CPU has the SVM feature as AMD family 17h does. So enable the KVM infrastructure support to it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/654dd12876149fba9561698eaf9fc15d030301f8.1537533369.git.puwen@hygon.cn
2018-09-27x86/mce: Add Hygon Dhyana support to the MCA infrastructurePu Wen
The machine check architecture for Hygon Dhyana CPU is similar to the AMD family 17h one. Add vendor checking for Hygon Dhyana to share the code path of AMD family 17h. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: tony.luck@intel.com Cc: thomas.lendacky@amd.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/87d8a4f16bdea0bfe0c0cf2e4a8d2c2a99b1055c.1537533369.git.puwen@hygon.cn
2018-09-27x86/amd_nb: Check vendor in AMD-only functionsPu Wen
Exit early in functions which are meant to run on AMD only but which get run on different vendor (VMs, etc). [ bp: rewrite commit message. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhelgaas@google.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: helgaas@kernel.org Link: https://lkml.kernel.org/r/487d8078708baedaf63eb00a82251e228b58f1c2.1537885177.git.puwen@hygon.cn
2018-09-27x86/cpu: Get cache info and setup cache cpumap for Hygon DhyanaPu Wen
The Hygon Dhyana CPU has a topology extensions bit in CPUID. With this bit, the kernel can get the cache information. So add support in cpuid4_cache_lookup_regs() to get the correct cache size. The Hygon Dhyana CPU also discovers num_cache_leaves via CPUID leaf 0x8000001d, so add support to it in find_num_cache_leaves(). Also add cacheinfo_hygon_init_llc_id() and init_hygon_cacheinfo() functions to initialize Dhyana cache info. Setup cache cpumap in the same way as AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: bp@alien8.de Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/2a686b2ac0e2f5a1f2f5f101124d9dd44f949731.1537533369.git.puwen@hygon.cn
2018-09-27x86/jump_table: Use relative referencesArd Biesheuvel
Similar to the arm64 case, 64-bit x86 can benefit from using relative references rather than absolute ones when emitting struct jump_entry instances. Not only does this reduce the memory footprint of the entries themselves by 33%, it also removes the need for carrying relocation metadata on relocatable builds (i.e., for KASLR) which saves a fair chunk of .init space as well (although the savings are not as dramatic as on arm64) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20180919065144.25010-7-ard.biesheuvel@linaro.org
2018-09-27x86: Add support for 64-bit place relative relocationsArd Biesheuvel
Add support for R_X86_64_PC64 relocations, which operate on 64-bit quantities holding a relative symbol reference. Also remove the definition of R_X86_64_NUM: given that it is currently unused, it is unclear what the new value should be. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20180919065144.25010-5-ard.biesheuvel@linaro.org
2018-09-27Merge tag 'efi-next' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v4.20 from Ard Biesheuvel: - Add support for enlisting the help of the EFI firmware to create memory reservations that persist across kexec. - Add page fault handling to the runtime services support code on x86 so we can gracefully recover from buggy EFI firmware. - Fix command line handling on x86 for the boot path that omits the stub's PE/COFF entry point. - Other assorted fixes.
2018-09-27x86/cpu: Create Hygon Dhyana architecture support filePu Wen
Add x86 architecture support for a new processor: Hygon Dhyana Family 18h. Carve out initialization code needed by Dhyana into a separate compilation unit. To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON. Since Dhyana uses AMD functionality to a large degree, select CPU_SUP_AMD which provides that functionality. [ bp: drop explicit license statement as it has an SPDX tag already. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/1a882065223bacbde5726f3beaa70cebd8dcd814.1537533369.git.puwen@hygon.cn
2018-09-27x86/mce: Add macros for the corrected error count bit fieldQiuxu Zhuo
The bit field [52:38] of MCi_STATUS contains the corrected error count. Add {*_SHIFT|*_MASK|*_CEC(c)} macros for it. [ bp: use GENMASK_ULL. ] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180925000343.GB5998@agluck-desk
2018-09-27x86/mce: Use BIT_ULL(x) for bit mask definitionsQiuxu Zhuo
Current coding style is to use the BIT_ULL() macro. [ bp: Align the MCG_STATUS defines vertically too. ] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180925000127.GA5998@agluck-desk
2018-09-26xen: don't include <xen/xen.h> from <asm/io.h> and <asm/dma-mapping.h>Christoph Hellwig
Nothing Xen specific in these headers, which get included from a lot of code in the kernel. So prune the includes and move them to the Xen-specific files that actually use them instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26block: remove ARCH_BIOVEC_PHYS_MERGEABLEChristoph Hellwig
Take the Xen check into the core code instead of delegating it to the architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26xen: provide a prototype for xen_biovec_phys_mergeable in xen.hChristoph Hellwig
Having multiple externs in arch headers is not a good way to provide a common interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26efi/x86: Handle page faults occurring while running EFI runtime servicesSai Praneeth
Memory accesses performed by UEFI runtime services should be limited to: - reading/executing from EFI_RUNTIME_SERVICES_CODE memory regions - reading/writing from/to EFI_RUNTIME_SERVICES_DATA memory regions - reading/writing by-ref arguments - reading/writing from/to the stack. Accesses outside these regions may cause the kernel to hang because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an EFI specific page fault handler which recovers from such faults by 1. If the efi runtime service is efi_reset_system(), reboot the machine through BIOS. 2. If the efi runtime service is _not_ efi_reset_system(), then freeze efi_rts_wq and schedule a new process. The EFI page fault handler offers us two advantages: 1. Avoid potential hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Suggested-by: Matt Fleming <matt@codeblueprint.co.uk> Based-on-code-from: Ricardo Neri <ricardo.neri@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [ardb: clarify commit log] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2018-09-25iommu/vt-d: Relocate struct/function declarations to its header filesSohil Mehta
To reuse the static functions and the struct declarations, move them to corresponding header files and export the needed functions. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-24block: simplify BIOVEC_PHYS_MERGEABLEChristoph Hellwig
Turn the macro into an inline, move it to blk.h and simplify the arch hooks a bit. Also rename the function to biovec_phys_mergeable as there is no need to shout. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-23x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variantZhenzhong Duan
..so that they match their asm counterpart. Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang YanQing <udknight@gmail.com> Cc: dhaval.giani@oracle.com Cc: srinivas.eeda@oracle.com Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@default
2018-09-23Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A set of fixes for x86: - Resolve the kvmclock regression on AMD systems with memory encryption enabled. The rework of the kvmclock memory allocation during early boot results in encrypted storage, which is not shareable with the hypervisor. Create a new section for this data which is mapped unencrypted and take care that the later allocations for shared kvmclock memory is unencrypted as well. - Fix the build regression in the paravirt code introduced by the recent spectre v2 updates. - Ensure that the initial static page tables cover the fixmap space correctly so early console always works. This worked so far by chance, but recent modifications to the fixmap layout can - depending on kernel configuration - move the relevant entries to a different place which is not covered by the initial static page tables. - Address the regressions and issues which got introduced with the recent extensions to the Intel Recource Director Technology code. - Update maintainer entries to document reality" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Expand static page table for fixmap space MAINTAINERS: Add X86 MM entry x86/intel_rdt: Add Reinette as co-maintainer for RDT MAINTAINERS: Add Borislav to the x86 maintainers x86/paravirt: Fix some warning messages x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Fix exclusive mode handling of MBA resource x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Do not allow pseudo-locking of MBA resource x86/intel_rdt: Fix unchecked MSR access x86/intel_rdt: Fix invalid mode warning when multiple resources are managed x86/intel_rdt: Global closid helper to support future fixes x86/intel_rdt: Fix size reporting of MBA resource x86/intel_rdt: Fix data type in parsing callbacks x86/kvm: Use __bss_decrypted attribute in shared variables x86/mm: Add .bss..decrypted section to hold shared variables
2018-09-20x86/mm: Expand static page table for fixmap spaceFeng Tang
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
2018-09-20KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICvLiran Alon
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state, it is possible for a vCPU to be blocked while it is in guest-mode. According to Intel SDM 26.6.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery: "These events wake the logical processor if it just entered the HLT state because of a VM entry". Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU should be waken from the HLT state and injected with the interrupt. In addition, if while the vCPU is blocked (while it is in guest-mode), it receives a nested posted-interrupt, then the vCPU should also be waken and injected with the posted interrupt. To handle these cases, this patch enhances kvm_vcpu_has_events() to also check if there is a pending interrupt in L2 virtual APICv provided by L1. That is, it evaluates if there is a pending virtual interrupt for L2 by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1 Evaluation of Pending Interrupts. Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/hyper-v: rename ipi_arg_{ex,non_ex} structuresVitaly Kuznetsov
These structures are going to be used from KVM code so let's make their names reflect their Hyper-V origin. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: use preemption timer to force immediate VMExitSean Christopherson
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/kvm/lapic: always disable MMIO interface in x2APIC modeVitaly Kuznetsov
When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-19signal/x86: Move mpx siginfo generation into do_boundsEric W. Biederman
This separates the logic of generating the signal from the logic of gathering the information about the bounds violation. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-19signal/x86: In trace_mpx_bounds_register_exception add __user annotationsEric W. Biederman
The value passed in to addr_referenced is of type void __user *, so update the addr_referenced parameter in trace_mpx_bounds_register_exception to match. Also update the addr_referenced paramater in TP_STRUCT__entry as it again holdes the same value. I don't know why this was missed earlier but sparse was complaining when testing test branch so fix this now. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-19signal: Simplify tracehook_report_syscall_exitEric W. Biederman
Replace user_single_step_siginfo with user_single_step_report that allocates siginfo structure on the stack and sends it. This allows tracehook_report_syscall_exit to become a simple if statement that calls user_single_step_report or ptrace_report_syscall depending on the value of step. Update the default helper function now called user_single_step_report to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0. The default helper has always been doing this (using memset) but it was far from obvious. The powerpc helper can now just call force_sig_fault. The x86 helper can now just call send_sigtrap. Unfortunately the default implementation of user_single_step_report can not use force_sig_fault as it does not use a SIGTRAP si_code. So it has to carefully setup the siginfo and use use force_sig_info. The net result is code that is easier to understand and simpler to maintain. Ref: 85ec7fd9f8e5 ("ptrace: introduce user_single_step_siginfo() helper") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-15x86/mm: Add .bss..decrypted section to hold shared variablesBrijesh Singh
kvmclock defines few static variables which are shared with the hypervisor during the kvmclock initialization. When SEV is active, memory is encrypted with a guest-specific key, and if the guest OS wants to share the memory region with the hypervisor then it must clear the C-bit before sharing it. Currently, we use kernel_physical_mapping_init() to split large pages before clearing the C-bit on shared pages. But it fails when called from the kvmclock initialization (mainly because the memblock allocator is not ready that early during boot). Add a __bss_decrypted section attribute which can be used when defining such shared variable. The so-defined variables will be placed in the .bss..decrypted section. This section will be mapped with C=0 early during boot. The .bss..decrypted section has a big chunk of memory that may be unused when memory encryption is not active, free it when memory encryption is not active. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Radim Krčmář<rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
2018-09-14Revert "x86/mm/legacy: Populate the user page-table with user pgd's"Joerg Roedel
This reverts commit 1f40a46cf47c12d93a5ad9dccd82bd36ff8f956a. It turned out that this patch is not sufficient to enable PTI on 32 bit systems with legacy 2-level page-tables. In this paging mode the huge-page PTEs are in the top-level page-table directory, where also the mirroring to the user-space page-table happens. So every huge PTE exits twice, in the kernel and in the user page-table. That means that accessed/dirty bits need to be fetched from two PTEs in this mode to be safe, but this is not trivial to implement because it needs changes to generic code just for the sake of enabling PTI with 32-bit legacy paging. As all systems that need PTI should support PAE anyway, remove support for PTI when 32-bit legacy paging is used. Fixes: 7757d607c6b3 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org