summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-01-17x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerabilityTony W Wang-oc
New Zhaoxin family 7 CPUs are not affected by the SWAPGS vulnerability. So mark these CPUs in the cpu vulnerability whitelist accordingly. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2Tony W Wang-oc
New Zhaoxin family 7 CPUs are not affected by SPECTRE_V2. So define a separate cpu_vuln_whitelist bit NO_SPECTRE_V2 and add these CPUs to the cpu vulnerability whitelist. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEARPawan Gupta
/proc/cpuinfo currently reports Hardware Lock Elision (HLE) feature to be present on boot cpu even if it was disabled during the bootup. This is because cpuinfo_x86->x86_capability HLE bit is not updated after TSX state is changed via the new MSR IA32_TSX_CTRL. Update the cached HLE bit also since it is expected to change after an update to CPUID_CLEAR bit in MSR IA32_TSX_CTRL. Fixes: 95c5824f75f3 ("x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/2529b99546294c893dfa1c89e2b3e46da3369a59.1578685425.git.pawan.kumar.gupta@linux.intel.com
2020-01-17x86/hyper-v: Add "polling" bit to hv_synic_sintWei Liu
That bit is documented in TLFS 5.0c as follows: Setting the polling bit will have the effect of unmasking an interrupt source, except that an actual interrupt is not generated. Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20191222233404.1629-1-wei.liu@kernel.org
2020-01-17x86/apic/uv: Avoid unused variable warningArnd Bergmann
When CONFIG_PROC_FS is disabled, the compiler warns about an unused variable: arch/x86/kernel/apic/x2apic_uv_x.c: In function 'uv_setup_proc_files': arch/x86/kernel/apic/x2apic_uv_x.c:1546:8: error: unused variable 'name' [-Werror=unused-variable] char *name = hubless ? "hubless" : "hubbed"; Simplify the code so this variable is no longer needed. Fixes: 8785968bce1c ("x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191212140419.315264-1-arnd@arndb.de
2020-01-17perf/x86/intel/uncore: Remove PCIe3 unit for SNRKan Liang
The PCIe Root Port driver for CPU Complex PCIe Root Ports are not loaded on SNR. The device ID for SNR PCIe3 unit is used by both uncore driver and the PCIe Root Port driver. If uncore driver is loaded, the PCIe Root Port driver never be probed. Remove the PCIe3 unit for SNR for now. The support for PCIe3 unit will be added later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200116200210.18937-2-kan.liang@linux.intel.com
2020-01-17perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_eventsKan Liang
An Oops during the boot is found on some SNR machines. It turns out this is because the snr_uncore_imc_freerunning_events[] array was missing an end-marker. Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Reported-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Like Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200116200210.18937-1-kan.liang@linux.intel.com
2020-01-17perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 FamilyKan Liang
The IMC uncore support is missed for E3-1585 v5 CPU. Intel Xeon E3 V5 Family has Sky Lake CPU. Add the PCI ID of IMC for Intel Xeon E3 V5 Family. Reported-by: Rosales-fernandez, Carlos <carlos.rosales-fernandez@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Rosales-fernandez, Carlos <carlos.rosales-fernandez@intel.com> Link: https://lkml.kernel.org/r/1578687311-158748-1-git-send-email-kan.liang@linux.intel.com
2020-01-17perf/x86/amd: Add support for Large Increment per Cycle EventsKim Phillips
Description of hardware operation --------------------------------- The core AMD PMU has a 4-bit wide per-cycle increment for each performance monitor counter. That works for most events, but now with AMD Family 17h and above processors, some events can occur more than 15 times in a cycle. Those events are called "Large Increment per Cycle" events. In order to count these events, two adjacent h/w PMCs get their count signals merged to form 8 bits per cycle total. In addition, the PERF_CTR count registers are merged to be able to count up to 64 bits. Normally, events like instructions retired, get programmed on a single counter like so: PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count The next counter at MSRs 0xc0010202-3 remains unused, or can be used independently to count something else. When counting Large Increment per Cycle events, such as FLOPs, however, we now have to reserve the next counter and program the PERF_CTL (config) register with the Merge event (0xFFF), like so: PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count The count is widened from the normal 48-bits to 64 bits by having the second counter carry the higher 16 bits of the count in its lower 16 bits of its counter register. The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge event before the even counter, PERF_CTL0. The Large Increment feature is available starting with Family 17h. For more details, search any Family 17h PPR for the "Large Increment per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this version: https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip Description of software operation --------------------------------- The following steps are taken in order to support reserving and enabling the extra counter for Large Increment per Cycle events: 1. In the main x86 scheduler, we reduce the number of available counters by the number of Large Increment per Cycle events being scheduled, tracked by a new cpuc variable 'n_pair' and a new amd_put_event_constraints_f17h(). This improves the counter scheduler success rate. 2. In perf_assign_events(), if a counter is assigned to a Large Increment event, we increment the current counter variable, so the counter used for the Merge event is removed from assignment consideration by upcoming event assignments. 3. In find_counter(), if a counter has been found for the Large Increment event, we set the next counter as used, to prevent other events from using it. 4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e., we add Merge event accounting to the existing used_mask logic. 5. Finally, we add on the programming of Merge event to the neighbouring PMC counters in the counter enable/disable{_all} code paths. Currently, software does not support a single PMU with mixed 48- and 64-bit counting, so Large increment event counts are limited to 48 bits. In set_period, we zero-out the upper 16 bits of the count, so the hardware doesn't copy them to the even counter's higher bits. Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm vaddps instruction executed in a loop 100 million times: perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload> Performance counter stats for '<workload>': 800,000,000 cpu/fp_ret_sse_avx_ops.all/u 300,042,101 cpu/instructions/u Prior to this patch, the reported SSE/AVX FLOPs retired count would be wrong. [peterz: lots of renames and edits to the code] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-01-17perf/x86/amd: Constrain Large Increment per Cycle eventsKim Phillips
AMD Family 17h processors and above gain support for Large Increment per Cycle events. Unfortunately there is no CPUID or equivalent bit that indicates whether the feature exists or not, so we continue to determine eligibility based on a CPU family number comparison. For Large Increment per Cycle events, we add a f17h-and-compatibles get_event_constraints_f17h() that returns an even counter bitmask: Large Increment per Cycle events can only be placed on PMCs 0, 2, and 4 out of the currently available 0-5. The only currently public event that requires this feature to report valid counts is PMCx003 "Retired SSE/AVX Operations". Note that the CPU family logic in amd_core_pmu_init() is changed so as to be able to selectively add initialization for features available in ranges of backward-compatible CPU families. This Large Increment per Cycle feature is expected to be retained in future families. A side-effect of assigning a new get_constraints function for f17h disables calling the old (prior to f15h) amd_get_event_constraints implementation left enabled by commit e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors"), which is no longer necessary since those North Bridge event codes are obsoleted. Also fix a spelling mistake whilst in the area (calulating -> calculating). Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.com
2020-01-17perf/x86/intel/rapl: Add Comet Lake supportHarry Pan
Comet Lake supports the same RAPL counters like Kaby Lake and Skylake. After this, on CML machine the energy counters appear in perf list. Signed-off-by: Harry Pan <harry.pan@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191227171944.1.Id6f3ab98474d7d1dba5b95390b24e0a67368d364@changeid
2020-01-16x86/CPU/AMD: Ensure clearing of SME/SEV features is maintainedTom Lendacky
If the SME and SEV features are present via CPUID, but memory encryption support is not enabled (MSR 0xC001_0010[23]), the feature flags are cleared using clear_cpu_cap(). However, if get_cpu_cap() is later called, these feature flags will be reset back to present, which is not desired. Change from using clear_cpu_cap() to setup_clear_cpu_cap() so that the clearing of the flags is maintained. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.16.x- Link: https://lkml.kernel.org/r/226de90a703c3c0be5a49565047905ac4e94e8f3.1579125915.git.thomas.lendacky@amd.com
2020-01-16x86/amd_nb: Add Family 19h PCI IDsYazen Ghannam
Add the new PCI Device 18h IDs for AMD Family 19h systems. Note that Family 19h systems will not have a new PCI root device ID. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-4-Yazen.Ghannam@amd.com
2020-01-16x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaTypeYazen Ghannam
Add support for a new version of the Load Store unit bank type as indicated by its McaType value, which will be present in future SMCA systems. Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is logically the same to the user. Also, add the new error descriptions to edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
2020-01-16crypto: x86/poly1305 - wire up faster implementations for kernelJason A. Donenfeld
These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F. The AVX-512F implementation is disabled on Skylake, due to throttling, but it is quite fast on >= Cannonlake. On the left is cycle counts on a Core i7 6700HQ using the AVX-2 codepath, comparing this implementation ("new") to the implementation in the current crypto api ("old"). On the right are benchmarks on a Xeon Gold 5120 using the AVX-512 codepath. The new implementation is faster on all benchmarks. AVX-2 AVX-512 --------- ----------- size old new size old new ---- ---- ---- ---- ---- ---- 0 70 68 0 74 70 16 92 90 16 96 92 32 134 104 32 136 106 48 172 120 48 184 124 64 218 136 64 218 138 80 254 158 80 260 160 96 298 174 96 300 176 112 342 192 112 342 194 128 388 212 128 384 212 144 428 228 144 420 226 160 466 246 160 464 248 176 510 264 176 504 264 192 550 282 192 544 282 208 594 302 208 582 300 224 628 316 224 624 318 240 676 334 240 662 338 256 716 354 256 708 358 272 764 374 272 748 372 288 802 352 288 788 358 304 420 366 304 422 370 320 428 360 320 432 364 336 484 378 336 486 380 352 426 384 352 434 390 368 478 400 368 480 408 384 488 394 384 490 398 400 542 408 400 542 412 416 486 416 416 492 426 432 534 430 432 538 436 448 544 422 448 546 432 464 600 438 464 600 448 480 540 448 480 548 456 496 594 464 496 594 476 512 602 456 512 606 470 528 656 476 528 656 480 544 600 480 544 606 498 560 650 494 560 652 512 576 664 490 576 662 508 592 714 508 592 716 522 608 656 514 608 664 538 624 708 532 624 710 552 640 716 524 640 720 516 656 770 536 656 772 526 672 716 548 672 722 544 688 770 562 688 768 556 704 774 552 704 778 556 720 826 568 720 832 568 736 768 574 736 780 584 752 822 592 752 826 600 768 830 584 768 836 560 784 884 602 784 888 572 800 828 610 800 838 588 816 884 628 816 884 604 832 888 618 832 894 598 848 942 632 848 946 612 864 884 644 864 896 628 880 936 660 880 942 644 896 948 652 896 952 608 912 1000 664 912 1004 616 928 942 676 928 954 634 944 994 690 944 1000 646 960 1002 680 960 1008 646 976 1054 694 976 1062 658 992 1002 706 992 1012 674 1008 1052 720 1008 1058 690 This commit wires in the prior implementation from Andy, and makes the following changes to be suitable for kernel land. - Some cosmetic and structural changes, like renaming labels to .Lname, constants, and other Linux conventions, as well as making the code easy for us to maintain moving forward. - CPU feature checking is done in C by the glue code. - We avoid jumping into the middle of functions, to appease objtool, and instead parameterize shared code. - We maintain frame pointers so that stack traces make sense. - We remove the dependency on the perl xlate code, which transforms the output into things that assemblers we don't care about use. Importantly, none of our changes affect the arithmetic or core code, but just involve the differing environment of kernel space. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16crypto: x86/poly1305 - import unmodified cryptogams implementationJason A. Donenfeld
These x86_64 vectorized implementations come from Andy Polyakov's CRYPTOGAMS implementation, and are included here in raw form without modification, so that subsequent commits that fix these up for the kernel can see how it has changed. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16crypto: poly1305 - add new 32 and 64-bit generic versionsJason A. Donenfeld
These two C implementations from Zinc -- a 32x32 one and a 64x64 one, depending on the platform -- come from Andrew Moon's public domain poly1305-donna portable code, modified for usage in the kernel. The precomputation in the 32-bit version and the use of 64x64 multiplies in the 64-bit version make these perform better than the code it replaces. Moon's code is also very widespread and has received many eyeballs of scrutiny. There's a bit of interference between the x86 implementation, which relies on internal details of the old scalar implementation. In the next commit, the x86 implementation will be replaced with a faster one that doesn't rely on this, so none of this matters much. But for now, to keep this passing the tests, we inline the bits of the old implementation that the x86 implementation relied on. Also, since we now support a slightly larger key space, via the union, some offsets had to be fixed up. Nonce calculation was folded in with the emit function, to take advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no nonce handling in emit, so this path was conditionalized. We also introduced a new struct, poly1305_core_key, to represent the precise amount of space that particular implementation uses. Testing with kbench9000, depending on the CPU, the update function for the 32x32 version has been improved by 4%-7%, and for the 64x64 by 19%-30%. The 32x32 gains are small, but I think there's great value in having a parallel implementation to the 64x64 one so that the two can be compared side-by-side as nice stand-alone units. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-15x86/cpu: Print "VMX disabled" error message iff KVM is enabledSean Christopherson
Don't print an error message about VMX being disabled by BIOS if KVM, the sole user of VMX, is disabled. E.g. if KVM is disabled and the MSR is unlocked, the kernel will intentionally disable VMX when locking feature control and then complain that "BIOS" disabled VMX. Fixes: ef4d3bf19855 ("x86/cpu: Clear VMX feature flag if VMX is not fully enabled") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200114202545.20296-1-sean.j.christopherson@intel.com
2020-01-15x86/mce/therm_throt: Do not access uninitialized therm_workChuansheng Liu
It is relatively easy to trigger the following boot splat on an Ice Lake client platform. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... The reason is that a CPU's thermal interrupt is enabled prior to executing its hotplug onlining callback which will initialize the throttling workqueues. Such a race can lead to therm_throt_process() accessing an uninitialized therm_work, leading to the above BUG at a very early bootup stage. Therefore, unmask the thermal interrupt vector only after having setup the workqueues completely. [ bp: Heavily massage commit message and correct comment formatting. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
2020-01-14arch/x86/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-24-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14x86/vdso: On timens page fault prefault also VVAR pageDmitry Safonov
As timens page has offsets to data on VVAR page VVAR is going to be accessed shortly. Set it up with timens in one page fault as optimization. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-26-dima@arista.com
2020-01-14x86/vdso: Handle faults on timens pageDmitry Safonov
If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com
2020-01-14x86/vdso: Add time napespace pageDmitry Safonov
To support time namespaces in the VDSO with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the VDSO data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. Allocate the time namespace page among VVAR pages and place vdso_data on it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the code-relative position of VVARs on that special page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com
2020-01-14x86/vdso: Provide vdso_data offset on vvar_pageDmitry Safonov
VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14x86/vdso: Restrict splitting VVAR VMADmitry Safonov
Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the amount of corner-cases to consider while working further on VDSO time namespace support. As the offset from timens to VVAR page is computed compile-time, the pages in VVAR should stay together and not being partically mremap()'ed. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-20-dima@arista.com
2020-01-14x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACKVincenzo Frascino
VDSO_HAS_32BIT_FALLBACK has been removed from the core since the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks. Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUsSean Christopherson
Change the dependency for KVM_INTEL, i.e. KVM w/ VMX, from Intel CPUs to any CPU that supports the IA32_FEAT_CTL MSR and thus VMX functionality. This effectively allows building KVM_INTEL for Centaur and Zhaoxin CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-20-sean.j.christopherson@intel.com
2020-01-13perf/x86: Provide stubs of KVM helpers for non-Intel CPUsSean Christopherson
Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n. Lack of stubs is not currently a problem as the only user, KVM_INTEL, takes a dependency on CPU_SUP_INTEL=y. Provide the stubs for all CPUs so that KVM_INTEL can be built for any CPU with compatible hardware support, e.g. Centuar and Zhaoxin CPUs. Note, the existing stub for perf_guest_get_msrs() is essentially dead code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees the full implementation is built. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-19-sean.j.christopherson@intel.com
2020-01-13KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bitsSean Christopherson
Define the VMCS execution control flags (consumed by KVM) using their associated VMX_FEATURE_* to provide a strong hint that new VMX features are expected to be added to VMX_FEATURE and considered for reporting via /proc/cpuinfo. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-18-sean.j.christopherson@intel.com
2020-01-13KVM: VMX: Check for full VMX support when verifying CPU compatibilitySean Christopherson
Explicitly check the current CPU's IA32_FEAT_CTL and VMX feature flags when verifying compatibility across physical CPUs. This effectively adds a check on IA32_FEAT_CTL to ensure that VMX is fully enabled on all CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-17-sean.j.christopherson@intel.com
2020-01-13KVM: VMX: Use VMX feature flag to query BIOS enablingSean Christopherson
Replace KVM's manual checks on IA32_FEAT_CTL with a query on the boot CPU's MSR_IA32_FEAT_CTL and VMX feature flags. The MSR_IA32_FEAT_CTL indicates that IA32_FEAT_CTL has been configured and that dependent features are accurately reflected in cpufeatures, e.g. the VMX flag is now cleared during boot if VMX isn't fully enabled via IA32_FEAT_CTL, including the case where the MSR isn't supported. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-16-sean.j.christopherson@intel.com
2020-01-13KVM: VMX: Drop initialization of IA32_FEAT_CTL MSRSean Christopherson
Remove KVM's code to initialize IA32_FEAT_CTL MSR when KVM is loaded now that the MSR is initialized during boot on all CPUs that support VMX, i.e. on all CPUs that can possibly load kvm_intel. Note, don't WARN if IA32_FEAT_CTL is unlocked, even though the MSR is unconditionally locked by init_ia32_feat_ctl(). KVM isn't tied directly to a CPU vendor detection, whereas init_ia32_feat_ctl() is invoked if and only if the CPU vendor is recognized and known to support VMX. As a result, vmx_disabled_by_bios() may be reached without going through init_ia32_feat_ctl() and thus without locking IA32_FEAT_CTL. This quirk will be eliminated in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lkml.kernel.org/r/20191221044513.21680-15-sean.j.christopherson@intel.com
2020-01-13x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configuredSean Christopherson
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks. Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU. As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()Sean Christopherson
Set the synthetic VMX cpufeatures, which need to be kept to preserve /proc/cpuinfo's ABI, in the common IA32_FEAT_CTL initialization code. Remove the vendor code that manually sets the synthetic flags. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-13-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*Sean Christopherson
Add support for generating VMX feature names in capflags.c and use the resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't print VMX flags if no bits are set in word 0, which holds Pin Controls. Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if they are not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU. Print the features in a dedicated "vmx flags" line to avoid polluting the common "flags" and to avoid having to prefix all flags with "vmx_", which results in horrendously long names. Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI for those flags. This means that "flags" and "vmx flags" will have duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at those six VMX features. The vendor-specific code that populates the synthetic flags will be consolidated in a future patch to further minimize the lasting damage. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-12-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUsSean Christopherson
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization. Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
2020-01-13x86/vmx: Introduce VMX_FEATURES_*Sean Christopherson
Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually supplant the synthetic VMX flags defined in cpufeatures word 8. Use the Intel-defined layouts for the major VMX execution controls so that their word entries can be directly populated from their respective MSRs, and so that the VMX_FEATURE_* flags can be used to define the existing bit definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE flag when adding support for a new hardware feature. The majority of Intel's (and compatible CPU's) VMX capabilities are enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't naturally provide any insight into the virtualization capabilities of VMX enabled CPUs. Commit e38e05a85828d ("x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo") attempted to address the issue by synthesizing select VMX features into a Linux-defined word in cpufeatures. Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic because there is no sane way for a user to query the capabilities of their platform, e.g. when trying to find a platform to test a feature or debug an issue that has a hardware dependency. Lack of reporting is especially problematic when the user isn't familiar with VMX, e.g. the format of the MSRs is non-standard, existence of some MSRs is reported by bits in other MSRs, several "features" from KVM's point of view are enumerated as 3+ distinct features by hardware, etc... The synthetic cpufeatures approach has several flaws: - The set of synthesized VMX flags has become extremely stale with respect to the full set of VMX features, e.g. only one new flag (EPT A/D) has been added in the the decade since the introduction of the synthetic VMX features. Failure to keep the VMX flags up to date is likely due to the lack of a mechanism that forces developers to consider whether or not a new feature is worth reporting. - The synthetic flags may incorrectly be misinterpreted as affecting kernel behavior, i.e. KVM, the kernel's sole consumer of VMX, completely ignores the synthetic flags. - New CPU vendors that support VMX have duplicated the hideous code that propagates VMX features from MSRs to cpufeatures. Bringing the synthetic VMX flags up to date would exacerbate the copy+paste trainwreck. Define separate VMX_FEATURE flags to set the stage for enumerating VMX capabilities outside of the cpu_has() framework, and for adding functional usage of VMX_FEATURE_* to help ensure the features reported via /proc/cpuinfo is up to date with respect to kernel recognition of VMX capabilities. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Clear VMX feature flag if VMX is not fully enabledSean Christopherson
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are known to support VMX[*], clear the VMX capability flag if the MSR is unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't set the appropriate VMX enable bit. [*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's still possible for IA32_FEAT_CTL to be left unlocked when VMX is supported by the CPU. This is not fatal, and will be addressed in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-9-sean.j.christopherson@intel.com
2020-01-13x86/zhaoxin: Use common IA32_FEAT_CTL MSR initializationSean Christopherson
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Zhaoxin CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-8-sean.j.christopherson@intel.com
2020-01-13x86/centaur: Use common IA32_FEAT_CTL MSR initializationSean Christopherson
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Centaur CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-7-sean.j.christopherson@intel.com
2020-01-13x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlockedSean Christopherson
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-6-sean.j.christopherson@intel.com
2020-01-13x86/intel: Initialize IA32_FEAT_CTL MSR at bootSean Christopherson
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is left unlocked by BIOS. Configuring feature control at boot time paves the way for similar enabling of other features, e.g. Software Guard Extensions (SGX). Temporarily leave equivalent KVM code in place in order to avoid introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing KVM's code would leave the MSR unlocked on those CPUs and would break existing functionality if people are loading kvm_intel on Centaur and/or Zhaoxin. Defer enablement of the boot-time configuration on Centaur and Zhaoxin to future patches to aid bisection. Note, Local Machine Check Exceptions (LMCE) are also supported by the kernel and enabled via feature control, but the kernel currently uses LMCE if and only if the feature is explicitly enabled by BIOS. Keep the current behavior to avoid introducing bugs, future patches can opt in to opportunistic enabling if it's deemed desirable to do so. Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support VMX, so that other existing and future kernel code that queries the MSR can assume it's locked. Start from a clean slate when constructing the value to write to IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not to enable random features or fault on the WRMSR. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-5-sean.j.christopherson@intel.com
2020-01-13x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSRSean Christopherson
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL are quite a mouthful, especially the VMX bits which must differentiate between enabling VMX inside and outside SMX (TXT) operation. Rename the MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to make them a little friendlier on the eyes. Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name to match Intel's SDM, but a future patch will add a dedicated Kconfig, file and functions for the MSR. Using the full name for those assets is rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its nomenclature is consistent throughout the kernel. Opportunistically, fix a few other annoyances with the defines: - Relocate the bit defines so that they immediately follow the MSR define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL. - Add whitespace around the block of feature control defines to make it clear they're all related. - Use BIT() instead of manually encoding the bit shift. - Use "VMX" instead of "VMXON" to match the SDM. - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to be consistent with the kernel's verbiage used for all other feature control bits. Note, the SDM refers to the LMCE bit as LMCE_ON, likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore the (literal) one-off usage of _ON, the SDM is simply "wrong". Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13x86/resctrl: Do not reconfigure exiting tasksXiaochen Shen
When writing a pid to file "tasks", a callback function move_myself() is queued to this task to be called when the task returns from kernel mode or exits. The purpose of move_myself() is to activate the newly assigned closid and/or rmid associated with this task. This activation is done by calling resctrl_sched_in() from move_myself(), the same function that is called when switching to this task. If this work is successfully queued but then the task enters PF_EXITING status (e.g., receiving signal SIGKILL, SIGTERM) prior to the execution of the callback move_myself(), move_myself() still calls resctrl_sched_in() since the task status is not currently considered. When a task is exiting, the data structure of the task itself will be freed soon. Calling resctrl_sched_in() to write the register that controls the task's resources is unnecessary and it implies extra performance overhead. Add check on task status in move_myself() and return immediately if the task is PF_EXITING. [ bp: Massage. ] Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/1578500026-21152-1-git-send-email-xiaochen.shen@intel.com
2020-01-13Merge back power capping changes for v5.6.Rafael J. Wysocki
2020-01-13x86/mce: Fix use of uninitialized MCE message stringJan H. Schönherr
The function mce_severity() is not required to update its msg argument. In fact, mce_severity_amd() does not, which makes mce_no_way_out() return uninitialized data, which may be used later for printing. Assuming that implementations of mce_severity() either always or never update the msg argument (which is currently the case), it is sufficient to initialize the temporary variable in mce_no_way_out(). While at it, avoid printing a useless "Unknown". Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-4-jschoenh@amazon.de
2020-01-13x86/mce: Fix mce=nobootlogJan H. Schönherr
Since commit 8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic pool twice") the mce=nobootlog option has become mostly ineffective (after being only slightly ineffective before), as the code is taking actions on MCEs left over from boot when they have a usable address. Move the check for MCP_DONTLOG a bit outward to make it effective again. Also, since commit 011d82611172 ("RAS: Add a Corrected Errors Collector") the two branches of the remaining "if" at the bottom of machine_check_poll() do same. Unify them. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-3-jschoenh@amazon.de
2020-01-13x86/mce: Take action on UCNA/Deferred errors againJan H. Schönherr
Commit fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll") added handling of UCNA and Deferred errors by adding them to the ring for SRAO errors. Later, commit fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors") switched storage from the SRAO ring to the unified pool that is still in use today. In order to only act on the intended errors, a filter for MCE_AO_SEVERITY is used -- effectively removing handling of UCNA/Deferred errors again. Extend the severity filter to include UCNA/Deferred errors again. Also, generalize the naming of the notifier from SRAO to UC to capture the extended scope. Note, that this change may cause a message like the following to appear, as the same address may be reported as SRAO and as UCNA: Memory failure: 0x5fe3284: already hardware poisoned Technically, this is a return to previous behavior. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de