summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-10-02Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf eventsNatarajan, Janakarajan
In Family 17h, some L3 Cache Performance events require the ThreadMask and SliceMask to be set. For other events, these fields do not affect the count either way. Set ThreadMask and SliceMask to 0xFF and 0xF respectively. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKXKan Liang
The counters on M3UPI Link 0 and Link 3 don't count properly, and writing 0 to these counters may causes system crash on some machines. The PCI BDF addresses of the M3UPI in the current code are incorrect. The correct addresses should be: D18:F1 0x204D D18:F2 0x204E D18:F5 0x204D Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1537538826-55489-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded ↵Masayoshi Mizuma
physical package ID 0 Physical package id 0 doesn't always exist, we should use boot_cpu_data.phys_proc_id here. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180910144750.6782-1-msys.mizuma@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02x86/vdso: Fix asm constraints on vDSO syscall fallbacksAndy Lutomirski
The syscall fallbacks in the vDSO have incorrect asm constraints. They are not marked as writing to their outputs -- instead, they are marked as clobbering "memory", which is useless. In particular, gcc is smart enough to know that the timespec parameter hasn't escaped, so a memory clobber doesn't clobber it. And passing a pointer as an asm *input* does not tell gcc that the pointed-to value is changed. Add in the fact that the asm instructions weren't volatile, and gcc was free to omit them entirely unless their sole output (the return value) is used. Which it is (phew!), but that stops happening with some upcoming patches. As a trivial example, the following code: void test_fallback(struct timespec *ts) { vdso_fallback_gettime(CLOCK_MONOTONIC, ts); } compiles to: 00000000000000c0 <test_fallback>: c0: c3 retq To add insult to injury, the RCX and R11 clobbers on 64-bit builds were missing. The "memory" clobber is also unnecessary -- no ordering with respect to other memory operations is needed, but that's going to be fixed in a separate not-for-stable patch. Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
2018-10-01Merge tag 'v4.19-rc6' into for-4.20/blockJens Axboe
Merge -rc6 in, for two reasons: 1) Resolve a trivial conflict in the blk-mq-tag.c documentation 2) A few important regression fixes went into upstream directly, so they aren't in the 4.20 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v4.19-rc6': (780 commits) Linux 4.19-rc6 MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c cpufreq: qcom-kryo: Fix section annotations perf/core: Add sanity check to deal with pinned event failure xen/blkfront: correct purging of persistent grants Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" selftests/powerpc: Fix Makefiles for headers_install change blk-mq: I/O and timer unplugs are inverted in blktrace dax: Fix deadlock in dax_lock_mapping_entry() x86/boot: Fix kexec booting failure in the SEV bit detection code bcache: add separate workqueue for journal_write to avoid deadlock drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" xen/blkfront: When purging persistent grants, keep them in the buffer clocksource/drivers/timer-atmel-pit: Properly handle error cases block: fix deadline elevator drain for zoned block devices ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-01KVM: x86: fix L1TF's MMIO GFN calculationSean Christopherson
One defense against L1TF in KVM is to always set the upper five bits of the *legal* physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGSLiran Alon
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls. Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which is L1 IA32_BNDCFGS. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: x86: Do not use kvm_x86_ops->mpx_supported() directlyLiran Alon
Commit a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") introduced kvm_mpx_supported() to return true iff MPX is enabled in the host. However, that commit seems to have missed replacing some calls to kvm_x86_ops->mpx_supported() to kvm_mpx_supported(). Complete original commit by replacing remaining calls to kvm_mpx_supported(). Fixes: a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabledLiran Alon
Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. Without this change, a L1 guest running with kernel which don't have commit 691bd4340bef ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS") asserts in QEMU on the following: qemu-kvm: error: failed to set MSR 0xd90 to 0x0 qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs: Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed' This is because L1 KVM kvm_init_msr_list() will see that vmx_mpx_supported() (As it only checks MPX VMX controls support) and therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS. However, later when L1 will attempt to set this MSR via KVM_SET_MSRS IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu). Therefore, fix the issue by exposing MPX VMX controls to L1 guest only when vCPU supports MPX. Fixes: 36be0b9deb23 ("KVM: x86: Add nested virtualization support for MPX") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01x86/build: Remove unused CONFIG_AS_CRC32Masahiro Yamada
CONFIG_AS_CRC32 is not used anywhere. Its last user was removed by 0cb6c969ed9d ("net, lib: kill arch_fast_hash library bits") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/1538389443-28514-1-git-send-email-yamada.masahiro@socionext.com
2018-10-01x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double()Uros Bizjak
Replace open-coded use of the SETcc instruction with CC_SET()/CC_OUT() in __cmpxchg_double(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/CAFULd4YdvwwhXWHqqPsGk5+TLG71ozgSscTZNsqmrm+Jzg941w@mail.gmail.com
2018-09-29Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A single fix for the AMD memory encryption boot code so it does not read random garbage instead of the cached encryption bit when a kexec kernel is allocated above the 32bit address limit." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix kexec booting failure in the SEV bit detection code
2018-09-28x86/intel_rdt: Use perf infrastructure for measurementsReinette Chatre
The success of a cache pseudo-locked region is measured using performance monitoring events that are programmed directly at the time the user requests a measurement. Modifying the performance event registers directly is not appropriate since it circumvents the in-kernel perf infrastructure that exists to manage these resources and provide resource arbitration to the performance monitoring hardware. The cache pseudo-locking measurements are modified to use the in-kernel perf infrastructure. Performance events are created and validated with the appropriate perf API. The performance counters are still read as directly as possible to avoid the additional cache hits. This is done safely by first ensuring with the perf API that the counters have been programmed correctly and only accessing the counters in an interrupt disabled section where they are not able to be moved. As part of the transition to the in-kernel perf infrastructure the L2 and L3 measurements are split into two separate measurements that can be triggered independently. This separation prevents additional cache misses incurred during the extra testing code used to decide if a L2 and/or L3 measurement should be made. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/fc24e728b446404f42c78573c506e98cd0599873.1537468643.git.reinette.chatre@intel.com
2018-09-28x86/intel_rdt: Create required perf event attributesReinette Chatre
A perf event has many attributes that are maintained in a separate structure that should be provided when a new perf_event is created. In preparation for the transition to perf_events the required attribute structures are created for all the events that may be used in the measurements. Most attributes for all the events are identical. The actual configuration, what specifies what needs to be measured, is what will be different between the events used. This configuration needs to be done with X86_CONFIG that cannot be used as part of the designated initializers used here, this will be introduced later. Although they do look identical at this time the attribute structures needs to be maintained separately since a perf_event will maintain a pointer to its unique attributes. In support of patch testing the new structs are given the unused attribute until their use in later patches. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1822f6164e221a497648d108913d056ab675d5d0.1537377064.git.reinette.chatre@intel.com
2018-09-28x86/intel_rdt: Remove local register variablesReinette Chatre
Local register variables were used in an effort to improve the accuracy of the measurement of cache residency of a pseudo-locked region. This was done to ensure that only the cache residency of the memory is measured and not the cache residency of the variables used to perform the measurement. While local register variables do accomplish the goal they do require significant care since different architectures have different registers available. Local register variables also cannot be used with valuable developer tools like KASAN. Significant testing has shown that similar accuracy in measurement results can be obtained by replacing local register variables with regular local variables. Make use of local variables in the critical code but do so with READ_ONCE() to prevent the compiler from merging or refetching reads. Ensure these variables are initialized before the measurement starts, and ensure it is only the local variables that are accessed during the measurement. With the removal of the local register variables and using READ_ONCE() there is no longer a motivation for using a direct wrmsr call (that avoids the additional tracing code that may clobber the local register variables). Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/f430f57347414e0691765d92b144758ab93d8407.1537377064.git.reinette.chatre@intel.com
2018-09-28perf/x86: Add helper to obtain performance counter indexReinette Chatre
perf_event_read_local() is the safest way to obtain measurements associated with performance events. In some cases the overhead introduced by perf_event_read_local() affects the measurements and the use of rdpmcl() is needed. rdpmcl() requires the index of the performance counter used so a helper is introduced to determine the index used by a provided performance event. The index used by a performance event may change when interrupts are enabled. A check is added to ensure that the index is only accessed with interrupts disabled. Even with this check the use of this counter needs to be done with care to ensure it is queried and used within the same disabled interrupts section. This change introduces a new checkpatch warning: CHECK: extern prototypes should be avoided in .h files +extern int x86_perf_rdpmc_index(struct perf_event *event); This warning was discussed and designated as a false positive in http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.net Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.com
2018-09-28x86: DT: use for_each_of_cpu_node iteratorRob Herring
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This has the side effect of defaulting to iterating using "cpu" node names in preference to the deprecated (for FDT) device_type == "cpu". Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rob Herring <robh@kernel.org>
2018-09-28x86/fpu: Remove VLA usage of skcipherKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: x86@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-27x86/hyperv: Remove unused includeYueHaibing
Remove including <linux/version.h>. It's not needed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <devel@linuxdriverproject.org> Cc: <kernel-janitors@vger.kernel.org> Link: https://lkml.kernel.org/r/1537690822-97455-1-git-send-email-yuehaibing@huawei.com
2018-09-27x86/hyperv: Suppress "PCI: Fatal: No config space access function found"Dexuan Cui
A Generation-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and users always see the scary warning, which is actually harmless. Suppress it. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: Olaf Aepfle <olaf@aepfle.de> Cc: Andy Whitcroft <apw@canonical.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Marcelo Cerri <marcelo.cerri@canonical.com> Cc: Josh Poulson <jopoulso@microsoft.com> Link: https://lkml.kernel.org/r/ <KU1P153MB0166D977DC930996C4BF538ABF1D0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
2018-09-27x86/mm/cpa: Optimize __cpa_flush_range()Peter Zijlstra
If we IPI for WBINDV, then we might as well kill the entire TLB too. But if we don't have to invalidate cache, there is no reason not to use a range TLB flush. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.195633798@infradead.org
2018-09-27x86/mm/cpa: Factor common code between cpa_flush_*()Peter Zijlstra
The start of cpa_flush_range() and cpa_flush_array() is the same, use a common function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.138859183@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_array()Peter Zijlstra
Rather than guarding cpa_flush_array() users with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.087848187@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_range()Peter Zijlstra
Rather than guarding all cpa_flush_range() uses with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.036195503@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_kernel_range()Peter Zijlstra
Both cpa_flush_range() and cpa_flush_array() have a well specified range, use that to do a range based TLB invalidate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.985193217@infradead.org
2018-09-27x86/mm/cpa: Unconditionally avoid WBINDV when we canPeter Zijlstra
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org
2018-09-27x86/mm/cpa: Move flush_tlb_all()Peter Zijlstra
There is an atom errata, where we do a local TLB invalidate right before we return and then do a global TLB invalidate. Move the global invalidate up a little bit and avoid the local invalidate entirely. This does put the global invalidate under pgd_lock, but that shouldn't matter. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.882287392@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_all()Peter Zijlstra
Instead of open-coding it.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.831102058@infradead.org
2018-09-27x86/mm/cpa: Avoid the 4k pages check completelyThomas Gleixner
The extra loop which tries hard to preserve large pages in case of conflicts with static protection regions turns out to be not preserving anything, at least not in the experiments which have been conducted. There might be corner cases in which the code would be able to preserve a large page oaccsionally, but it's really not worth the extra code and the cycles wasted in the common case. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.de
2018-09-27x86/mm/cpa: Do the range check earlyThomas Gleixner
To avoid excessive 4k wise checks in the common case do a quick check first whether the requested new page protections conflict with a static protection area in the large page. If there is no conflict then the decision whether to preserve or to split the page can be made immediately. If the requested range covers the full large page, preserve it. Otherwise split it up. No point in doing a slow crawl in 4k steps. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.de
2018-09-27x86/mm/cpa: Optimize same protection checkThomas Gleixner
When the existing mapping is correct and the new requested page protections are the same as the existing ones, then further checks can be omitted and the large page can be preserved. The slow path 4k wise check will not come up with a different result. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.de
2018-09-27x86/mm/cpa: Add sanity check for existing mappingsThomas Gleixner
With the range check it is possible to do a quick verification that the current mapping is correct vs. the static protection areas. In case a incorrect mapping is detected a warning is emitted and the large page is split up. If the large page is a 2M page, then the split code is forced to check the static protections for the PTE entries to fix up the incorrectness. For 1G pages this can't be done easily because that would require to either find the offending 2M areas before the split or afterwards. For now just warn about that case and revisit it when reported. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.de
2018-09-27x86/mm/cpa: Avoid static protection checks on unmapThomas Gleixner
If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are completely irrelevant. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.de
2018-09-27x86/mm/cpa: Add large page preservation statisticsThomas Gleixner
The large page preservation mechanism is just magic and provides no information at all. Add optional statistic output in debugfs so the magic can be evaluated. Defaults is off. Output: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de
2018-09-27x86/mm/cpa: Add debug mechanismThomas Gleixner
The whole static protection magic is silently fixing up anything which is handed in. That's just wrong. The offending call sites need to be fixed. Add a debug mechanism which emits a warning if a requested mapping needs to be fixed up. The DETECT debug mechanism is really not meant to be enabled except for developers, so limit the output hard to the protection fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.de
2018-09-27x86/mm/cpa: Allow range check for static protectionsThomas Gleixner
Checking static protections only page by page is slow especially for huge pages. To allow quick checks over a complete range, add the ability to do that. Make the checks inclusive so the ranges can be directly used for debug output later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.de
2018-09-27x86/mm/cpa: Rework static_protections()Thomas Gleixner
static_protections() is pretty unreadable. Split it up into separate checks for each protection area. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.de
2018-09-27x86/mm/cpa: Split, rename and clean up try_preserve_large_page()Thomas Gleixner
Avoid the extra variable and gotos by splitting the function into the actual algorithm and a callable function which contains the lock protection. Rename it to should_split_large_page() while at it so the return values make actually sense. Clean up the code flow, comments and general whitespace damage while at it. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.830507216@linutronix.de
2018-09-27x86/mm/init32: Mark text and rodata RO in one goThomas Gleixner
The sequence of marking text and rodata read-only in 32bit init is: set_ro(text); kernel_set_to_readonly = 1; set_ro(rodata); When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA for the read only regions. With the upcoming checks for existing mappings this consequently triggers the warning about an existing mapping being incorrect vs. static protections because rodata has not been converted yet. There is no technical reason to split the two, so just combine the RO protection to convert text and rodata in one go. Convert the printks to pr_info while at it. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.731701535@linutronix.de
2018-09-27x86/boot: Fix kexec booting failure in the SEV bit detection codeKairui Song
Commit 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") can occasionally cause system resets when kexec-ing a second kernel even if SEV is not active. That's because get_sev_encryption_bit() uses 32-bit rIP-relative addressing to read the value of enc_bit - a variable which caches a previously detected encryption bit position - but kexec may allocate the early boot code to a higher location, beyond the 32-bit addressing limit. In this case, garbage will be read and get_sev_encryption_bit() will return the wrong value, leading to accessing memory with the wrong encryption setting. Therefore, remove enc_bit, and thus get rid of the need to do 32-bit rIP-relative addressing in the first place. [ bp: massage commit message heavily. ] Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: brijesh.singh@amd.com Cc: kexec@lists.infradead.org Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: ghook@redhat.com Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
2018-09-27x86/xen: Add Hygon Dhyana support to XenPu Wen
To make Xen work on the Hygon platform, reuse AMD's Xen support code path for Hygon Dhyana CPU. There are six core performance events counters per thread, so there are six MSRs for these counters. Also there are four legacy PMC MSRs, they are aliases of the counters. In this version, use the legacy and safe version of MSR access. Tested successfully with VPMU enabled in Xen on Hygon platform by testing with perf. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: jgross@suse.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/311bf41f08f24550aa6c5da3f1e03a68d3b89dac.1537533369.git.puwen@hygon.cn
2018-09-27x86/kvm: Add Hygon Dhyana support to KVMPu Wen
The Hygon Dhyana CPU has the SVM feature as AMD family 17h does. So enable the KVM infrastructure support to it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/654dd12876149fba9561698eaf9fc15d030301f8.1537533369.git.puwen@hygon.cn
2018-09-27x86/mce: Add Hygon Dhyana support to the MCA infrastructurePu Wen
The machine check architecture for Hygon Dhyana CPU is similar to the AMD family 17h one. Add vendor checking for Hygon Dhyana to share the code path of AMD family 17h. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: tony.luck@intel.com Cc: thomas.lendacky@amd.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/87d8a4f16bdea0bfe0c0cf2e4a8d2c2a99b1055c.1537533369.git.puwen@hygon.cn
2018-09-27x86/bugs: Add Hygon Dhyana to the respective mitigation machineryPu Wen
The Hygon Dhyana CPU has the same speculative execution as AMD family 17h, so share AMD spectre mitigation code with Hygon Dhyana. Also Hygon Dhyana is not affected by meltdown, so add exception for it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/0861d39c8a103fc0deca15bafbc85d403666d9ef.1537533369.git.puwen@hygon.cn
2018-09-27x86/apic: Add Hygon Dhyana supportPu Wen
Add Hygon Dhyana support to the APIC subsystem. When running in 32 bit mode, bigsmp should be enabled if there are more than 8 cores online. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/7a557265a8c7c9e842fe60f9d8e064458801aef3.1537533369.git.puwen@hygon.cn
2018-09-27x86/pci, x86/amd_nb: Add Hygon Dhyana support to PCI and northbridgePu Wen
Hygon's PCI vendor ID is 0x1d94, and there are PCI devices 0x1450/0x1463/0x1464 for the host bridge on the Hygon Dhyana platform. Add Hygon Dhyana support to the PCI and northbridge subsystems by using the code path of AMD family 17h. [ bp: Massage commit message, sort local vars into reverse xmas tree order and move the amd_northbridges.num check up. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: helgaas@kernel.org Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/5f8877bd413f2ea0833378dd5454df0720e1c0df.1537885177.git.puwen@hygon.cn
2018-09-27x86/amd_nb: Check vendor in AMD-only functionsPu Wen
Exit early in functions which are meant to run on AMD only but which get run on different vendor (VMs, etc). [ bp: rewrite commit message. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhelgaas@google.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: helgaas@kernel.org Link: https://lkml.kernel.org/r/487d8078708baedaf63eb00a82251e228b58f1c2.1537885177.git.puwen@hygon.cn
2018-09-27x86/alternative: Init ideal_nops for Hygon DhyanaPu Wen
The ideal_nops for Hygon Dhyana CPU should be p6_nops. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/79e76c3173716984fe5fdd4a8e2c798bf4193205.1537533369.git.puwen@hygon.cn
2018-09-27x86/events: Add Hygon Dhyana support to PMU infrastructurePu Wen
The PMU architecture for the Hygon Dhyana CPU is similar to the AMD Family 17h one. To support it, call amd_pmu_init() to share the AMD PMU initialization flow, and change the PMU name to "HYGON". The Hygon Dhyana CPU supports both legacy and extension PMC MSRs (perf counter registers and event selection registers), so add Hygon Dhyana support in the similar way as AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/9d93ed54a975f33ef7247e0967960f4ce5d3d990.1537533369.git.puwen@hygon.cn