summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2019-06-22x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASEAndy Lutomirski
This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/ptrace: Prevent ptrace from clearing the FS/GS selectorChang S. Bae
When a ptracer writes a ptracee's FS/GSBASE with a different value, the selector is also cleared. This behavior is not correct as the selector should be preserved. Update only the base value and leave the selector intact. To simplify the code further remove the conditional checking for the same value as this code is not performance critical. The only recognizable downside of this change is when the selector is already nonzero on write. The base will be reloaded according to the selector. But the case is highly unexpected in real usages. [ tglx: Massage changelog ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com
2019-06-20x86/resctrl: Prevent possible overrun during bitmap operationsReinette Chatre
While the DOC at the beginning of lib/bitmap.c explicitly states that "The number of valid bits in a given bitmap does _not_ need to be an exact multiple of BITS_PER_LONG.", some of the bitmap operations do indeed access BITS_PER_LONG portions of the provided bitmap no matter the size of the provided bitmap. For example, if find_first_bit() is provided with an 8 bit bitmap the operation will access BITS_PER_LONG bits from the provided bitmap. While the operation ensures that these extra bits do not affect the result, the memory is still accessed. The capacity bitmasks (CBMs) are typically stored in u32 since they can never exceed 32 bits. A few instances exist where a bitmap_* operation is performed on a CBM by simply pointing the bitmap operation to the stored u32 value. The consequence of this pattern is that some bitmap_* operations will access out-of-bounds memory when interacting with the provided CBM. This same issue has previously been addressed with commit 49e00eee0061 ("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests") but at that time not all instances of the issue were fixed. Fix this by using an unsigned long to store the capacity bitmask data that is passed to bitmap functions. Fixes: e651901187ab ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Fixes: f4e80d67a527 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information") Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.com
2019-06-20x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructionsFenghua Yu
AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point format (BF16) for deep learning optimization. BF16 is a short version of 32-bit single-precision floating-point format (FP32) and has several advantages over 16-bit half-precision floating-point format (FP16). BF16 keeps FP32 accumulation after multiplication without loss of precision, offers more than enough range for deep learning training tasks, and doesn't need to handle hardware exception. AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5] AVX512_BF16. CPUID.7.1:EAX contains only feature bits. Reuse the currently empty word 12 as a pure features word to hold the feature bits including AVX512_BF16. Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <namit@vmware.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: Robert Hoo <robert.hu@linux.intel.com> Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: x86 <x86@kernel.org> Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
2019-06-20x86/cpufeatures: Combine word 11 and 12 into a new scattered features wordFenghua Yu
It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two whole feature bits words. To better utilize feature words, re-define word 11 to host scattered features and move the four X86_FEATURE_CQM_* features into Linux defined word 11. More scattered features can be added in word 11 in the future. Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a Linux-defined leaf. Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful name in the next patch when CPUID.7.1:EAX occupies world 12. Maximum number of RMID and cache occupancy scale are retrieved from CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the code into a separate function. KVM doesn't support resctrl now. So it's safe to move the X86_FEATURE_CQM_* features to scattered features word 11 for KVM. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lewis <aaronlewis@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Babu Moger <babu.moger@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: kvm ML <kvm@vger.kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: x86 <x86@kernel.org> Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
2019-06-20x86/cpufeatures: Carve out CQM features retrievalBorislav Petkov
... into a separate function for better readability. Split out from a patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical, sole code movement separate for easy review. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org
2019-06-20x86/kexec: Set the C-bit in the identity map page table when SEV is activeLianbo Jiang
When SEV is active, the second kernel image is loaded into encrypted memory. For that, make sure that when kexec builds the identity mapping page table, the memory is encrypted (i.e., _PAGE_ENC is set). [ bp: Sort local args and OR in _PAGE_ENC for more clarity. ] Co-developed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: bhe@redhat.com Cc: dyoung@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190430074421.7852-3-lijiang@redhat.com
2019-06-20x86/kexec: Do not map kexec area as decrypted when SEV is activeLianbo Jiang
When a virtual machine panics, its memory needs to be dumped for analysis. With memory encryption in the picture, special care must be taken when loading a kexec/kdump kernel in a SEV guest. A SEV guest starts and runs fully encrypted. In order to load a kexec kernel and initrd, arch_kexec_post_{alloc,free}_pages() need to not map areas as decrypted unconditionally but differentiate whether the kernel is running as a SEV guest and if so, leave kexec area encrypted. [ bp: Reduce commit message to the relevant information pertaining to this commit only. ] Co-developed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: bhe@redhat.com Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dyoung@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190430074421.7852-2-lijiang@redhat.com
2019-06-20x86/crash: Add e820 reserved ranges to kdump kernel's e820 tableLianbo Jiang
At present, when using the kexec_file_load() syscall to load the kernel image and initramfs, for example: kexec -s -p xxx the kernel does not pass the e820 reserved ranges to the second kernel, which might cause two problems: 1. MMCONFIG: A device in PCI segment 1 cannot be discovered by the kernel PCI probing without all the e820 I/O reservations being present in the e820 table. Which is the case currently, because the kdump kernel does not have those reservations because the kexec command does not pass the I/O reservation via the "memmap=xxx" command line option. Further details courtesy of Bjorn Helgaas¹: I think you should regard correct MCFG/ECAM usage in the kdump kernel as a requirement. MMCONFIG (aka ECAM) space is described in the ACPI MCFG table. If you don't have ECAM: (a) PCI devices won't work at all on non-x86 systems that use only ECAM for config access, (b) you won't be able to access devices on non-0 segments (granted, there aren't very many of these yet, but there will be more in the future), and (c) you won't be able to access extended config space (addresses 0x100-0xfff), which means none of the Extended Capabilities will be available (AER, ACS, ATS, etc). 2. The second issue is that the SME kdump kernel doesn't work without the e820 reserved ranges. When SME is active in the kdump kernel, those reserved regions are still decrypted, but because those reserved ranges are not present at all in kdump kernel's e820 table, they are accessed as encrypted. Which is obviously wrong. [1]: https://lkml.kernel.org/r/CABhMZUUscS3jUZUSM5Y6EYJK6weo7Mjj5-EAKGvbw0qEe%2B38zw@mail.gmail.com [ bp: Heavily massage commit message. ] Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Baoquan He <bhe@redhat.com> Cc: Bjorn Helgaas <bjorn.helgaas@gmail.com> Cc: dave.hansen@linux.intel.com Cc: Dave Young <dyoung@redhat.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Cc: Yi Wang <wang.yi59@zte.com.cn> Link: https://lkml.kernel.org/r/20190423013007.17838-4-lijiang@redhat.com
2019-06-20x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVEDLianbo Jiang
When executing the kexec_file_load() syscall, the first kernel needs to pass the e820 reserved ranges to the second kernel because some devices (PCI, for example) need them present in the kdump kernel for proper initialization. But the kernel can not exactly match the e820 reserved ranges when walking through the iomem resources using the default IORES_DESC_NONE descriptor, because there are several types of e820 ranges which are marked IORES_DESC_NONE, see e820_type_to_iores_desc(). Therefore, add a new I/O resource descriptor called IORES_DESC_RESERVED to mark exactly those ranges. It will be used to match the reserved resource ranges when walking through iomem resources. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: bhe@redhat.com Cc: dave.hansen@linux.intel.com Cc: dyoung@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huang Zijiang <huang.zijiang@zte.com.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Juergen Gross <jgross@suse.com> Cc: kexec@lists.infradead.org Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190423013007.17838-2-lijiang@redhat.com
2019-06-20x86/mm: Create a workarea in the kernel for SME early encryptionThomas Lendacky
In order for the kernel to be encrypted "in place" during boot, a workarea outside of the kernel must be used. This SME workarea used during early encryption of the kernel is situated on a 2MB boundary after the end of the kernel text, data, etc. sections (_end). This works well during initial boot of a compressed kernel because of the relocation used for decompression of the kernel. But when performing a kexec boot, there's a chance that the SME workarea may not be mapped by the kexec pagetables or that some of the other data used by kexec could exist in this range. Create a section for SME in vmlinux.lds.S. Position it after "_end", which is after "__end_of_kernel_reserve", so that the memory will be reclaimed during boot and since this area is all zeroes, it compresses well. This new section will be part of the kernel image, so kexec will account for it in pagetable mappings and placement of data after the kernel. Here's an example of a kernel size without and with the SME section: without: vmlinux: 36,501,616 bzImage: 6,497,344 100000000-47f37ffff : System RAM 1e4000000-1e47677d4 : Kernel code (0x7677d4) 1e47677d5-1e4e2e0bf : Kernel data (0x6c68ea) 1e5074000-1e5372fff : Kernel bss (0x2fefff) with: vmlinux: 44,419,408 bzImage: 6,503,136 880000000-c7ff7ffff : System RAM 8cf000000-8cf7677d4 : Kernel code (0x7677d4) 8cf7677d5-8cfe2e0bf : Kernel data (0x6c68ea) 8d0074000-8d0372fff : Kernel bss (0x2fefff) Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Lianbo Jiang <lijiang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/3c483262eb4077b1654b2052bd14a8d011bffde3.1560969363.git.thomas.lendacky@amd.com
2019-06-20x86/mm: Identify the end of the kernel area to be reservedThomas Lendacky
The memory occupied by the kernel is reserved using memblock_reserve() in setup_arch(). Currently, the area is from symbols _text to __bss_stop. Everything after __bss_stop must be specifically reserved otherwise it is discarded. This is not clearly documented. Add a new symbol, __end_of_kernel_reserve, that more readily identifies what is reserved, along with comments that indicate what is reserved, what is discarded and what needs to be done to prevent a section from being discarded. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Lianbo Jiang <lijiang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7db7da45b435f8477f25e66f292631ff766a844c.1560969363.git.thomas.lendacky@amd.com
2019-06-19x86/cacheinfo: Fix a -Wtype-limits warningQian Cai
cpuinfo_x86.x86_model is an unsigned type, so comparing against zero will generate a compilation warning: arch/x86/kernel/cpu/cacheinfo.c: In function 'cacheinfo_amd_init_llc_id': arch/x86/kernel/cpu/cacheinfo.c:662:19: warning: comparison is always true \ due to limited range of data type [-Wtype-limits] Remove the unnecessary lower bound check. [ bp: Massage. ] Fixes: 68091ee7ac3c ("x86/CPU/AMD: Calculate last level cache ID from number of sharing threads") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1560954773-11967-1-git-send-email-cai@lca.pw
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 477Thomas Gleixner
Based on 1 normalized pattern(s): subject to gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081204.018005938@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243Thomas Gleixner
Based on 1 normalized pattern(s): this file is licensed under the gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204654.634736654@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230Thomas Gleixner
Based on 2 normalized pattern(s): this source code is licensed under the gnu general public license version 2 see the file copying for more details this source code is licensed under general public license version 2 see extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 52 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19x86/microcode: Fix the microcode load on CPU hotplug for realThomas Gleixner
A recent change moved the microcode loader hotplug callback into the early startup phase which is running with interrupts disabled. It missed that the callbacks invoke sysfs functions which might sleep causing nice 'might sleep' splats with proper debugging enabled. Split the callbacks and only load the microcode in the early startup phase and move the sysfs handling back into the later threaded and preemptible bringup phase where it was before. Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.de
2019-06-17x86/fpu: Remove the fpu__save() exportChristoph Hellwig
This function is only use by the core FPU code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604071524.12835-4-hch@lst.de
2019-06-17x86/fpu: Simplify kernel_fpu_begin()Christoph Hellwig
Merge two helpers into the main function, remove a pointless local variable and flatten a conditional. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604071524.12835-3-hch@lst.de
2019-06-17Merge tag 'v5.2-rc5' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/jump_label: Batch jump label updatesDaniel Bristot de Oliveira
Currently, the jump label of a static key is transformed via the arch specific function: void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type) The new approach (batch mode) uses two arch functions, the first has the same arguments of the arch_jump_label_transform(), and is the function: bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_type type) Rather than transforming the code, it adds the jump_entry in a queue of entries to be updated. This functions returns true in the case of a successful enqueue of an entry. If it returns false, the caller must to apply the queue and then try to queue again, for instance, because the queue is full. This function expects the caller to sort the entries by the address before enqueueuing then. This is already done by the arch independent code, though. After queuing all jump_entries, the function: void arch_jump_label_transform_apply(void) Applies the changes in the queue. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/57b4caa654bad7e3b066301c9a9ae233dea065b5.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/alternative: Batch of patch operationsDaniel Bristot de Oliveira
Currently, the patch of an address is done in three steps: -- Pseudo-code #1 - Current implementation --- 1) add an int3 trap to the address that will be patched sync cores (send IPI to all other CPUs) 2) update all but the first byte of the patched range sync cores (send IPI to all other CPUs) 3) replace the first byte (int3) by the first byte of replacing opcode sync cores (send IPI to all other CPUs) -- Pseudo-code #1 --- When a static key has more than one entry, these steps are called once for each entry. The number of IPIs then is linear with regard to the number 'n' of entries of a key: O(n*3), which is O(n). This algorithm works fine for the update of a single key. But we think it is possible to optimize the case in which a static key has more than one entry. For instance, the sched_schedstats jump label has 56 entries in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in which the thread that is enabling the key is _not_ running. With this patch, rather than receiving a single patch to be processed, a vector of patches is passed, enabling the rewrite of the pseudo-code #1 in this way: -- Pseudo-code #2 - This patch --- 1) for each patch in the vector: add an int3 trap to the address that will be patched sync cores (send IPI to all other CPUs) 2) for each patch in the vector: update all but the first byte of the patched range sync cores (send IPI to all other CPUs) 3) for each patch in the vector: replace the first byte (int3) by the first byte of replacing opcode sync cores (send IPI to all other CPUs) -- Pseudo-code #2 - This patch --- Doing the update in this way, the number of IPI becomes O(3) with regard to the number of keys, which is O(1). The batch mode is done with the function text_poke_bp_batch(), that receives two arguments: a vector of "struct text_to_poke", and the number of entries in the vector. The vector must be sorted by the addr field of the text_to_poke structure, enabling the binary search of a handler in the poke_int3_handler function (a fast path). Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/ca506ed52584c80f64de23f6f55ca288e5d079de.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/jump_label: Add a __jump_label_set_jump_code() helperDaniel Bristot de Oliveira
Move the definition of the code to be written from __jump_label_transform() to a specialized function. No functional change. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/d2f52a0010ecd399cf9b02a65bcf5836571b9e52.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17Merge tag 'v5.2-rc5' into x86/asm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/fpu: Simplify kernel_fpu_end()Christoph Hellwig
Remove two little helpers and merge them into kernel_fpu_end() to streamline the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604071524.12835-2-hch@lst.de
2019-06-16x86/apic: Make apic_bsp_setup() staticThomas Gleixner
No user outside of apic.c. Remove the stale and bogus function comment while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-06-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "The accumulated fixes from this and last week: - Fix vmalloc TLB flush and map range calculations which lead to stale TLBs, spurious faults and other hard to diagnose issues. - Use fault_in_pages_writable() for prefaulting the user stack in the FPU code as it's less fragile than the current solution - Use the PF_KTHREAD flag when checking for a kernel thread instead of current->mm as the latter can give the wrong answer due to use_mm() - Compute the vmemmap size correctly for KASLR and 5-Level paging. Otherwise this can end up with a way too small vmemmap area. - Make KASAN and 5-level paging work again by making sure that all invalid bits are masked out when computing the P4D offset. This worked before but got broken recently when the LDT remap area was moved. - Prevent a NULL pointer dereference in the resource control code which can be triggered with certain mount options when the requested resource is not available. - Enforce ordering of microcode loading vs. perf initialization on secondary CPUs. Otherwise perf tries to access a non-existing MSR as the boot CPU marked it as available. - Don't stop the resource control group walk early otherwise the control bitmaps are not updated correctly and become inconsistent. - Unbreak kgdb by returning 0 on success from kgdb_arch_set_breakpoint() instead of an error code. - Add more Icelake CPU model defines so depending changes can be queued in other trees" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback x86/kasan: Fix boot with 5-level paging and KASAN x86/fpu: Don't use current->mm to check for a kthread x86/kgdb: Return 0 from kgdb_arch_set_breakpoint() x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled x86/resctrl: Don't stop walking closids when a locksetup group is found x86/fpu: Update kernel's FPU state before using for the fsave header x86/mm/KASLR: Compute the size of the vmemmap section properly x86/fpu: Use fault_in_pages_writeable() for pre-faulting x86/CPU: Add more Icelake model numbers mm/vmalloc: Avoid rare case of flushing TLB with weird arguments mm/vmalloc: Fix calculation of direct map addr range
2019-06-15x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callbackBorislav Petkov
Adric Blake reported the following warning during suspend-resume: Enabling non-boot CPUs ... x86: Booting SMP configuration: smpboot: Booting Node 0 Processor 1 APIC 0x2 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \ at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20) Call Trace: intel_set_tfa intel_pmu_cpu_starting ? x86_pmu_dead_cpu x86_pmu_starting_cpu cpuhp_invoke_callback ? _raw_spin_lock_irqsave notify_cpu_starting start_secondary secondary_startup_64 microcode: sig=0x806ea, pf=0x80, revision=0x96 microcode: updated to revision 0xb4, date = 2019-04-01 CPU1 is up The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated by microcode. The log above shows that the microcode loader callback happens after the PMU restoration, leading to the conjecture that because the microcode hasn't been updated yet, that MSR is not present yet, leading to the #GP. Add a microcode loader-specific hotplug vector which comes before the PERF vectors and thus executes earlier and makes sure the MSR is present. Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Reported-by: Adric Blake <promarbler14@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: x86@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
2019-06-14Merge tag 'v5.2-rc4' into mauroJonathan Corbet
We need to pick up post-rc1 changes to various document files so they don't get lost in Mauro's massive RST conversion push.
2019-06-14x86/amd_nb: Make hygon_nb_misc_ids staticYueHaibing
Fix the following sparse warning: arch/x86/kernel/amd_nb.c:74:28: warning: symbol 'hygon_nb_misc_ids' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Brian Woods <Brian.Woods@amd.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190614155441.22076-1-yuehaibing@huawei.com
2019-06-14x86/tsc: Move inline keyword to the beginning of function declarationsMathieu Malaterre
The inline keyword was not at the beginning of the function declarations. Fix the following warnings triggered when using W=1: arch/x86/kernel/tsc.c:62:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] arch/x86/kernel/tsc.c:79:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Cc: kernel-janitors@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190524103252.28575-1-malat@debian.org
2019-06-14x86/mce: Do not check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. The only way this can fail is if: * debugfs superblock can not be pinned - something really went wrong with the vfs layer. * file is created with same name - the caller's fault. * new_inode() fails - happens if memory is exhausted. so failing to clean up debugfs properly is the least of the system's sproblems in uch a situation. [ bp: Extend commit message, remove unused err var in inject_init(). ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190612151531.GA16278@kroah.com
2019-06-13x86/fpu: Don't use current->mm to check for a kthreadChristoph Hellwig
current->mm can be non-NULL if a kthread calls use_mm(). Check for PF_KTHREAD instead to decide when to store user mode FP state. Fixes: 2722146eb784 ("x86/fpu: Remove fpu->initialized") Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Aubrey Li <aubrey.li@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604175411.GA27477@lst.de
2019-06-12x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()Matt Mullins
err must be nonzero in order to reach text_poke(), which caused kgdb to fail to set breakpoints: (gdb) break __x64_sys_sync Breakpoint 1 at 0xffffffff81288910: file ../fs/sync.c, line 124. (gdb) c Continuing. Warning: Cannot insert breakpoint 1. Cannot access memory at address 0xffffffff81288910 Command aborted. Fixes: 86a22057127d ("x86/kgdb: Avoid redundant comparison of patched code") Signed-off-by: Matt Mullins <mmullins@fb.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nadav Amit <namit@vmware.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190531194755.6320-1-mmullins@fb.com
2019-06-12x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_statusAubrey Li
AVX-512 components usage can result in turbo frequency drop. So it's useful to expose AVX-512 usage elapsed time as a heuristic hint for user space job schedulers to cluster the AVX-512 using tasks together. Examples: $ while [ 1 ]; do cat /proc/tid/arch_status | grep AVX512; sleep 1; done AVX512_elapsed_ms: 4 AVX512_elapsed_ms: 8 AVX512_elapsed_ms: 4 This means that 4 milliseconds have elapsed since the tsks AVX512 usage was detected when the task was scheduled out. $ cat /proc/tid/arch_status | grep AVX512 AVX512_elapsed_ms: -1 '-1' indicates that no AVX512 usage was recorded for this task. The time exposed is not necessarily accurate when the arch_status file is read as the AVX512 usage is only evaluated when a task is scheduled out. Accurate usage information can be obtained with performance counters. [ tglx: Massaged changelog ] Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: peterz@infradead.org Cc: hpa@zytor.com Cc: ak@linux.intel.com Cc: tim.c.chen@linux.intel.com Cc: dave.hansen@intel.com Cc: arjan@linux.intel.com Cc: adobriyan@gmail.com Cc: aubrey.li@intel.com Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linux API <linux-api@vger.kernel.org> Link: https://lkml.kernel.org/r/20190606012236.9391-2-aubrey.li@linux.intel.com
2019-06-12x86/resctrl: Prevent NULL pointer dereference when local MBM is disabledPrarit Bhargava
Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in a NULL pointer dereference on systems which do not have local MBM support enabled.. BUG: kernel NULL pointer dereference, address: 0000000000000020 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2 Workqueue: events mbm_handle_overflow RIP: 0010:mbm_handle_overflow+0x150/0x2b0 Only enter the bandwith update loop if the system has local MBM enabled. Fixes: de73f38f7680 ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190610171544.13474-1-prarit@redhat.com
2019-06-12x86/resctrl: Don't stop walking closids when a locksetup group is foundJames Morse
When a new control group is created __init_one_rdt_domain() walks all the other closids to calculate the sets of used and unused bits. If it discovers a pseudo_locksetup group, it breaks out of the loop. This means any later closid doesn't get its used bits added to used_b. These bits will then get set in unused_b, and added to the new control group's configuration, even if they were marked as exclusive for a later closid. When encountering a pseudo_locksetup group, we should continue. This is because "a resource group enters 'pseudo-locked' mode after the schemata is written while the resource group is in 'pseudo-locksetup' mode." When we find a pseudo_locksetup group, its configuration is expected to be overwritten, we can skip it. Fixes: dfe9674b04ff6 ("x86/intel_rdt: Enable entering of pseudo-locksetup mode") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H Peter Avin <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190603172531.178830-1-james.morse@arm.com
2019-06-11x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vectorZhao Yakui
Use the HYPERVISOR_CALLBACK_VECTOR to notify an ACRN guest. Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1559108037-18813-4-git-send-email-yakui.zhao@intel.com
2019-06-11x86: Add support for Linux guests on an ACRN hypervisorZhao Yakui
ACRN is an open-source hypervisor maintained by The Linux Foundation. It is built for embedded IOT with small footprint and real-time features. Add ACRN guest support so that it allows Linux to be booted under the ACRN hypervisor. This adds only the barebones implementation. [ bp: Massage commit message and help text. ] Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1559108037-18813-3-git-send-email-yakui.zhao@intel.com
2019-06-11x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbolZhao Yakui
Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests using the hypervisor interrupt callback counter can select and thus enable that counter. Select it when xen or hyperv support is enabled. No functional changes. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: linux-hyperv@vger.kernel.org Cc: Nicolai Stange <nstange@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/1559108037-18813-2-git-send-email-yakui.zhao@intel.com
2019-06-11x86/MCE: Determine MCA banks' init state properlyYazen Ghannam
The OS is expected to write all bits to MCA_CTL for each bank, thus enabling error reporting in all banks. However, some banks may be unused in which case the registers for such banks are Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control bits because of quirks, etc. A bank can be considered uninitialized if the MCA_CTL register returns zero. This is because either the OS did not write anything or because the hardware is enforcing RAZ/WI for the bank. Set a bank's init value based on if the control bits are set or not in hardware. Return an error code in the sysfs interface for uninitialized banks. Do a final bank init check in a separate function which is not part of any user-controlled code flows. This is so a user may enable/disable a bank during runtime without having to restart their system. [ bp: Massage a bit. Discover bank init state at boot. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-6-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make the number of MCA banks a per-CPU variableYazen Ghannam
The number of MCA banks is provided per logical CPU. Historically, this number has been the same across all CPUs, but this is not an architectural guarantee. Future AMD systems may have MCA bank counts that vary between logical CPUs in a system. This issue was partially addressed in 006c077041dc ("x86/mce: Handle varying MCA bank counts") by allocating structures using the maximum number of MCA banks and by saving the maximum MCA bank count in a system as the global count. This means that some extra structures are allocated. Also, this means that CPUs will spend more time in the #MC and other handlers checking extra MCA banks. Thus, define the number of MCA banks as a per-CPU variable. [ bp: Make mce_num_banks an unsigned int. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-5-Yazen.Ghannam@amd.com
2019-06-11x86/MCE/AMD: Don't cache block addresses on SMCA systemsYazen Ghannam
On legacy systems, the addresses of the MCA_MISC* registers need to be recursively discovered based on a Block Pointer field in the registers. On Scalable MCA systems, the register space is fixed, and particular addresses can be derived by regular offsets for bank and register type. This fixed address space includes the MCA_MISC* registers. MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1. Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This needs to be done only during init. The values should be saved per CPU to accommodate heterogeneous SMCA systems. Redo smca_get_block_address() to directly return the block addresses. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-4-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make mce_banks a per-CPU arrayYazen Ghannam
Current AMD systems have unique MCA banks per logical CPU even though the type of the banks may all align to the same bank number. Each CPU will have control of a set of MCA banks in the hardware and these are not shared with other CPUs. For example, bank 0 may be the Load-Store Unit on every logical CPU, but each bank 0 is a unique structure in the hardware. In other words, there isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs share. This idea extends even to non-core MCA banks. For example, CPU0 and CPU4 may see a Unified Memory Controller at bank 15, but each CPU is actually seeing a unique hardware structure that is not shared with other CPUs. Because the MCA banks are all unique hardware structures, it would be good to control them in a more granular way. For example, if there is a known issue with the Floating Point Unit on CPU5 and a user wishes to disable an error type on the Floating Point Unit, then it would be good to do this only for CPU5 rather than all CPUs. Also, future AMD systems may have heterogeneous MCA banks. Meaning the bank numbers may not necessarily represent the same types between CPUs. For example, bank 20 visible to CPU0 may be a Unified Memory Controller and bank 20 visible to CPU4 may be a Coherent Slave. So granular control will be even more necessary should the user wish to control specific MCA banks. Split the device attributes from struct mce_bank leaving only the MCA bank control fields. Make struct mce_banks[] per_cpu in order to have more granular control over individual MCA banks in the hardware. Allocate the device attributes statically based on the maximum number of MCA banks supported. The sysfs interface will use as many as needed per CPU. Currently, this is set to mca_cfg.banks, but will be changed to a per_cpu bank count in a future patch. Allocate the MCA control bits statically. This is in order to avoid locking warnings when memory is allocated during secondary CPUs' init sequences. Also, remove the now unnecessary return values from __mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init(). Redo the sysfs store/show functions to handle the per_cpu mce_banks[]. [ bp: s/mce_banks_percpu/mce_banks_array/g ] [ Locking issue reported by ] Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-3-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make struct mce_banks[] staticYazen Ghannam
The struct mce_banks[] array is only used in mce/core.c so move its definition there and make it static. Also, change the "init" field to bool type. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-2-Yazen.Ghannam@amd.com
2019-06-10x86/resctrl: Use _ASM_BX to avoid ifdefferyUros Bizjak
Use the _ASM_BX macro which expands to either %rbx or %ebx, depending on the 32-bit or 64-bit config selected. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190606200044.5730-1-ubizjak@gmail.com
2019-06-10x86/kexec: Add the ACPI NVS region to the ident mapKairui Song
With the recent addition of RSDP parsing in the decompression stage, a kexec-ed kernel now needs ACPI tables to be covered by the identity mapping. And in commit 6bbeb276b71f ("x86/kexec: Add the EFI system tables and ACPI tables to the ident map") the ACPI tables memory region was added to the ident map. But some machines have only an ACPI NVS memory region and the ACPI tables are located in that region. In such case, the kexec-ed kernel will still fail when trying to access ACPI tables if they're not mapped. So add the NVS memory region to the ident map as well. [ bp: Massage. ] Fixes: 6bbeb276b71f ("x86/kexec: Add the EFI system tables and ACPI tables to the ident map") Suggested-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: kexec@lists.infradead.org Cc: Lianbo Jiang <lijiang@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190610073617.19767-1-kasong@redhat.com
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-08docs: fix broken documentation linksMauro Carvalho Chehab
Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>