summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-11-24x86/crashdump/32: Simplify copy_oldmem_page()Thomas Gleixner
Replace kmap_atomic_pfn() with kmap_local_pfn() which is preemptible and can take page faults. Remove the indirection of the dump page and the related cruft which is not longer required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201118204007.670851839@linutronix.de
2020-11-24x86: Support kmap_local() forced debuggingThomas Gleixner
kmap_local() and related interfaces are NOOPs on 64bit and only create temporary fixmaps for highmem pages on 32bit. That means the test coverage for this code is pretty small. CONFIG_KMAP_LOCAL can be enabled independent from CONFIG_HIGHMEM, which allows to provide support for enforced kmap_local() debugging even on 64bit. For 32bit the support is unconditional, for 64bit it's only supported when CONFIG_NR_CPUS <= 4096 as supporting it for 8192 CPUs would require to set up yet another fixmap PGT. If CONFIG_KMAP_LOCAL_FORCE_DEBUG is enabled then kmap_local()/kmap_atomic() will use the temporary fixmap mapping path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201118204007.169209557@linutronix.de
2020-11-24x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leakXiaochen Shen
On resource group creation via a mkdir an extra kernfs_node reference is obtained by kernfs_get() to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. Currently the extra kernfs_node reference count is only dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup structure is removed in a few other locations that lack the matching reference drop. In call paths of rmdir and umount, when a control group is removed, kernfs_remove() is called to remove the whole kernfs nodes tree of the control group (including the kernfs nodes trees of all child monitoring groups), and then rdtgroup structure is freed by kfree(). The rdtgroup structures of all child monitoring groups under the control group are freed by kfree() in free_all_child_rdtgrp(). Before calling kfree() to free the rdtgroup structures, the kernfs node of the control group itself as well as the kernfs nodes of all child monitoring groups still take the extra references which will never be dropped to 0 and the kernfs nodes will never be freed. It leads to reference count leak and kernfs_node_cache memory leak. For example, reference count leak is observed in these two cases: (1) mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 umount /sys/fs/resctrl (2) mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 rmdir /sys/fs/resctrl/c1 The same reference count leak issue also exists in the error exit paths of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon(). Fix this issue by following changes to make sure the extra kernfs_node reference on rdtgroup is dropped before freeing the rdtgroup structure. (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up kernfs_put() and kfree(). (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup structure is about to be freed by kfree(). (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error exit paths of mkdir where an extra reference is taken by kernfs_get(). Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
2020-11-24x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leakXiaochen Shen
Willem reported growing of kernfs_node_cache entries in slabtop when repeatedly creating and removing resctrl subdirectories as well as when repeatedly mounting and unmounting the resctrl filesystem. On resource group (control as well as monitoring) creation via a mkdir an extra kernfs_node reference is obtained to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. The kernfs_node reference count is dropped by kernfs_put() in rdtgroup_kn_unlock(). With the above explaining the need for one kernfs_get()/kernfs_put() pair in resctrl there are more places where a kernfs_node reference is obtained without a corresponding release. The excessive amount of reference count on kernfs nodes will never be dropped to 0 and the kernfs nodes will never be freed in the call paths of rmdir and umount. It leads to reference count leak and kernfs_node_cache memory leak. Remove the superfluous kernfs_get() calls and expand the existing comments surrounding the remaining kernfs_get()/kernfs_put() pair that remains in use. Superfluous kernfs_get() calls are removed from two areas: (1) In call paths of mount and mkdir, when kernfs nodes for "info", "mon_groups" and "mon_data" directories and sub-directories are created, the reference count of newly created kernfs node is set to 1. But after kernfs_create_dir() returns, superfluous kernfs_get() are called to take an additional reference. (2) kernfs_get() calls in rmdir call paths. Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two") Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support") Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
2020-11-24x86/sgx: Fix sgx_ioc_enclave_provision() kernel-doc commentBorislav Petkov
Fix ./arch/x86/kernel/cpu/sgx/ioctl.c:666: warning: Function parameter or member \ 'encl' not described in 'sgx_ioc_enclave_provision' ./arch/x86/kernel/cpu/sgx/ioctl.c:666: warning: Excess function parameter \ 'enclave' description in 'sgx_ioc_enclave_provision' Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201123181922.0c009406@canb.auug.org.au
2020-11-23signal: clear non-uapi flag bits when passing/returning sa_flagsPeter Collingbourne
Previously we were not clearing non-uapi flag bits in sigaction.sa_flags when storing the userspace-provided sa_flags or when returning them via oldact. Start doing so. This allows userspace to detect missing support for flag bits and allows the kernel to use non-uapi bits internally, as we are already doing in arch/x86 for two flag bits. Now that this change is in place, we no longer need the code in arch/x86 that was hiding these bits from userspace, so remove it. This is technically a userspace-visible behavior change for sigaction, as the unknown bits returned via oldact.sa_flags are no longer set. However, we are free to define the behavior for unknown bits exactly because their behavior is currently undefined, so for now we can define the meaning of each of them to be "clear the bit in oldact.sa_flags unless the bit becomes known in the future". Furthermore, this behavior is consistent with OpenBSD [1], illumos [2] and XNU [3] (FreeBSD [4] and NetBSD [5] fail the syscall if unknown bits are set). So there is some precedent for this behavior in other kernels, and in particular in XNU, which is probably the most popular kernel among those that I looked at, which means that this change is less likely to be a compatibility issue. Link: [1] https://github.com/openbsd/src/blob/f634a6a4b5bf832e9c1de77f7894ae2625e74484/sys/kern/kern_sig.c#L278 Link: [2] https://github.com/illumos/illumos-gate/blob/76f19f5fdc974fe5be5c82a556e43a4df93f1de1/usr/src/uts/common/syscall/sigaction.c#L86 Link: [3] https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/bsd/kern/kern_sig.c#L480 Link: [4] https://github.com/freebsd/freebsd/blob/eded70c37057857c6e23fae51f86b8f8f43cd2d0/sys/kern/kern_sig.c#L699 Link: [5] https://github.com/NetBSD/src/blob/3365779becdcedfca206091a645a0e8e22b2946e/sys/kern/sys_sig.c#L473 Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://linux-review.googlesource.com/id/I35aab6f5be932505d90f3b3450c083b4db1eca86 Link: https://lkml.kernel.org/r/878dbcb5f47bc9b11881c81f745c0bef5c23f97f.1605235762.git.pcc@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-11-23arch: move SA_* definitions to generic headersPeter Collingbourne
Most architectures with the exception of alpha, mips, parisc and sparc use the same values for these flags. Move their definitions into asm-generic/signal-defs.h and allow the architectures with non-standard values to override them. Also, document the non-standard flag values in order to make it easier to add new generic flags in the future. A consequence of this change is that on powerpc and x86, the constants' values aside from SA_RESETHAND change signedness from unsigned to signed. This is not expected to impact realistic use of these constants. In particular the typical use of the constants where they are or'ed together and assigned to sa_flags (or another int variable) would not be affected. Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://linux-review.googlesource.com/id/Ia3849f18b8009bf41faca374e701cdca36974528 Link: https://lkml.kernel.org/r/b6d0d1ec34f9ee93e1105f14f288fba5f89d1f24.1605235762.git.pcc@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-11-22Merge tag 'perf-urgent-2020-11-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the x86 perf sysfs interfaces which used kobject attributes instead of device attributes and therefore making clang's control flow integrity checker upset" * tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: fix sysfs type mismatches
2020-11-22Merge tag 'efi-urgent-for-v5.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Borislav Petkov: "Forwarded EFI fixes from Ard Biesheuvel: - fix memory leak in efivarfs driver - fix HYP mode issue in 32-bit ARM version of the EFI stub when built in Thumb2 mode - avoid leaking EFI pgd pages on allocation failure" * tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Free efi_pgd with free_pages() efivarfs: fix memory leak in efivarfs_create() efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
2020-11-22Merge tag 'x86_urgent_for_v5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of same because the proper one is going through the IOMMU tree (Thomas Gleixner) - An Intel microcode loader fix to save the correct microcode patch to apply during resume (Chen Yu) - A fix to not access user memory of other processes when dumping opcode bytes (Thomas Gleixner) * tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account" x86/dumpstack: Do not try to access user space code of other tasks x86/microcode/intel: Check patch signature before saving microcode for early loading iommu/vt-d: Take CONFIG_PCI_ATS into account
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-21x86/mce, cper: Pass x86 CPER through the MCA handling chainSmita Koralahalli
The kernel uses ACPI Boot Error Record Table (BERT) to report fatal errors that occurred in a previous boot. The MCA errors in the BERT are reported using the x86 Processor Error Common Platform Error Record (CPER) format. Currently, the record prints out the raw MSR values and AMD relies on the raw record to provide MCA information. Extract the raw MSR values of MCA registers from the BERT and feed them into mce_log() to decode them properly. The implementation is SMCA-specific as the raw MCA register values are given in the register offset order of the SMCA address space. [ bp: Massage. ] [ Fix a build breakage in patch v1. ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20201119182938.151155-1-Smita.KoralahalliChannabasappa@amd.com
2020-11-21x86/boot/compressed/64: Use TEST %reg,%reg instead of CMP $0,%regUros Bizjak
Use TEST %reg,%reg which sets the zero flag in the same way as CMP $0,%reg, but the encoding uses one byte less. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201029160258.139216-1-ubizjak@gmail.com
2020-11-20x86: Enable seccomp architecture trackingKees Cook
Provide seccomp internals with the details to calculate which syscall table the running kernel is expecting to deal with. This allows for efficient architecture pinning and paves the way for constant-action bitmaps. Co-developed-by: YiFei Zhu <yifeifz2@illinois.edu> Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/da58c3733d95c4f2115dd94225dfbe2573ba4d87.1602431034.git.yifeifz2@illinois.edu
2020-11-20Merge tag 'for-linus-5.10b-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for avoiding WARN splats when booting a Xen guest with nosmt" * tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: don't unbind uninitialized lock_kicker_irq
2020-11-20Merge tag 'iommu-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull iommu fixes from Will Deacon: "Two straightforward vt-d fixes: - Fix boot when intel iommu initialisation fails under TXT (tboot) - Fix intel iommu compilation error when DMAR is enabled without ATS and temporarily update IOMMU MAINTAINERs entry" * tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: MAINTAINERS: Temporarily add myself to the IOMMU entry iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set iommu/vt-d: Avoid panic if iommu init fails in tboot system
2020-11-20x86/head64: Remove duplicate includeWang Qing
Remove duplicate header include. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1604893542-20961-1-git-send-email-wangqing@vivo.com
2020-11-20x86/mm: Declare 'start' variable where it is usedLukas Bulwahn
It is not required to initialize the local variable start in memory_map_top_down(), as the variable will be initialized in any path before it is used. make clang-analyzer on x86_64 tinyconfig reports: arch/x86/mm/init.c:612:15: warning: Although the value stored to 'start' \ is used in the enclosing expression, the value is never actually read \ from 'start' [clang-analyzer-deadcode.DeadStores] Move the variable declaration into the loop, where it is used. No code changed: # arch/x86/mm/init.o: text data bss dec hex filename 7105 1424 26768 35297 89e1 init.o.before 7105 1424 26768 35297 89e1 init.o.after md5: a8d76c1bb5fce9cae251780a7ee7730f init.o.before.asm a8d76c1bb5fce9cae251780a7ee7730f init.o.after.asm [ bp: Massage. ] Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200928100004.25674-1-lukas.bulwahn@gmail.com
2020-11-20crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-19Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', ↵Paul E. McKenney
'fixes.2020.11.19b', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD cpuinfo.2020.11.06a: Speedups for /proc/cpuinfo. doc.2020.11.06a: Documentation updates. fixes.2020.11.19b: Miscellaneous fixes. lockdep.2020.11.02a: Lockdep-RCU updates to avoid "unused variable". tasks.2020.11.06a: Tasks-RCU updates. torture.2020.11.06a': Torture-test updates.
2020-11-19x86/smpboot: Move rcu_cpu_starting() earlierPaul E. McKenney
The call to rcu_cpu_starting() in mtrr_ap_init() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: ============================= WARNING: suspicious RCU usage 5.9.0+ #268 Not tainted ----------------------------- kernel/kprobes.c:300 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0+ #268 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x77/0x97 __is_insn_slot_addr+0x15d/0x170 kernel_text_address+0xba/0xe0 ? get_stack_info+0x22/0xa0 __kernel_text_address+0x9/0x30 show_trace_log_lvl+0x17d/0x380 ? dump_stack+0x77/0x97 dump_stack+0x77/0x97 __lock_acquire+0xdf7/0x1bf0 lock_acquire+0x258/0x3d0 ? vprintk_emit+0x6d/0x2c0 _raw_spin_lock+0x27/0x40 ? vprintk_emit+0x6d/0x2c0 vprintk_emit+0x6d/0x2c0 printk+0x4d/0x69 start_secondary+0x1c/0x100 secondary_startup_64_no_verify+0xb8/0xbb This is avoided by moving the call to rcu_cpu_starting up near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Reported-by: Qian Cai <cai@redhat.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19x86/resctrl: Constify kernfs_opsRikard Falkeborn
The only usage of the kf_ops field in the rftype struct is to pass it as argument to __kernfs_create_file(), which accepts a pointer to const. Make it a pointer to const. This makes it possible to make rdtgroup_kf_single_ops and kf_mondata_ops const, which allows the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20201110230228.801785-1-rikard.falkeborn@gmail.com
2020-11-19kvm: x86/mmu: Add TDP MMU SPTE changed trace pointBen Gardon
Add an extremely verbose trace point to the TDP MMU to log all SPTE changes, regardless of callstack / motivation. This is useful when a complete picture of the paging structure is needed or a change cannot be explained with the other, existing trace points. Tested: ran the demand paging selftest on an Intel Skylake machine with all the trace points used by the TDP MMU enabled and observed them firing with expected values. This patch can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3813 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201027175944.1183301-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-19kvm: x86/mmu: Add existing trace points to TDP MMUBen Gardon
The TDP MMU was initially implemented without some of the usual tracepoints found in mmu.c. Correct this discrepancy by adding the missing trace points to the TDP MMU. Tested: ran the demand paging selftest on an Intel Skylake machine with all the trace points used by the TDP MMU enabled and observed them firing with expected values. This patch can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3812 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201027175944.1183301-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-19x86/topology: Set cpu_die_id only if DIE_TYPE foundYazen Ghannam
CPUID Leaf 0x1F defines a DIE_TYPE level (nb: ECX[8:15] level type == 0x5), but CPUID Leaf 0xB does not. However, detect_extended_topology() will set struct cpuinfo_x86.cpu_die_id regardless of whether a valid Die ID was found. Only set cpu_die_id if a DIE_TYPE level is found. CPU topology code may use another value for cpu_die_id, e.g. the AMD NodeId on AMD-based systems. Code ordering should be maintained so that the CPUID Leaf 0x1F Die ID value will take precedence on systems that may use another value. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-5-Yazen.Ghannam@amd.com
2020-11-19x86/CPU/AMD: Remove amd_get_nb_id()Yazen Ghannam
The Last Level Cache ID is returned by amd_get_nb_id(). In practice, this value is the same as the AMD NodeId for callers of this function. The NodeId is saved in struct cpuinfo_x86.cpu_die_id. Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and remove the function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
2020-11-19x86/CPU/AMD: Save AMD NodeId as cpu_die_idYazen Ghannam
AMD systems provide a "NodeId" value that represents a global ID indicating to which "Node" a logical CPU belongs. The "Node" is a physical structure equivalent to a Die, and it should not be confused with logical structures like NUMA nodes. Logical nodes can be adjusted based on firmware or other settings whereas the physical nodes/dies are fixed based on hardware topology. The NodeId value can be used when a physical ID is needed by software. Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value from CPUID or MSR as appropriate. Default to phys_proc_id otherwise. Do so for both AMD and Hygon systems. Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no longer needed. Update the x86 topology documentation. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
2020-11-19x86: Support HAVE_CONTEXT_TRACKING_OFFSTACKFrederic Weisbecker
A lot of ground work has been performed on x86 entry code. Fragile path between user_enter() and user_exit() have IRQs disabled. Uses of RCU and intrumentation in these fragile areas have been explicitly annotated and protected. This architecture doesn't need exception_enter()/exception_exit() anymore and has therefore earned CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201117151637.259084-6-frederic@kernel.org
2020-11-19x86/sgx: Return -ERESTARTSYS in sgx_ioc_enclave_add_pages()Jarkko Sakkinen
Return -ERESTARTSYS instead of -EINTR in sgx_ioc_enclave_add_pages() when interrupted before any pages have been processed. At this point ioctl can be obviously safely restarted. Reported-by: Haitao Huang <haitao.huang@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201118213932.63341-1-jarkko@kernel.org
2020-11-19Merge tag 'x86-urgent-2020-11-15' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/iommu/fixes Pull in x86 fixes from Thomas, as they include a change to the Intel DMAR code on which we depend: * tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iommu/vt-d: Cure VF irqdomain hickup x86/platform/uv: Fix copied UV5 output archtype x86/platform/uv: Drop last traces of uv_flush_tlb_others
2020-11-19x86/msr: Downgrade unrecognized MSR messageBorislav Petkov
It is a warning and not an error so use pr_warn(). Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201118123806.19672-1-bp@alien8.de
2020-11-18iommu/amd: Fix IOMMU interrupt generation in X2APIC modeDavid Woodhouse
The AMD IOMMU has two modes for generating its own interrupts. The first is very much based on PCI MSI, and can be configured by Linux precisely that way. But like legacy unmapped PCI MSI it's limited to 8 bits of APIC ID. The second method does not use PCI MSI at all in hardawre, and instead configures the INTCAPXT registers in the IOMMU directly with the APIC ID and vector. In the latter case, the IOMMU driver would still use pci_enable_msi(), read back (through MMIO) the MSI message that Linux wrote to the PCI MSI table, then swizzle those bits into the appropriate register. Historically, this worked because__irq_compose_msi_msg() would silently generate an invalid MSI message with the high bits of the APIC ID in the high bits of the MSI address. That hack was intended only for the Intel IOMMU, and I recently enforced that, introducing a warning in __irq_msi_compose_msg() if it was invoked with an APIC ID above 255. Fix the AMD IOMMU not to depend on that hack any more, by having its own irqdomain and directly putting the bits from the irq_cfg into the right place in its ->activate() method. Fixes: 47bea873cf80 "x86/msi: Only use high bits of MSI address for DMAR unit") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/05e3a5ba317f5ff48d2f8356f19e617f8b9d23a4.camel@infradead.org
2020-11-18x86/sgx: Clarify 'laundry_list' lockingDave Hansen
Short Version: The SGX section->laundry_list structure is effectively thread-local, but declared next to some shared structures. Its semantics are clear as mud. Fix that. No functional changes. Compile tested only. Long Version: The SGX hardware keeps per-page metadata. This can provide things like permissions, integrity and replay protection. It also prevents things like having an enclave page mapped multiple times or shared between enclaves. But, that presents a problem for kexec()'d kernels (or any other kernel that does not run immediately after a hardware reset). This is because the last kernel may have been rude and forgotten to reset pages, which would trigger the "shared page" sanity check. To fix this, the SGX code "launders" the pages by running the EREMOVE instruction on all pages at boot. This is slow and can take a long time, so it is performed off in the SGX-specific ksgxd instead of being synchronous at boot. The init code hands the list of pages to launder in a per-SGX-section list: ->laundry_list. The only code to touch this list is the init code and ksgxd. This means that no locking is necessary for ->laundry_list. However, a lock is required for section->page_list, which is accessed while creating enclaves and by ksgxd. This lock (section->lock) is acquired by ksgxd while also processing ->laundry_list. It is easy to confuse the purpose of the locking as being for ->laundry_list and ->page_list. Rename ->laundry_list to ->init_laundry_list to make it clear that this is not normally used at runtime. Also add some comments clarifying the locking, and reorganize 'sgx_epc_section' to put 'lock' near the things it protects. Note: init_laundry_list is 128 bytes of wasted space at runtime. It could theoretically be dynamically allocated and then freed after the laundering process. But it would take nearly 128 bytes of extra instructions to do that. Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201116222531.4834-1-dave.hansen@intel.com
2020-11-18x86/head/64: Remove unused GET_CR2_INTO() macroArvind Sankar
Commit 4b47cdbda6f1 ("x86/head/64: Move early exception dispatch to C code") removed the usage of GET_CR2_INTO(). Drop the definition as well, and related definitions in paravirt.h and asm-offsets.h Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005151208.2212886-3-nivedita@alum.mit.edu
2020-11-18x86/sgx: Add ptrace() support for the SGX driverJarkko Sakkinen
Enclave memory is normally inaccessible from outside the enclave. This makes enclaves hard to debug. However, enclaves can be put in a debug mode when they are being built. In that mode, enclave data *can* be read and/or written by using the ENCLS[EDBGRD] and ENCLS[EDBGWR] functions. This is obviously only for debugging and destroys all the protections present with normal enclaves. But, enclaves know their own debug status and can adjust their behavior appropriately. Add a vm_ops->access() implementation which can be used to read and write memory inside debug enclaves. This is typically used via ptrace() APIs. [ bp: Massage. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-23-jarkko@kernel.org
2020-11-18x86/sgx: Add a page reclaimerJarkko Sakkinen
Just like normal RAM, there is a limited amount of enclave memory available and overcommitting it is a very valuable tool to reduce resource use. Introduce a simple reclaim mechanism for enclave pages. In contrast to normal page reclaim, the kernel cannot directly access enclave memory. To get around this, the SGX architecture provides a set of functions to help. Among other things, these functions copy enclave memory to and from normal memory, encrypting it and protecting its integrity in the process. Implement a page reclaimer by using these functions. Picks victim pages in LRU fashion from all the enclaves running in the system. A new kernel thread (ksgxswapd) reclaims pages in the background based on watermarks, similar to normal kswapd. All enclave pages can be reclaimed, architecturally. But, there are some limits to this, such as the special SECS metadata page which must be reclaimed last. The page version array (used to mitigate replaying old reclaimed pages) is also architecturally reclaimable, but not yet implemented. The end result is that the vast majority of enclave pages are currently reclaimable. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
2020-11-18x86/vdso: Implement a vDSO for Intel SGX enclave callSean Christopherson
Enclaves encounter exceptions for lots of reasons: everything from enclave page faults to NULL pointer dereferences, to system calls that must be “proxied” to the kernel from outside the enclave. In addition to the code contained inside an enclave, there is also supporting code outside the enclave called an “SGX runtime”, which is virtually always implemented inside a shared library. The runtime helps build the enclave and handles things like *re*building the enclave if it got destroyed by something like a suspend/resume cycle. The rebuilding has traditionally been handled in SIGSEGV handlers, registered by the library. But, being process-wide, shared state, signal handling and shared libraries do not mix well. Introduce a vDSO function call that wraps the enclave entry functions (EENTER/ERESUME functions of the ENCLU instruciton) and returns information about any exceptions to the caller in the SGX runtime. Instead of generating a signal, the kernel places exception information in RDI, RSI and RDX. The kernel-provided userspace portion of the vDSO handler will place this information in a user-provided buffer or trigger a user-provided callback at the time of the exception. The vDSO function calling convention uses the standard RDI RSI, RDX, RCX, R8 and R9 registers. This makes it possible to declare the vDSO as a C prototype, but other than that there is no specific support for SystemV ABI. Things like storing XSAVE are the responsibility of the enclave and the runtime. [ bp: Change vsgx.o build dependency to CONFIG_X86_SGX. ] Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Cedric Xing <cedric.xing@intel.com> Signed-off-by: Cedric Xing <cedric.xing@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-20-jarkko@kernel.org
2020-11-18x86/traps: Attempt to fixup exceptions in vDSO before signalingSean Christopherson
vDSO functions can now leverage an exception fixup mechanism similar to kernel exception fixup. For vDSO exception fixup, the initial user is Intel's Software Guard Extensions (SGX), which will wrap the low-level transitions to/from the enclave, i.e. EENTER and ERESUME instructions, in a vDSO function and leverage fixup to intercept exceptions that would otherwise generate a signal. This allows the vDSO wrapper to return the fault information directly to its caller, obviating the need for SGX applications and libraries to juggle signal handlers. Attempt to fixup vDSO exceptions immediately prior to populating and sending signal information. Except for the delivery mechanism, an exception in a vDSO function should be treated like any other exception in userspace, e.g. any fault that is successfully handled by the kernel should not be directly visible to userspace. Although it's debatable whether or not all exceptions are of interest to enclaves, defer to the vDSO fixup to decide whether to do fixup or generate a signal. Future users of vDSO fixup, if there ever are any, will undoubtedly have different requirements than SGX enclaves, e.g. the fixup vs. signal logic can be made function specific if/when necessary. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-19-jarkko@kernel.org
2020-11-18x86/fault: Add a helper function to sanitize error codeSean Christopherson
vDSO exception fixup is a replacement for signals in limited situations. Signals and vDSO exception fixup need to provide similar information to userspace, including the hardware error code. That hardware error code needs to be sanitized. For instance, if userspace accesses a kernel address, the error code could indicate to userspace whether the address had a Present=1 PTE. That can leak information about the kernel layout to userspace, which is bad. The existing signal code does this sanitization, but fairly late in the signal process. The vDSO exception code runs before the sanitization happens. Move error code sanitization out of the signal code and into a helper. Call the helper in the signal code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-18-jarkko@kernel.org
2020-11-18x86/vdso: Add support for exception fixup in vDSO functionsSean Christopherson
Signals are a horrid little mechanism. They are especially nasty in multi-threaded environments because signal state like handlers is global across the entire process. But, signals are basically the only way that userspace can “gracefully” handle and recover from exceptions. The kernel generally does not like exceptions to occur during execution. But, exceptions are a fact of life and must be handled in some circumstances. The kernel handles them by keeping a list of individual instructions which may cause exceptions. Instead of truly handling the exception and returning to the instruction that caused it, the kernel instead restarts execution at a *different* instruction. This makes it obvious to that thread of execution that the exception occurred and lets *that* code handle the exception instead of the handler. This is not dissimilar to the try/catch exceptions mechanisms that some programming languages have, but applied *very* surgically to single instructions. It effectively changes the visible architecture of the instruction. Problem ======= SGX generates a lot of signals, and the code to enter and exit enclaves and muck with signal handling is truly horrid. At the same time, an approach like kernel exception fixup can not be easily applied to userspace instructions because it changes the visible instruction architecture. Solution ======== The vDSO is a special page of kernel-provided instructions that run in userspace. Any userspace calling into the vDSO knows that it is special. This allows the kernel a place to legitimately rewrite the user/kernel contract and change instruction behavior. Add support for fixing up exceptions that occur while executing in the vDSO. This replaces what could traditionally only be done with signal handling. This new mechanism will be used to replace previously direct use of SGX instructions by userspace. Just introduce the vDSO infrastructure. Later patches will actually replace signal generation with vDSO exception fixup. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-17-jarkko@kernel.org
2020-11-18x86/sgx: Add SGX_IOC_ENCLAVE_PROVISIONJarkko Sakkinen
The whole point of SGX is to create a hardware protected place to do “stuff”. But, before someone is willing to hand over the keys to the castle , an enclave must often prove that it is running on an SGX-protected processor. Provisioning enclaves play a key role in providing proof. There are actually three different enclaves in play in order to make this happen: 1. The application enclave. The familiar one we know and love that runs the actual code that’s doing real work. There can be many of these on a single system, or even in a single application. 2. The quoting enclave (QE). The QE is mentioned in lots of silly whitepapers, but, for the purposes of kernel enabling, just pretend they do not exist. 3. The provisioning enclave. There is typically only one of these enclaves per system. Provisioning enclaves have access to a special hardware key. They can use this key to help to generate certificates which serve as proof that enclaves are running on trusted SGX hardware. These certificates can be passed around without revealing the special key. Any user who can create a provisioning enclave can access the processor-unique Provisioning Certificate Key which has privacy and fingerprinting implications. Even if a user is permitted to create normal application enclaves (via /dev/sgx_enclave), they should not be able to create provisioning enclaves. That means a separate permissions scheme is needed to control provisioning enclave privileges. Implement a separate device file (/dev/sgx_provision) which allows creating provisioning enclaves. This device will typically have more strict permissions than the plain enclave device. The actual device “driver” is an empty stub. Open file descriptors for this device will represent a token which allows provisioning enclave duty. This file descriptor can be passed around and ultimately given as an argument to the /dev/sgx_enclave driver ioctl(). [ bp: Touchups. ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-security-module@vger.kernel.org Link: https://lkml.kernel.org/r/20201112220135.165028-16-jarkko@kernel.org
2020-11-18x86/sgx: Add SGX_IOC_ENCLAVE_INITJarkko Sakkinen
Enclaves have two basic states. They are either being built and are malleable and can be modified by doing things like adding pages. Or, they are locked down and not accepting changes. They can only be run after they have been locked down. The ENCLS[EINIT] function induces the transition from being malleable to locked-down. Add an ioctl() that performs ENCLS[EINIT]. After this, new pages can no longer be added with ENCLS[EADD]. This is also the time where the enclave can be measured to verify its integrity. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-15-jarkko@kernel.org
2020-11-18x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGESJarkko Sakkinen
SGX enclave pages are inaccessible to normal software. They must be populated with data by copying from normal memory with the help of the EADD and EEXTEND functions of the ENCLS instruction. Add an ioctl() which performs EADD that adds new data to an enclave, and optionally EEXTEND functions that hash the page contents and use the hash as part of enclave “measurement” to ensure enclave integrity. The enclave author gets to decide which pages will be included in the enclave measurement with EEXTEND. Measurement is very slow and has sometimes has very little value. For instance, an enclave _could_ measure every page of data and code, but would be slow to initialize. Or, it might just measure its code and then trust that code to initialize the bulk of its data after it starts running. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-14-jarkko@kernel.org
2020-11-18x86/sgx: Add SGX_IOC_ENCLAVE_CREATEJarkko Sakkinen
Add an ioctl() that performs the ECREATE function of the ENCLS instruction, which creates an SGX Enclave Control Structure (SECS). Although the SECS is an in-memory data structure, it is present in enclave memory and is not directly accessible by software. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-13-jarkko@kernel.org
2020-11-18x86/sgx: Add an SGX misc driver interfaceJarkko Sakkinen
Intel(R) SGX is a new hardware functionality that can be used by applications to set aside private regions of code and data called enclaves. New hardware protects enclave code and data from outside access and modification. Add a driver that presents a device file and ioctl API to build and manage enclaves. [ bp: Small touchups, remove unused encl variable in sgx_encl_find() as Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-12-jarkko@kernel.org
2020-11-18x86/boot: Remove unused finalize_identity_maps()Arvind Sankar
Commit 8570978ea030 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code") removed all the references to finalize_identity_maps(), but neglected to delete the actual function. Remove it. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005151208.2212886-2-nivedita@alum.mit.edu
2020-11-18iommu/vt-d: Avoid panic if iommu init fails in tboot systemZhenzhong Duan
"intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-18x86/uaccess: Document copy_from_user_nmi()Thomas Gleixner
Document the functionality of copy_from_user_nmi() to avoid further confusion. Fix the typo in the existing comment while at it. Requested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201117202753.806376613@linutronix.de
2020-11-18x86/dumpstack: Do not try to access user space code of other tasksThomas Gleixner
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
2020-11-18dma-mapping: remove the dma_direct_set_offset exportChristoph Hellwig
Drop the dma_direct_set_offset export and move the declaration to dma-map-ops.h now that the Allwinner drivers have stopped calling it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech>