summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2021-05-21x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORSH. Peter Anvin (Intel)
Add defines for the number of external vectors and number of system vectors instead of requiring the use of (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR) and (NR_VECTORS - FIRST_SYSTEM_VECTOR) respectively. Clean up the usage sites. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20210519212154.511983-3-hpa@zytor.com
2021-05-21x86/irq: Remove unused vectors definesH. Peter Anvin (Intel)
UV_BAU_MESSAGE is defined but not used anywhere in the kernel. Presumably this is a stale vector number that can be reclaimed. MCE_VECTOR is not an actual vector: #MC is an exception, not an interrupt vector, and as such is correctly described as X86_TRAP_MC. MCE_VECTOR is not used anywhere is the kernel. Note that NMI_VECTOR *is* used; specifically it is the vector number programmed into the APIC LVT when an NMI interrupt is configured. At the moment it is always numerically identical to X86_TRAP_NMI, that is not necessarily going to be the case indefinitely. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20210519212154.511983-4-hpa@zytor.com
2021-05-21x86/amd_nb: Add AMD family 19h model 50h PCI idsDavid Bartley
This is required to support Zen3 APUs in k10temp. Signed-off-by: David Bartley <andareed@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Wei Huang <wei.huang2@amd.com> Link: https://lkml.kernel.org/r/20210520174130.94954-1-andareed@gmail.com
2021-05-21x86/elf: Use _BITUL() macro in UAPI headersJoe Richey
Replace BIT() in x86's UAPI header with _BITUL(). BIT() is not defined in the UAPI headers and its usage may cause userspace build errors. Fixes: 742c45c3ecc9 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2") Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210521085849.37676-2-joerichey94@gmail.com
2021-05-21x86/Xen: swap NX determination and GDT setup on BSPJan Beulich
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-21Merge tag 'drm-intel-next-2021-05-19-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Core Changes: - drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose). Driver Changes: - Display plane clock rates fixes and improvements (Ville). - Uninint DMC FW loader state during shutdown (Imre). - Convert snprintf to sysfs_emit (Xuezhi). - Fix invalid access to ACPI _DSM objects (Takashi). - A big refactor around how i915 addresses the graphics and display IP versions. (Matt, Lucas). - Backlight fix (Lyude). - Display watermark and DBUF fixes (Ville). - HDCP fix (Anshuman). - Improve cases where display is not available (Jose). - Defeature PSR2 for RKL and ALD-S (Jose). - VLV DSI panel power fixes and improvements (Hans). - display-12 workaround (Jose). - Fix modesetting (Imre). - Drop redundant address-of op before lttpr_common_caps array (Imre). - Fix compiler checks (Jose, Jason). - GLK display fixes (Ville). - Fix error code returns (Dan). - eDP novel: back again to slow and wide link training everywhere (Kai-Heng). - Abstract DMC FW path (Rodrigo). - Preparation and changes for upcoming XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh). - Fix comment typo in DSI code (zuoqilin). - Simplify CCS and UV plane alignment handling (Imre). - PSR Fixes on TGL (Gwan-gyeong, Jose). - Add intel_dp_hdcp.h and rename init (Jani). - Move crtc and dpll declarations around (Jani). - Fix pre-skl DP AUX precharge length (Ville). - Remove stray newlines from random files (Ville). - crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville). - Add frontbuffer tracking tracepoints (Ville). - ADL-S PCI ID updates (Anand). - Use unique backlight device names (Jani). - A few clean-ups on i915/audio (Jani). - Use intel_framebuffer instead of drm one on intel_fb functions (Imre). - Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre). - Nuke display error state (Ville). - ADL-P initial enablement patches starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika). - Display clean-up around VBT and the strap bits (Lucas). - Try YCbCr420 color when RGB fails (Werner). - More PSR fixes and improvements (Jose). - Other generic display code clean-up (Jose, Ville). - Use correct downstream caps for check Src-Ctl mode for PCON (Ankit). - Disable HiZ Raw Stall Optimization on broken gen7 (Simon). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YKVioeu0JkUAlR7y@intel.com
2021-05-20Merge tag 'quota_for_v5.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code
2021-05-20x86/entry: Treat out of range and gap system calls the sameH. Peter Anvin (Intel)
The current 64-bit system call entry code treats out-of-range system calls differently than system calls that map to a hole in the system call table. This is visible to the user if system calls are intercepted via ptrace or seccomp and the return value (regs->ax) is modified: in the former case, the return value is preserved, and in the latter case, sys_ni_syscall() is called and the return value is forced to -ENOSYS. The API spec in <asm-generic/syscalls.h> is very clear that only (int)-1 is the non-system-call sentinel value, so make the system call behavior consistent by calling sys_ni_syscall() for all invalid system call numbers except for -1. Although currently sys_ni_syscall() simply returns -ENOSYS, calling it explicitly is friendly for tracing and future possible extensions, and as this is an error path there is no reason to optimize it. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210518191303.4135296-6-hpa@zytor.com
2021-05-20x86/entry/64: Sign-extend system calls on entry to intH. Peter Anvin (Intel)
Right now, *some* code will treat e.g. 0x0000000100000001 as a system call and some will not. Some of the code, notably in ptrace, will treat 0x000000018000000 as a system call and some will not. Finally, right now, e.g. 335 for x86-64 will force the exit code to be set to -ENOSYS even if poked by ptrace, but 548 will not, because there is an observable difference between an out of range system call and a system call number that falls outside the range of the table. This is visible to the user: for example, the syscall_numbering_64 test fails if run under strace, because as strace uses ptrace, it ends up clobbering the upper half of the 64-bit system call number. The architecture independent code all assumes that a system call is "int" that the value -1 specifically and not just any negative value is used for a non-system call. This is the case on x86 as well when arch-independent code is involved. The arch-independent API is defined/documented (but not *implemented*!) in <asm-generic/syscall.h>. This is an ABI change, but is in fact a revert to the original x86-64 ABI. The original assembly entry code would zero-extend the system call number; Use sign extend to be explicit that this is treated as a signed number (although in practice it makes no difference, of course) and to avoid people getting the idea of "optimizing" it, as has happened on at least two(!) separate occasions. Do not store the extended value into regs->orig_ax, however: on x86-64, the ABI is that the callee is responsible for extending parameters, so only examining the lower 32 bits is fully consistent with any "int" argument to any system call, e.g. regs->di for write(2). The full value of %rax on entry to the kernel is thus still available. [ tglx: Add a comment to the ASM code ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210518191303.4135296-5-hpa@zytor.com
2021-05-20x86/syscalls: Switch to generic syscallhdr.shMasahiro Yamada
Many architectures duplicate similar shell scripts. Converts x86 to use scripts/syscallhdr.sh. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-7-masahiroy@kernel.org
2021-05-20x86/syscalls: Use __NR_syscalls instead of __NR_syscall_maxMasahiro Yamada
__NR_syscall_max is only used by x86 and UML. In contrast, __NR_syscalls is widely used by all the architectures. Convert __NR_syscall_max to __NR_syscalls and adjust the usage sites. This prepares x86 to switch to the generic syscallhdr.sh script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-6-masahiroy@kernel.org
2021-05-20x86/unistd: Define X32_NR_syscalls only for 64-bit kernelMasahiro Yamada
X32_NR_syscalls is needed only when building a 64bit kernel. Move it to proper #ifdef guard. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-5-masahiroy@kernel.org
2021-05-20x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscallMasahiro Yamada
This is a follow-up cleanup after switching to the generic syscalltbl.sh. The old x86 specific script skipped non-existing syscalls. So, the generated syscalls_64.h, for example, had a big hole in the syscall numbers 335-423 range. That is why there exists [0 ... __NR_*_syscall_max] = &__*_sys_ni_cyscall. The new script, scripts/syscalltbl.sh automatically fills holes with __SYSCALL(<nr>, sys_ni_syscall), hence such ugly code can go away. The designated initializers, '[nr] =' are also unneeded. Also, there is no need to give __NR_*_syscall_max+1 because the array size is implied by the number of syscalls in the generated headers. Hence, there is no need to include <asm/unistd.h>, either. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-4-masahiroy@kernel.org
2021-05-20x86/syscalls: Switch to generic syscalltbl.shMasahiro Yamada
Many architectures duplicate similar shell scripts. Convert x86 and UML to use scripts/syscalltbl.sh. The generic script generates seperate headers for x86/64 and x86/x32 syscalls, while the x86 specific script coalesced them into one. Adjust the code accordingly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-3-masahiroy@kernel.org
2021-05-20x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*Masahiro Yamada
The SYSCALL macros are mapped to symbols as follows: __SYSCALL_COMMON(nr, sym) --> __x64_<sym> __SYSCALL_X32(nr, sym) --> __x32_<sym> Originally, the syscalls in the x32 special range (512-547) were all compat. This assumption is now broken after the following commits: 55db9c0e8534 ("net: remove compat_sys_{get,set}sockopt") 5f764d624a89 ("fs: remove the compat readv/writev syscalls") 598b3cec831f ("fs: remove compat_sys_vmsplice") c3973b401ef2 ("mm: remove compat_process_vm_{readv,writev}") Those commits redefined __x32_sys_* to __x64_sys_* because there is no stub like __x32_sys_*. Defining them as follows is more sensible and cleaner. __SYSCALL_COMMON(nr, sym) --> __x64_<sym> __SYSCALL_X32(nr, sym) --> __x64_<sym> This works because both x86_64 and x32 use the same ABI (RDI, RSI, RDX, R10, R8, R9) The ugly #define __x32_sys_* will go away. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210517073815.97426-2-masahiroy@kernel.org
2021-05-19x86/sev-es: Use __put_user()/__get_user() for data accessesJoerg Roedel
The put_user() and get_user() functions do checks on the address which is passed to them. They check whether the address is actually a user-space address and whether its fine to access it. They also call might_fault() to indicate that they could fault and possibly sleep. All of these checks are neither wanted nor needed in the #VC exception handler, which can be invoked from almost any context and also for MMIO instructions from kernel space on kernel memory. All the #VC handler wants to know is whether a fault happened when the access was tried. This is provided by __put_user()/__get_user(), which just do the access no matter what. Also add comments explaining why __get_user() and __put_user() are the best choice here and why it is safe to use them in this context. Also explain why copy_to/from_user can't be used. In addition, also revert commit 7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly") because using __get_user()/__put_user() fixes the same problem while the above commit introduced several problems: 1) It uses access_ok() which is only allowed in task context. 2) It uses memcpy() which has no fault handling at all and is thus unsafe to use here. [ bp: Fix up commit ID of the reverted commit above. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org
2021-05-19x86/sev-es: Forward page-faults which happen during emulationJoerg Roedel
When emulating guest instructions for MMIO or IOIO accesses, the #VC handler might get a page-fault and will not be able to complete. Forward the page-fault in this case to the correct handler instead of killing the machine. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org
2021-05-19x86/sev-es: Don't return NULL from sev_es_get_ghcb()Joerg Roedel
sev_es_get_ghcb() is called from several places but only one of them checks the return value. The reaction to returning NULL is always the same: calling panic() and kill the machine. Instead of adding checks to all call sites, move the panic() into the function itself so that it will no longer return NULL. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org
2021-05-19x86/build: Fix location of '-plugin-opt=' flagsNathan Chancellor
Commit b33fff07e3e3 ("x86, build: allow LTO to be selected") added a couple of '-plugin-opt=' flags to KBUILD_LDFLAGS because the code model and stack alignment are not stored in LLVM bitcode. However, these flags were added to KBUILD_LDFLAGS prior to the emulation flag assignment, which uses ':=', so they were overwritten and never added to $(LD) invocations. The absence of these flags caused misalignment issues in the AMDGPU driver when compiling with CONFIG_LTO_CLANG, resulting in general protection faults. Shuffle the assignment below the initial one so that the flags are properly passed along and all of the linker flags stay together. At the same time, avoid any future issues with clobbering flags by changing the emulation flag assignment to '+=' since KBUILD_LDFLAGS is already defined with ':=' in the main Makefile before being exported for modification here as a result of commit: ce99d0bf312d ("kbuild: clear LDFLAGS in the top Makefile") Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected") Reported-by: Anthony Ruhier <aruhier@mailbox.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Anthony Ruhier <aruhier@mailbox.org> Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1374 Link: https://lore.kernel.org/r/20210518190106.60935-1-nathan@kernel.org
2021-05-19x86/signal: Detect and prevent an alternate signal stack overflowChang S. Bae
The kernel pushes context on to the userspace stack to prepare for the user's signal handler. When the user has supplied an alternate signal stack, via sigaltstack(2), it is easy for the kernel to verify that the stack size is sufficient for the current hardware context. Check if writing the hardware context to the alternate stack will exceed it's size. If yes, then instead of corrupting user-data and proceeding with the original signal handler, an immediate SIGSEGV signal is delivered. Refactor the stack pointer check code from on_sig_stack() and use the new helper. While the kernel allows new source code to discover and use a sufficient alternate signal stack size, this check is still necessary to protect binaries with insufficient alternate signal stack size from data corruption. Fixes: c2bc11f10a39 ("x86, AVX-512: Enable AVX-512 States Context Switch") Reported-by: Florian Weimer <fweimer@redhat.com> Suggested-by: Jann Horn <jannh@google.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531
2021-05-19x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZChang S. Bae
Historically, signal.h defines MINSIGSTKSZ (2KB) and SIGSTKSZ (8KB), for use by all architectures with sigaltstack(2). Over time, the hardware state size grew, but these constants did not evolve. Today, literal use of these constants on several architectures may result in signal stack overflow, and thus user data corruption. A few years ago, the ARM team addressed this issue by establishing getauxval(AT_MINSIGSTKSZ). This enables the kernel to supply a value at runtime that is an appropriate replacement on current and future hardware. Add getauxval(AT_MINSIGSTKSZ) support to x86, analogous to the support added for ARM in 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv"). Also, include a documentation to describe x86-specific auxiliary vectors. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-4-chang.seok.bae@intel.com
2021-05-19x86/signal: Introduce helpers to get the maximum signal frame sizeChang S. Bae
Signal frames do not have a fixed format and can vary in size when a number of things change: supported XSAVE features, 32 vs. 64-bit apps, etc. Add support for a runtime method for userspace to dynamically discover how large a signal stack needs to be. Introduce a new variable, max_frame_size, and helper functions for the calculation to be used in a new user interface. Set max_frame_size to a system-wide worst-case value, instead of storing multiple app-specific values. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lkml.kernel.org/r/20210518200320.17239-3-chang.seok.bae@intel.com
2021-05-18signal: Deliver all of the siginfo perf data in _perfEric W. Biederman
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18siginfo: Move si_trapno inside the union inside _si_faultEric W. Biederman
It turns out that linux uses si_trapno very sparingly, and as such it can be considered extra information for a very narrow selection of signals, rather than information that is present with every fault reported in siginfo. As such move si_trapno inside the union inside of _si_fault. This results in no change in placement, and makes it eaiser to extend _si_fault in the future as this reduces the number of special cases. In particular with si_trapno included in the union it is no longer a concern that the union must be pointer aligned on most architectures because the union follows immediately after si_addr which is a pointer. This change results in a difference in siginfo field placement on sparc and alpha for the fields si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. These architectures do not implement the signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. Further these architecture have not yet implemented the userspace that would use si_perf. The point of this change is in fact to correct these placement issues before sparc or alpha grow userspace that cares. This change was discussed[1] and the agreement is that this change is currently safe. [1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18x86/bus_lock: Set rate limit for bus lockFenghua Yu
A bus lock can be thousands of cycles slower than atomic operation within one cache line. It also disrupts performance on other cores. Malicious users can generate multiple bus locks to degrade the whole system performance. The current mitigation is to kill the offending process, but for certain scenarios it's desired to identify and throttle the offending application. Add a system wide rate limit for bus locks. When the system detects bus locks at a rate higher than N/sec (where N can be set by the kernel boot argument in the range [1..1000]) any task triggering a bus lock will be forced to sleep for at least 20ms until the overall system rate of bus locks drops below the threshold. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210419214958.4035512-3-fenghua.yu@intel.com
2021-05-18x86/idt: Rework IDT setup for boot CPUThomas Gleixner
A basic IDT setup for the boot CPU has to be done before invoking cpu_init() because that might trigger #GP when accessing certain MSRs. This setup cannot install the IST variants on 64-bit because the TSS setup which is required for ISTs to work happens in cpu_init(). That leaves a theoretical window where a NMI would invoke the ASM entry point which relies on IST being enabled on the kernel stack which is undefined behaviour. This setup logic has never worked correctly, but on the other hand a NMI hitting the boot CPU before it has fully set up the IDT would be fatal anyway. So the small window between the wrong NMI gate and the IST based NMI gate is not really adding a substantial amount of risk. But the setup logic is nevertheless more convoluted than necessary. The recent separation of the TSS setup into a separate function to ensure that setup so it can setup TSS first, then initialize IDT with the IST variants before invoking cpu_init() and get rid of the post cpu_init() IST setup. Move the invocation of cpu_init_exception_handling() ahead of idt_setup_traps() and merge the IST setup into the default setup table. Reported-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210507114000.569244755@linutronix.de
2021-05-18x86/cpu: Init AP exception handling from cpu_init_secondary()Borislav Petkov
SEV-ES guests require properly setup task register with which the TSS descriptor in the GDT can be located so that the IST-type #VC exception handler which they need to function properly, can be executed. This setup needs to happen before attempting to load microcode in ucode_cpu_init() on secondary CPUs which can cause such #VC exceptions. Simplify the machinery by running that exception setup from a new function cpu_init_secondary() and explicitly call cpu_init_exception_handling() for the boot CPU before cpu_init(). The latter prepares for fixing and simplifying the exception/IST setup on the boot CPU. There should be no functional changes resulting from this patch. [ tglx: Reworked it so cpu_init_exception_handling() stays seperate ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/87k0o6gtvu.ffs@nanos.tec.linutronix.de
2021-05-18perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICXAlexander Antonov
This patch enables I/O stacks to IIO PMON mapping on Icelake server. Mapping of IDs in SAD_CONTROL_CFG notation to IDs in PMON notation for Icelake server: Stack Name | CBDMA/DMI | PCIe_1 | PCIe_2 | PCIe_3 | PCIe_4 | PCIe_5 SAD_CONTROL_CFG ID | 0 | 1 | 2 | 3 | 4 | 5 PMON ID | 5 | 0 | 1 | 2 | 3 | 4 I/O stacks to IIO PMON mapping is exposed through attributes /sys/devices/uncore_iio_<pmu_idx>/dieX, where dieX is file which holds "Segment:Root Bus" for PCIe root port which can be monitored by that IIO PMON block. Example for 2-S Icelake server: ==> /sys/devices/uncore_iio_0/die0 <== 0000:16 ==> /sys/devices/uncore_iio_0/die1 <== 0000:97 ==> /sys/devices/uncore_iio_1/die0 <== 0000:30 ==> /sys/devices/uncore_iio_1/die1 <== 0000:b0 ==> /sys/devices/uncore_iio_3/die0 <== 0000:4a ==> /sys/devices/uncore_iio_3/die1 <== 0000:c9 ==> /sys/devices/uncore_iio_4/die0 <== 0000:64 ==> /sys/devices/uncore_iio_4/die1 <== 0000:e2 ==> /sys/devices/uncore_iio_5/die0 <== 0000:00 ==> /sys/devices/uncore_iio_5/die1 <== 0000:80 Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210426131614.16205-4-alexander.antonov@linux.intel.com
2021-05-18perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNRAlexander Antonov
I/O stacks to PMON mapping on Skylake server relies on topology information from CPU_BUS_NO MSR but this approach is not applicable for SNR and ICX. Mapping on these platforms can be gotten by reading SAD_CONTROL_CFG CSR from Mesh2IIO device with 0x09a2 DID. SAD_CONTROL_CFG CSR contains stack IDs in its own notation which are statically mapped on IDs in PMON notation. The map for Snowridge: Stack Name | CBDMA/DMI | PCIe Gen 3 | DLB | NIS | QAT SAD_CONTROL_CFG ID | 0 | 1 | 2 | 3 | 4 PMON ID | 1 | 4 | 3 | 2 | 0 This patch enables I/O stacks to IIO PMON mapping on Snowridge. Mapping is exposed through attributes /sys/devices/uncore_iio_<pmu_idx>/dieX, where dieX is file which holds "Segment:Root Bus" for PCIe root port which can be monitored by that IIO PMON block. Example for Snowridge: ==> /sys/devices/uncore_iio_0/die0 <== 0000:f3 ==> /sys/devices/uncore_iio_1/die0 <== 0000:00 ==> /sys/devices/uncore_iio_2/die0 <== 0000:eb ==> /sys/devices/uncore_iio_3/die0 <== 0000:e3 ==> /sys/devices/uncore_iio_4/die0 <== 0000:14 Mapping for Icelake server will be enabled in the follow-up patch. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210426131614.16205-3-alexander.antonov@linux.intel.com
2021-05-18perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedureAlexander Antonov
Currently I/O stacks to IIO PMON mapping is available on Skylake servers only and need to make code more general to easily enable further platforms. So, introduce get_topology() callback in struct intel_uncore_type which allows to move common code to separate function and make mapping procedure more general. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210426131614.16205-2-alexander.antonov@linux.intel.com
2021-05-18perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic contextLike Xu
If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional might_sleep_if() deep in kmem_cache_alloc() will generate the following trace, and potentially cause a deadlock when another LBR event is added: [] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 [] Call Trace: [] kmem_cache_alloc+0x36/0x250 [] intel_pmu_lbr_add+0x152/0x170 [] x86_pmu_add+0x83/0xd0 Make it symmetric with the release_lbr_buffers() call and mirror the existing DS buffers. Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simplified] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210430052247.3079672-2-like.xu@linux.intel.com
2021-05-18perf/x86: Avoid touching LBR_TOS MSR for Arch LBRLike Xu
The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). In a guest that should support Architecture LBR, check_msr() will be a non-related check for the architecture MSR 0x0 (IA32_P5_MC_ADDR) that is also not supported by KVM. The failure will cause x86_pmu.lbr_nr = 0, thereby preventing the initialization of the guest Arch LBR. Fix it by avoiding this extraneous check in intel_pmu_init() for Arch LBR. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simpler still] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com
2021-05-18x86/sev-es: Invalidate the GHCB after completing VMGEXITTom Lendacky
Since the VMGEXIT instruction can be issued from userspace, invalidate the GHCB after performing VMGEXIT processing in the kernel. Invalidation is only required after userspace is available, so call vc_ghcb_invalidate() from sev_es_put_ghcb(). Update vc_ghcb_invalidate() to additionally clear the GHCB exit code so that it is always presented as 0 when VMGEXIT has been issued by anything else besides the kernel. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/5a8130462e4f0057ee1184509cd056eedd78742b.1621273353.git.thomas.lendacky@amd.com
2021-05-18x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patchTom Lendacky
Move the location of sev_es_put_ghcb() in preparation for an update to it in a follow-on patch. This will better highlight the changes being made to the function. No functional change. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/8c07662ec17d3d82e5c53841a1d9e766d3bdbab6.1621273353.git.thomas.lendacky@amd.com
2021-05-17Merge drm/drm-next into drm-intel-nextRodrigo Vivi
Time to get back in sync... Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-05-17x86/acpi: Switch to pr_xxx log functionsHeiner Kallweit
Switching to pr_debug et al has two benefits: - We don't have to add PREFIX to each log statement - Debug output is suppressed except DEBUG is defined or dynamic debugging is enabled for the respective code piece. In addition ensure that longer messages aren't split to multiple lines in source code, checkpatch complains otherwise. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-17quota: Disable quotactl_path syscallJan Kara
In commit fa8b90070a80 ("quota: wire up quotactl_path") we have wired up new quotactl_path syscall. However some people in LWN discussion have objected that the path based syscall is missing dirfd and flags argument which is mostly standard for contemporary path based syscalls. Indeed they have a point and after a discussion with Christian Brauner and Sascha Hauer I've decided to disable the syscall for now and update its API. Since there is no userspace currently using that syscall and it hasn't been released in any major release, we should be fine. CC: Christian Brauner <christian.brauner@ubuntu.com> CC: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-17Merge tag 'kvmarm-fixes-5.13-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.13, take #1 - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code
2021-05-16Merge tag 'timers-urgent-2021-05-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for timers: - Use the ALARM feature check in the alarmtimer core code insted of the old method of checking for the set_alarm() callback. Drivers can have that callback set but the feature bit cleared. If such a RTC device is selected then alarms wont work. - Use a proper define to let the preprocessor check whether Hyper-V VDSO clocksource should be active. The code used a constant in an enum with #ifdef, which evaluates to always false and disabled the clocksource for VDSO" * tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86 alarmtimer: Check RTC features instead of ops
2021-05-16Merge tag 'x86_urgent_for_v5.13_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The three SEV commits are not really urgent material. But we figured since getting them in now will avoid a huge amount of conflicts between future SEV changes touching tip, the kvm and probably other trees, sending them to you now would be best. The idea is that the tip, kvm etc branches for 5.14 will all base ontop of -rc2 and thus everything will be peachy. What is more, those changes are purely mechanical and defines movement so they should be fine to go now (famous last words). Summary: - Enable -Wundef for the compressed kernel build stage - Reorganize SEV code to streamline and simplify future development" * tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed: Enable -Wundef x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG x86/sev: Move GHCB MSR protocol and NAE definitions in a common header x86/sev-es: Rename sev-es.{ch} to sev.{ch}
2021-05-14clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86Vitaly Kuznetsov
Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029) the commit e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") broke vDSO on x86. The problem appears to be that VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and '#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not a define). Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead. Fixes: e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") Reported-by: Mohammed Gamal <mgamal@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
2021-05-14x86/cpu: Fix core name for Sapphire RapidsAndi Kleen
Sapphire Rapids uses Golden Cove, not Willow Cove. Fixes: 53375a5a218e ("x86/cpu: Resort and comment Intel models") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210513163904.3083274-1-ak@linux.intel.com
2021-05-14jump_label/x86: Remove unused JUMP_LABEL_NOP_SIZEPeter Zijlstra
JUMP_LABEL_NOP_SIZE is now unused, remove it. Fixes: 001951bea748 ("jump_label, x86: Add variable length patching support") Reported-by: Miroslav Benes <mbenes@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/YJ00zxsvocDV5vLU@hirez.programming.kicks-ass.net
2021-05-14x86/asm: Make <asm/asm.h> valid on cross-builds as wellIngo Molnar
Stephen Rothwell reported that the objtool cross-build breaks on non-x86 hosts: > tools/arch/x86/include/asm/asm.h:185:24: error: invalid register name for 'current_stack_pointer' > 185 | register unsigned long current_stack_pointer asm(_ASM_SP); > | ^~~~~~~~~~~~~~~~~~~~~ The PowerPC host obviously doesn't know much about x86 register names. Protect the kernel-specific bits of <asm/asm.h>, so that it can be included by tooling and cross-built. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-13x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen ↵Huang Rui
generations Some AMD Ryzen generations has different calculation method on maximum performance. 255 is not for all ASICs, some specific generations should use 166 as the maximum performance. Otherwise, it will report incorrect frequency value like below: ~ → lscpu | grep MHz CPU MHz: 3400.000 CPU max MHz: 7228.3198 CPU min MHz: 2200.0000 [ mingo: Tidied up whitespace use. ] [ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ] Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Fixed-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-12x86/boot/compressed: Enable -WundefNick Desaulniers
A discussion around -Wundef showed that there were still a few boolean Kconfigs where #if was used rather than #ifdef to guard different code. Kconfig doesn't define boolean configs, which can result in -Wundef warnings. arch/x86/boot/compressed/Makefile resets the CFLAGS used for this directory, and doesn't re-enable -Wundef as the top level Makefile does. If re-added, with RANDOMIZE_BASE and X86_NEED_RELOCS disabled, the following warnings are visible. arch/x86/boot/compressed/misc.h:82:5: warning: 'CONFIG_RANDOMIZE_BASE' is not defined, evaluates to 0 [-Wundef] ^ arch/x86/boot/compressed/misc.c:175:5: warning: 'CONFIG_X86_NEED_RELOCS' is not defined, evaluates to 0 [-Wundef] ^ Simply fix these and re-enable this warning for this directory. Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20210422190450.3903999-1-ndesaulniers@google.com
2021-05-12x86: Fix leftover comment typosIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-12Merge branch 'linus' into x86/cleanups, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-12jump_label, x86: Allow short NOPsPeter Zijlstra
Now that objtool is able to rewrite jump_label instructions, have the compiler emit a JMP, such that it can decide on the optimal encoding, and set jump_entry::key bit1 to indicate that objtool should rewrite the instruction to a matching NOP. For x86_64-allyesconfig this gives: jl\ NOP JMP short: 22997 124 long: 30874 90 IOW, we save (22997+124) * 3 bytes of kernel text in hotpaths. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210506194158.216763632@infradead.org
2021-05-12jump_label, x86: Emit short JMPPeter Zijlstra
Now that we can patch short JMP/NOP, allow the compiler/assembler to emit short JMP instructions. There is no way to have the assembler emit short NOPs based on the potential displacement, so leave those long for now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210506194157.967034497@infradead.org