summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2019-06-28x86/hpet: Remove not required includesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132435.348089155@linutronix.de
2019-06-28x86/hpet: Decapitalize and rename EVT_TO_HPET_DEVThomas Gleixner
It's a function not a macro and the upcoming changes use channel for the individual hpet timer units to allow a step by step refactoring approach. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132435.241032433@linutronix.de
2019-06-28x86/hpet: Simplify counter validationThomas Gleixner
There is no point to loop for 200k TSC cycles to check afterwards whether the HPET counter is working. Read the counter inside of the loop and break out when the counter value changed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132435.149535103@linutronix.de
2019-06-28x86/hpet: Separate counter check out of clocksource register codeThomas Gleixner
The init code checks whether the HPET counter works late in the init function when the clocksource is registered. That should happen right with the other sanity checks. Split it into a separate validation function and move it to the other sanity checks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132435.058540608@linutronix.de
2019-06-28x86/hpet: Shuffle code around for readability sakeThomas Gleixner
It doesn't make sense to have init functions in the middle of other code. Aside of that, further changes in that area create horrible diffs if the code stays where it is. No functional change Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.951733064@linutronix.de
2019-06-28x86/hpet: Move static and global variables to one placeThomas Gleixner
Having static and global variables sprinkled all over the code is just annoying to read. Move them all to the top of the file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.860549134@linutronix.de
2019-06-28x86/hpet: Sanitize stub functionsThomas Gleixner
Mark them inline and remove the pointless 'return;' statement. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.754768274@linutronix.de
2019-06-28x86/hpet: Mark init functions __initThomas Gleixner
They are only called from init code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.645357869@linutronix.de
2019-06-28x86/hpet: Remove the unused hpet_msi_read() functionThomas Gleixner
No users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.553729327@linutronix.de
2019-06-28x86/hpet: Remove unused parameter from hpet_next_event()Thomas Gleixner
The clockevent device pointer is not used in this function. While at it, rename the misnamed 'timer' parameter to 'channel', which makes it clear what this parameter means. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.447880978@linutronix.de
2019-06-28x86/hpet: Remove pointless x86-64 specific #includeThomas Gleixner
Nothing requires asm/pgtable.h here anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.339011567@linutronix.de
2019-06-28x86/hpet: Restructure init codeThomas Gleixner
As a preparatory change for further consolidation, restructure the HPET init code so it becomes more readable. Fix up misleading and stale comments and rename variables so they actually make sense. No intended functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.247842972@linutronix.de
2019-06-28x86/hpet: Replace printk(KERN...) with pr_...()Thomas Gleixner
And sanitize the format strings while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.140411339@linutronix.de
2019-06-28x86/hpet: Simplify CPU online codeThomas Gleixner
The indirection via work scheduled on the upcoming CPU was necessary with the old hotplug code because the online callback was invoked on the control CPU not on the upcoming CPU. The rework of the CPU hotplug core guarantees that the online callbacks are invoked on the upcoming CPU. Remove the now pointless work redirection. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/20190623132434.047254075@linutronix.de
2019-06-28x86/unwind/orc: Fall back to using frame pointers for generated codeJosh Poimboeuf
The ORC unwinder can't unwind through BPF JIT generated code because there are no ORC entries associated with the code. If an ORC entry isn't available, try to fall back to frame pointers. If BPF and other generated code always do frame pointer setup (even with CONFIG_FRAME_POINTERS=n) then this will allow ORC to unwind through most generated code despite there being no corresponding ORC entries. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/b6f69208ddff4343d56b7bfac1fc7cfcd62689e8.1561595111.git.jpoimboe@redhat.com
2019-06-27x86/tls: Fix possible spectre-v1 in do_get_thread_area()Dianzhang Chen
The index to access the threads tls array is controlled by userspace via syscall: sys_ptrace(), hence leading to a potential exploitation of the Spectre variant 1 vulnerability. The index can be controlled from: ptrace -> arch_ptrace -> do_get_thread_area. Fix this by sanitizing the user supplied index before using it to access the p->thread.tls_array. Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1561524630-3642-1-git-send-email-dianzhangchen0@gmail.com
2019-06-27x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()Dianzhang Chen
The index to access the threads ptrace_bps is controlled by userspace via syscall: sys_ptrace(), hence leading to a potential exploitation of the Spectre variant 1 vulnerability. The index can be controlled from: ptrace -> arch_ptrace -> ptrace_get_debugreg. Fix this by sanitizing the user supplied index before using it access thread->ptrace_bps. Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1561476617-3759-1-git-send-email-dianzhangchen0@gmail.com
2019-06-27x86/jailhouse: Mark jailhouse_x2apic_available() as __initZhenzhong Duan
.. as it is only called at early bootup stage. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Borislav Petkov <bp@alien8.de> Cc: jailhouse-dev@googlegroups.com Link: https://lkml.kernel.org/r/1561539289-29180-1-git-send-email-zhenzhong.duan@oracle.com
2019-06-26x86/speculation: Allow guests to use SSBD even if host does notAlejandro Jimenez
The bits set in x86_spec_ctrl_mask are used to calculate the guest's value of SPEC_CTRL that is written to the MSR before VMENTRY, and control which mitigations the guest can enable. In the case of SSBD, unless the host has enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in the kernel parameters), the SSBD bit is not set in the mask and the guest can not properly enable the SSBD always on mitigation mode. This has been confirmed by running the SSBD PoC on a guest using the SSBD always on mitigation mode (booted with kernel parameter "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable unless the host is also using SSBD always on mode. In addition, the guest OS incorrectly reports the SSB vulnerability as mitigated. Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports it, allowing the guest to use SSBD whether or not the host has chosen to enable the mitigation in any of its modes. Fixes: be6fcb5478e9 ("x86/bugs: Rework spec_ctrl base and mask logic") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: bp@alien8.de Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.com
2019-06-26x86/kexec: Make variable static and config dependentTiezhu Yang
The following sparse warning is emitted: arch/x86/kernel/crash.c:59:15: warning: symbol 'crash_zero_bytes' was not declared. Should it be static? The variable is only used in this compilation unit, but it is also only used when CONFIG_KEXEC_FILE is enabled. Just making it static would result in a 'defined but not used' warning for CONFIG_KEXEC_FILE=n. Make it static and move it into the existing CONFIG_KEXEC_FILE section. [ tglx: Massaged changelog and moved it into the existing ifdef ] Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dave Young <dyoung@redhat.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: kexec@lists.infradead.org Cc: vgoyal@redhat.com Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/117ef0c6.3d30.16b87c9cfbf.Coremail.kernelpatch@126.com
2019-06-26x86/boot/64: Add missing fixup_pointer() for next_early_pgt accessKirill A. Shutemov
__startup_64() uses fixup_pointer() to access global variables in a position-independent fashion. Access to next_early_pgt was wrapped into the helper, but one instance in the 5-level paging branch was missed. GCC generates a R_X86_64_PC32 PC-relative relocation for the access which doesn't trigger the issue, but Clang emmits a R_X86_64_32S which leads to an invalid memory access and system reboot. Fixes: 187e91fe5e91 ("x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Link: https://lkml.kernel.org/r/20190620112422.29264-1-kirill.shutemov@linux.intel.com
2019-06-26x86/boot/64: Fix crash if kernel image crosses page table boundaryKirill A. Shutemov
A kernel which boots in 5-level paging mode crashes in a small percentage of cases if KASLR is enabled. This issue was tracked down to the case when the kernel image unpacks in a way that it crosses an 1G boundary. The crash is caused by an overrun of the PMD page table in __startup_64() and corruption of P4D page table allocated next to it. This particular issue is not visible with 4-level paging as P4D page tables are not used. But the P4D and the PUD calculation have similar problems. The PMD index calculation is wrong due to operator precedence, which fails to confine the PMDs in the PMD array on wrap around. The P4D calculation for 5-level paging and the PUD calculation calculate the first index correctly, but then blindly increment it which causes the same issue when a kernel image is located across a 512G and for 5-level paging across a 46T boundary. This wrap around mishandling was introduced when these parts moved from assembly to C. Restore it to the correct behaviour. Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190620112345.28833-1-kirill.shutemov@linux.intel.com
2019-06-25x86/alternatives: Add int3_emulate_call() selftestPeter Zijlstra
Given that the entry_*.S changes for this functionality are somewhat tricky, make sure the paths are tested every boot, instead of on the rare occasion when we trip an INT3 while rewriting text. Requested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25x86/stackframe/32: Allow int3_emulate_push()Peter Zijlstra
Now that x86_32 has an unconditional gap on the kernel stack frame, the int3_emulate_push() thing will work without further changes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25x86/stackframe/32: Provide consistent pt_regsPeter Zijlstra
Currently pt_regs on x86_32 has an oddity in that kernel regs (!user_mode(regs)) are short two entries (esp/ss). This means that any code trying to use them (typically: regs->sp) needs to jump through some unfortunate hoops. Change the entry code to fix this up and create a full pt_regs frame. This then simplifies various trampolines in ftrace and kprobes, the stack unwinder, ptrace, kdump and kgdb. Much thanks to Josh for help with the cleanups! Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25x86/stackframe, x86/ftrace: Add pt_regs frame annotationsPeter Zijlstra
When CONFIG_FRAME_POINTER, we should mark pt_regs frames. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25x86/stackframe, x86/kprobes: Fix frame pointer annotationsPeter Zijlstra
The kprobe trampolines have a FRAME_POINTER annotation that makes no sense. It marks the frame in the middle of pt_regs, at the place of saving BP. Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER from the respective entry_*.S. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25Merge tag 'v5.2-rc6' into x86/asm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25x86/build: Remove redundant 'clean-files += capflags.c'Masahiro Yamada
All the files added to 'targets' are cleaned. Adding the same file to both 'targets' and 'clean-files' is redundant. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190625073311.18303-1-yamada.masahiro@socionext.com
2019-06-25x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.cMasahiro Yamada
Without 'set -e', shell scripts continue running even after any error occurs. The missed 'set -e' is a typical bug in shell scripting. For example, when a disk space shortage occurs while this script is running, it actually ends up with generating a truncated capflags.c. Yet, mkcapflags.sh continues running and exits with 0. So, the build system assumes it has succeeded. It will not be re-generated in the next invocation of Make since its timestamp is newer than that of any of the source files. Add 'set -e' so that any error in this script is caught and propagated to the build system. Since 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special target"), make automatically deletes the target on any failure. So, the broken capflags.c will be deleted automatically. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190625072622.17679-1-yamada.masahiro@socionext.com
2019-06-25x86/resctrl: Cleanup cbm_ensure_valid()Reinette Chatre
A recent fix to the cbm_ensure_valid() function left some coding style issues that are now addressed: - Return a value instead of using a function parameter as input and output - Use if (!val) instead of if (val == 0) - Follow reverse fir tree ordering of variable declarations Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/15ba03856f1d944468ee6f44e3fd7aa548293ede.1561408280.git.reinette.chatre@intel.com
2019-06-25Merge branch 'x86/urgent' into x86/cacheThomas Gleixner
Pick up pending upstream fixes to meet dependencies
2019-06-25x86/jump_label: Make tp_vec_nr staticYueHaibing
Fix sparse warning: arch/x86/kernel/jump_label.c:106:5: warning: symbol 'tp_vec_nr' was not declared. Should it be static? It's only used in jump_label.c, so make it static. Fixes: ba54f0c3f7c4 ("x86/jump_label: Batch jump label updates") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <bp@alien8.de> Cc: <hpa@zytor.com> Cc: <peterz@infradead.org> Cc: <bristot@redhat.com> Cc: <namit@vmware.com> Link: https://lkml.kernel.org/r/20190625034548.26392-1-yuehaibing@huawei.com
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/regs: Check reserved bitsKan Liang
The perf fuzzer triggers a warning which map to: if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset))) return 0; The bits between XMM registers and generic registers are reserved. But perf_reg_validate() doesn't check these bits. Add PERF_REG_X86_RESERVED for reserved bits on X86. Check the reserved bits in perf_reg_validate(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24x86/umwait: Add sysfs interface to control umwait maximum timeFenghua Yu
IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta that processor can stay in C0.1 or C0.2. A zero value means no maximum time. Each instruction sets its own deadline in the instruction's implicit input EDX:EAX value. The instruction wakes up if the time-stamp counter reaches or exceeds the specified deadline, or the umwait maximum time expires, or a store happens in the monitored address range in umwait. The administrator can write an unsigned 32-bit number to /sys/devices/system/cpu/umwait_control/max_time to change the default value. Note that a value of zero means there is no limit. The lower two bits of the value must be zero. [ tglx: Simplify the write function. Massage changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Andy Lutomirski" <luto@kernel.org> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
2019-06-24x86/umwait: Add sysfs interface to control umwait C0.2 stateFenghua Yu
C0.2 state in umwait and tpause instructions can be enabled or disabled on a processor through IA32_UMWAIT_CONTROL MSR register. By default, C0.2 is enabled and the user wait instructions results in lower power consumption with slower wakeup time. But in real time systems which require faster wakeup time although power savings could be smaller, the administrator needs to disable C0.2 and all umwait invocations from user applications use C0.1. Create a sysfs interface which allows the administrator to control C0.2 state during run time. Andy Lutomirski suggested to turn off local irqs before writing the MSR to ensure the cached control value is not changed by a concurrent sysfs write from a different CPU via IPI. [ tglx: Simplified the update logic in the write function and got rid of all the convoluted type casts. Added a shared update function and made the namespace consistent. Moved the sysfs create invocation. Massaged changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Andy Lutomirski" <luto@kernel.org> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
2019-06-24x86/umwait: Initialize umwait control valuesFenghua Yu
umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-23x86/apic: Use non-atomic operations when possibleNadav Amit
Using __clear_bit() and __cpumask_clear_cpu() is more efficient than using their atomic counterparts. Use them when atomicity is not needed, such as when manipulating bitmasks that are on the stack. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190613064813.8102-10-namit@vmware.com
2019-06-22x86/vdso: Switch to generic vDSO implementationVincenzo Frascino
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library. Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@arm.com
2019-06-22x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUsKonstantin Khlebnikov
Since commit 7d5905dc14a8 ("x86 / CPU: Always show current CPU frequency in /proc/cpuinfo") open and read of /proc/cpuinfo sends IPI to all CPUs. Many applications read /proc/cpuinfo at the start for trivial reasons like counting cores or detecting cpu features. While sensitive workloads like DPDK network polling don't like any interrupts. Integrates this feature with cpu isolation and do not send IPIs to CPUs without housekeeping flag HK_FLAG_MISC (set by nohz_full). Code that requests cpu frequency like show_cpuinfo() falls back to the last frequency set by the cpufreq driver if this method returns 0. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/155790354043.1104.15333317408370209.stgit@buzz
2019-06-22x86/apic: Fix integer overflow on 10 bit left shift of cpu_khzColin Ian King
The left shift of unsigned int cpu_khz will overflow for large values of cpu_khz, so cast it to a long long before shifting it to avoid overvlow. For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com
2019-06-22x86/asm: Pin sensitive CR4 bitsKees Cook
Several recent exploits have used direct calls to the native_write_cr4() function to disable SMEP and SMAP before then continuing their exploits using userspace memory access. Direct calls of this form can be mitigate by pinning bits of CR4 so that they cannot be changed through a common function. This is not intended to be a general ROP protection (which would require CFI to defend against properly), but rather a way to avoid trivial direct function calling (or CFI bypasses via a matching function prototype) as seen in: https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html (https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) The goals of this change: - Pin specific bits (SMEP, SMAP, and UMIP) when writing CR4. - Avoid setting the bits too early (they must become pinned only after CPU feature detection and selection has finished). - Pinning mask needs to be read-only during normal runtime. - Pinning needs to be checked after write to validate the cr4 state Using __ro_after_init on the mask is done so it can't be first disabled with a malicious write. Since these bits are global state (once established by the boot CPU and kernel boot parameters), they are safe to write to secondary CPUs before those CPUs have finished feature detection. As such, the bits are set at the first cr4 write, so that cr4 write bugs can be detected (instead of silently papered over). This uses a few bytes less storage of a location we don't have: read-only per-CPU data. A check is performed after the register write because an attack could just skip directly to the register write. Such a direct jump is possible because of how this function may be built by the compiler (especially due to the removal of frame pointers) where it doesn't add a stack frame (function exit may only be a retq without pops) which is sufficient for trivial exploitation like in the timer overwrites mentioned above). The asm argument constraints gain the "+" modifier to convince the compiler that it shouldn't make ordering assumptions about the arguments or memory, and treat them as changed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: kernel-hardening@lists.openwall.com Link: https://lkml.kernel.org/r/20190618045503.39105-3-keescook@chromium.org
2019-06-22x86/acpi/cstate: Add Zhaoxin processors support for cache flush policy in C3Tony W Wang-oc
Same as Intel, Zhaoxin MP CPUs support C3 share cache and on all recent Zhaoxin platforms ARB_DISABLE is a nop. So set related flags correctly in the same way as Intel does. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "hpa@zytor.com" <hpa@zytor.com> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net> Cc: "lenb@kernel.org" <lenb@kernel.org> Cc: David Wang <DavidWang@zhaoxin.com> Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com> Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com> Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com> Link: https://lkml.kernel.org/r/a370503660994669991a7f7cda7c5e98@zhaoxin.com
2019-06-22x86/cpu: Create Zhaoxin processors architecture support fileTony W Wang-oc
Add x86 architecture support for new Zhaoxin processors. Carve out initialization code needed by Zhaoxin processors into a separate compilation unit. To identify Zhaoxin CPU, add a new vendor type X86_VENDOR_ZHAOXIN for system recognition. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "hpa@zytor.com" <hpa@zytor.com> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net> Cc: "lenb@kernel.org" <lenb@kernel.org> Cc: David Wang <DavidWang@zhaoxin.com> Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com> Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com> Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com> Link: https://lkml.kernel.org/r/01042674b2f741b2aed1f797359bdffb@zhaoxin.com
2019-06-22x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2Andi Kleen
The kernel needs to explicitly enable FSGSBASE. So, the application needs to know if it can safely use these instructions. Just looking at the CPUID bit is not enough because it may be running in a kernel that does not enable the instructions. One way for the application would be to just try and catch the SIGILL. But that is difficult to do in libraries which may not want to overwrite the signal handlers of the main application. Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF aux vector. AT_HWCAP2 is already used by PPC for similar purposes. The application can access it open coded or by using the getauxval() function in newer versions of glibc. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bitAndy Lutomirski
Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/process/64: Use FSGSBASE instructions on thread copy and ptraceChang S. Bae
When FSGSBASE is enabled, copying threads and reading fsbase and gsbase using ptrace must read the actual values. When copying a thread, use save_fsgs() and copy the saved values. For ptrace, the bases must be read from memory regardless of the selector if FSGSBASE is enabled. [ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ] [ luto: Massage changelog ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/process/64: Use FSBSBASE in switch_to() if availableAndy Lutomirski
With the new FSGSBASE instructions, FS and GSABSE can be efficiently read and writen in __switch_to(). Use that capability to preserve the full state. This will enable user code to do whatever it wants with the new instructions without any kernel-induced gotchas. (There can still be architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if WRGSBASE was used, but users are expected to read the CPU manual before doing things like that.) This is a considerable speedup. It seems to save about 100 cycles per context switch compared to the baseline 4.6-rc1 behavior on a Skylake laptop. [ chang: 5~10% performance improvements were seen with a context switch benchmark that ran threads with different FS/GSBASE values (to the baseline 4.16). Minor edit on the changelog. ] [ tglx: Masaage changelog ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/fsgsbase/64: Enable FSGSBASE instructions in helper functionsChang S. Bae
Add cpu feature conditional FSGSBASE access to the relevant helper functions. That allows to accelerate certain FS/GS base operations in subsequent changes. Note, that while possible, the user space entry/exit GSBASE operations are not going to use the new FSGSBASE instructions. The reason is that it would require additional storage for the user space value which adds more complexity to the low level code and experiments have shown marginal benefit. This may be revisited later but for now the SWAPGS based handling in the entry code is preserved except for the paranoid entry/exit code. To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive() helpers. Note, for Xen PV, paravirt hooks can be added later as they might allow a very efficient but different implementation. [ tglx: Massaged changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com