summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2019-07-24x86/mm: Avoid redundant interrupt disable in load_mm_cr4()Jan Kiszka
load_mm_cr4() is always called with interrupts disabled from: - switch_mm_irqs_off() - refresh_pce(), which is a on_each_cpu() callback Thus, disabling interrupts in cr4_set/clear_bits() is redundant. Implement cr4_set/clear_bits_irqsoff() helpers, rename load_mm_cr4() to load_mm_cr4_irqsoff() and use the new helpers. The new helpers do not need a lockdep assert as __cr4_set() has one already. The renaming in combination with the checks in __cr4_set() ensure that any changes in the boundary conditions at the call sites will be detected. [ tglx: Massaged change log ] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/0fbbcb64-5f26-4ffb-1bb9-4f5f48426893@siemens.com
2019-07-23x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()Masahiro Yamada
__builtin_constant_p(nr) is used everywhere now. It does not make much sense to define IS_IMMEDIATE() as its alias. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190723074415.26811-1-yamada.masahiro@socionext.com
2019-07-23x86/build: Remove unneeded uapi asm-generic wrappersMasahiro Yamada
These are listed in include/uapi/asm-generic/Kbuild, so Kbuild will automatically generate them. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190723112646.14046-1-yamada.masahiro@socionext.com
2019-07-22KVM: X86: Dynamically allocate user_fpuWanpeng Li
After reverting commit 240c35a3783a (kvm: x86: Use task structs fpu field for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3) is the order at which allocations are deemed costly to service. In serveless scenario, one host can service hundreds/thoudands firecracker/kata-container instances, howerver, new instance will fail to launch after memory is too fragmented to allocate kvm_vcpu struct on host, this was observed in some cloud provider product environments. This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my Skylake server. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22Revert "kvm: x86: Use task structs fpu field for user"Paolo Bonzini
This reverts commit 240c35a3783ab9b3a0afaba0dde7291295680a6b ("kvm: x86: Use task structs fpu field for user", 2018-11-06). The commit is broken and causes QEMU's FPU state to be destroyed when KVM_RUN is preempted. Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22x86: Remove X86_FEATURE_MFENCE_RDTSCJosh Poimboeuf
AMD and Intel both have serializing lfence (X86_FEATURE_LFENCE_RDTSC). They've both had it for a long time, and AMD has had it enabled in Linux since Spectre v1 was announced. Back then, there was a proposal to remove the serializing mfence feature bit (X86_FEATURE_MFENCE_RDTSC), since both AMD and Intel have serializing lfence. At the time, it was (ahem) speculated that some hypervisors might not yet support its removal, so it remained for the time being. Now a year-and-a-half later, it should be safe to remove. I asked Andrew Cooper about whether it's still needed: So if you're virtualised, you've got no choice in the matter.  lfence is either dispatch-serialising or not on AMD, and you won't be able to change it. Furthermore, you can't accurately tell what state the bit is in, because the MSR might not be virtualised at all, or may not reflect the true state in hardware.  Worse still, attempting to set the bit may not be successful even if there isn't a fault for doing so. Xen sets the DE_CFG bit unconditionally, as does Linux by the looks of things (see MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT).  ISTR other hypervisor vendors saying the same, but I don't have any information to hand. If you are running under a hypervisor which has been updated, then lfence will almost certainly be dispatch-serialising in practice, and you'll almost certainly see the bit already set in DE_CFG.  If you're running under a hypervisor which hasn't been patched since Spectre, you've already lost in many more ways. I'd argue that X86_FEATURE_MFENCE_RDTSC is not worth keeping. So remove it. This will reduce some code rot, and also make it easier to hook barrier_nospec() up to a cmdline disable for performance raisins, without having to need an alternative_3() macro. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/d990aa51e40063acb9888e8c1b688e41355a9588.1562255067.git.jpoimboe@redhat.com
2019-07-22x86/realmode: Remove trampoline_statusPingfan Liu
There is no reader of trampoline_status, it's only written. It turns out that after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"), trampoline_status is not needed any more. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563266424-3472-1-git-send-email-kernelfans@gmail.com
2019-07-22x86/hyperv: Add functions to allocate/deallocate page for Hyper-VMaya Nakamura
Introduce two new functions, hv_alloc_hyperv_page() and hv_free_hyperv_page(), to allocate/deallocate memory with the size and alignment that Hyper-V expects as a page. Although currently they are not used, they are ready to be used to allocate/deallocate memory on x86 when their ARM64 counterparts are implemented, keeping symmetry between architectures with potentially different guest page sizes. Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1906272334560.32342@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/ Link: https://lkml.kernel.org/r/706b2e71eb3e587b5f8801e50f090fae2a00e35d.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22x86/hyperv: Create and use Hyper-V page definitionsMaya Nakamura
Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because the Linux guest page size and hypervisor page size concepts are different, even though they happen to be the same value on x86. Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE. Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lkml.kernel.org/r/e95111629abf65d016e983f72494cbf110ce605f.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22x86/cpufeatures: Enable a new AVX512 CPU featureGayatri Kammela
Add a new AVX512 instruction group/feature for enumeration in /proc/cpuinfo: AVX512_VP2INTERSECT. CPUID.(EAX=7,ECX=0):EDX[bit 8] AVX512_VP2INTERSECT Detailed information of CPUID bits for this feature can be found in the Intel Architecture Intsruction Set Extensions Programming Reference document (refer to Table 1-2). A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=204215. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190717234632.32673-3-gayatri.kammela@intel.com
2019-07-22x86/syscalls: Split the x32 syscalls into their own tableAndy Lutomirski
For unfortunate historical reasons, the x32 syscalls and the x86_64 syscalls are not all numbered the same. As an example, ioctl() is nr 16 on x86_64 but 514 on x32. This has potentially nasty consequences, since it means that there are two valid RAX values to do ioctl(2) and two invalid RAX values. The valid values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000) (i.e. ioctl(2) using the x32 ABI). The invalid values are 514 and (16 | 0x40000000). 514 will enter the "COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall() and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter the native entry point with in_compat_syscall() and in_x32_syscall() returning true. Both are bogus, and both will exercise code paths in the kernel and in any running seccomp filters that really ought to be unreachable. Splitting out the x32 syscalls into their own tables, allows both bogus invocations to return -ENOSYS. I've checked glibc, musl, and Bionic, and all of them appear to call syscalls with their correct numbers, so this change should have no effect on them. There is an added benefit going forward: new syscalls that need special handling on x32 can share the same number on x32 and x86_64. This means that the special syscall range 512-547 can be treated as a legacy wart instead of something that may need to be extended in the future. Also add a selftest to verify the new behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Make __X32_SYSCALL_BIT be unsigned longAndy Lutomirski
Currently, it's an int. This is bizarre. Fortunately, the code using it still works: ~__X32_SYSCALL_BIT is also int, so, if nr is unsigned long, then C kindly sign-extends the ~__X32_SYSCALL_BIT part, and it actually results in the desired value. This is far more subtle than it deserves to be. Syscall numbers are, for all practical purposes, unsigned long, so make __X32_SYSCALL_BIT be unsigned long. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/99b0d83ad891c67105470a1a6b63243fd63a5061.1562185330.git.luto@kernel.org
2019-07-22x86/paravirt: Drop {read,write}_cr8() hooksAndrew Cooper
There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by lapic_{suspend,resume}(). Saving and restoring cr8 independently of the rest of the Local APIC state isn't a clever thing to be doing. Delete the suspend/resume cr8 handling, which shrinks the size of struct saved_context, and allows for the removal of both PVOPS. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20190715151641.29210-1-andrew.cooper3@citrix.com
2019-07-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 specific fixes and updates: - The CR2 corruption fixes which store CR2 early in the entry code and hand the stored address to the fault handlers. - Revert a forgotten leftover of the dropped FSGSBASE series. - Plug a memory leak in the boot code. - Make the Hyper-V assist functionality robust by zeroing the shadow page. - Remove a useless check for dead processes with LDT - Update paravirt and VMware maintainers entries. - A few cleanup patches addressing various compiler warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Prevent clobbering of saved CR2 value x86/hyper-v: Zero out the VP ASSIST PAGE on allocation x86, boot: Remove multiple copy of static function sanitize_boot_params() x86/boot/compressed/64: Remove unused variable x86/boot/efi: Remove unused variables x86/mm, tracing: Fix CR2 corruption x86/entry/64: Update comments and sanity tests for create_gap x86/entry/64: Simplify idtentry a little x86/entry/32: Simplify common_exception x86/paravirt: Make read_cr2() CALLEE_SAVE MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE x86/process: Delete useless check for dead process with LDT x86: math-emu: Hide clang warnings for 16-bit overflow x86/e820: Use proper booleans instead of 0/1 x86/apic: Silence -Wtype-limits compiler warnings x86/mm: Free sme_early_buffer after init x86/boot: Fix memory leak in default_get_smp_config() Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - A collection of objtool fixes which address recent fallout partially exposed by newer toolchains, clang, BPF and general code changes. - Force USER_DS for user stack traces [ Note: the "objtool fixes" are not all to objtool itself, but for kernel code that triggers objtool warnings. Things like missing function size annotations, or code that confuses the unwinder etc. - Linus] * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) objtool: Support conditional retpolines objtool: Convert insn type to enum objtool: Fix seg fault on bad switch table entry objtool: Support repeated uses of the same C jump table objtool: Refactor jump table code objtool: Refactor sibling call detection logic objtool: Do frame pointer check before dead end check objtool: Change dead_end_function() to return boolean objtool: Warn on zero-length functions objtool: Refactor function alias logic objtool: Track original function across branches objtool: Add mcsafe_handle_tail() to the uaccess safe list bpf: Disable GCC -fgcse optimization for ___bpf_prog_run() x86/uaccess: Remove redundant CLACs in getuser/putuser error paths x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail() x86/uaccess: Remove ELF function annotation from copy_user_handle_tail() x86/head/64: Annotate start_cpu0() as non-callable x86/entry: Fix thunk function ELF sizes x86/kvm: Don't call kvm_spurious_fault() from .fixup x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2 ...
2019-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "Mostly bugfixes, but also: - s390 support for KVM selftests - LAPIC timer offloading to housekeeping CPUs - Extend an s390 optimization for overcommitted hosts to all architectures - Debugging cleanups and improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: x86: Add fixed counters to PMU filter KVM: nVMX: do not use dangling shadow VMCS after guest reset KVM: VMX: dump VMCS on failed entry KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup KVM: Boost vCPUs that are delivering interrupts KVM: selftests: Remove superfluous define from vmx.c KVM: SVM: Fix detection of AMD Errata 1096 KVM: LAPIC: Inject timer interrupt via posted interrupt KVM: LAPIC: Make lapic timer unpinned KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS kvm: x86: ioapic and apic debug macros cleanup kvm: x86: some tsc debug cleanup kvm: vmx: fix coccinelle warnings x86: kvm: avoid constant-conversion warning x86: kvm: avoid -Wsometimes-uninitized warning KVM: x86: expose AVX512_BF16 feature to guest KVM: selftests: enable pgste option for the linker on s390 KVM: selftests: Move kvm_create_max_vcpus test to generic code ...
2019-07-20KVM: x86: Add fixed counters to PMU filterEric Hankland
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: Eric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19Merge tag 'for-linus-5.3a-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Fixes and features: - A series to introduce a common command line parameter for disabling paravirtual extensions when running as a guest in virtualized environment - A fix for int3 handling in Xen pv guests - Removal of the Xen-specific tmem driver as support of tmem in Xen has been dropped (and it was experimental only) - A security fix for running as Xen dom0 (XSA-300) - A fix for IRQ handling when offlining cpus in Xen guests - Some small cleanups" * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: let alloc_xenballooned_pages() fail if not enough memory free xen/pv: Fix a boot up hang revealed by int3 self test x86/xen: Add "nopv" support for HVM guest x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete x86: Add "nopv" parameter to disable PV extensions x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init xen: remove tmem driver Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized" xen/events: fix binding user event channels to cpus
2019-07-18x86/kvm: Don't call kvm_spurious_fault() from .fixupJosh Poimboeuf
After making a change to improve objtool's sibling call detection, it started showing the following warning: arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame The problem is the ____kvm_handle_fault_on_reboot() macro. It does a fake call by pushing a fake RIP and doing a jump. That tricks the unwinder into printing the function which triggered the exception, rather than the .fixup code. Instead of the hack to make it look like the original function made the call, just change the macro so that the original function actually does make the call. This allows removal of the hack, and also makes objtool happy. I triggered a vmx instruction exception and verified that the stack trace is still sane: kernel BUG at arch/x86/kvm/x86.c:358! invalid opcode: 0000 [#1] SMP PTI CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16 Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017 RIP: 0010:kvm_spurious_fault+0x5/0x10 Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246 RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000 RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000 R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0 R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000 FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: loaded_vmcs_init+0x4f/0xe0 alloc_loaded_vmcs+0x38/0xd0 vmx_create_vcpu+0xf7/0x600 kvm_vm_ioctl+0x5e9/0x980 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? free_one_page+0x13f/0x4e0 do_vfs_ioctl+0xa4/0x630 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x1c0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa349b1ee5b Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/paravirt: Fix callee-saved function ELF sizesJosh Poimboeuf
The __raw_callee_save_*() functions have an ELF symbol size of zero, which confuses objtool and other tools. Fixes a bunch of warnings like the following: arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
2019-07-18Merge tag 'trace-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The main changes in this release include: - Add user space specific memory reading for kprobes - Allow kprobes to be executed earlier in boot The rest are mostly just various clean ups and small fixes" * tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Make trace_get_fields() global tracing: Let filter_assign_type() detect FILTER_PTR_STRING tracing: Pass type into tracing_generic_entry_update() ftrace/selftest: Test if set_event/ftrace_pid exists before writing ftrace/selftests: Return the skip code when tracing directory not configured in kernel tracing/kprobe: Check registered state using kprobe tracing/probe: Add trace_event_call accesses APIs tracing/probe: Add probe event name and group name accesses APIs tracing/probe: Add trace flag access APIs for trace_probe tracing/probe: Add trace_event_file access APIs for trace_probe tracing/probe: Add trace_event_call register API for trace_probe tracing/probe: Add trace_probe init and free functions tracing/uprobe: Set print format when parsing command tracing/kprobe: Set print format right after parsed command kprobes: Fix to init kprobes in subsys_initcall tracepoint: Use struct_size() in kmalloc() ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS ftrace: Enable trampoline when rec count returns back to one tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline tracing: Make a separate config for trace event self tests ...
2019-07-17x86/mm, tracing: Fix CR2 corruptionPeter Zijlstra
Despite the current efforts to read CR2 before tracing happens there still exist a number of possible holes: idtentry page_fault do_page_fault has_error_code=1 call error_entry TRACE_IRQS_OFF call trace_hardirqs_off* #PF // modifies CR2 CALL_enter_from_user_mode __context_tracking_exit() trace_user_exit(0) #PF // modifies CR2 call do_page_fault address = read_cr2(); /* whoopsie */ And similar for i386. Fix it by pulling the CR2 read into the entry code, before any of that stuff gets a chance to run and ruin things. Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: joel@joelfernandes.org Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17x86/paravirt: Make read_cr2() CALLEE_SAVEPeter Zijlstra
The one paravirt read_cr2() implementation (Xen) is actually quite trivial and doesn't need to clobber anything other than the return register. Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows more convenient use from assembly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.887392493@infradead.org
2019-07-17xen/pv: Fix a boot up hang revealed by int3 self testZhenzhong Duan
Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call() selftest") is used to ensure there is a gap setup in int3 exception stack which could be used for inserting call return address. This gap is missed in XEN PV int3 exception entry path, then below panic triggered: [ 0.772876] general protection fault: 0000 [#1] SMP NOPTI [ 0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11 [ 0.772893] RIP: e030:int3_magic+0x0/0x7 [ 0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246 [ 0.773334] Call Trace: [ 0.773334] alternative_instructions+0x3d/0x12e [ 0.773334] check_bugs+0x7c9/0x887 [ 0.773334] ? __get_locked_pte+0x178/0x1f0 [ 0.773334] start_kernel+0x4ff/0x535 [ 0.773334] ? set_init_arg+0x55/0x55 [ 0.773334] xen_start_kernel+0x571/0x57a For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with %rcx/%r11 on the stack. To convert back to "normal" looking exceptions, the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'. E.g. Extracting 'xen_pv_trap xenint3' we have: xen_xenint3: pop %rcx; pop %r11; jmp xenint3 As xenint3 and int3 entry code are same except xenint3 doesn't generate a gap, we can fix it by using int3 and drop useless xenint3. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17x86/xen: Add "nopv" support for HVM guestZhenzhong Duan
PVH guest needs PV extentions to work, so "nopv" parameter should be ignored for PVH but not for HVM guest. If PVH guest boots up via the Xen-PVH boot entry, xen_pvh is set early, we know it's PVH guest and ignore "nopv" parameter directly. If PVH guest boots up via the normal boot entry same as HVM guest, it's hard to distinguish PVH and HVM guest at that time. In this case, we have to panic early if PVH is detected and nopv is enabled to avoid a worse situation later. Remove static from bool_x86_init_noop/x86_op_int_noop so they could be used globally. Move xen_platform_hvm() after xen_hvm_guest_late_init() to avoid compile error. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17x86/paravirt: Remove const mark from x86_hyper_xen_hvm variableZhenzhong Duan
.. as "nopv" support needs it to be changeable at boot up stage. Checkpatch reports warning, so move variable declarations from hypervisor.c to hypervisor.h Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17x86: Add "nopv" parameter to disable PV extensionsZhenzhong Duan
In virtualization environment, PV extensions (drivers, interrupts, timers, etc) are enabled in the majority of use cases which is the best option. However, in some cases (kexec not fully working, benchmarking) we want to disable PV extensions. We have "xen_nopv" for that purpose but only for XEN. For a consistent admin experience a common command line parameter "nopv" set across all PV guest implementations is a better choice. There are guest types which just won't work without PV extensions, like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to struct hypervisor_x86 set to true for those guest types and call the detect functions only if nopv is false or ignore_nopv is true. Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __initZhenzhong Duan
.. as they are only called at early bootup stage. In fact, other functions in x86_hyper_xen_hvm.init.* are all marked as __init. Unexport xen_hvm_need_lapic as it's never used outside. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-16mm: introduce ARCH_HAS_PTE_DEVMAPRobin Murphy
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16arch/*: remove unused isa_page_to_bus()Stephen Kitt
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove it entirely. Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.org Signed-off-by: Stephen Kitt <steve@sk2.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16x86/apic: Silence -Wtype-limits compiler warningsQian Cai
There are many compiler warnings like this, In file included from ./arch/x86/include/asm/smp.h:13, from ./arch/x86/include/asm/mmzone_64.h:11, from ./arch/x86/include/asm/mmzone.h:5, from ./include/linux/mmzone.h:969, from ./include/linux/gfp.h:6, from ./include/linux/mm.h:10, from arch/x86/kernel/apic/io_apic.c:34: arch/x86/kernel/apic/io_apic.c: In function 'check_timer': ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if ((v) <= apic_verbosity) \ ^~ arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro 'apic_printk' apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X " ^~~~~~~~~~~ ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if ((v) <= apic_verbosity) \ ^~ arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro 'apic_printk' apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: " ^~~~~~~~~~~ APIC_QUIET is 0, so silence them by making apic_verbosity type int. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pw
2019-07-14Merge tag 'platform-drivers-x86-v5.3-1' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: "Gathered a bunch of x86 platform driver changes. It's rather big, since includes two big refactors and completely new driver: - ASUS WMI driver got a big refactoring in order to support the TUF Gaming laptops. Besides that, the regression with backlight being permanently off on various EeePC laptops has been fixed. - Accelerometer on HP ProBook 450 G0 shows wrong measurements due to X axis being inverted. This has been fixed. - Intel PMC core driver has been extended to be ACPI enumerated if the DSDT provides device with _HID "INT33A1". This allows to convert the driver to be pure platform and support new hardware purely based on ACPI DSDT. - From now on the Intel Speed Select Technology is supported thru a corresponding driver. This driver provides an access to the features of the ISST, such as Performance Profile, Core Power, Base frequency and Turbo Frequency. - Mellanox platform drivers has been refactored and now extended to support more systems, including new coming ones. - The OLPC XO-1.75 platform is now supported. - CB4063 Beckhoff Automation board is using PMC clocks, provided via pmc_atom driver, for ethernet controllers in a way that they can't be managed by the clock driver. The quirk has been extended to cover this case. - Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile the information of Chuwi Hi10 Air has been fixed to cover more models based on the same platform. - Xiaomi notebooks have WMI interface enabled. Thus, the driver to support it has been provided. It required some extension of the generic WMI library, which allows to propagate opaque context to the ->probe() of the individual drivers. This release includes debugfs clean up from Greg KH for several drivers that drop return code check and make debugfs absence or failure non-fatal. Also miscellaneous fixes here and there, mostly for Acer WMI and various Intel drivers" * tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86: (74 commits) platform/x86: Fix PCENGINES_APU2 Kconfig warning tools/power/x86/intel-speed-select: Add .gitignore file platform/x86: mlx-platform: Fix error handling in mlxplat_init() platform/x86: intel_pmc_core: Attach using APCI HID "INT33A1" platform/x86: intel_pmc_core: transform Pkg C-state residency from TSC ticks into microseconds platform/x86: asus-wmi: Use dev_get_drvdata() Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces platform/x86: mlx-platform: Add more reset cause attributes platform/x86: mlx-platform: Modify DMI matching order platform/x86: mlx-platform: Add regmap structure for the next generation systems platform/x86: mlx-platform: Change API for i2c-mlxcpld driver activation platform/x86: mlx-platform: Move regmap initialization before all drivers activation MAINTAINERS: Update for Intel Speed Select Technology tools/power/x86: A tool to validate Intel Speed Select commands platform/x86: ISST: Restore state on resume platform/x86: ISST: Add Intel Speed Select PUNIT MSR interface platform/x86: ISST: Add Intel Speed Select mailbox interface via MSRs platform/x86: ISST: Add Intel Speed Select mailbox interface via PCI platform/x86: ISST: Add Intel Speed Select mmio interface platform/x86: ISST: Add IOCTL to Translate Linux logical CPU to PUNIT CPU number ...
2019-07-12Merge tag 'asm-generic-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "The asm-generic changes for 5.3 consist of a cleanup series to remove ptrace.h from Christoph Hellwig, who explains: 'asm-generic/ptrace.h is a little weird in that it doesn't actually implement any functionality, but it provided multiple layers of macros that just implement trivial inline functions. We implement those directly in the few architectures and be off with a much simpler design.' at https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/" * tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: remove ptrace.h x86: don't use asm-generic/ptrace.h sh: don't use asm-generic/ptrace.h powerpc: don't use asm-generic/ptrace.h arm64: don't use asm-generic/ptrace.h
2019-07-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - support for chained PMU counters in guests - improved SError handling - handle Neoverse N1 erratum #1349291 - allow side-channel mitigation status to be migrated - standardise most AArch64 system register accesses to msr_s/mrs_s - fix host MPIDR corruption on 32bit - selftests ckleanups x86: - PMU event {white,black}listing - ability for the guest to disable host-side interrupt polling - fixes for enlightened VMCS (Hyper-V pv nested virtualization), - new hypercall to yield to IPI target - support for passing cstate MSRs through to the guest - lots of cleanups and optimizations Generic: - Some txt->rST conversions for the documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (128 commits) Documentation: virtual: Add toctree hooks Documentation: kvm: Convert cpuid.txt to .rst Documentation: virtual: Convert paravirt_ops.txt to .rst KVM: x86: Unconditionally enable irqs in guest context KVM: x86: PMU Event Filter kvm: x86: Fix -Wmissing-prototypes warnings KVM: Properly check if "page" is valid in kvm_vcpu_unmap KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane kvm: LAPIC: write down valid APIC registers KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s KVM: doc: Add API documentation on the KVM_REG_ARM_WORKAROUNDS register KVM: arm/arm64: Add save/restore support for firmware workaround state arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests KVM: arm/arm64: Support chained PMU counters KVM: arm/arm64: Remove pmc->bitmask KVM: arm/arm64: Re-create event when setting counter value KVM: arm/arm64: Extract duplicated code to own function KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functions KVM: LAPIC: ARBPRI is a reserved register for x2APIC ...
2019-07-12Merge tag 'hyperv-next-signed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyper-v updates from Sasha Levin: - Add a module description to the Hyper-V vmbus module. - Rework some vmbus code to separate architecture specifics out to arch/x86/. This is part of the work of adding arm64 support to Hyper-V. * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h drivers: hv: Add a module description line to the hv_vmbus driver
2019-07-12asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel]Mike Rapoport
Most architectures have identical or very similar implementation of pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and pte_free(). Add a generic implementation that can be reused across architectures and enable its use on x86. The generic implementation uses GFP_KERNEL | __GFP_ZERO for the kernel page tables and GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT for the user page tables. The "base" functions for PTE allocation, namely __pte_alloc_one_kernel() and __pte_alloc_one() are intended for the architectures that require additional actions after actual memory allocation or must use non-default GFP flags. x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and pte_free(). x86 still implements pte_alloc_one() to allow run-time control of GFP flags required for "userpte" command line option. Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12mm: lift the x86_32 PAE version of gup_get_pte to common codeChristoph Hellwig
The split low/high access is the only non-READ_ONCE version of gup_get_pte that did show up in the various arch implemenations. Lift it to common code and drop the ifdef based arch override. Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12mm: simplify gup_fast_permittedChristoph Hellwig
Pass in the already calculated end value instead of recomputing it, and leave the end > start check in the callers instead of duplicating them in the arch code. Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12asm-generic, x86: add bitops instrumentation for KASANMarco Elver
This adds a new header to asm-generic to allow optionally instrumenting architecture-specific asm implementations of bitops. This change includes the required change for x86 as reference and changes the kernel API doc to point to bitops-instrumented.h instead. Rationale: the functions in x86's bitops.h are no longer the kernel API functions, but instead the arch_ prefixed functions, which are then instrumented via bitops-instrumented.h. Other architectures can similarly add support for asm implementations of bitops. The documentation text was derived from x86 and existing bitops asm-generic versions: 1) references to x86 have been removed; 2) as a result, some of the text had to be reworded for clarity and consistency. Tested using lib/test_kasan with bitops tests (pre-requisite patch). Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439 Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A collection of assorted fixes: - Fix for the pinned cr0/4 fallout which escaped all testing efforts because the kvm-intel module was never loaded when the kernel was compiled with CONFIG_PARAVIRT=n. The cr0/4 accessors are moved out of line and static key is now solely used in the core code and therefore can stay in the RO after init section. So the kvm-intel and other modules do not longer reference the (read only) static key which the module loader tried to update. - Prevent an infinite loop in arch_stack_walk_user() by breaking out of the loop once the return address is detected to be 0. - Prevent the int3_emulate_call() selftest from corrupting the stack when KASAN is enabled. KASASN clobbers more registers than covered by the emulated call implementation. Convert the int3_magic() selftest to a ASM function so the compiler cannot KASANify it. - Unbreak the build with old GCC versions and with the Gold linker by reverting the 'Move of _etext to the actual end of .text'. In both cases the build fails with 'Invalid absolute R_X86_64_32S relocation: _etext' - Initialize the context lock for init_mm, which was never an issue until the alternatives code started to use a temporary mm for patching. - Fix a build warning vs. the LOWMEM_PAGES constant where clang complains rightfully about a signed integer overflow in the shift operation by converting the operand to an ULL. - Adjust the misnamed ENDPROC() of common_spurious in the 32bit entry code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/stacktrace: Prevent infinite loop in arch_stack_walk_user() x86/asm: Move native_write_cr0/4() out of line x86/pgtable/32: Fix LOWMEM_PAGES constant x86/alternatives: Fix int3_emulate_call() selftest stack corruption x86/entry/32: Fix ENDPROC of common_spurious Revert "x86/build: Move _etext to actual end of .text" x86/ldt: Initialize the context lock for init_mm
2019-07-11Merge tag 'clone3-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 system call from Christian Brauner: "This adds the clone3 syscall which is an extensible successor to clone after we snagged the last flag with CLONE_PIDFD during the 5.2 merge window for clone(). It cleanly supports all of the flags from clone() and thus all legacy workloads. There are few user visible differences between clone3 and clone. First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse this flag. Second, the CSIGNAL flag is deprecated and will cause EINVAL to be reported. It is superseeded by a dedicated "exit_signal" argument in struct clone_args thus freeing up even more flags. And third, clone3 gives CLONE_PIDFD a dedicated return argument in struct clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr argument. The clone3 uapi is designed to be easy to handle on 32- and 64 bit: /* uapi */ struct clone_args { __aligned_u64 flags; __aligned_u64 pidfd; __aligned_u64 child_tid; __aligned_u64 parent_tid; __aligned_u64 exit_signal; __aligned_u64 stack; __aligned_u64 stack_size; __aligned_u64 tls; }; and a separate kernel struct is used that uses proper kernel typing: /* kernel internal */ struct kernel_clone_args { u64 flags; int __user *pidfd; int __user *child_tid; int __user *parent_tid; int exit_signal; unsigned long stack; unsigned long stack_size; unsigned long tls; }; The system call comes with a size argument which enables the kernel to detect what version of clone_args userspace is passing in. clone3 validates that any additional bytes a given kernel does not know about are set to zero and that the size never exceeds a page. A nice feature is that this patchset allowed us to cleanup and simplify various core kernel codepaths in kernel/fork.c by making the internal _do_fork() function take struct kernel_clone_args even for legacy clone(). This patch also unblocks the time namespace patchset which wants to introduce a new CLONE_TIMENS flag. Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and xtensa. These were the architectures that did not require special massaging. Other architectures treat fork-like system calls individually and after some back and forth neither Arnd nor I felt confident that we dared to add clone3 unconditionally to all architectures. We agreed to leave this up to individual architecture maintainers. This is why there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3 which any architecture can set once it has implemented support for clone3. The patch also adds a cond_syscall(clone3) for architectures such as nios2 or h8300 that generate their syscall table by simply including asm-generic/unistd.h. The hope is to get rid of __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon" * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: arch: handle arches who do not yet define clone3 arch: wire-up clone3() syscall fork: add clone3
2019-07-11Merge tag 'kvm-arm-for-5.3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 5.3 - Add support for chained PMU counters in guests - Improve SError handling - Handle Neoverse N1 erratum #1349291 - Allow side-channel mitigation status to be migrated - Standardise most AArch64 system register accesses to msr_s/mrs_s - Fix host MPIDR corruption on 32bit
2019-07-11KVM: x86: PMU Event FilterEric Hankland
Some events can provide a guest with information about other guests or the host (e.g. L3 cache stats); providing the capability to restrict access to a "safe" set of events would limit the potential for the PMU to be used in any side channel attacks. This change introduces a new VM ioctl that sets an event filter. If the guest attempts to program a counter for any blacklisted or non-whitelisted event, the kernel counter won't be created, so any RDPMC/RDMSR will show 0 instances of that event. Signed-off-by: Eric Hankland <ehankland@google.com> [Lots of changes. All remaining bugs are probably mine. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-10x86/asm: Move native_write_cr0/4() out of lineThomas Gleixner
The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n. The reason is that the static key which controls the pinning is marked RO after init. The kvm_intel module contains a CR4 write which requires to update the static key entry list. That obviously does not work when the key is in a RO section. With CONFIG_PARAVIRT enabled this does not happen because the CR4 write uses the paravirt indirection and the actual write function is built in. As the key is intended to be immutable after init, move native_write_cr0/4() out of line. While at it consolidate the update of the cr4 shadow variable and store the value right away when the pinning is initialized on a booting CPU. No point in reading it back 20 instructions later. This allows to confine the static key and the pinning variable to cpu/common and allows to mark them static. Fixes: 8dbec27a242c ("x86/asm: Pin sensitive CR0 bits") Fixes: 873d50d58f67 ("x86/asm: Pin sensitive CR4 bits") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Xi Ruoyao <xry111@mengyan1223.wang> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de
2019-07-10x86/pgtable/32: Fix LOWMEM_PAGES constantArnd Bergmann
clang points out that the computation of LOWMEM_PAGES causes a signed integer overflow on 32-bit x86: arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow] (PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT); ^~~~~~~~~~~~ arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES' #define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT)) ~^ ~~ arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE' #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD) Use the _ULL() macro to make it a 64-bit constant. Fixes: 1e620f9b23e5 ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
2019-07-09Merge tag 'docs-5.3' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-07-09Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x865 kdump updates from Thomas Gleixner: "Yet more kexec/kdump updates: - Properly support kexec when AMD's memory encryption (SME) is enabled - Pass reserved e820 ranges to the kexec kernel so both PCI and SME can work" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active x86/kexec: Set the C-bit in the identity map page table when SEV is active x86/kexec: Do not map kexec area as decrypted when SEV is active x86/crash: Add e820 reserved ranges to kdump kernel's e820 table x86/mm: Rework ioremap resource mapping determination x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED x86/mm: Create a workarea in the kernel for SME early encryption x86/mm: Identify the end of the kernel area to be reserved
2019-07-09Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Thomas Gleixner: "Assorted updates to kexec/kdump: - Proper kexec support for 4/5-level paging and jumping from a 5-level to a 4-level paging kernel. - Make the EFI support for kexec/kdump more robust - Enforce that the GDT is properly aligned instead of getting the alignment by chance" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kdump/64: Restrict kdump kernel reservation to <64TB x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel x86/boot: Add xloadflags bits to check for 5-level paging support x86/boot: Make the GDT 8-byte aligned x86/kexec: Add the ACPI NVS region to the ident map x86/boot: Call get_rsdp_addr() after console_init() Revert "x86/boot: Disable RSDP parsing temporarily" x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels x86/kexec: Add the EFI system tables and ACPI tables to the ident map
2019-07-09x86/speculation: Prepare entry code for Spectre v1 swapgs mitigationsJosh Poimboeuf
Spectre v1 isn't only about array bounds checks. It can affect any conditional checks. The kernel entry code interrupt, exception, and NMI handlers all have conditional swapgs checks. Those may be problematic in the context of Spectre v1, as kernel code can speculatively run with a user GS. For example: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg mov (%reg), %reg1 When coming from user space, the CPU can speculatively skip the swapgs, and then do a speculative percpu load using the user GS value. So the user can speculatively force a read of any kernel value. If a gadget exists which uses the percpu value as an address in another load/store, then the contents of the kernel value may become visible via an L1 side channel attack. A similar attack exists when coming from kernel space. The CPU can speculatively do the swapgs, causing the user GS to get used for the rest of the speculative window. The mitigation is similar to a traditional Spectre v1 mitigation, except: a) index masking isn't possible; because the index (percpu offset) isn't user-controlled; and b) an lfence is needed in both the "from user" swapgs path and the "from kernel" non-swapgs path (because of the two attacks described above). The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a CR3 write when PTI is enabled. Since CR3 writes are serializing, the lfences can be skipped in those cases. On the other hand, the kernel entry swapgs paths don't depend on PTI. To avoid unnecessary lfences for the user entry case, create two separate features for alternative patching: X86_FEATURE_FENCE_SWAPGS_USER X86_FEATURE_FENCE_SWAPGS_KERNEL Use these features in entry code to patch in lfences where needed. The features aren't enabled yet, so there's no functional change. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2019-07-09x86/ldt: Initialize the context lock for init_mmSebastian Andrzej Siewior
The mutex mm->context->lock for init_mm is not initialized for init_mm. This wasn't a problem because it remained unused. This changed however since commit 4fc19708b165c ("x86/alternatives: Initialize temporary mm for patching") Initialize the mutex for init_mm. Fixes: 4fc19708b165c ("x86/alternatives: Initialize temporary mm for patching") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20190701173354.2pe62hhliok2afea@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>