summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-05-16Merge tag 'trace-v4.17-rc4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Some of the ftrace internal events use a zero for a data size of a field event. This is increasingly important for the histogram trigger work that is being extended. While auditing trace events, I found that a couple of the xen events were used as just marking that a function was called, by creating a static array of size zero. This can play havoc with the tracing features if these events are used, because a zero size of a static array is denoted as a special nul terminated dynamic array (this is what the trace_marker code uses). But since the xen events have no size, they are not nul terminated, and unexpected results may occur. As trace events were never intended on being a marker to denote that a function was hit or not, especially since function tracing and kprobes can trivially do the same, the best course of action is to simply remove these events" * tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
2018-05-14tracing/x86/xen: Remove zero data size trace events ↵Steven Rostedt (VMware)
trace_xen_mmu_flush_tlb{_all} Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering. Worse yet, the hack used was: __array(char, x, 0) Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace. Nuke the trace events! Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-13Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "A mixed bag of fixes and updates for the ghosts which are hunting us. The scheduler fixes have been pulled into that branch to avoid conflicts. - A set of fixes to address a khread_parkme() race which caused lost wakeups and loss of state. - A deadlock fix for stop_machine() solved by moving the wakeups outside of the stopper_lock held region. - A set of Spectre V1 array access restrictions. The possible problematic spots were discuvered by Dan Carpenters new checks in smatch. - Removal of an unused file which was forgotten when the rest of that functionality was removed" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Remove unused file perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] sched/core: Introduce set_special_state() kthread, sched/wait: Fix kthread_parkme() completion issue kthread, sched/wait: Fix kthread_parkme() wait-loop sched/fair: Fix the update of blocked load when newly idle stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
2018-05-11Merge tag 'for-linus-4.17-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "One fix for the kernel running as a fully virtualized guest using PV drivers on old Xen hypervisor versions" * tag 'for-linus-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Reset VCPU0 info pointer after shared_info remap
2018-05-07x86/xen: Reset VCPU0 info pointer after shared_info remapvan der Linden, Frank
This patch fixes crashes during boot for HVM guests on older (pre HVM vector callback) Xen versions. Without this, current kernels will always fail to boot on those Xen versions. Sample stack trace: BUG: unable to handle kernel paging request at ffffffffff200000 IP: __xen_evtchn_do_upcall+0x1e/0x80 PGD 1e0e067 P4D 1e0e067 PUD 1e10067 PMD 235c067 PTE 0 Oops: 0002 [#1] SMP PTI Modules linked in: CPU: 0 PID: 512 Comm: kworker/u2:0 Not tainted 4.14.33-52.13.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 3.4.3.amazon 11/11/2016 task: ffff88002531d700 task.stack: ffffc90000480000 RIP: 0010:__xen_evtchn_do_upcall+0x1e/0x80 RSP: 0000:ffff880025403ef0 EFLAGS: 00010046 RAX: ffffffff813cc760 RBX: ffffffffff200000 RCX: ffffc90000483ef0 RDX: ffff880020540a00 RSI: ffff880023c78000 RDI: 000000000000001c RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880025403f5c R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880025400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff200000 CR3: 0000000001e0a000 CR4: 00000000000006f0 Call Trace: <IRQ> do_hvm_evtchn_intr+0xa/0x10 __handle_irq_event_percpu+0x43/0x1a0 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x39/0x60 handle_fasteoi_irq+0x80/0x140 handle_irq+0xaf/0x120 do_IRQ+0x41/0xd0 common_interrupt+0x7d/0x7d </IRQ> During boot, the HYPERVISOR_shared_info page gets remapped to make it work with KASLR. This means that any pointer derived from it needs to be adjusted. The only value that this applies to is the vcpu_info pointer for VCPU 0. For PV and HVM with the callback vector feature, this gets done via the smp_ops prepare_boot_cpu callback. Older Xen versions do not support the HVM callback vector, so there is no Xen-specific smp_ops set up in that scenario. So, the vcpu_info pointer for VCPU 0 never gets set to the proper value, and the first reference of it will be bad. Fix this by resetting it immediately after the remap. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com> Reviewed-by: Alakesh Haloi <alakeshh@amazon.com> Reviewed-by: Vallish Vaidyeshwara <vallish@amazon.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pll KVM fixes from Radim Krčmář: "ARM: - Fix proxying of GICv2 CPU interface accesses - Fix crash when switching to BE - Track source vcpu git GICv2 SGIs - Fix an outdated bit of documentation x86: - Speed up injection of expired timers (for stable)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: remove APIC Timer periodic/oneshot spikes arm64: vgic-v2: Fix proxying of cpuif access KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance KVM: arm64: Fix order of vcpu_write_sys_reg() arguments KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
2018-05-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when the evaluation of physical and virtual bits which uses the same CPUID leaf was moved out of get_cpu_cap()" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Restore CPUID_8000_0008_EBX reload
2018-05-06Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource fixes from Thomas Gleixner: "The recent addition of the early TSC clocksource breaks on machines which have an unstable TSC because in case that TSC is disabled, then the clocksource selection logic falls back to the early TSC which is obviously bogus. That also unearthed a few robustness issues in the clocksource derating code which are addressed as well" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Rework stale comment clocksource: Consistent de-rate when marking unstable x86/tsc: Fix mark_tsc_unstable() clocksource: Initialize cs->wd_list clocksource: Allow clocksource_mark_unstable() on unregistered clocksources x86/tsc: Always unregister clocksource_tsc_early
2018-05-05KVM: x86: remove APIC Timer periodic/oneshot spikesAnthoine Bourgeois
Since the commit "8003c9ae204e: add APIC Timer periodic/oneshot mode VMX preemption timer support", a Windows 10 guest has some erratic timer spikes. Here the results on a 150000 times 1ms timer without any load: Before 8003c9ae204e | After 8003c9ae204e Max 1834us | 86000us Mean 1100us | 1021us Deviation 59us | 149us Here the results on a 150000 times 1ms timer with a cpu-z stress test: Before 8003c9ae204e | After 8003c9ae204e Max 32000us | 140000us Mean 1006us | 1997us Deviation 140us | 11095us The root cause of the problem is starting hrtimer with an expiry time already in the past can take more than 20 milliseconds to trigger the timer function. It can be solved by forward such past timers immediately, rather than submitting them to hrtimer_start(). In case the timer is periodic, update the target expiration and call hrtimer_start with it. v2: Check if the tsc deadline is already expired. Thank you Mika. v3: Execute the past timers immediately rather than submitting them to hrtimer_start(). v4: Rearm the periodic timer with advance_periodic_target_expiration() a simpler version of set_target_expiration(). Thank you Paolo. Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com> 8003c9ae204e ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-05x86/vdso: Remove unused fileJann Horn
commit da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") left this file behind; nothing is using it anymore. Signed-off-by: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Link: http://lkml.kernel.org/r/20180504175935.104085-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-05perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msrPeter Zijlstra
> arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-05perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driverPeter Zijlstra
> arch/x86/events/msr.c:178 msr_event_init() warn: potential spectre issue 'msr' (local cap) Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-05perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()Peter Zijlstra
> arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) > arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map' > arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map' > arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events' > arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map' > arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map' Userspace controls @attr, sanitize @attr->config before passing it on to x86_pmu::event_map(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-05perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*Peter Zijlstra
> arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap) > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap) > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap) > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap) Userspace controls @config which contains 3 (byte) fields used for a 3 dimensional array deref. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-04Merge tag 'for-linus-4.17-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen cleanup from Juergen Gross: "One cleanup to remove VLAs from the kernel" * tag 'for-linus-4.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Remove use of VLAs
2018-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Various sockmap fixes from John Fastabend (pinned map handling, blocking in recvmsg, double page put, error handling during redirect failures, etc.) 2) Fix dead code handling in x86-64 JIT, from Gianluca Borello. 3) Missing device put in RDS IB code, from Dag Moxnes. 4) Don't process fast open during repair mode in TCP< from Yuchung Cheng. 5) Move address/port comparison fixes in SCTP, from Xin Long. 6) Handle add a bond slave's master into a bridge properly, from Hangbin Liu. 7) IPv6 multipath code can operate on unitialized memory due to an assumption that the icmp header is in the linear SKB area. Fix from Eric Dumazet. 8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave Watson. 9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann. 10) RDS leaks kernel memory to userspace, from Eric Dumazet. 11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits) dccp: fix tasklet usage smc: fix sendpage() call net/smc: handle unregistered buffers net/smc: call consolidation qed: fix spelling mistake: "offloded" -> "offloaded" net/mlx5e: fix spelling mistake: "loobpack" -> "loopback" tcp: restore autocorking rds: do not leak kernel memory to user land qmi_wwan: do not steal interfaces from class drivers ipv4: fix fnhe usage by non-cached routes bpf: sockmap, fix error handling in redirect failures bpf: sockmap, zero sg_size on error when buffer is released bpf: sockmap, fix scatterlist update on error path in send with apply net_sched: fq: take care of throttled flows before reuse ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6" bpf, x64: fix memleak when not converging on calls bpf, x64: fix memleak when not converging after image net/smc: restrict non-blocking connect finish 8139too: Use disable_irq_nosync() in rtl8139_poll_controller() sctp: fix the issue that the cookie-ack with auth can't get processed ...
2018-05-02bpf, x64: fix memleak when not converging on callsDaniel Borkmann
The JIT logic in jit_subprogs() is as follows: for all subprogs we allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here), and pass it to bpf_int_jit_compile(). If a failure occurred during JIT and prog->jited is not set, then we bail out from attempting to JIT the whole program, and punt to the interpreter instead. In case JITing went successful, we fixup BPF call offsets and do another pass to bpf_int_jit_compile() (extra_pass is true at that point) to complete JITing calls. Given that requires to pass JIT context around addrs and jit_data from x86 JIT are freed in the extra_pass in bpf_int_jit_compile() when calls are involved (if not, they can be freed immediately). However, if in the original pass, the JIT image didn't converge then we leak addrs and jit_data since image itself is NULL, the prog->is_func is set and extra_pass is false in that case, meaning both will become unreachable and are never cleaned up, therefore we need to free as well on !image. Only x64 JIT is affected. Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf, x64: fix memleak when not converging after imageDaniel Borkmann
While reviewing x64 JIT code, I noticed that we leak the prior allocated JIT image in the case where proglen != oldproglen during the JIT passes. Prior to the commit e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") we would just break out of the loop, and using the image as the JITed prog since it could only shrink in size anyway. After e0ee9c12157d, we would bail out to out_addrs label where we free addrs and jit_data but not the image coming from bpf_jit_binary_alloc(). Fixes: e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02x86/cpu: Restore CPUID_8000_0008_EBX reloadThomas Gleixner
The recent commt which addresses the x86_phys_bits corruption with encrypted memory on CPUID reload after a microcode update lost the reload of CPUID_8000_0008_EBX as well. As a consequence IBRS and IBRS_FW are not longer detected Restore the behaviour by bringing the reload of CPUID_8000_0008_EBX back. This restore has a twist due to the convoluted way the cpuid analysis works: CPUID_8000_0008_EBX is used by AMD to enumerate IBRB, IBRS, STIBP. On Intel EBX is not used. But the speculation control code sets the AMD bits when running on Intel depending on the Intel specific speculation control bits. This was done to use the same bits for alternatives. The change which moved the 8000_0008 evaluation out of get_cpu_cap() broke this nasty scheme due to ordering. So that on Intel the store to CPUID_8000_0008_EBX clears the IBRB, IBRS, STIBP bits which had been set before by software. So the actual CPUID_8000_0008_EBX needs to go back to the place where it was and the phys/virt address space calculation cannot touch it. In hindsight this should have used completely synthetic bits for IBRB, IBRS, STIBP instead of reusing the AMD bits, but that's for 4.18. /me needs to find time to cleanup that steaming pile of ... Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") Reported-by: Jörg Otte <jrg.otte@gmail.com> Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: kirill.shutemov@linux.intel.com Cc: Borislav Petkov <bp@alien8.de Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805021043510.1668@nanos.tec.linutronix.de
2018-05-02x86/tsc: Fix mark_tsc_unstable()Peter Zijlstra
mark_tsc_unstable() also needs to affect tsc_early, Now that clocksource_mark_unstable() can be used on a clocksource irrespective of its registration state, use it on both tsc_early and tsc. This does however require cs->list to be initialized empty, otherwise it cannot tell the registation state before registation. Fixes: aa83c45762a2 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Diego Viola <diego.viola@gmail.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180430100344.533326547@infradead.org
2018-05-02x86/tsc: Always unregister clocksource_tsc_earlyPeter Zijlstra
Don't leave the tsc-early clocksource registered if it errors out early. This was reported by Diego, who on his Core2 era machine got TSC invalidated while it was running with tsc-early (due to C-states). This results in keeping tsc-early with very bad effects. Reported-and-Tested-by: Diego Viola <diego.viola@gmail.com> Fixes: aa83c45762a2 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180430100344.350507853@infradead.org
2018-04-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Another set of x86 related updates: - Fix the long broken x32 version of the IPC user space headers which was noticed by Arnd Bergman in course of his ongoing y2038 work. GLIBC seems to have non broken private copies of these headers so this went unnoticed. - Two microcode fixlets which address some more fallout from the recent modifications in that area: - Unconditionally save the microcode patch, which was only saved when CPU_HOTPLUG was enabled causing failures in the late loading mechanism - Make the later loader synchronization finally work under all circumstances. It was exiting early and causing timeout failures due to a missing synchronization point. - Do not use mwait_play_dead() on AMD systems to prevent excessive power consumption as the CPU cannot go into deep power states from there. - Address an annoying sparse warning due to lost type qualifiers of the vmemmap and vmalloc base address constants. - Prevent reserving crash kernel region on Xen PV as this leads to the wrong perception that crash kernels actually work there which is not the case. Xen PV has its own crash mechanism handled by the hypervisor. - Add missing TLB cpuid values to the table to make the printout on certain machines correct. - Enumerate the new CLDEMOTE instruction - Fix an incorrect SPDX identifier - Remove stale macros" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ipc: Fix x32 version of shmid64_ds and msqid64_ds x86/setup: Do not reserve a crash kernel region if booted on Xen PV x86/cpu/intel: Add missing TLB cpuid values x86/smpboot: Don't use mwait_play_dead() on AMD systems x86/mm: Make vmemmap and vmalloc base address constants unsigned long x86/vector: Remove the unused macro FPU_IRQ x86/vector: Remove the macro VECTOR_OFFSET_START x86/cpufeatures: Enumerate cldemote instruction x86/microcode: Do not exit early from __reload_late() x86/microcode/intel: Save microcode patch unconditionally x86/jailhouse: Fix incorrect SPDX identifier
2018-04-29Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti fixes from Thomas Gleixner: "A set of updates for the x86/pti related code: - Preserve r8-r11 in int $0x80. r8-r11 need to be preserved, but the int$80 entry code removed that quite some time ago. Make it correct again. - A set of fixes for the Global Bit work which went into 4.17 and caused a bunch of interesting regressions: - Triggering a BUG in the page attribute code due to a missing check for early boot stage - Warnings in the page attribute code about holes in the kernel text mapping which are caused by the freeing of the init code. Handle such holes gracefully. - Reduce the amount of kernel memory which is set global to the actual text and do not incidentally overlap with data. - Disable the global bit when RANDSTRUCT is enabled as it partially defeats the hardening. - Make the page protection setup correct for vma->page_prot population again. The adjustment of the protections fell through the crack during the Global bit rework and triggers warnings on machines which do not support certain features, e.g. NX" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64/compat: Preserve r8-r11 in int $0x80 x86/pti: Filter at vma->vm_page_prot population x86/pti: Disallow global kernel text with RANDSTRUCT x86/pti: Reduce amount of kernel text allowed to be Global x86/pti: Fix boot warning from Global-bit setting x86/pti: Fix boot problems from Global-bit setting
2018-04-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "The perf update contains the following bits: x86: - Prevent setting freeze_on_smi on PerfMon V1 CPUs to avoid #GP perf stat: - Keep the '/' event modifier separator in fallback, for example when fallbacking from 'cpu/cpu-cycles/' to user level only, where it should become 'cpu/cpu-cycles/u' and not 'cpu/cpu-cycles/:u' (Jiri Olsa) - Fix PMU events parsing rule, improving error reporting for invalid events (Jiri Olsa) - Disable write_backward and other event attributes for !group events in a group, fixing, for instance this group: '{cycles,msr/aperf/}:S' that has leader sampling (:S) and where just the 'cycles', the leader event, should have the write_backward attribute set, in this case it all fails because the PMU where 'msr/aperf/' lives doesn't accepts write_backward style sampling (Jiri Olsa) - Only fall back group read for leader (Kan Liang) - Fix core PMU alias list for x86 platform (Kan Liang) - Print out hint for mixed PMU group error (Kan Liang) - Fix duplicate PMU name for interval print (Kan Liang) Core: - Set main kernel end address properly when reading kernel and module maps (Namhyung Kim) perf mem: - Fix incorrect entries and add missing man options (Sangwon Hong) s/390: - Remove s390 specific strcmp_cpuid_cmp function (Thomas Richter) - Adapt 'perf test' case record+probe_libc_inet_pton.sh for s390 - Fix s390 undefined record__auxtrace_init() return value in 'perf record' (Thomas Richter)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1 perf stat: Fix duplicate PMU name for interval print perf evsel: Only fall back group read for leader perf stat: Print out hint for mixed PMU group error perf pmu: Fix core PMU alias list for X86 platform perf record: Fix s390 undefined record__auxtrace_init() return value perf mem: Document incorrect and missing options perf evsel: Disable write_backward for leader sampling group events perf pmu: Fix pmu events parsing rule perf stat: Keep the / modifier separator in fallback perf test: Adapt test case record+probe_libc_inet_pton.sh for s390 perf list: Remove s390 specific strcmp_cpuid_cmp function perf machine: Set main kernel end address properly
2018-04-27rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - PSCI selection API, a leftover from 4.16 (for stable) - Kick vcpu on active interrupt affinity change - Plug a VMID allocation race on oversubscribed systems - Silence debug messages - Update Christoffer's email address (linaro -> arm) x86: - Expose userspace-relevant bits of a newly added feature - Fix TLB flushing on VMX with VPID, but without EPT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use arm/arm64: KVM: Add PSCI version selection API KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration arm64: KVM: Demote SVE and LORegion warnings to debug only MAINTAINERS: Update e-mail address for Christoffer Dall KVM: arm/arm64: Close VMID generation race
2018-04-27x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPIKarimAllah Ahmed
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-27kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in useJunaid Shahid
Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-27x86/entry/64/compat: Preserve r8-r11 in int $0x80Andy Lutomirski
32-bit user code that uses int $80 doesn't care about r8-r11. There is, however, some 64-bit user code that intentionally uses int $0x80 to invoke 32-bit system calls. From what I've seen, basically all such code assumes that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I doubt that there's any code that depends on int $0x80 zeroing r8-r11, change the kernel to preserve them. I suspect that very little user code is broken by the old clobber, since r8-r11 are only rarely allocated by gcc, and they're clobbered by function calls, so they only way we'd see a problem is if the same function that invokes int $0x80 also spills something important to one of these registers. The current behavior seems to date back to the historical commit "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were preserved. I can't find any explanation of why this change was made. Update the test_syscall_vdso_32 testcase as well to verify the new behavior, and it strengthens the test to make sure that the kernel doesn't accidentally permute r8..r15. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org
2018-04-27x86/ipc: Fix x32 version of shmid64_ds and msqid64_dsArnd Bergmann
A bugfix broke the x32 shmid64_ds and msqid64_ds data structure layout (as seen from user space) a few years ago: Originally, __BITS_PER_LONG was defined as 64 on x32, so we did not have padding after the 64-bit __kernel_time_t fields, After __BITS_PER_LONG got changed to 32, applications would observe extra padding. In other parts of the uapi headers we seem to have a mix of those expecting either 32 or 64 on x32 applications, so we can't easily revert the path that broke these two structures. Instead, this patch decouples x32 from the other architectures and moves it back into arch specific headers, partially reverting the even older commit 73a2d096fdf2 ("x86: remove all now-duplicate header files"). It's not clear whether this ever made any difference, since at least glibc carries its own (correct) copy of both of these header files, so possibly no application has ever observed the definitions here. Based on a suggestion from H.J. Lu, I tried out the tool from https://github.com/hjl-tools/linux-header to find other such bugs, which pointed out the same bug in statfs(), which also has a separate (correct) copy in glibc. Fixes: f4b4aae18288 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . J . Lu" <hjl.tools@gmail.com> Cc: Jeffrey Walton <noloader@gmail.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180424212013.3967461-1-arnd@arndb.de
2018-04-27x86/setup: Do not reserve a crash kernel region if booted on Xen PVPetr Tesarik
Xen PV domains cannot shut down and start a crash kernel. Instead, the crashing kernel makes a SCHEDOP_shutdown hypercall with the reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in arch/x86/xen/enlighten_pv.c. A crash kernel reservation is merely a waste of RAM in this case. It may also confuse users of kexec_load(2) and/or kexec_file_load(2). When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH, respectively, these syscalls return success, which is technically correct, but the crash kexec image will never be actually used. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
2018-04-26Merge tag 'trace-v4.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Add workqueue forward declaration (for new work, but a nice clean up) - seftest fixes for the new histogram code - Print output fix for hwlat tracer - Fix missing system call events - due to change in x86 syscall naming - Fix kprobe address being used by perf being hashed * tag 'trace-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix missing tab for hwlat_detector print format selftests: ftrace: Add a testcase for multiple actions on trigger selftests: ftrace: Fix trigger extended error testcase kprobes: Fix random address output of blacklist file tracing: Fix kernel crash while using empty filter with perf tracing/x86: Update syscall trace events to handle new prefixed syscall func names tracing: Add missing forward declaration
2018-04-26x86/cpu/intel: Add missing TLB cpuid valuesjacek.tomaka@poczta.fm
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210 (and others) Before: [ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 After: [ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16 The entries do exist in the official Intel SMD but the type column there is incorrect (states "Cache" where it should read "TLB"), but the entries for the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'. Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
2018-04-26x86/smpboot: Don't use mwait_play_dead() on AMD systemsYazen Ghannam
Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. play_dead() expects to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, it is preferred over the other methods, so the CPU never reaches the deepest possible state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then fallback to HALT. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
2018-04-26x86/mm: Make vmemmap and vmalloc base address constants unsigned longJiri Kosina
Commits 9b46a051e4 ("x86/mm: Initialize vmemmap_base at boot-time") and a7412546d8 ("x86/mm: Adjust vmalloc base and size at boot-time") lost the type information for __VMALLOC_BASE_L4, __VMALLOC_BASE_L5, __VMEMMAP_BASE_L4 and __VMEMMAP_BASE_L5 constants. Declare them explicitly unsigned long again. Fixes: 9b46a051e4 ("x86/mm: Initialize vmemmap_base at boot-time") Fixes: a7412546d8 ("x86/mm: Adjust vmalloc base and size at boot-time") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1804121437350.28129@cbobk.fhfr.pm
2018-04-26x86/vector: Remove the unused macro FPU_IRQDou Liyang
The macro FPU_IRQ has never been used since v3.10, So remove it. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180426060832.27312-1-douly.fnst@cn.fujitsu.com
2018-04-26x86/vector: Remove the macro VECTOR_OFFSET_STARTDou Liyang
Now, Linux uses matrix allocator for vector assignment, the original assignment code which used VECTOR_OFFSET_START has been removed. So remove the stale macro as well. Fixes: commit 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-26x86/cpufeatures: Enumerate cldemote instructionFenghua Yu
cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-04-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix to clear the percpu metadata_dst that could otherwise carry stale ip_tunnel_info, from William. 2) Fix that reduces the number of passes in x64 JIT with regards to dead code sanitation to avoid risk of prog rejection, from Gianluca. 3) Several fixes of sockmap programs, besides others, fixing a double page_put() in error path, missing refcount hold for pinned sockmap, adding required -target bpf for clang in sample Makefile, from John. 4) Fix to disable preemption in __BPF_PROG_RUN_ARRAY() paths, from Roman. 5) Fix tools/bpf/ Makefile with regards to a lex/yacc build error seen on older gcc-5, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1Kan Liang
The SMM freeze feature was introduced since PerfMon V2. But the current code unconditionally enables the feature for all platforms. It can generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for the machine with PerfMon V1. To disable the feature for PerfMon V1, perf needs to - Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to intel_pmu, which is only applied to PerfMon V2 and later. - Check the PerfMon version before flipping the SMM bit when starting CPU Fixes: 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ak@linux.intel.com Cc: eranian@google.com Cc: acme@redhat.com Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
2018-04-25bpf, x64: fix JIT emission for dead codeGianluca Borello
Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead code with a series of ja-1 instructions, for safety. That made JIT compilation much more complex for some BPF programs. One instance of such programs is, for example: bool flag = false ... /* A bunch of other code */ ... if (flag) do_something() In some cases llvm is not able to remove at compile time the code for do_something(), so the generated BPF program ends up with a large amount of dead instructions. In one specific real life example, there are two series of ~500 and ~1000 dead instructions in the program. When the verifier replaces them with a series of ja-1 instructions, it causes an interesting behavior at JIT time. During the first pass, since all the instructions are estimated at 64 bytes, the ja-1 instructions end up being translated as 5 bytes JMP instructions (0xE9), since the jump offsets become increasingly large (> 127) as each instruction gets discovered to be 5 bytes instead of the estimated 64. Starting from the second pass, the first N instructions of the ja-1 sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets become <= 127 this time. In particular, N is defined as roughly 127 / (5 - 2) ~= 42. So, each further pass will make the subsequent N JMP instructions shrink from 5 to 2 bytes, making the image shrink every time. This means that in order to have the entire program converge, there need to be, in the real example above, at least ~1000 / 42 ~= 24 passes just for translating the dead code. If we add this number to the passes needed to translate the other non dead code, it brings such program to 40+ passes, and JIT doesn't complete. Ultimately the userspace loader fails because such BPF program was supposed to be part of a prog array owner being JITed. While it is certainly possible to try to refactor such programs to help the compiler remove dead code, the behavior is not really intuitive and it puts further burden on the BPF developer who is not expecting such behavior. To make things worse, such programs are working just fine in all the kernel releases prior to the ja-1 fix. A possible approach to mitigate this behavior consists into noticing that for ja-1 instructions we don't really need to rely on the estimated size of the previous and current instructions, we know that a -1 BPF jump offset can be safely translated into a 0xEB instruction with a jump offset of -2. Such fix brings the BPF program in the previous example to complete again in ~9 passes. Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing") Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-25tracing/x86: Update syscall trace events to handle new prefixed syscall func ↵Steven Rostedt (VMware)
names Arnaldo noticed that the latest kernel is missing the syscall event system directory in x86. I bisected it down to d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()"). The system call trace events are special, as there is only one trace event for all system calls (the raw_syscalls). But a macro that wraps the system calls creates meta data for them that copies the name to find the system call that maps to the system call table (the number). At boot up, it does a kallsyms lookup of the system call table to find the function that maps to the meta data of the system call. If it does not find a function, then that system call is ignored. Because the x86 system calls had "__x64_", or "__ia32_" prefixed to the "sys" for the names, they do not match the default compare algorithm. As this was a problem for power pc, the algorithm can be overwritten by the architecture. The solution is to have x86 have its own algorithm to do the compare and this brings back the system call trace events. Link: http://lkml.kernel.org/r/20180417174128.0f3457f0@gandalf.local.home Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Fixes: d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-25x86/pti: Filter at vma->vm_page_prot populationDave Hansen
commit ce9962bf7e22bb3891655c349faff618922d4a73 0day reported warnings at boot on 32-bit systems without NX support: attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0: check_pgprot at arch/x86/include/asm/pgtable.h:535 (inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549 (inlined by) do_anonymous_page at mm/memory.c:3169 (inlined by) handle_pte_fault at mm/memory.c:3961 (inlined by) __handle_mm_fault at mm/memory.c:4087 (inlined by) handle_mm_fault at mm/memory.c:4124 The problem is that due to the recent commit which removed auto-massaging of page protections, filtering page permissions at PTE creation time is not longer done, so vma->vm_page_prot is passed unfiltered to PTE creation. Filter the page protections before they are installed in vma->vm_page_prot. Fixes: fb43d6cb91 ("x86/mm: Do not auto-massage page protections") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com
2018-04-25x86/pti: Disallow global kernel text with RANDSTRUCTDave Hansen
commit 26d35ca6c3776784f8156e1d6f80cc60d9a2a915 RANDSTRUCT derives its hardening benefits from the attacker's lack of knowledge about the layout of kernel data structures. Keep the kernel image non-global in cases where RANDSTRUCT is in use to help keep the layout a secret. Fixes: 8c06c7740 (x86/pti: Leave kernel text global for !PCID) Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20180420222026.D0B4AAC9@viggo.jf.intel.com
2018-04-25x86/pti: Reduce amount of kernel text allowed to be GlobalDave Hansen
commit abb67605203687c8b7943d760638d0301787f8d9 Kees reported to me that I made too much of the kernel image global. It was far more than just text: I think this is too much set global: _end is after data, bss, and brk, and all kinds of other stuff that could hold secrets. I think this should match what mark_rodata_ro() is doing. This does exactly that. We use __end_rodata_hpage_align as our marker both because it is huge-page-aligned and it does not contain any sections we expect to hold secrets. Kees's logic was that r/o data is in the kernel image anyway and, in the case of traditional distributions, can be freely downloaded from the web, so there's no reason to hide it. Fixes: 8c06c7740 (x86/pti: Leave kernel text global for !PCID) Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222023.1C8B2B20@viggo.jf.intel.com
2018-04-25x86/pti: Fix boot warning from Global-bit settingDave Hansen
commit 231df823c4f04176f607afc4576c989895cff40e The pageattr.c code attempts to process "faults" when it goes looking for PTEs to change and finds non-present entries. It allows these faults in the linear map which is "expected to have holes", but WARN()s about them elsewhere, like when called on the kernel image. However, change_page_attr_clear() is now called on the kernel image in the process of trying to clear the Global bit. This trips the warning in __cpa_process_fault() if a non-present PTE is encountered in the kernel image. The "holes" in the kernel image result from free_init_pages()'s use of set_memory_np(). These holes are totally fine, and result from normal operation, just as they would be in the kernel linear map. Just silence the warning when holes in the kernel image are encountered. Fixes: 39114b7a7 (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: Mariusz Ceier <mceier@gmail.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222021.1C7D2B3F@viggo.jf.intel.com
2018-04-25x86/pti: Fix boot problems from Global-bit settingDave Hansen
commit 16dce603adc9de4237b7bf2ff5c5290f34373e7b Part of the global bit _setting_ patches also includes clearing the Global bit when it should not be enabled. That is done with set_memory_nonglobal(), which uses change_page_attr_clear() in pageattr.c under the covers. The TLB flushing code inside pageattr.c has has checks like BUG_ON(irqs_disabled()), looking for interrupt disabling that might cause deadlocks. But, these also trip in early boot on certain preempt configurations. Just copy the existing BUG_ON() sequence from cpa_flush_range() to the other two sites and check for early boot. Fixes: 39114b7a7 (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: Mariusz Ceier <mceier@gmail.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222019.20C4A410@viggo.jf.intel.com
2018-04-24x86/microcode: Do not exit early from __reload_late()Borislav Petkov
Vitezslav reported a case where the "Timeout during microcode update!" panic would hit. After a deeper look, it turned out that his .config had CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a no-op. When that happened, the discovered microcode patch wasn't saved into the cache and the late loading path wouldn't find any. This, then, lead to early exit from __reload_late() and thus CPUs waiting until the timeout is reached, leading to the panic. In hindsight, that function should have been written so it does not return before the post-synchronization. Oh well, I know better now... Fixes: bb8c13d61a62 ("x86/microcode: Fix CPU synchronization routine") Reported-by: Vitezslav Samel <vitezslav@samel.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vitezslav Samel <vitezslav@samel.cz> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-2-bp@alien8.de
2018-04-24x86/microcode/intel: Save microcode patch unconditionallyBorislav Petkov
save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the generic_load_microcode() path saves the microcode patches it has found into the cache of patches which is used for late loading too. Regardless of whether CPU hotplug is used or not. Make the saving unconditional so that late loading can find the proper patch. Reported-by: Vitezslav Samel <vitezslav@samel.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vitezslav Samel <vitezslav@samel.cz> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-1-bp@alien8.de
2018-04-23x86/jailhouse: Fix incorrect SPDX identifierThomas Gleixner
GPL2.0 is not a valid SPDX identiier. Replace it with GPL-2.0. Fixes: 4a362601baa6 ("x86/jailhouse: Add infrastructure for running in non-root cell") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Link: https://lkml.kernel.org/r/20180422220832.815346488@linutronix.de
2018-04-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which causes the possible CPU count to be wrong. - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC calibration to fail - Fix the page table setup for temporary text mappings in the resume code which causes resume failures - Make the page table dump code handle HIGHPTE correctly instead of oopsing - Support for topologies where NUMA nodes share an LLC to prevent a invalid topology warning and further malfunction on such systems. - Remove the now unused pci-nommu code - Remove stale function declarations" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power/64: Fix page-table setup for temporary text mapping x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y x86,sched: Allow topologies where NUMA nodes share an LLC x86/processor: Remove two unused function declarations x86/acpi: Prevent X2APIC id 0xffffffff from being accounted x86/tsc: Prevent 32bit truncation in calc_hpet_ref() x86: Remove pci-nommu.c