summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-02-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Various improvements for BPF kselftests: i) skip unprivileged tests when kernel.unprivileged_bpf_disabled sysctl knob is set, ii) count the number of skipped tests from unprivileged, iii) when a test case had an unexpected error then print the actual but also the unexpected one for better comparison, from Joe. 2) Add a sample program for collecting CPU state statistics with regards to how long the CPU resides in cstate and pstate levels. Based on cpu_idle and cpu_frequency trace points, from Leo. 3) Various x64 BPF JIT optimizations to further shrink the generated image size in order to make it more icache friendly. When tested on the Cilium generated programs, image size reduced by approx 4-5% in best case mainly due to how LLVM emits unsigned 32 bit constants, from Daniel. 4) Improvements and fixes on the BPF sockmap sample programs: i) fix the sockmap's Makefile to include nlattr.o for libbpf, ii) detach the sock ops programs from the cgroup before exit, from Prashant. 5) Avoid including xdp.h in filter.h by just forward declaring the struct xdp_rxq_info in filter.h, from Jesper. 6) Fix the BPF kselftests Makefile for cgroup_helpers.c by only declaring it a dependency for test_dev_cgroup.c but not every other test case where it is not needed, from Jesper. 7) Adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user selftest since the default is insufficient for creating the 'global_map' used in the corresponding BPF program, from Yonghong. 8) Likewise, for the xdp_redirect sample, Tushar ran into the same when invoking xdp_redirect and xdp_monitor at the same time, therefore in order to have the sample generically work bump the limit here, too. Fix from Tushar. 9) Avoid an unnecessary NULL check in BPF_CGROUP_RUN_PROG_INET_SOCK() since sk is always guaranteed to be non-NULL, from Yafang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26Merge branch 'x86/boot' into x86/mm, to unify branchesIngo Molnar
Both x86/mm and x86/boot contain 5-level paging related patches, unify them to have a single tree to work against. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26x86/boot/e820: Implement a range manipulation operatorJan H. Schönherr
Add a more versatile memmap= operator, which -- in addition to all the things that were possible before -- allows you to: - redeclare existing ranges -- before, you were limited to adding ranges; - drop any range -- like a mem= for any location; - use any e820 memory type -- not just some predefined ones. The syntax is: memmap=<size>%<offset>-<oldtype>+<newtype> Size and offset work as usual. The "-<oldtype>" and "+<newtype>" are optional and their existence determine the behavior: The command works on the specified range of memory limited to type <oldtype> (if specified). This memory is then configured to show up as <newtype>. If <newtype> is not specified, the memory is removed from the e820 map. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180202231020.15608-1-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26x86/mm: Consider effective protection attributes in W+X checkJan Beulich
Using just the leaf page table entry flags would cause a false warning in case _PAGE_RW is clear or _PAGE_NX is set in a higher level entry. Hand through both the current entry's flags as well as the accumulated effective value (the latter as pgprotval_t instead of pgprot_t, as it's not an actual entry's value). This in particular eliminates the false W+X warning when running under Xen, as commit: 2cc42bac1c ("x86-64/Xen: eliminate W+X mappings") had to make the necessary adjustment in L2 rather than L1 (the reason is explained there). I.e. _PAGE_RW is clear there in L1, but _PAGE_NX is set in L2. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5A8FDE8902000078001AABBB@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26x86/boot: Make the x86_init noop functions staticJuergen Gross
Make the noop functions in x86_init.c static in case they are used locally only. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180221094232.23462-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26x86/xen: Add pvh specific rsdp address retrieval functionJuergen Gross
Add pvh_get_root_pointer() for Xen PVH guests to communicate the address of the RSDP table given to the kernel via Xen start info. This makes the kernel boot again in PVH mode after on recent Xen the RSDP was moved to higher addresses. So up to that change it was pure luck that the legacy method to locate the RSDP was working when running as PVH mode. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: boris.ostrovsky@oracle.com Cc: lenb@kernel.org Cc: linux-acpi@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20180219100906.14265-4-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26x86/acpi: Add a new x86_init_acpi structure to x86_init_opsJuergen Gross
Add a new struct x86_init_acpi to x86_init_ops. For now it contains only one init function to get the RSDP table address. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: boris.ostrovsky@oracle.com Cc: lenb@kernel.org Cc: linux-acpi@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20180219100906.14265-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-26Merge tag 'v4.16-rc3' into x86/mm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-25Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes: - UAPI data type correction for hyperv - correct the cpu cores field in /proc/cpuinfo on CPU hotplug - return proper error code in the resctrl file system failure path to avoid silent subsequent failures - correct a subtle accounting issue in the new vector allocation code which went unnoticed for a while and caused suspend/resume failures" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations x86/topology: Fix function name in documentation x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system x86/apic/vector: Handle vector release on CPU unplug correctly genirq/matrix: Handle CPU offlining proper x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
2018-02-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single commit which shuts up a bogus GCC-8 warning" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()
2018-02-25Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cleanup patchlet from Thomas Gleixner: "A single commit removing a bunch of bogus double semicolons all over the tree" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide/trivial: Remove ';;$' typo noise
2018-02-23bpf, x64: save 5 bytes in prologue when ebpf insns came from cbpfDaniel Borkmann
While it's rather cumbersome to reduce prologue for cBPF->eBPF migrations wrt spill/fill for r15 which is callee saved register due to bpf_error path in bpf_jit.S that is both used by migrations as well as native eBPF, we can still trivially save 5 bytes in prologue for the former since tail calls can never be used there. cBPF->eBPF migrations also have their own custom prologue in BPF asm that xors A and X reg anyway, so it's fine we skip this here. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23bpf, x64: save few bytes when mul is in alu32Daniel Borkmann
Add a generic emit_mov_reg() helper in order to reuse it in BPF multiplication to load the src into rax, we can save a few bytes in alu32 while doing so. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23bpf, x64: save several bytes when mul dest is r0/r3 anywayDaniel Borkmann
Instead of unconditionally performing push/pop on rax/rdx in case of multiplication, we can save a few bytes in case of dest register being either BPF r0 (rax) or r3 (rdx) since the result is written in there anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23bpf, x64: save several bytes by using mov over movabsq when possibleDaniel Borkmann
While analyzing some of the more complex BPF programs from Cilium, I found that LLVM generally prefers to emit LD_IMM64 instead of MOV32 BPF instructions for loading unsigned 32-bit immediates into a register. Given we cannot change the current/stable LLVM versions that are already out there, lets optimize this case such that the JIT prefers to emit 'mov %eax, imm32' over 'movabsq %rax, imm64' whenever suitable in order to reduce the image size by 4-5 bytes per such load in the typical case, reducing image size on some of the bigger programs by up to 4%. emit_mov_imm32() and emit_mov_imm64() have been added as helpers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23bpf, x64: save one byte per shl/shr/sar when imm is 1Daniel Borkmann
When we shift by one, we can use a different encoding where imm is not explicitly needed, which saves 1 byte per such op. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-24KVM: SVM: Fix SEV LAUNCH_SECRET commandBrijesh Singh
The SEV LAUNCH_SECRET command fails with error code 'invalid param' because we missed filling the guest and header system physical address while issuing the command. Fixes: 9f5b5b950aa9 (KVM: SVM: Add support for SEV LAUNCH_SECRET command) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: SVM: install RSM interceptBrijesh Singh
RSM instruction is used by the SMM handler to return from SMM mode. Currently, rsm causes a #UD - which results in instruction fetch, decode, and emulate. By installing the RSM intercept we can avoid the instruction fetch since we know that #VMEXIT was due to rsm. The patch is required for the SEV guest, because in case of SEV guest memory is encrypted with guest-specific key and hypervisor will not able to fetch the instruction bytes from the guest memory. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE commandBrijesh Singh
Using the access_ok() to validate the input before issuing the SEV command does not buy us anything in this case. If userland is giving us a garbage pointer then copy_to_user() will catch it when we try to return the measurement. Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 0d0736f76347 (KVM: SVM: Add support for KVM_SEV_LAUNCH_MEASURE ...) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is ↵Wanpeng Li
disabled Avoid traversing all the cpus for pv tlb flush when steal time is disabled since pv tlb flush depends on the field in steal time for shared data. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24x86/kvm: Make parse_no_xxx __init for kvmDou Liyang
The early_param() is only called during kernel initialization, So Linux marks the functions of it with __init macro to save memory. But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall, So, Make them __init as well. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: x86@kernel.org Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: x86: fix backward migration with async_PFRadim Krčmář
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT bit when enabling async_PF, but this bit is reserved on old hypervisors, which results in a failure upon migration. To avoid breaking different cases, we are checking for CPUID feature bit before enabling the feature and nothing else. Fixes: 52a5c155cf79 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode") Cc: <stable@vger.kernel.org> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24kvm: fix warning for non-x86 buildsSebastian Ott
Fix the following sparse warning by moving the prototype of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h . CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static? Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: X86: Fix SMRAM accessing even if VM is shutdownWanpeng Li
Reported by syzkaller: WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel] CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4 RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel] Call Trace: vmx_handle_exit+0xbd/0xe20 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm] kvm_vcpu_ioctl+0x3e9/0x720 [kvm] do_vfs_ioctl+0xa4/0x6a0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x25/0x9c The testcase creates a first thread to issue KVM_SMI ioctl, and then creates a second thread to mmap and operate on the same vCPU. This triggers a race condition when running the testcase with multiple threads. Sometimes one thread exits with a triple fault while another thread mmaps and operates on the same vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE in kvm_handle_bad_page(), which will go on to cause an emulation failure and an exit with KVM_EXIT_INTERNAL_ERROR. Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2Chao Gao
Although L2 is in halt state, it will be in the active state after VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity State. Halting the vcpu here means the event won't be injected to L2 and this decision isn't reported to L1. Thus L0 drops an event that should be injected to L2. Cc: Liran Alon <liran.alon@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24KVM/x86: remove WARN_ON() for when vm_munmap() failsEric Biggers
On x86, special KVM memslots such as the TSS region have anonymous memory mappings created on behalf of userspace, and these mappings are removed when the VM is destroyed. It is however possible for removing these mappings via vm_munmap() to fail. This can most easily happen if the thread receives SIGKILL while it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)' in __x86_set_memory_region(). syzkaller was able to hit this, using 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure results in the mapping not being removed immediately, it is not leaked forever but rather will be freed when the process exits. It's not really possible to handle this failure properly, so almost every other caller of vm_munmap() doesn't check the return value. It's a limitation of having the kernel manage these mappings rather than userspace. So just remove the WARN_ON() so that users can't spam the kernel log with this warning. Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-02-24KVM: nVMX: preserve SECONDARY_EXEC_DESC without UMIPRadim Krčmář
L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS bit if UMIP is not being emulated. We must still set the bit when emulating UMIP as the feature can be passed to L2 where L0 will do the emulation and because L2 can change CR4 without a VM exit, we should clear the bit if UMIP is disabled. Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-02-24KVM: x86: move LAPIC initialization after VMCS creationPaolo Bonzini
The initial reset of the local APIC is performed before the VMCS has been created, but it tries to do a vmwrite: vmwrite error: reg 810 value 4a00 (err 18944) CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014 Call Trace: vmx_set_rvi [kvm_intel] vmx_hwapic_irr_update [kvm_intel] kvm_lapic_reset [kvm] kvm_create_lapic [kvm] kvm_arch_vcpu_init [kvm] kvm_vcpu_init [kvm] vmx_create_vcpu [kvm_intel] kvm_vm_ioctl [kvm] Move it later, after the VMCS has been created. Fixes: 4191db26b714 ("KVM: x86: Update APICv on APIC reset") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh. 2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang. 3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song. 4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang. 5) Fix potential deadlocks in netfilter getsockopt() code paths, from Paolo Abeni. 6) Netfilter stackpointer size checks really are needed to validate user input, from Florian Westphal. 7) Missing timer init in x_tables, from Paolo Abeni. 8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg. 9) When an ibmvnic device is brought down then back up again, it can be sent queue entries from a previous session, handle this properly instead of crashing. From Thomas Falcon. 10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman. 11) When we are dumping filters in cls_api, the output SKB is empty, and the filter we are dumping is too large for the space in the SKB, we should return -EMSGSIZE like other netlink dump operations do. Otherwise userland has no signal that is needs to increase the size of its read buffer. From Roman Kapl. 12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer. 13) Module refcount leak in netlink when a dump start fails, from Jason Donenfeld. 14) Handle sub-optimal GSO sizes better in TCP BBR congestion control, from Eric Dumazet. 15) Releasing bpf per-cpu arraymaps can take a long time, add a condtional scheduling point. From Eric Dumazet. 16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From Daniel Borkmann. 17) Fix page leak in gianfar driver, from Andy Spencer. 18) Missed clearing of estimator scratch buffer, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net_sched: gen_estimator: fix broken estimators based on percpu stats gianfar: simplify FCS handling and fix memory leak ipv6 sit: work around bogus gcc-8 -Wrestrict warning macvlan: fix use-after-free in macvlan_common_newlink() bpf, arm64: fix out of bounds access in tail call bpf, x64: implement retpoline for tail call rxrpc: Fix send in rxrpc_send_data_packet() net: aquantia: Fix error handling in aq_pci_probe() bpf: fix rcu lockdep warning for lpm_trie map_free callback bpf: add schedule points in percpu arrays management regulatory: add NUL to request alpha2 ibmvnic: Fix early release of login buffer net/smc9194: Remove bogus CONFIG_MAC reference net: ipv4: Set addr_type in hash_keys for forwarded case tcp_bbr: better deal with suboptimal GSO smsc75xx: fix smsc75xx_set_features() netlink: put module reference if dump start fails selftests/bpf/test_maps: exit child process without error in ENOMEM case selftests/bpf: update gitignore with test_libbpf_open selftests/bpf: tcpbpf_kern: use in6_* macros from glibc ..
2018-02-23x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across ↵Samuel Neves
CPU hotplug operations Without this fix, /proc/cpuinfo will display an incorrect amount of CPU cores, after bringing them offline and online again, as exemplified below: $ cat /proc/cpuinfo | grep cores cpu cores : 4 cpu cores : 8 cpu cores : 8 cpu cores : 20 cpu cores : 4 cpu cores : 3 cpu cores : 2 cpu cores : 2 This patch fixes this by always zeroing the booted_cores variable upon turning off a logical CPU. Tested-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jgross@suse.com Cc: luto@kernel.org Cc: prarit@redhat.com Cc: vkuznets@redhat.com Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.pt Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR ↵Paolo Bonzini
path as unlikely() vmx_vcpu_run() and svm_vcpu_run() are large functions, and giving branch hints to the compiler can actually make a substantial cycle difference by keeping the fast path contiguous in memory. With this optimization, the retpoline-guest/retpoline-host case is about 50 cycles faster. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180222154318.20361-3-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23KVM/x86: Remove indirect MSR op calls from SPEC_CTRLPaolo Bonzini
Having a paravirt indirect call in the IBRS restore path is not a good idea, since we are trying to protect from speculative execution of bogus indirect branch targets. It is also slower, so use native_wrmsrl() on the vmentry path too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Fixes: d28b387fb74da95d69d2615732f50cceb38e9a4d Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23x86/pci: Simplify code by using the new dmi_get_bios_year() helperAndy Shevchenko
...instead of open coding its functionality. No changes in functionality. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20180222125923.57385-2-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23Merge tag 'v4.16-rc2' into x86/platform, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23x86/intel_rdt: Fix incorrect returned value when creating rdgroup ↵Wang Hui
sub-directory in resctrl file system If no monitoring feature is detected because all monitoring features are disabled during boot time or there is no monitoring feature in hardware, creating rdtgroup sub-directory by "mkdir" command reports error: mkdir: cannot create directory ‘/sys/fs/resctrl/p1’: No such file or directory But the sub-directory actually is generated and content is correct: cpus cpus_list schemata tasks The error is because rdtgroup_mkdir_ctrl_mon() returns non zero value after the sub-directory is created and the returned value is reported as an error to user. Clear the returned value to report to user that the sub-directory is actually created successfully. Signed-off-by: Wang Hui <john.wanghui@huawei.com> Signed-off-by: Zhang Yanfei <yanfei.zhang@huawei.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vikas <vikas.shivappa@intel.com> Cc: Xiaochen Shen <xiaochen.shen@intel.com> Link: http://lkml.kernel.org/r/1519356363-133085-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23x86/apic/vector: Handle vector release on CPU unplug correctlyThomas Gleixner
When a irq vector is replaced, then the previous vector is normally released when the first interrupt happens on the new vector. If the target CPU of the previous vector is already offline when the new vector is installed, then the previous vector is silently discarded, which leads to accounting issues causing suspend failures and other problems. Adjust the logic so that the previous vector is freed in the underlying matrix allocator to ensure that the accounting stays correct. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Yuriy Vostrikov <delamonpansie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yuriy Vostrikov <delamonpansie@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180222112316.930791749@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-22bpf, x64: implement retpoline for tail callDaniel Borkmann
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]). JIT dump after patch: # bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit With CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...] * relative fall-through jumps in error case + retpoline for indirect jump Without CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...] * relative fall-through jumps in error case - plain indirect jump as before [0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-22x86: Treat R_X86_64_PLT32 as R_X86_64_PC32H.J. Lu
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. On x86-64, for 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation. As far as Linux kernel is concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since Linux kernel doesn't use PLT. R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in binutils master branch which will become binutils 2.31. [ hjl is working on having better documentation on this all, but a few more notes from him: "PLT32 relocation is used as marker for PC-relative branches. Because of EBX, it looks odd to generate PLT32 relocation on i386 when EBX doesn't have GOT. As for symbol resolution, PLT32 and PC32 relocations are almost interchangeable. But when linker sees PLT32 relocation against a protected symbol, it can resolved locally at link-time since it is used on a branch instruction. Linker can't do that for PC32 relocation" but for the kernel use, the two are basically the same, and this commit gets things building and working with the current binutils master - Linus ] Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22isa: Remove ISA_BUS_API selection for ISA_BUSWilliam Breathitt Gray
ISA_BUS_API is selected by drivers themselves when necessary. The ISA_BUS Kconfig option is now simply a mask for true ISA device drivers and relevant configuration. For now, the ISA_BUS Kconfig option is only available for X86, but may be added for other arch builds in the future if the need arises. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-02-22crypto: aesni - Update aesni-intel_glue to use scatter/gatherDave Watson
Add gcmaes_crypt_by_sg routine, that will do scatter/gather by sg. Either src or dst may contain multiple buffers, so iterate over both at the same time if they are different. If the input is the same as the output, iterate only over one. Currently both the AAD and TAG must be linear, so copy them out with scatterlist_map_and_copy. If first buffer contains the entire AAD, we can optimize and not copy. Since the AAD can be any size, if copied it must be on the heap. TAG can be on the stack since it is always < 16 bytes. Only the SSE routines are updated so far, so leave the previous gcmaes_en/decrypt routines, and branch to the sg ones if the keysize is inappropriate for avx, or we are SSE only. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce scatter/gather asm function stubsDave Watson
The asm macros are all set up now, introduce entry points. GCM_INIT and GCM_COMPLETE have arguments supplied, so that the new scatter/gather entry points don't have to take all the arguments, and only the ones they need. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add fast path for > 16 byte updateDave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes, and shift over by the appropriate amount. Usually we are reading > 16 bytes, so this should be faster than the READ_PARTIAL macro introduced in b20209c91e2 for the average case. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce partial block macroDave Watson
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move HashKey computation from stack to gcm_contextDave Watson
HashKey computation only needs to happen once per scatter/gather operation, save it between calls in gcm_context struct instead of on the stack. Since the asm no longer stores anything on the stack, we can use %rsp directly, and clean up the frame save/restore macros a bit. Hashkeys actually only need to be calculated once per key and could be moved to when set_key is called, however, the current glue code falls back to generic aes code if fpu is disabled. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move ghash_mul to GCM_COMPLETEDave Watson
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Fill in new context data structuresDave Watson
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Split AAD hash calculation to separate macroDave Watson
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce gcm_context_dataDave Watson
Introduce a gcm_context_data struct that will be used to pass context data between scatter/gather update calls. It is passed as the second argument (after crypto keys), other args are renumbered. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Merge encode and decode to GCM_ENC_DEC macroDave Watson
Make a macro for the main encode/decode routine. Only a small handful of lines differ for enc and dec. This will also become the main scatter/gather update routine. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add GCM_COMPLETE macroDave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>