summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-07-22x86: Remove X86_FEATURE_MFENCE_RDTSCJosh Poimboeuf
AMD and Intel both have serializing lfence (X86_FEATURE_LFENCE_RDTSC). They've both had it for a long time, and AMD has had it enabled in Linux since Spectre v1 was announced. Back then, there was a proposal to remove the serializing mfence feature bit (X86_FEATURE_MFENCE_RDTSC), since both AMD and Intel have serializing lfence. At the time, it was (ahem) speculated that some hypervisors might not yet support its removal, so it remained for the time being. Now a year-and-a-half later, it should be safe to remove. I asked Andrew Cooper about whether it's still needed: So if you're virtualised, you've got no choice in the matter.  lfence is either dispatch-serialising or not on AMD, and you won't be able to change it. Furthermore, you can't accurately tell what state the bit is in, because the MSR might not be virtualised at all, or may not reflect the true state in hardware.  Worse still, attempting to set the bit may not be successful even if there isn't a fault for doing so. Xen sets the DE_CFG bit unconditionally, as does Linux by the looks of things (see MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT).  ISTR other hypervisor vendors saying the same, but I don't have any information to hand. If you are running under a hypervisor which has been updated, then lfence will almost certainly be dispatch-serialising in practice, and you'll almost certainly see the bit already set in DE_CFG.  If you're running under a hypervisor which hasn't been patched since Spectre, you've already lost in many more ways. I'd argue that X86_FEATURE_MFENCE_RDTSC is not worth keeping. So remove it. This will reduce some code rot, and also make it easier to hook barrier_nospec() up to a cmdline disable for performance raisins, without having to need an alternative_3() macro. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/d990aa51e40063acb9888e8c1b688e41355a9588.1562255067.git.jpoimboe@redhat.com
2019-07-22x86/realmode: Remove trampoline_statusPingfan Liu
There is no reader of trampoline_status, it's only written. It turns out that after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"), trampoline_status is not needed any more. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563266424-3472-1-git-send-email-kernelfans@gmail.com
2019-07-22x86/hyperv: Add functions to allocate/deallocate page for Hyper-VMaya Nakamura
Introduce two new functions, hv_alloc_hyperv_page() and hv_free_hyperv_page(), to allocate/deallocate memory with the size and alignment that Hyper-V expects as a page. Although currently they are not used, they are ready to be used to allocate/deallocate memory on x86 when their ARM64 counterparts are implemented, keeping symmetry between architectures with potentially different guest page sizes. Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1906272334560.32342@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/ Link: https://lkml.kernel.org/r/706b2e71eb3e587b5f8801e50f090fae2a00e35d.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22x86/hyperv: Create and use Hyper-V page definitionsMaya Nakamura
Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because the Linux guest page size and hypervisor page size concepts are different, even though they happen to be the same value on x86. Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE. Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lkml.kernel.org/r/e95111629abf65d016e983f72494cbf110ce605f.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22x86/irq/64: Update stale commentCao jin
Commit e6401c130931 ("x86/irq/64: Split the IRQ stack into its own pages") missed to update one piece of comment as it did to its peer in Xen, which will confuse people who still need to read comment. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190719081635.26528-1-caoj.fnst@cn.fujitsu.com
2019-07-22x86/sysfb_efi: Add quirks for some devices with swapped width and heightHans de Goede
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but advertise a landscape resolution and pitch, resulting in a messed up display if the kernel tries to show anything on the efifb (because of the wrong pitch). Fix this by adding a new DMI match table for devices which need to have their width and height swapped. At first it was tried to use the existing table for overriding some of the efifb parameters, but some of the affected devices have variants with different LCD resolutions which will not work with hardcoded override values. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.com
2019-07-22x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user()Eiichi Tsukata
When arch_stack_walk_user() is called from atomic contexts, access_ok() can trigger the following warning if compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y. Reproducer: // CONFIG_DEBUG_ATOMIC_SLEEP=y # cd /sys/kernel/debug/tracing # echo 1 > options/userstacktrace # echo 1 > events/irq/irq_handler_entry/enable WARNING: CPU: 0 PID: 2649 at arch/x86/kernel/stacktrace.c:103 arch_stack_walk_user+0x6e/0xf6 CPU: 0 PID: 2649 Comm: bash Not tainted 5.3.0-rc1+ #99 RIP: 0010:arch_stack_walk_user+0x6e/0xf6 Call Trace: <IRQ> stack_trace_save_user+0x10a/0x16d trace_buffer_unlock_commit_regs+0x185/0x240 trace_event_buffer_commit+0xec/0x330 trace_event_raw_event_irq_handler_entry+0x159/0x1e0 __handle_irq_event_percpu+0x22d/0x440 handle_irq_event_percpu+0x70/0x100 handle_irq_event+0x5a/0x8b handle_edge_irq+0x12f/0x3f0 handle_irq+0x34/0x40 do_IRQ+0xa6/0x1f0 common_interrupt+0xf/0xf </IRQ> Fix it by calling __range_not_ok() directly instead of access_ok() as copy_from_user_nmi() does. This is fine here because the actual copy is inside a pagefault disabled region. Reported-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190722083216.16192-2-devel@etsukata.com
2019-07-22x86/cpufeatures: Enable a new AVX512 CPU featureGayatri Kammela
Add a new AVX512 instruction group/feature for enumeration in /proc/cpuinfo: AVX512_VP2INTERSECT. CPUID.(EAX=7,ECX=0):EDX[bit 8] AVX512_VP2INTERSECT Detailed information of CPUID bits for this feature can be found in the Intel Architecture Intsruction Set Extensions Programming Reference document (refer to Table 1-2). A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=204215. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190717234632.32673-3-gayatri.kammela@intel.com
2019-07-22cpu/cpuid-deps: Add a tab to cpuid dependent featuresGayatri Kammela
Improve code readability by adding a tab between the elements of each structure in an array of cpuid-dep struct so longer feature names will fit. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190717234632.32673-2-gayatri.kammela@intel.com
2019-07-22x86/syscalls: Split the x32 syscalls into their own tableAndy Lutomirski
For unfortunate historical reasons, the x32 syscalls and the x86_64 syscalls are not all numbered the same. As an example, ioctl() is nr 16 on x86_64 but 514 on x32. This has potentially nasty consequences, since it means that there are two valid RAX values to do ioctl(2) and two invalid RAX values. The valid values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000) (i.e. ioctl(2) using the x32 ABI). The invalid values are 514 and (16 | 0x40000000). 514 will enter the "COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall() and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter the native entry point with in_compat_syscall() and in_x32_syscall() returning true. Both are bogus, and both will exercise code paths in the kernel and in any running seccomp filters that really ought to be unreachable. Splitting out the x32 syscalls into their own tables, allows both bogus invocations to return -ENOSYS. I've checked glibc, musl, and Bionic, and all of them appear to call syscalls with their correct numbers, so this change should have no effect on them. There is an added benefit going forward: new syscalls that need special handling on x32 can share the same number on x32 and x86_64. This means that the special syscall range 512-547 can be treated as a legacy wart instead of something that may need to be extended in the future. Also add a selftest to verify the new behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Disallow compat entries for all types of 64-bit syscallsAndy Lutomirski
A "compat" entry in the syscall tables means to use a different entry on 32-bit and 64-bit builds. This only makes sense for syscalls that exist in the first place in 32-bit builds, so disallow it for anything other than i386. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/4b7565954c5a06530ac01d98cb1592538fd8ae51.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask()Andy Lutomirski
I'm working on some code that detects at build time if there's a COMPAT_SYSCALL_DEFINE() that is not referenced in the x86 syscall tables. It catches three offenders: rt_sigsuspend(), rt_sigprocmask(), and sendfile64(). For rt_sigsuspend() and rt_sigprocmask(), the only potential difference between the native and compat versions is that the compat version converts the sigset_t, but, on little endian architectures, the conversion is a no-op. This is why they both currently work on x86. To make the code more consistent, and to make the upcoming patches work, rewire x86 to use the compat vesions. sendfile64() is more complicated, and will be addressed separately. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/51643ac3157b5921eae0e172a8a0b1d953e68ebb.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Make __X32_SYSCALL_BIT be unsigned longAndy Lutomirski
Currently, it's an int. This is bizarre. Fortunately, the code using it still works: ~__X32_SYSCALL_BIT is also int, so, if nr is unsigned long, then C kindly sign-extends the ~__X32_SYSCALL_BIT part, and it actually results in the desired value. This is far more subtle than it deserves to be. Syscall numbers are, for all practical purposes, unsigned long, so make __X32_SYSCALL_BIT be unsigned long. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/99b0d83ad891c67105470a1a6b63243fd63a5061.1562185330.git.luto@kernel.org
2019-07-22x86/mm: Sync also unmappings in vmalloc_sync_all()Joerg Roedel
With huge-page ioremap areas the unmappings also need to be synced between all page-tables. Otherwise it can cause data corruption when a region is unmapped and later re-used. Make the vmalloc_sync_one() function ready to sync unmappings and make sure vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD is found. Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org
2019-07-22x86/mm: Check for pfn instead of page in vmalloc_sync_one()Joerg Roedel
Do not require a struct page for the mapped memory location because it might not exist. This can happen when an ioremapped region is mapped with 2MB pages. Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.org
2019-07-22x86/paravirt: Drop {read,write}_cr8() hooksAndrew Cooper
There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by lapic_{suspend,resume}(). Saving and restoring cr8 independently of the rest of the Local APIC state isn't a clever thing to be doing. Delete the suspend/resume cr8 handling, which shrinks the size of struct saved_context, and allows for the removal of both PVOPS. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20190715151641.29210-1-andrew.cooper3@citrix.com
2019-07-22x86/apic: Initialize TPR to block interrupts 16-31Andy Lutomirski
The APIC, per spec, is fundamentally confused and thinks that interrupt vectors 16-31 are valid. This makes no sense -- the CPU reserves vectors 0-31 for exceptions (faults, traps, etc). Obviously, no device should actually produce an interrupt with vector 16-31, but robustness can be improved by setting the APIC TPR class to 1, which will prevent delivery of an interrupt with a vector below 32. Note: This is *not* intended as a security measure against attackers who control malicious hardware. Any PCI or similar hardware that can be controlled by an attacker MUST be behind a functional IOMMU that remaps interrupts. The purpose of this change is to reduce the chance that a certain class of device malfunctions crashes the kernel in hard-to-debug ways. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.luto@kernel.org
2019-07-20Merge tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: "Fix various regressions: - force unencrypted dma-coherent buffers if encryption bit can't fit into the dma coherent mask (Tom Lendacky) - avoid limiting request size if swiotlb is not used (me) - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang Duan)" * tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device dma-direct: only limit the mapping size if swiotlb could be used dma-mapping: add a dma_addressing_limited helper dma-direct: Force unencrypted DMA under SME for certain DMA masks
2019-07-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 specific fixes and updates: - The CR2 corruption fixes which store CR2 early in the entry code and hand the stored address to the fault handlers. - Revert a forgotten leftover of the dropped FSGSBASE series. - Plug a memory leak in the boot code. - Make the Hyper-V assist functionality robust by zeroing the shadow page. - Remove a useless check for dead processes with LDT - Update paravirt and VMware maintainers entries. - A few cleanup patches addressing various compiler warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Prevent clobbering of saved CR2 value x86/hyper-v: Zero out the VP ASSIST PAGE on allocation x86, boot: Remove multiple copy of static function sanitize_boot_params() x86/boot/compressed/64: Remove unused variable x86/boot/efi: Remove unused variables x86/mm, tracing: Fix CR2 corruption x86/entry/64: Update comments and sanity tests for create_gap x86/entry/64: Simplify idtentry a little x86/entry/32: Simplify common_exception x86/paravirt: Make read_cr2() CALLEE_SAVE MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE x86/process: Delete useless check for dead process with LDT x86: math-emu: Hide clang warnings for 16-bit overflow x86/e820: Use proper booleans instead of 0/1 x86/apic: Silence -Wtype-limits compiler warnings x86/mm: Free sme_early_buffer after init x86/boot: Fix memory leak in default_get_smp_config() Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - A collection of objtool fixes which address recent fallout partially exposed by newer toolchains, clang, BPF and general code changes. - Force USER_DS for user stack traces [ Note: the "objtool fixes" are not all to objtool itself, but for kernel code that triggers objtool warnings. Things like missing function size annotations, or code that confuses the unwinder etc. - Linus] * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) objtool: Support conditional retpolines objtool: Convert insn type to enum objtool: Fix seg fault on bad switch table entry objtool: Support repeated uses of the same C jump table objtool: Refactor jump table code objtool: Refactor sibling call detection logic objtool: Do frame pointer check before dead end check objtool: Change dead_end_function() to return boolean objtool: Warn on zero-length functions objtool: Refactor function alias logic objtool: Track original function across branches objtool: Add mcsafe_handle_tail() to the uaccess safe list bpf: Disable GCC -fgcse optimization for ___bpf_prog_run() x86/uaccess: Remove redundant CLACs in getuser/putuser error paths x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail() x86/uaccess: Remove ELF function annotation from copy_user_handle_tail() x86/head/64: Annotate start_cpu0() as non-callable x86/entry: Fix thunk function ELF sizes x86/kvm: Don't call kvm_spurious_fault() from .fixup x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2 ...
2019-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "Mostly bugfixes, but also: - s390 support for KVM selftests - LAPIC timer offloading to housekeeping CPUs - Extend an s390 optimization for overcommitted hosts to all architectures - Debugging cleanups and improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: x86: Add fixed counters to PMU filter KVM: nVMX: do not use dangling shadow VMCS after guest reset KVM: VMX: dump VMCS on failed entry KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup KVM: Boost vCPUs that are delivering interrupts KVM: selftests: Remove superfluous define from vmx.c KVM: SVM: Fix detection of AMD Errata 1096 KVM: LAPIC: Inject timer interrupt via posted interrupt KVM: LAPIC: Make lapic timer unpinned KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS kvm: x86: ioapic and apic debug macros cleanup kvm: x86: some tsc debug cleanup kvm: vmx: fix coccinelle warnings x86: kvm: avoid constant-conversion warning x86: kvm: avoid -Wsometimes-uninitized warning KVM: x86: expose AVX512_BF16 feature to guest KVM: selftests: enable pgste option for the linker on s390 KVM: selftests: Move kvm_create_max_vcpus test to generic code ...
2019-07-20Merge tag 'kbuild-v5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - match the directory structure of the linux-libc-dev package to that of Debian-based distributions - fix incorrect include/config/auto.conf generation when Kconfig creates it along with the .config file - remove misleading $(AS) from documents - clean up precious tag files by distclean instead of mrproper - add a new coccinelle patch for devm_platform_ioremap_resource migration - refactor module-related scripts to read modules.order instead of $(MODVERDIR)/*.mod files to get the list of created modules - remove MODVERDIR - update list of header compile-test - add -fcf-protection=none flag to avoid conflict with the retpoline flags when CONFIG_RETPOLINE=y - misc cleanups * tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) kbuild: add -fcf-protection=none when using retpoline flags kbuild: update compile-test header list for v5.3-rc1 kbuild: split out *.mod out of {single,multi}-used-m rules kbuild: remove 'prepare1' target kbuild: remove the first line of *.mod files kbuild: create *.mod with full directory path and remove MODVERDIR kbuild: export_report: read modules.order instead of .tmp_versions/*.mod kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver kbuild: remove duplication from modules.order in sub-directories kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin} kbuild: do not create empty modules.order in the prepare stage coccinelle: api: add devm_platform_ioremap_resource script kbuild: compile-test headers listed in header-test-m as well kbuild: remove unused hostcc-option kbuild: remove tag files by distclean instead of mrproper kbuild: add --hash-style= and --build-id unconditionally kbuild: get rid of misleading $(AS) from documents ...
2019-07-20x86/entry/64: Prevent clobbering of saved CR2 valueThomas Gleixner
The recent fix for CR2 corruption introduced a new way to reliably corrupt the saved CR2 value. CR2 is saved early in the entry code in RDX, which is the third argument to the fault handling functions. But it missed that between saving and invoking the fault handler enter_from_user_mode() can be called. RDX is a caller saved register so the invoked function can freely clobber it with the obvious consequences. The TRACE_IRQS_OFF call is safe as it calls through the thunk which preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into C-code outside of the thunk. Store CR2 in R12 instead which is a callee saved register and move R12 to RDX just before calling the fault handler. Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
2019-07-20KVM: x86: Add fixed counters to PMU filterEric Hankland
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: Eric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: nVMX: do not use dangling shadow VMCS after guest resetPaolo Bonzini
If a KVM guest is reset while running a nested guest, free_nested will disable the shadow VMCS execution control in the vmcs01. However, on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync the VMCS12 to the shadow VMCS which has since been freed. This causes a vmptrld of a NULL pointer on my machime, but Jan reports the host to hang altogether. Let's see how much this trivial patch fixes. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: VMX: dump VMCS on failed entryPaolo Bonzini
This is useful for debugging, and is ratelimited nowadays. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: x86/vPMU: refine kvm_pmu err msg when event creation failedLike Xu
If a perf_event creation fails due to any reason of the host perf subsystem, it has no chance to log the corresponding event for guest which may cause abnormal sampling data in guest result. In debug mode, this message helps to understand the state of vPMC and we may not limit the number of occurrences but not in a spamming style. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: SVM: Fix detection of AMD Errata 1096Liran Alon
When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is possible that CPU microcode implementing DecodeAssist will fail to read bytes of instruction which caused #NPF. This is AMD errata 1096 and it happens because CPU microcode reading instruction bytes incorrectly attempts to read code as implicit supervisor-mode data accesses (that is, just like it would read e.g. a TSS), which are susceptible to SMAP faults. The microcode reads CS:RIP and if it is a user-mode address according to the page tables, the processor gives up and returns no instruction bytes. In this case, GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly return 0 instead of the correct guest instruction bytes. Current KVM code attemps to detect and workaround this errata, but it has multiple issues: 1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1, which is required for encountering a SMAP fault. 2) It assumes SMAP faults can only occur when guest CPL==3. However, in case guest CR4.SMEP=0, the guest can execute an instruction which reside in a user-accessible page with CPL<3 priviledge. If this instruction raise a #NPF on it's data access, then CPU DecodeAssist microcode will still encounter a SMAP violation. Even though no sane OS will do so (as it's an obvious priviledge escalation vulnerability), we still need to handle this semanticly correct in KVM side. Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy triggerable condition and guests usually enable SMAP together with SMEP. If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest instead of #NPF. We keep this condition to avoid false positives in the detection of the errata. In addition, to avoid future confusion and improve code readbility, include details of the errata in code and not just in commit message. Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: Singh Brijesh <brijesh.singh@amd.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: LAPIC: Inject timer interrupt via posted interruptWanpeng Li
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19Merge tag 'for-linus-5.3a-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Fixes and features: - A series to introduce a common command line parameter for disabling paravirtual extensions when running as a guest in virtualized environment - A fix for int3 handling in Xen pv guests - Removal of the Xen-specific tmem driver as support of tmem in Xen has been dropped (and it was experimental only) - A security fix for running as Xen dom0 (XSA-300) - A fix for IRQ handling when offlining cpus in Xen guests - Some small cleanups" * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: let alloc_xenballooned_pages() fail if not enough memory free xen/pv: Fix a boot up hang revealed by int3 self test x86/xen: Add "nopv" support for HVM guest x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete x86: Add "nopv" parameter to disable PV extensions x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init xen: remove tmem driver Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized" xen/events: fix binding user event channels to cpus
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "The rest of MM and a kernel-wide procfs cleanup. Summary of the more significant patches: - Patch series "mm/memory_hotplug: Factor out memory block devicehandling", v3. David Hildenbrand. Some spring-cleaning of the memory hotplug code, notably in drivers/base/memory.c - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang Shi. Fix /proc/pid/smaps output for THP pages used in shmem. - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit. Bugfix and speedup for kernel/resource.c - Patch series "mm: Further memory block device cleanups", David Hildenbrand. More spring-cleaning of the memory hotplug code. - Patch series "mm: Sub-section memory hotplug support". Dan Williams. Generalise the memory hotplug code so that pmem can use it more completely. Then remove the hacks from the libnvdimm code which were there to work around the memory-hotplug code's constraints. - "proc/sysctl: add shared variables for range check", Matteo Croce. We have about 250 instances of int zero; ... .extra1 = &zero, in the tree. This is a tree-wide sweep to make all those private "zero"s and "one"s use global variables. Alas, it isn't practical to make those two global integers const" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) proc/sysctl: add shared variables for range check mm: migrate: remove unused mode argument mm/sparsemem: cleanup 'section number' data types libnvdimm/pfn: stop padding pmem namespaces to section alignment libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields mm/devm_memremap_pages: enable sub-section remap mm: document ZONE_DEVICE memory-model implications mm/sparsemem: support sub-section hotplug mm/sparsemem: prepare for sub-section ranges mm: kill is_dev_zone() helper mm/hotplug: kill is_dev_zone() usage in __remove_pages() mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal mm/sparsemem: add helpers track active portions of a section at boot mm/sparsemem: introduce a SECTION_IS_EARLY flag mm/sparsemem: introduce struct mem_section_usage drivers/base/memory.c: get rid of find_memory_block_hinted() mm/memory_hotplug: move and simplify walk_memory_blocks() mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns mm: make register_mem_sect_under_node() static ...
2019-07-19x86/hyper-v: Zero out the VP ASSIST PAGE on allocationDexuan Cui
The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section 5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt: "The hypervisor defines several special pages that "overlay" the guest's Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are not included in the normal GPA map maintained internally by the hypervisor. Conceptually, they exist in a separate map that overlays the GPA map. If a page within the GPA space is overlaid, any SPA page mapped to the GPA page is effectively "obscured" and generally unreachable by the virtual processor through processor memory accesses. If an overlay page is disabled, the underlying GPA page is "uncovered", and an existing mapping becomes accessible to the guest." SPA = System Physical Address = the final real physical address. When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST PAGE and enables the EOI optimization for this CPU by writing the MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the special SPA page, and this CPU *always* uses hvp->apic_assist (which is shared with the hypervisor) to decide if it needs to write the EOI MSR. When a CPU is offlined then on the outgoing CPU: 1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from now on hvp->apic_assist belongs to the original "normal" SPA page; 2. the remaining work of stopping this CPU is done 3. this CPU is completely stopped. Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write the EOI MSR for every interrupt received, otherwise the hypervisor may not deliver further interrupts, which may be needed to completely stop the CPU. So, after the EOI optimization is disabled in hv_cpu_die(), it's required that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the current allocation mode because it lacks __GFP_ZERO. As a consequence the bit might be set and interrupt handling would not write the EOI MSR causing interrupt delivery to become stuck. Add the missing __GFP_ZERO to the allocation. Note 1: after the "normal" SPA page is allocted and zeroed out, neither the hypervisor nor the guest writes into the page, so the page remains with zeros. Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI optimization. When the optimization is enabled, the guest can still write the EOI MSR register irrespective of the "No EOI required" value, but that's slower than the optimized assist based variant. Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()Dan Williams
Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18x86, boot: Remove multiple copy of static function sanitize_boot_params()Zhenzhong Duan
Kernel build warns: 'sanitize_boot_params' defined but not used [-Wunused-function] at below files: arch/x86/boot/compressed/cmdline.c arch/x86/boot/compressed/error.c arch/x86/boot/compressed/early_serial_console.c arch/x86/boot/compressed/acpi.c That's becausethey each include misc.h which includes a definition of sanitize_boot_params() via bootparam_utils.h. Remove the inclusion from misc.h and have the c file including bootparam_utils.h directly. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18x86/boot/compressed/64: Remove unused variableZhenzhong Duan
Fix gcc warning: arch/x86/boot/compressed/pgtable_64.c: In function 'find_trampoline_placement': arch/x86/boot/compressed/pgtable_64.c:43:16: warning: unused variable 'trampoline_start' [-Wunused-variable] unsigned long trampoline_start; ^ Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/1563283040-31101-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18x86/boot/efi: Remove unused variablesZhenzhong Duan
Fix gcc warnings: arch/x86/boot/compressed/eboot.c: In function 'make_boot_params': arch/x86/boot/compressed/eboot.c:394:6: warning: unused variable 'i' [-Wunused-variable] int i; ^ arch/x86/boot/compressed/eboot.c:393:6: warning: unused variable 's1' [-Wunused-variable] u8 *s1; ^ arch/x86/boot/compressed/eboot.c:392:7: warning: unused variable 's2' [-Wunused-variable] u16 *s2; ^ arch/x86/boot/compressed/eboot.c:387:8: warning: unused variable 'options' [-Wunused-variable] void *options, *handle; ^ arch/x86/boot/compressed/eboot.c: In function 'add_e820ext': arch/x86/boot/compressed/eboot.c:498:16: warning: unused variable 'size' [-Wunused-variable] unsigned long size; ^ arch/x86/boot/compressed/eboot.c:497:15: warning: unused variable 'status' [-Wunused-variable] efi_status_t status; ^ arch/x86/boot/compressed/eboot.c: In function 'exit_boot_func': arch/x86/boot/compressed/eboot.c:681:15: warning: unused variable 'status' [-Wunused-variable] efi_status_t status; ^ arch/x86/boot/compressed/eboot.c:680:8: warning: unused variable 'nr_desc' [-Wunused-variable] __u32 nr_desc; ^ arch/x86/boot/compressed/eboot.c: In function 'efi_main': arch/x86/boot/compressed/eboot.c:750:22: warning: unused variable 'image' [-Wunused-variable] efi_loaded_image_t *image; ^ Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563282957-26898-1-git-send-email-zhenzhong.duan@oracle.com
2019-07-18Merge tag 'riscv/for-v5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Hugepage support - "Image" header support for RISC-V kernel binaries, compatible with the current ARM64 "Image" header - Initial page table setup now split into two stages - CONFIG_SOC support (starting with SiFive SoCs) - Avoid reserving memory between RAM start and the kernel in setup_bootmem() - Enable high-res timers and dynamic tick in the RV64 defconfig - Remove long-deprecated gate area stubs - MAINTAINERS updates to switch to the newly-created shared RISC-V git tree, and to fix a get_maintainers.pl issue for patches involving SiFive E-mail addresses Also, one integration fix to resolve a build problem introduced during in the v5.3-rc1 merge window: - Fix build break after macro-to-function conversion in asm-generic/cacheflush.h * tag 'riscv/for-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: fix build break after macro-to-function conversion in generic cacheflush.h RISC-V: Add an Image header that boot loader can parse. RISC-V: Setup initial page tables in two stages riscv: remove free_initrd_mem riscv: ccache: Remove unused variable riscv: Introduce huge page support for 32/64bit kernel x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig RISC-V: Fix memory reservation in setup_bootmem() riscv: defconfig: enable SOC_SIFIVE riscv: select SiFive platform drivers with SOC_SIFIVE arch: riscv: add config option for building SiFive's SoC resource riscv: Remove gate area stubs MAINTAINERS: change the arch/riscv git tree to the new shared tree MAINTAINERS: don't automatically patches involving SiFive to the linux-riscv list RISC-V: defconfig: Enable NO_HZ_IDLE and HIGH_RES_TIMERS
2019-07-18x86/uaccess: Remove redundant CLACs in getuser/putuser error pathsJosh Poimboeuf
The same getuser/putuser error paths are used regardless of whether AC is set. In non-exception failure cases, this results in an unnecessary CLAC. Fixes the following warnings: arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()Josh Poimboeuf
After adding mcsafe_handle_tail() to the objtool uaccess safe list, objtool reports: arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled With SMAP, this function is called with AC=1, so it needs to be careful about which functions it calls. Disable the ftrace entry hook, which can potentially pull in a lot of extra code. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()Josh Poimboeuf
After an objtool improvement, it's complaining about the CLAC in copy_user_handle_tail(): arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x6: (alt) arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x2: (alt) arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x0: <=== (func) copy_user_handle_tail() is incorrectly marked as a callable function, so objtool is rightfully concerned about the CLAC with no corresponding STAC. Remove the ELF function annotation. The copy_user_handle_tail() code path is already verified by objtool because it's jumped to by other callable asm code (which does the corresponding STAC). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/head/64: Annotate start_cpu0() as non-callableJosh Poimboeuf
After an objtool improvement, it complains about the fact that start_cpu0() jumps to code which has an LRET instruction. arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function Technically, start_cpu0() is callable, but it acts nothing like a callable function. Prevent objtool from treating it like one by removing its ELF function annotation. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/entry: Fix thunk function ELF sizesJosh Poimboeuf
Fix the following warnings: arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Don't call kvm_spurious_fault() from .fixupJosh Poimboeuf
After making a change to improve objtool's sibling call detection, it started showing the following warning: arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame The problem is the ____kvm_handle_fault_on_reboot() macro. It does a fake call by pushing a fake RIP and doing a jump. That tricks the unwinder into printing the function which triggered the exception, rather than the .fixup code. Instead of the hack to make it look like the original function made the call, just change the macro so that the original function actually does make the call. This allows removal of the hack, and also makes objtool happy. I triggered a vmx instruction exception and verified that the stack trace is still sane: kernel BUG at arch/x86/kvm/x86.c:358! invalid opcode: 0000 [#1] SMP PTI CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16 Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017 RIP: 0010:kvm_spurious_fault+0x5/0x10 Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246 RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000 RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000 R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0 R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000 FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: loaded_vmcs_init+0x4f/0xe0 alloc_loaded_vmcs+0x38/0xd0 vmx_create_vcpu+0xf7/0x600 kvm_vm_ioctl+0x5e9/0x980 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? free_one_page+0x13f/0x4e0 do_vfs_ioctl+0xa4/0x630 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x1c0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa349b1ee5b Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2Josh Poimboeuf
Objtool reports the following: arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_vmenter()+0x14: call without frame pointer save/setup But frame pointers are necessarily broken anyway, because __vmx_vcpu_run() clobbers RBP with the guest's value before calling vmx_vmenter(). So calling without a frame pointer doesn't make things any worse. Make objtool happy by changing the call to a UD2. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/9fc2216c9dc972f95bb65ce2966a682c6bda1cb0.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/kvm: Fix fastop function ELF metadataJosh Poimboeuf
Some of the fastop functions, e.g. em_setcc(), are actually just used as global labels which point to blocks of functions. The global labels are incorrectly annotated as functions. Also the functions themselves don't have size annotations. Fixes a bunch of warnings like the following: arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com
2019-07-18x86/paravirt: Fix callee-saved function ELF sizesJosh Poimboeuf
The __raw_callee_save_*() functions have an ELF symbol size of zero, which confuses objtool and other tools. Fixes a bunch of warnings like the following: arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
2019-07-18Merge tag 'trace-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The main changes in this release include: - Add user space specific memory reading for kprobes - Allow kprobes to be executed earlier in boot The rest are mostly just various clean ups and small fixes" * tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Make trace_get_fields() global tracing: Let filter_assign_type() detect FILTER_PTR_STRING tracing: Pass type into tracing_generic_entry_update() ftrace/selftest: Test if set_event/ftrace_pid exists before writing ftrace/selftests: Return the skip code when tracing directory not configured in kernel tracing/kprobe: Check registered state using kprobe tracing/probe: Add trace_event_call accesses APIs tracing/probe: Add probe event name and group name accesses APIs tracing/probe: Add trace flag access APIs for trace_probe tracing/probe: Add trace_event_file access APIs for trace_probe tracing/probe: Add trace_event_call register API for trace_probe tracing/probe: Add trace_probe init and free functions tracing/uprobe: Set print format when parsing command tracing/kprobe: Set print format right after parsed command kprobes: Fix to init kprobes in subsys_initcall tracepoint: Use struct_size() in kmalloc() ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS ftrace: Enable trampoline when rec count returns back to one tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline tracing: Make a separate config for trace event self tests ...