summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-07-08Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer and timekeeping departement delivers: Core: - The consolidation of the VDSO code into a generic library including the conversion of x86 and ARM64. Conversion of ARM and MIPS are en route through the relevant maintainer trees and should end up in 5.4. This gets rid of the unnecessary different copies of the same code and brings all architectures on the same level of VDSO functionality. - Make the NTP user space interface more robust by restricting the TAI offset to prevent undefined behaviour. Includes a selftest. - Validate user input in the compat settimeofday() syscall to catch invalid values which would be turned into valid values by a multiplication overflow - Consolidate the time accessors - Small fixes, improvements and cleanups all over the place Drivers: - Support for the NXP system counter, TI davinci timer - Move the Microsoft HyperV clocksource/events code into the drivers/clocksource directory so it can be shared between x86 and ARM64. - Overhaul of the Tegra driver - Delay timer support for IXP4xx - Small fixes, improvements and cleanups as usual" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) time: Validate user input in compat_settimeofday() timer: Document TIMER_PINNED clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic clocksource/drivers: Make Hyper-V clocksource ISA agnostic MAINTAINERS: Fix Andy's surname and the directory entries of VDSO hrtimer: Use a bullet for the returns bullet list arm64: vdso: Fix compilation with clang older than 8 arm64: compat: Fix __arch_get_hw_counter() implementation arm64: Fix __arch_get_hw_counter() implementation lib/vdso: Make delta calculation work correctly MAINTAINERS: Add entry for the generic VDSO library arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system arm64: vdso: Remove unnecessary asm-offsets.c definitions vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h clocksource/drivers/davinci: Add support for clocksource clocksource/drivers/davinci: Add support for clockevents clocksource/drivers/tegra: Set up maximum-ticks limit properly clocksource/drivers/tegra: Cycles can't be 0 clocksource/drivers/tegra: Restore base address before cleanup clocksource/drivers/tegra: Add verbose definition for 1MHz constant ...
2019-07-08Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug updates from Thomas Gleixner: "A small set of updates for SMP and CPU hotplug: - Abort disabling secondary CPUs in the freezer when a wakeup is pending instead of evaluating it only after all CPUs have been offlined. - Remove the shared annotation for the strict per CPU cfd_data in the smp function call core code. - Remove the return values of smp_call_function() and on_each_cpu() as they are unconditionally 0. Fixup the few callers which actually bothered to check the return value" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove smp_call_function() and on_each_cpu() return values smp: Do not mark call_function_data as shared cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-07-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86 bugfix patches and one compilation fix for ARM" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm64/sve: Fix vq_present() macro to yield a bool KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCS KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMM KVM: x86: degrade WARN to pr_warn_ratelimited
2019-07-04Merge tag 'trace-v5.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes three fixes: - Fix a deadlock from a previous fix to keep module loading and function tracing text modifications from stepping on each other (this has a few patches to help document the issue in comments) - Fix a crash when the snapshot buffer gets out of sync with the main ring buffer - Fix a memory leak when reading the memory logs" * tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare() tracing/snapshot: Resize spare buffer if size changed tracing: Fix memory leak in tracing_err_log_open() ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare() ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
2019-07-03clocksource/drivers: Continue making Hyper-V clocksource ISA agnosticMichael Kelley
Continue consolidating Hyper-V clock and timer code into an ISA independent Hyper-V clocksource driver. Move the existing clocksource code under drivers/hv and arch/x86 to the new clocksource driver while separating out the ISA dependencies. Update Hyper-V initialization to call initialization and cleanup routines since the Hyper-V synthetic clock is not independently enumerated in ACPI. Update Hyper-V clocksource users in KVM and VDSO to get definitions from the new include file. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com
2019-07-03clocksource/drivers: Make Hyper-V clocksource ISA agnosticMichael Kelley
Hyper-V clock/timer code and data structures are currently mixed in with other code in the ISA independent drivers/hv directory as well as the ISA dependent Hyper-V code under arch/x86. Consolidate this code and data structures into a Hyper-V clocksource driver to better follow the Linux model. In doing so, separate out the ISA dependent portions so the new clocksource driver works for x86 and for the in-process Hyper-V on ARM64 code. To start, move the existing clockevents code to create the new clocksource driver. Update the VMbus driver to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
2019-07-03Merge branch 'timers/vdso' into timers/coreThomas Gleixner
so the hyper-v clocksource update can be applied.
2019-07-02ftrace/x86: Anotate text_mutex split between ↵Jiri Kosina
ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_prepare() is acquiring text_mutex, while the corresponding release is happening in ftrace_arch_code_modify_post_process(). This has already been documented in the code, but let's also make the fact that this is intentional clear to the semantic analysis tools such as sparse. Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1906292321170.27227@cbobk.fhfr.pm Fixes: 39611265edc1a ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()") Fixes: d5b844a2cf507 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-02KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPICWanpeng Li
Thomas reported that: | Background: | | In preparation of supporting IPI shorthands I changed the CPU offline | code to software disable the local APIC instead of just masking it. | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV | register. | | Failure: | | When the CPU comes back online the startup code triggers occasionally | the warning in apic_pending_intr_clear(). That complains that the IRRs | are not empty. | | The offending vector is the local APIC timer vector who's IRR bit is set | and stays set. | | It took me quite some time to reproduce the issue locally, but now I can | see what happens. | | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 | (and hardware support) it behaves correctly. | | Here is the series of events: | | Guest CPU | | goes down | | native_cpu_disable() | | apic_soft_disable(); | | play_dead() | | .... | | startup() | | if (apic_enabled()) | apic_pending_intr_clear() <- Not taken | | enable APIC | | apic_pending_intr_clear() <- Triggers warning because IRR is stale | | When this happens then the deadline timer or the regular APIC timer - | happens with both, has fired shortly before the APIC is disabled, but the | interrupt was not serviced because the guest CPU was in an interrupt | disabled region at that point. | | The state of the timer vector ISR/IRR bits: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 1 | | Now one would assume that the IRR is cleared after the INIT reset, but this | happens only on CPU0. | | Why? | | Because our CPU0 hotplug is just for testing to make sure nothing breaks | and goes through an NMI wakeup vehicle because INIT would send it through | the boots-trap code which is not really working if that CPU was not | physically unplugged. | | Now looking at a real world APIC the situation in that case is: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 0 | | Why? | | Once the dying CPU reenables interrupts the pending interrupt gets | delivered as a spurious interupt and then the state is clear. | | While that CPU0 hotplug test case is surely an esoteric issue, the APIC | emulation is still wrong, Even if the play_dead() code would not enable | interrupts then the pending IRR bit would turn into an ISR .. interrupt | when the APIC is reenabled on startup. From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled * Pending interrupts in the IRR and ISR registers are held and require masking or handling by the CPU. In Thomas's testing, hardware cpu will not respect soft disable LAPIC when IRR has already been set or APICv posted-interrupt is in flight, so we can skip soft disable APIC checking when clearing IRR and set ISR, continue to respect soft disable APIC when attempting to set IRR. Reported-by: Rong Chen <rong.a.chen@intel.com> Reported-by: Feng Tang <feng.tang@intel.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-02KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCSLiran Alon
Currently KVM_STATE_NESTED_EVMCS is used to signal that eVMCS capability is enabled on vCPU. As indicated by vmx->nested.enlightened_vmcs_enabled. This is quite bizarre as userspace VMM should make sure to expose same vCPU with same CPUID values in both source and destination. In case vCPU is exposed with eVMCS support on CPUID, it is also expected to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability. Therefore, KVM_STATE_NESTED_EVMCS is redundant. KVM_STATE_NESTED_EVMCS is currently used on restore path (vmx_set_nested_state()) only to enable eVMCS capability in KVM and to signal need_vmcs12_sync such that on next VMEntry to guest nested_sync_from_vmcs12() will be called to sync vmcs12 content into eVMCS in guest memory. However, because restore nested-state is rare enough, we could have just modified vmx_set_nested_state() to always signal need_vmcs12_sync. From all the above, it seems that we could have just removed the usage of KVM_STATE_NESTED_EVMCS. However, in order to preserve backwards migration compatibility, we cannot do that. (vmx_get_nested_state() needs to signal flag when migrating from new kernel to old kernel). Returning KVM_STATE_NESTED_EVMCS when just vCPU have eVMCS enabled have a bad side-effect of userspace VMM having to send nested-state from source to destination as part of migration stream. Even if guest have never used eVMCS as it doesn't even run a nested hypervisor workload. This requires destination userspace VMM and KVM to support setting nested-state. Which make it more difficult to migrate from new host to older host. To avoid this, change KVM_STATE_NESTED_EVMCS to signal eVMCS is not only enabled but also active. i.e. Guest have made some eVMCS active via an enlightened VMEntry. i.e. vmcs12 is copied from eVMCS and therefore should be restored into eVMCS resident in memory (by copy_vmcs12_to_enlightened()). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-02KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMMLiran Alon
As comment in code specifies, SMM temporarily disables VMX so we cannot be in guest mode, nor can VMLAUNCH/VMRESUME be pending. However, code currently assumes that these are the only flags that can be set on kvm_state->flags. This is not true as KVM_STATE_NESTED_EVMCS can also be set on this field to signal that eVMCS should be enabled. Therefore, fix code to check for guest-mode and pending VMLAUNCH/VMRESUME explicitly. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-02KVM: x86: degrade WARN to pr_warn_ratelimitedPaolo Bonzini
This warning can be triggered easily by userspace, so it should certainly not cause a panic if panic_on_warn is set. Reported-by: syzbot+c03f30b4f4c46bdf8575@syzkaller.appspotmail.com Suggested-by: Alexander Potapenko <glider@google.com> Acked-by: Alexander Potapenko <glider@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: - might_sleep() atomicity fix in the microcode loader - resctrl boundary condition fix - APIC arithmethics bug fix for frequencies >= 4.2 GHz - three 5-level paging crash fixes - two speculation fixes - a perf/stacktrace fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fall back to using frame pointers for generated code perf/x86: Always store regs->ip in perf_callchain_kernel() x86/speculation: Allow guests to use SSBD even if host does not x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init() x86/boot/64: Add missing fixup_pointer() for next_early_pgt access x86/boot/64: Fix crash if kernel image crosses page table boundary x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz x86/resctrl: Prevent possible overrun during bitmap operations x86/microcode: Fix the microcode load on CPU hotplug for real
2019-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes, most of them related to bugs perf fuzzing found in the x86 code" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/regs: Use PERF_REG_EXTENDED_MASK perf/x86: Remove pmu->pebs_no_xmm_regs perf/x86: Clean up PEBS_XMM_REGS perf/x86/regs: Check reserved bits perf/x86: Disable extended registers for non-supported PMUs perf/ioctl: Add check for the sample_period value perf/core: Fix perf_sample_regs_user() mm check
2019-06-29Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Four fixes: - fix a kexec crash on arm64 - fix a reboot crash on some Android platforms - future-proof the code for upcoming ACPI 6.2 changes - fix a build warning on x86" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efibc: Replace variable set function in notifier call x86/efi: fix a -Wtype-limits compilation warning efi/bgrt: Drop BGRT status field reserved bits check efi/memreserve: deal with memreserve entries in unmapped memory
2019-06-28ftrace/x86: Add a comment to why we take text_mutex in ↵Steven Rostedt (VMware)
ftrace_arch_code_modify_prepare() Taking the text_mutex in ftrace_arch_code_modify_prepare() is to fix a race against module loading and live kernel patching that might try to change the text permissions while ftrace has it as read/write. This really needs to be documented in the code. Add a comment that does such. Link: http://lkml.kernel.org/r/20190627211819.5a591f52@gandalf.local.home Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28ftrace/x86: Remove possible deadlock between register_kprobe() and ↵Petr Mladek
ftrace_run_update_code() The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") causes a possible deadlock between register_kprobe() and ftrace_run_update_code() when ftrace is using stop_machine(). The existing dependency chain (in reverse order) is: -> #1 (text_mutex){+.+.}: validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 __mutex_lock+0x88/0x908 mutex_lock_nested+0x32/0x40 register_kprobe+0x254/0x658 init_kprobes+0x11a/0x168 do_one_initcall+0x70/0x318 kernel_init_freeable+0x456/0x508 kernel_init+0x22/0x150 ret_from_fork+0x30/0x34 kernel_thread_starter+0x0/0xc -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x90c/0xde0 validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 cpus_read_lock+0x62/0xd0 stop_machine+0x2e/0x60 arch_ftrace_update_code+0x2e/0x40 ftrace_run_update_code+0x40/0xa0 ftrace_startup+0xb2/0x168 register_ftrace_function+0x64/0x88 klp_patch_object+0x1a2/0x290 klp_enable_patch+0x554/0x980 do_one_initcall+0x70/0x318 do_init_module+0x6e/0x250 load_module+0x1782/0x1990 __s390x_sys_finit_module+0xaa/0xf0 system_call+0xd8/0x2d0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); It is similar problem that has been solved by the commit 2d1e38f56622b9b ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved. To be on the safe side, text_mutex must become a low level lock taken after cpu_hotplug_lock.rw_sem. This can't be achieved easily with the current ftrace design. For example, arm calls set_all_modules_text_rw() already in ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c. This functions is called: + outside stop_machine() from ftrace_run_update_code() + without stop_machine() from ftrace_module_enable() Fortunately, the problematic fix is needed only on x86_64. It is the only architecture that calls set_all_modules_text_rw() in ftrace path and supports livepatching at the same time. Therefore it is enough to move text_mutex handling from the generic kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c: ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_post_process() This patch basically reverts the ftrace part of the problematic commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race"). And provides x86_64 specific-fix. Some refactoring of the ftrace code will be needed when livepatching is implemented for arm or nds32. These architectures call set_all_modules_text_rw() and use stop_machine() at the same time. Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") Acked-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> [ As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of ftrace_run_update_code() as it is a void function. ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28x86/unwind/orc: Fall back to using frame pointers for generated codeJosh Poimboeuf
The ORC unwinder can't unwind through BPF JIT generated code because there are no ORC entries associated with the code. If an ORC entry isn't available, try to fall back to frame pointers. If BPF and other generated code always do frame pointer setup (even with CONFIG_FRAME_POINTERS=n) then this will allow ORC to unwind through most generated code despite there being no corresponding ORC entries. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/b6f69208ddff4343d56b7bfac1fc7cfcd62689e8.1561595111.git.jpoimboe@redhat.com
2019-06-28perf/x86: Always store regs->ip in perf_callchain_kernel()Song Liu
The stacktrace_map_raw_tp BPF selftest is failing because the RIP saved by perf_arch_fetch_caller_regs() isn't getting saved by perf_callchain_kernel(). This was broken by the following commit: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") With that change, when starting with non-HW regs, the unwinder starts with the current stack frame and unwinds until it passes up the frame which called perf_arch_fetch_caller_regs(). So regs->ip needs to be saved deliberately. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/3975a298fa52b506fea32666d8ff6a13467eee6d.1561595111.git.jpoimboe@redhat.com
2019-06-27x86/entry: Simplify _TIF_SYSCALL_EMU handlingSudeep Holla
The usage of emulated and _TIF_SYSCALL_EMU flags in syscall_trace_enter is more complicated than required. Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-26x86/speculation: Allow guests to use SSBD even if host does notAlejandro Jimenez
The bits set in x86_spec_ctrl_mask are used to calculate the guest's value of SPEC_CTRL that is written to the MSR before VMENTRY, and control which mitigations the guest can enable. In the case of SSBD, unless the host has enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in the kernel parameters), the SSBD bit is not set in the mask and the guest can not properly enable the SSBD always on mitigation mode. This has been confirmed by running the SSBD PoC on a guest using the SSBD always on mitigation mode (booted with kernel parameter "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable unless the host is also using SSBD always on mode. In addition, the guest OS incorrectly reports the SSB vulnerability as mitigated. Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports it, allowing the guest to use SSBD whether or not the host has chosen to enable the mitigation in any of its modes. Fixes: be6fcb5478e9 ("x86/bugs: Rework spec_ctrl base and mask logic") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: bp@alien8.de Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.com
2019-06-26lib/vdso: Make delta calculation work correctlyThomas Gleixner
The x86 vdso implementation on which the generic vdso library is based on has subtle (unfortunately undocumented) twists: 1) The code assumes that the clocksource mask is U64_MAX which means that no bits are masked. Which is true for any valid x86 VDSO clocksource. Stupidly it still did the mask operation for no reason and at the wrong place right after reading the clocksource. 2) It contains a sanity check to catch the case where slightly unsynchronized TSC values can be observed which would cause the delta calculation to make a huge jump. It therefore checks whether the current TSC value is larger than the value on which the current conversion is based on. If it's not larger the base value is used to prevent time jumps. #1 Is not only stupid for the X86 case because it does the masking for no reason it is also completely wrong for clocksources with a smaller mask which can legitimately wrap around during a conversion period. The core timekeeping code does it correct by applying the mask after the delta calculation: (now - base) & mask #2 is equally broken for clocksources which have smaller masks and can wrap around during a conversion period because there the now > base check is just wrong and causes stale time stamps and time going backwards issues. Unbreak it by: 1) Removing the mask operation from the clocksource read which makes the fallback detection work for all clocksources 2) Replacing the conditional delta calculation with a overrideable inline function. #2 could reuse clocksource_delta() from the timekeeping code but that results in a significant performance hit for the x86 VSDO. The timekeeping core code must have the non optimized version as it has to operate correctly with clocksources which have smaller masks as well to handle the case where TSC is discarded as timekeeper clocksource and replaced by HPET or pmtimer. For the VDSO there is no replacement clocksource. If TSC is unusable the syscall is enforced which does the right thing. To accommodate to the needs of various architectures provide an override-able inline function which defaults to the regular delta calculation with masking: (now - base) & mask Override it for x86 with the non-masking and checking version. This unbreaks the ARM64 syscall fallback operation, allows to use clocksources with arbitrary width and preserves the performance optimization for x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: linux-arch@vger.kernel.org Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: catalin.marinas@arm.com Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux@armlinux.org.uk Cc: Ralf Baechle <ralf@linux-mips.org> Cc: paul.burton@mips.com Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: salyzyn@android.com Cc: pcc@google.com Cc: shuah@kernel.org Cc: 0x7f454c46@gmail.com Cc: linux@rasmusvillemoes.dk Cc: huw@codeweavers.com Cc: sthotton@marvell.com Cc: andre.przywara@arm.com Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906261159230.32342@nanos.tec.linutronix.de
2019-06-26x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()Kirill A. Shutemov
Kyle has reported occasional crashes when booting a kernel in 5-level paging mode with KASLR enabled: WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea RIP: 0010:phys_p4d_init+0x1d4/0x1ea Call Trace: __kernel_physical_mapping_init+0x10a/0x35c kernel_physical_mapping_init+0xe/0x10 init_memory_mapping+0x1aa/0x3b0 init_range_memory_mapping+0xc8/0x116 init_mem_mapping+0x225/0x2eb setup_arch+0x6ff/0xcf5 start_kernel+0x64/0x53b ? copy_bootdata+0x1f/0xce x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x8a/0x8d secondary_startup_64+0xb6/0xc0 which causes later: BUG: unable to handle page fault for address: ff484d019580eff8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page BAD Oops: 0000 [#1] SMP NOPTI RIP: 0010:fill_pud+0x13/0x130 Call Trace: set_pte_vaddr_p4d+0x2e/0x50 set_pte_vaddr+0x6f/0xb0 __native_set_fixmap+0x28/0x40 native_set_fixmap+0x39/0x70 register_lapic_address+0x49/0xb6 early_acpi_boot_init+0xa5/0xde setup_arch+0x944/0xcf5 start_kernel+0x64/0x53b Kyle bisected the issue to commit b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB") Before this commit PAGE_OFFSET was always aligned to P4D_SIZE when booting 5-level paging mode. But now only PUD_SIZE alignment is guaranteed. In the case I was able to reproduce the following vaddr/paddr values were observed in phys_p4d_init(): Iteration vaddr paddr 1 0xff4228027fe00000 0x033fe00000 2 0xff42287f40000000 0x8000000000 'vaddr' in both cases belongs to the same p4d entry. But due to the original assumption that PAGE_OFFSET is aligned to P4D_SIZE this overlap cannot be handled correctly. The code assumes strictly aligned entries and unconditionally increments the index into the P4D table, which creates false duplicate entries. Once the index reaches the end, the last entry in the page table is missing. Aside of that the 'paddr >= paddr_end' condition can evaluate wrong which causes an P4D entry to be cleared incorrectly. Change the loop in phys_p4d_init() to walk purely based on virtual addresses like __kernel_physical_mapping_init() does. This makes it work correctly with unaligned virtual addresses. Fixes: b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB") Reported-by: Kyle Pelton <kyle.d.pelton@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Pelton <kyle.d.pelton@intel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190624123150.920-1-kirill.shutemov@linux.intel.com
2019-06-26x86/boot/64: Add missing fixup_pointer() for next_early_pgt accessKirill A. Shutemov
__startup_64() uses fixup_pointer() to access global variables in a position-independent fashion. Access to next_early_pgt was wrapped into the helper, but one instance in the 5-level paging branch was missed. GCC generates a R_X86_64_PC32 PC-relative relocation for the access which doesn't trigger the issue, but Clang emmits a R_X86_64_32S which leads to an invalid memory access and system reboot. Fixes: 187e91fe5e91 ("x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Link: https://lkml.kernel.org/r/20190620112422.29264-1-kirill.shutemov@linux.intel.com
2019-06-26x86/boot/64: Fix crash if kernel image crosses page table boundaryKirill A. Shutemov
A kernel which boots in 5-level paging mode crashes in a small percentage of cases if KASLR is enabled. This issue was tracked down to the case when the kernel image unpacks in a way that it crosses an 1G boundary. The crash is caused by an overrun of the PMD page table in __startup_64() and corruption of P4D page table allocated next to it. This particular issue is not visible with 4-level paging as P4D page tables are not used. But the P4D and the PUD calculation have similar problems. The PMD index calculation is wrong due to operator precedence, which fails to confine the PMDs in the PMD array on wrap around. The P4D calculation for 5-level paging and the PUD calculation calculate the first index correctly, but then blindly increment it which causes the same issue when a kernel image is located across a 512G and for 5-level paging across a 46T boundary. This wrap around mishandling was introduced when these parts moved from assembly to C. Restore it to the correct behaviour. Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190620112345.28833-1-kirill.shutemov@linux.intel.com
2019-06-24perf/x86: Remove pmu->pebs_no_xmm_regsKan Liang
We don't need pmu->pebs_no_xmm_regs anymore, the capabilities PERF_PMU_CAP_EXTENDED_REGS can be used to check if XMM registers collection is supported. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Clean up PEBS_XMM_REGSKan Liang
Use generic macro PERF_REG_EXTENDED_MASK to replace PEBS_XMM_REGS to avoid duplication. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/regs: Check reserved bitsKan Liang
The perf fuzzer triggers a warning which map to: if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset))) return 0; The bits between XMM registers and generic registers are reserved. But perf_reg_validate() doesn't check these bits. Add PERF_REG_X86_RESERVED for reserved bits on X86. Check the reserved bits in perf_reg_validate(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24x86/vdso: Give the [ph]vclock_page declarations real typesAndy Lutomirski
Clean up the vDSO code a bit by giving pvclock_page and hvclock_page their actual types instead of u8[PAGE_SIZE]. This shouldn't materially affect the generated code. Heavily based on a patch from Linus. [ tglx: Adapted to the unified VDSO code ] Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/6920c5188f8658001af1fc56fd35b815706d300c.1561241273.git.luto@kernel.org
2019-06-23smp: Remove smp_call_function() and on_each_cpu() return valuesNadav Amit
The return value is fixed. Remove it and amend the callers. [ tglx: Fixup arm/bL_switcher and powerpc/rtas ] Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20190613064813.8102-2-namit@vmware.com
2019-06-22x86/vdso: Add clock_gettime64() entry pointVincenzo Frascino
Linux 5.1 gained the new clock_gettime64() syscall to address the Y2038 problem on 32bit systems. The x86 VDSO is missing support for this variant of clock_gettime(). Update the x86 specific vDSO library accordingly so it exposes the new time getter. [ tglx: Massaged changelog ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-25-vincenzo.frascino@arm.com
2019-06-22x86/vdso: Add clock_getres() entry pointVincenzo Frascino
The generic vDSO library provides an implementation of clock_getres() that can be leveraged by each architecture. Add the clock_getres() VDSO entry point on x86. [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-24-vincenzo.frascino@arm.com
2019-06-22x86/vdso: Switch to generic vDSO implementationVincenzo Frascino
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library. Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@arm.com
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
2019-06-22x86/apic: Fix integer overflow on 10 bit left shift of cpu_khzColin Ian King
The left shift of unsigned int cpu_khz will overflow for large values of cpu_khz, so cast it to a long long before shifting it to avoid overvlow. For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com
2019-06-21x86/vdso: Prevent segfaults due to hoisted vclock readsAndy Lutomirski
GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock pages before the vclock mode checks. This creates a path through vclock_gettime() in which no vclock is enabled at all (due to disabled TSC on old CPUs, for example) but the pvclock or hvclock page nevertheless read. This will segfault on bare metal. This fixes commit 459e3a21535a ("gcc-9: properly declare the {pv,hv}clock_page storage") in the sense that, before that commit, GCC didn't seem to generate the offending code. There was nothing wrong with that commit per se, and -stable maintainers should backport this to all supported kernels regardless of whether the offending commit was present, since the same crash could just as easily be triggered by the phase of the moon. On GCC 9.1.1, this doesn't seem to affect the generated code at all, so I'm not too concerned about performance regressions from this fix. Cc: stable@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Reported-by: Duncan Roe <duncan_roe@optusnet.com.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-21x86/efi: fix a -Wtype-limits compilation warningQian Cai
Compiling a kernel with W=1 generates this warning, arch/x86/platform/efi/quirks.c:731:16: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] Fixes: 3425d934fc03 ("efi/x86: Handle page faults occurring while running ...") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: "Prakhya, Sai Praneeth" <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-06-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Fixes for ARM and x86, plus selftest patches and nicer structs for nested state save/restore" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: reorganize initial steps of vmx_set_nested_state KVM: arm/arm64: Fix emulated ptimer irq injection tests: kvm: Check for a kernel warning kvm: tests: Sort tests in the Makefile alphabetically KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT KVM: x86: Modify struct kvm_nested_state to have explicit fields for data KVM: fix typo in documentation KVM: nVMX: use correct clean fields when copying from eVMCS KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST KVM: arm64: Implement vq_present() as a macro
2019-06-20KVM: nVMX: reorganize initial steps of vmx_set_nested_statePaolo Bonzini
Commit 332d079735f5 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state", 2019-05-02) broke evmcs_test because the eVMCS setup must be performed even if there is no VMXON region defined, as long as the eVMCS bit is set in the assist page. While the simplest possible fix would be to add a check on kvm_state->flags & KVM_STATE_NESTED_EVMCS in the initial "if" that covers kvm_state->hdr.vmx.vmxon_pa == -1ull, that is quite ugly. Instead, this patch moves checks earlier in the function and conditionalizes them on kvm_state->hdr.vmx.vmxon_pa, so that vmx_set_nested_state always goes through vmx_leave_nested and nested_enable_evmcs. Fixes: 332d079735f5 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state") Cc: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-20x86/resctrl: Prevent possible overrun during bitmap operationsReinette Chatre
While the DOC at the beginning of lib/bitmap.c explicitly states that "The number of valid bits in a given bitmap does _not_ need to be an exact multiple of BITS_PER_LONG.", some of the bitmap operations do indeed access BITS_PER_LONG portions of the provided bitmap no matter the size of the provided bitmap. For example, if find_first_bit() is provided with an 8 bit bitmap the operation will access BITS_PER_LONG bits from the provided bitmap. While the operation ensures that these extra bits do not affect the result, the memory is still accessed. The capacity bitmasks (CBMs) are typically stored in u32 since they can never exceed 32 bits. A few instances exist where a bitmap_* operation is performed on a CBM by simply pointing the bitmap operation to the stored u32 value. The consequence of this pattern is that some bitmap_* operations will access out-of-bounds memory when interacting with the provided CBM. This same issue has previously been addressed with commit 49e00eee0061 ("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests") but at that time not all instances of the issue were fixed. Fix this by using an unsigned long to store the capacity bitmask data that is passed to bitmap functions. Fixes: e651901187ab ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Fixes: f4e80d67a527 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information") Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.com
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504Thomas Gleixner
Based on 1 normalized pattern(s): this file is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin st fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 8 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.443595178@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499Thomas Gleixner
Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 see the copying file in the top level directory extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 35 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497Thomas Gleixner
Based on 1 normalized pattern(s): this file is part of the linux kernel and is made available under the terms of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 28 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.534229504@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 477Thomas Gleixner
Based on 1 normalized pattern(s): subject to gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081204.018005938@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 474Thomas Gleixner
Based on 1 normalized pattern(s): subject to the gnu public license v 2 no warranty of any kind extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 2 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081203.641025917@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243Thomas Gleixner
Based on 1 normalized pattern(s): this file is licensed under the gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204654.634736654@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>