summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-06-24ARM: imx6: add missing put_device() call in imx6q_suspend_init()yu kuai
if of_find_device_by_node() succeed, imx6q_suspend_init() doesn't have a corresponding put_device(). Thus add a jump target to fix the exception handling for this function implementation. Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-06-24ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()yu kuai
if of_find_device_by_node() succeed, imx_suspend_alloc_ocram() doesn't have a corresponding put_device(). Thus add a jump target to fix the exception handling for this function implementation. Fixes: 1579c7b9fe01 ("ARM: imx53: Set DDR pins to high impedance when in suspend to RAM.") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-06-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "All bugfixes except for a couple cleanup patches" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling KVM: X86: Fix MSR range of APIC registers in X2APIC mode KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL KVM: nVMX: Plumb L2 GPA through to PML emulation KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic() KVM: LAPIC: ensure APIC map is up to date on concurrent update requests kvm: lapic: fix broken vcpu hotplug Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU" KVM: VMX: Add helpers to identify interrupt type from intr_info kvm/svm: disable KCSAN for svm_vcpu_run() KVM: MIPS: Fix a build error for !CPU_LOONGSON64
2020-06-23arm64: Depend on newer binutils when building PACMark Brown
Versions of binutils prior to 2.33.1 don't understand the ELF notes that are added by modern compilers to indicate the PAC and BTI options used to build the code. This causes them to emit large numbers of warnings in the form: aarch64-linux-gnu-nm: warning: .tmp_vmlinux.kallsyms2: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000 during the kernel build which is currently causing quite a bit of disruption for automated build testing using clang. In commit 15cd0e675f3f76b (arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch) we added a dependency on binutils to avoid this issue when building with versions of GCC that emit the notes but did not do so for clang as it was believed that the existing check for .cfi_negate_ra_state was already requiring a new enough binutils. This does not appear to be the case for some versions of binutils (eg, the binutils in Debian 10) so instead refactor so we require a new enough GNU binutils in all cases other than when we are using an old GCC version that does not emit notes. Other, more exotic, combinations of tools are possible such as using clang, lld and gas together are possible and may have further problems but rather than adding further version checks it looks like the most robust thing will be to just test that we can build cleanly with the configured tools but that will require more review and discussion so do this for now to address the immediate problem disrupting build testing. Reported-by: KernelCI <bot@kernelci.org> Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1054 Link: https://lore.kernel.org/r/20200619123550.48098-1-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-06-23arm64: compat: Remove 32-bit sigreturn code from the vDSOWill Deacon
The sigreturn code in the compat vDSO is unused. Remove it. Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-06-23arm64: compat: Always use sigpage for sigreturn trampolineWill Deacon
The 32-bit sigreturn trampoline in the compat sigpage matches the binary representation of the arch/arm/ sigpage exactly. This is important for debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely on matching the instruction sequence in order to identify that they are unwinding through a signal. The same cannot be said for the sigreturn trampoline in the compat vDSO, which defeats the unwinder heuristics and instead attempts to use unwind directives for the unwinding. This is in contrast to arch/arm/, which never uses the vDSO for sigreturn. Ensure compatibility with arch/arm/ and existing unwinders by always using the sigpage for the sigreturn trampoline, regardless of the presence of the compat vDSO. Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-06-23arm64: compat: Allow 32-bit vdso and sigpage to co-existWill Deacon
In preparation for removing the signal trampoline from the compat vDSO, allow the sigpage and the compat vDSO to co-exist. For the moment the vDSO signal trampoline will still be used when built. Subsequent patches will move to the sigpage consistently. Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-06-23arm64: vdso: Disable dwarf unwinding through the sigreturn trampolineWill Deacon
Commit 7e9f5e6629f6 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc unwinder to unwind out of signal handlers using the .eh_frame information populated by our .cfi directives. In conjunction with a4eb355a3fda ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has been shown to cause segmentation faults originating from within the unwinder during thread cancellation: | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault. | 0x0000000000435e24 in uw_frame_state_for () | (gdb) bt | #0 0x0000000000435e24 in uw_frame_state_for () | #1 0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 () | #2 0x00000000004374d8 in _Unwind_ForcedUnwind () | #3 0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121 | #4 0x0000000000429808 in __do_cancel () at ./pthreadP.h:304 | #5 sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200 | #6 sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165 | #7 <signal handler called> | #8 futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 After considerable bashing of heads, it appears that our CFI directives for unwinding out of the sigreturn trampoline are only processed by libgcc when both a .eh_frame_hdr section is present *and* the mysterious NOP is covered by an entry in .eh_frame. With both of these now in place, it has highlighted that our CFI directives are not comprehensive enough to restore the stack pointer of the interrupted context. This results in libgcc falling back to an arm64-specific unwinder after computing a bogus PC value from the unwind tables. The unwinder promptly dereferences this bogus address in an attempt to see if the pointed-to instruction sequence looks like the sigreturn trampoline. Restore the old unwind behaviour, which relied solely on heuristics in the unwinder, by removing the .eh_frame_hdr section from the vDSO and commenting out the insufficient CFI directives for now. Add comments to explain the current, miserable state of affairs. Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2020-06-23s390/debug: avoid kernel warning on too large number of pagesChristian Borntraeger
When specifying insanely large debug buffers a kernel warning is printed. The debug code does handle the error gracefully, though. Instead of duplicating the check let us silence the warning to avoid crashes when panic_on_warn is used. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-23s390/kasan: fix early pgm check handler executionVasily Gorbik
Currently if early_pgm_check_handler is called it ends up in pgm check loop. The problem is that early_pgm_check_handler is instrumented by KASAN but executed without DAT flag enabled which leads to addressing exception when KASAN checks try to access shadow memory. Fix that by executing early handlers with DAT flag on under KASAN as expected. Reported-and-tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-23s390: fix system call single steppingSven Schnelle
When single stepping an svc instruction on s390, the kernel is entered with a PER program check interruption. The program check handler than jumps to the system call handler by reloading the PSW. The code didn't set GPR13 to the thread pointer in struct task_struct. This made the kernel access invalid memory while trying to fetch the syscall function address. Fix this by always assigned GPR13 after .Lsysc_per. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-23KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbellMarc Zyngier
When making a vPE non-resident because it has hit a blocking WFI, the doorbell can fire at any time after the write to the RD. Crucially, it can fire right between the write to GICR_VPENDBASER and the write to the pending_last field in the its_vpe structure. This means that we would overwrite pending_last with stale data, and potentially not wakeup until some unrelated event (such as a timer interrupt) puts the vPE back on the CPU. GICv4 isn't affected by this as we actively mask the doorbell on entering the guest, while GICv4.1 automatically manages doorbell delivery without any hypervisor-driven masking. Use the vpe_lock to synchronize such update, which solves the problem altogether. Fixes: ae699ad348cdc ("irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-06-23KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkruSean Christopherson
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: 37486135d3a7b ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23KVM: x86: allow TSC to differ by NTP correction bounds without TSC scalingMarcelo Tosatti
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Without TSC scaling, the current kernel interface will either return an error (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode. This change enables the following TSC tolerance check to accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default). /* * Compute the variation in TSC rate which is acceptable * within the range of tolerance and decide if the * rate being applied is within that bounds of the hardware * rate. If so, no scaling or compensation need be done. */ thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) { pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); use_scaling = 1; } NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616114741.GA298183@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23KVM: X86: Fix MSR range of APIC registers in X2APIC modeXiaoyao Li
Only MSR address range 0x800 through 0x8ff is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode. Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drainFrieder Schrempf
The WDOG_ANY signal is connected to the RESET_IN signal of the SoM and baseboard. It is currently configured as push-pull, which means that if some external device like a programmer wants to assert the RESET_IN signal by pulling it to ground, it drives against the high level WDOG_ANY output of the SoC. To fix this we set the WDOG_ANY signal to open-drain configuration. That way we make sure that the RESET_IN can be asserted by the watchdog as well as by external devices. Fixes: 1ea4b76cdfde ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-06-23ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoMFrieder Schrempf
The watchdog's WDOG_ANY signal is used to trigger a POR of the SoC, if a soft reset is issued. As the SoM hardware connects the WDOG_ANY and the POR signals, the watchdog node itself and the pin configuration should be part of the common SoM devicetree. Let's move it from the baseboard's devicetree to its proper place. Fixes: 1ea4b76cdfde ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-06-22KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROLSean Christopherson
Remove support for context switching between the guest's and host's desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't required for correct functionality, e.g. KVM intercepts reads and writes to the MSR, and the latency effects of the settings controlled by the MSR are not architecturally visible. As a general rule, KVM should not allow the guest to control power management settings unless explicitly enabled by userspace, e.g. see KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 can improve the performance of SMT siblings. A devious guest could disable C0.2 so as to improve the performance of their workloads at the detriment to workloads running in the host or on other VMs. Wholesale removal of UMWAIT_CONTROL context switching also fixes a race condition where updates from the host may cause KVM to enter the guest with the incorrect value. Because updates are are propagated to all CPUs via IPI (SMP function callback), the value in hardware may be stale with respect to the cached value and KVM could enter the guest with the wrong value in hardware. As above, the guest can't observe the bad value, but it's a weird and confusing wart in the implementation. Removal also fixes the unnecessary usage of VMX's atomic load/store MSR lists. Using the lists is only necessary for MSRs that are required for correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on old hardware, or for MSRs that need to-the-uop precision, e.g. perf related MSRs. For UMWAIT_CONTROL, the effects are only visible in the kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in vcpu_vmx_run(). Using the atomic lists is undesirable as they are more expensive than direct RDMSR/WRMSR. Furthermore, even if giving the guest control of the MSR is legitimate, e.g. in pass-through scenarios, it's not clear that the benefits would outweigh the overhead. E.g. saving and restoring an MSR across a VMX roundtrip costs ~250 cycles, and if the guest diverged from the host that cost would be paid on every run of the guest. In other words, if there is a legitimate use case then it should be enabled by a new per-VM capability. Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can correctly expose other WAITPKG features to the guest, e.g. TPAUSE, UMWAIT and UMONITOR. Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Cc: Jingqi Liu <jingqi.liu@intel.com> Cc: Tao Xu <tao3.xu@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: nVMX: Plumb L2 GPA through to PML emulationSean Christopherson
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6e8455 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()Vitaly Kuznetsov
translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and we don't use the value in 'real_gfn' as a GFN, we do real_gfn = gpa_to_gfn(real_gfn); instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust your eyes', but let's fix it for good. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200622151435.752560-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: LAPIC: ensure APIC map is up to date on concurrent update requestsPaolo Bonzini
The following race can cause lost map update events: cpu1 cpu2 apic_map_dirty = true ------------------------------------------------------------ kvm_recalculate_apic_map: pass check mutex_lock(&kvm->arch.apic_map_lock); if (!kvm->arch.apic_map_dirty) and in process of updating map ------------------------------------------------------------- other calls to apic_map_dirty = true might be too late for affected cpu ------------------------------------------------------------- apic_map_dirty = false ------------------------------------------------------------- kvm_recalculate_apic_map: bail out on if (!kvm->arch.apic_map_dirty) To fix it, record the beginning of an update of the APIC map in apic_map_dirty. If another APIC map change switches apic_map_dirty back to DIRTY during the update, kvm_recalculate_apic_map should not make it CLEAN, and the other caller will go through the slow path. Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22kvm: lapic: fix broken vcpu hotplugIgor Mammedov
Guest fails to online hotplugged CPU with error smpboot: do_boot_cpu failed(-1) to wakeup CPU#4 It's caused by the fact that kvm_apic_set_state(), which used to call recalculate_apic_map() unconditionally and pulled hotplugged CPU into apic map, is updating map conditionally on state changes. In this case the APIC map is not considered dirty and the is not updated. Fix the issue by forcing unconditional update from kvm_apic_set_state(), like it used to be. Fixes: 4abaffce4d25a ("KVM: LAPIC: Recalculate apic map in batch") Cc: stable@vger.kernel.org Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200622160830.426022-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: arm64: pvtime: Ensure task delay accounting is enabledAndrew Jones
Ensure we're actually accounting run_delay before we claim that we'll expose it to the guest. If we're not, then we just pretend like steal time isn't supported in order to avoid any confusion. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200622142710.18677-1-drjones@redhat.com
2020-06-22KVM: arm64: Fix kvm_reset_vcpu() return code being incorrect with SVESteven Price
If SVE is enabled then 'ret' can be assigned the return value of kvm_vcpu_enable_sve() which may be 0 causing future "goto out" sites to erroneously return 0 on failure rather than -EINVAL as expected. Remove the initialisation of 'ret' and make setting the return value explicit to avoid this situation in the future. Fixes: 9a3cdf26e336 ("KVM: arm64/sve: Allow userspace to enable SVE for vcpus") Cc: stable@vger.kernel.org Reported-by: James Morse <james.morse@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200617105456.28245-1-steven.price@arm.com
2020-06-22KVM: arm64: Annotate hyp NMI-related functions as __always_inlineAlexandru Elisei
The "inline" keyword is a hint for the compiler to inline a function. The functions system_uses_irq_prio_masking() and gic_write_pmr() are used by the code running at EL2 on a non-VHE system, so mark them as __always_inline to make sure they'll always be part of the .hyp.text section. This fixes the following splat when trying to run a VM: [ 47.625273] Kernel panic - not syncing: HYP panic: [ 47.625273] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006 [ 47.625273] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000 [ 47.625273] VCPU:0000000000000000 [ 47.647261] CPU: 1 PID: 217 Comm: kvm-vcpu-0 Not tainted 5.8.0-rc1-ARCH+ #61 [ 47.654508] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT) [ 47.661139] Call trace: [ 47.663659] dump_backtrace+0x0/0x1cc [ 47.667413] show_stack+0x18/0x24 [ 47.670822] dump_stack+0xb8/0x108 [ 47.674312] panic+0x124/0x2f4 [ 47.677446] panic+0x0/0x2f4 [ 47.680407] SMP: stopping secondary CPUs [ 47.684439] Kernel Offset: disabled [ 47.688018] CPU features: 0x240402,20002008 [ 47.692318] Memory Limit: none [ 47.695465] ---[ end Kernel panic - not syncing: HYP panic: [ 47.695465] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006 [ 47.695465] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000 [ 47.695465] VCPU:0000000000000000 ]--- The instruction abort was caused by the code running at EL2 trying to fetch an instruction which wasn't mapped in the EL2 translation tables. Using objdump showed the two functions as separate symbols in the .text section. Fixes: 85738e05dc38 ("arm64: kvm: Unmask PMR before entering guest") Cc: stable@vger.kernel.org Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200618171254.1596055-1-alexandru.elisei@arm.com
2020-06-22powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUALAneesh Kumar K.V
With CONFIG_DEBUG_VIRTUAL=y, __pa() checks for addr value and if it's less than PAGE_OFFSET it leads to a BUG(). #define __pa(x) ({ VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET); (unsigned long)(x) & 0x0fffffffffffffffUL; }) kernel BUG at arch/powerpc/kvm/book3s_64_mmu_radix.c:43! cpu 0x70: Vector: 700 (Program Check) at [c0000018a2187360] pc: c000000000161b30: __kvmhv_copy_tofrom_guest_radix+0x130/0x1f0 lr: c000000000161d5c: kvmhv_copy_from_guest_radix+0x3c/0x80 ... kvmhv_copy_from_guest_radix+0x3c/0x80 kvmhv_load_from_eaddr+0x48/0xc0 kvmppc_ld+0x98/0x1e0 kvmppc_load_last_inst+0x50/0x90 kvmppc_hv_emulate_mmio+0x288/0x2b0 kvmppc_book3s_radix_page_fault+0xd8/0x2b0 kvmppc_book3s_hv_page_fault+0x37c/0x1050 kvmppc_vcpu_run_hv+0xbb8/0x1080 kvmppc_vcpu_run+0x34/0x50 kvm_arch_vcpu_ioctl_run+0x2fc/0x410 kvm_vcpu_ioctl+0x2b4/0x8f0 ksys_ioctl+0xf4/0x150 sys_ioctl+0x28/0x80 system_call_exception+0x104/0x1d0 system_call_common+0xe8/0x214 kvmhv_copy_tofrom_guest_radix() uses a NULL value for to/from to indicate direction of copy. Avoid calling __pa() if the value is NULL to avoid the BUG(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Massage change log a bit to mention CONFIG_DEBUG_VIRTUAL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611120159.680284-1-aneesh.kumar@linux.ibm.com
2020-06-22powerpc/fsl_booke/32: Fix build with CONFIG_RANDOMIZE_BASEArseny Solokha
Building the current 5.8 kernel for an e500 machine with CONFIG_RANDOMIZE_BASE=y and CONFIG_BLOCK=n yields the following failure: arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init': arch/powerpc/mm/nohash/kaslr_booke.c:387:2: error: implicit declaration of function 'flush_icache_range'; did you mean 'flush_tlb_range'? Indeed, including asm/cacheflush.h into kaslr_booke.c fixes the build. Fixes: 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR infrastructure") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Reviewed-by: Jason Yan <yanaijie@huawei.com> Acked-by: Scott Wood <oss@buserror.net> [mpe: Tweak change log to mention CONFIG_BLOCK=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200613162801.1946619-1-asolokha@kb.kras.ru
2020-06-21Merge tag 'kbuild-fixes-v5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED - improve cc-option in scripts/Kbuild.include to clean up temp files - improve cc-option in scripts/Kconfig.include for more reliable compile option test - do not copy modules.builtin by 'make install' because it would break existing systems - use 'userprogs' syntax for watch_queue sample * tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: samples: watch_queue: build sample program for target architecture Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n" scripts: Fix typo in headers_install.sh kconfig: unify cc-option and as-option kbuild: improve cc-option to clean up all temporary files Makefile: Improve compressed debug info support detection
2020-06-21Merge tag 'powerpc-5.8-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - One fix for the interrupt rework we did last release which broke KVM-PR - Three commits fixing some fallout from the READ_ONCE() changes interacting badly with our 8xx 16K pages support, which uses a pte_t that is a structure of 4 actual PTEs - A cleanup of the 8xx pte_update() to use the newly added pmd_off() - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is enabled - A minor fix for the SPU syscall generation Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike Rapoport, Nicholas Piggin. * tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/8xx: Provide ptep_get() with 16k pages mm: Allow arches to provide ptep_get() mm/gup: Use huge_ptep_get() in gup_hugepte() powerpc/syscalls: Use the number when building SPU syscall table powerpc/8xx: use pmd_off() to access a PMD entry in pte_update() powerpc/64s: Fix KVM interrupt using wrong save area powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-20Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - a small collection of remaining API conversion patches (all acked) which allow to finally remove the deprecated API - some documentation fixes and a MAINTAINERS addition * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Add robert and myself as qcom i2c cci maintainers i2c: smbus: Fix spelling mistake in the comments Documentation/i2c: SMBus start signal is S not A i2c: remove deprecated i2c_new_device API Documentation: media: convert to use i2c_new_client_device() video: backlight: tosa_lcd: convert to use i2c_new_client_device() x86/platform/intel-mid: convert to use i2c_new_client_device() drm: encoder_slave: use new I2C API drm: encoder_slave: fix refcouting error for modules
2020-06-20Merge tag 'trace-v5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Have recordmcount work with > 64K sections (to support LTO) - kprobe RCU fixes - Correct a kprobe critical section with missing mutex - Remove redundant arch_disarm_kprobe() call - Fix lockup when kretprobe triggers within kprobe_flush_task() - Fix memory leak in fetch_op_data operations - Fix sleep in atomic in ftrace trace array sample code - Free up memory on failure in sample trace array code - Fix incorrect reporting of function_graph fields in format file - Fix quote within quote parsing in bootconfig - Fix return value of bootconfig tool - Add testcases for bootconfig tool - Fix maybe uninitialized warning in ftrace pid file code - Remove unused variable in tracing_iter_reset() - Fix some typos * tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix maybe-uninitialized compiler warning tools/bootconfig: Add testcase for show-command and quotes test tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig tools/bootconfig: Fix to use correct quotes for value proc/bootconfig: Fix to use correct quotes for value tracing: Remove unused event variable in tracing_iter_reset tracing/probe: Fix memleak in fetch_op_data operations trace: Fix typo in allocate_ftrace_ops()'s comment tracing: Make ftrace packed events have align of 1 sample-trace-array: Remove trace_array 'sample-instance' sample-trace-array: Fix sleeping function called from invalid context kretprobe: Prevent triggering kretprobe from within kprobe_flush_task kprobes: Remove redundant arch_disarm_kprobe() call kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex kprobes: Use non RCU traversal APIs on kprobe_tables if possible kprobes: Suppress the suspicious RCU warning on kprobes recordmcount: support >64k sections
2020-06-20Merge tag 'libnvdimm-for-5.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "A feature (papr_scm health retrieval) and a fix (sysfs attribute visibility) for v5.8. Vaibhav explains in the merge commit below why missing v5.8 would be painful and I agreed to try a -rc2 pull because only cosmetics kept this out of -rc1 and his initial versions were posted in more than enough time for v5.8 consideration: 'These patches are tied to specific features that were committed to customers in upcoming distros releases (RHEL and SLES) whose time-lines are tied to 5.8 kernel release. Being able to track the health of an nvdimm is critical for our customers that are running workloads leveraging papr-scm nvdimms. Missing the 5.8 kernel would mean missing the distro timelines and shifting forward the availability of this feature in distro kernels by at least 6 months' Summary: - Fix the visibility of the region 'align' attribute. The new unit tests for region alignment handling caught a corner case where the alignment cannot be specified if the region is converted from static to dynamic provisioning at runtime. - Add support for device health retrieval for the persistent memory supported by the papr_scm driver. This includes both the standard sysfs "health flags" that the nfit persistent memory driver publishes and a mechanism for the ndctl tool to retrieve a health-command payload" * tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/region: always show the 'align' attribute powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl() powerpc/papr_scm: Fetch nvdimm health information from PHYP seq_buf: Export seq_buf_printf powerpc: Document details on H_SCM_HEALTH hcall
2020-06-20Merge tag 's390-5.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - a few ptrace fixes mostly for strace and seccomp_bpf kernel tests findings - cleanup unused pm callbacks in virtio ccw - replace kmalloc + memset with kzalloc in crypto - use $(LD) for vDSO linkage to make clang happy - fix vDSO clock_getres() to preserve the same behaviour as posix_get_hrtimer_res() - fix workqueue cpumask warning when NUMA=n and nr_node_ids=2 - reduce SLSB writes during input processing, improve warnings and cleanup qdio_data usage in qdio - a few fixes to use scnprintf() instead of snprintf() * tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix syscall_get_error for compat processes s390/qdio: warn about unexpected SLSB states s390/qdio: clean up usage of qdio_data s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES s390/vdso: fix vDSO clock_getres() s390/vdso: Use $(LD) instead of $(CC) to link vDSO s390/protvirt: use scnprintf() instead of snprintf() s390: use scnprintf() in sys_##_prefix##_##_name##_show s390/crypto: use scnprintf() instead of snprintf() s390/zcrypt: use kzalloc s390/virtio: remove unused pm callbacks s390/qdio: reduce SLSB writes during Input Queue processing selftests/seccomp: s390 shares the syscall and return value register s390/ptrace: fix setting syscall number s390/ptrace: pass invalid syscall numbers to tracing s390/ptrace: return -ENOSYS when invalid syscall is supplied s390/seccomp: pass syscall arguments via seccomp_data s390/qdio: fine-tune SLSB update
2020-06-20Merge tag 'riscv-for-linus-5.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - a workaround for a compiler surprise related to the "r" inline assembly that allows LLVM to boot. - a fix to avoid WX-only mappings, which the ISA does not allow. While this probably manifests in many ways, the bug was found in stress-ng. - a missing lock in set_direct_map_*(), which due to a recent lockdep change started asserting. * tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Acquire mmap lock before invoking walk_page_range RISC-V: Don't allow write+exec only page mapping request in mmap riscv/atomic: Fix sign extension for RV64I
2020-06-20powerpc/8xx: Provide ptep_get() with 16k pagesChristophe Leroy
READ_ONCE() now enforces atomic read, which leads to: CC mm/gup.o In file included from ./include/linux/kernel.h:11:0, from mm/gup.c:2: In function 'gup_hugepte.constprop', inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x); \ ^ mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE' pte = READ_ONCE(*ptep); ^ In function 'gup_get_pte', inlined from 'gup_pte_range' at mm/gup.c:2228:9, inlined from 'gup_pmd_range' at mm/gup.c:2613:15, inlined from 'gup_pud_range' at mm/gup.c:2641:15, inlined from 'gup_p4d_range' at mm/gup.c:2666:15, inlined from 'gup_pgd_range' at mm/gup.c:2694:15, inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x); \ ^ mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE' return READ_ONCE(*ptep); ^ make[2]: *** [mm/gup.o] Error 1 Define ptep_get() on 8xx when using 16k pages. Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/341688399c1b102756046d19ea6ce39db1ae4742.1592225558.git.christophe.leroy@csgroup.eu
2020-06-19Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Unfortunately, we still have a number of outstanding issues so there will be more fixes to come, but this lot are a good start. - Fix handling of watchpoints triggered by uaccess routines - Fix initialisation of gigantic pages for CMA buffers - Raise minimum clang version for BTI to avoid miscompilation - Fix data race in SVE vector length configuration code - Ensure address tags are ignored in kern_addr_valid() - Dump register state on fatal BTI exception - kexec_file() cleanup to use struct_size() macro" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints arm64: kexec_file: Use struct_size() in kmalloc() arm64: mm: reserve hugetlb CMA after numa_init arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n arm64: pgtable: Clear the GP bit for non-executable kernel pages arm64: mm: reset address tag set by kasan sw tagging arm64: traps: Dump registers prior to panic() in bad_mode() arm64/sve: Eliminate data races on sve_default_vl docs/arm64: Fix typo'd #define in sve.rst arm64: remove TEXT_OFFSET randomization
2020-06-19x86/asm/64: Align start of __clear_user() loop to 16-bytesMatt Fleming
x86 CPUs can suffer severe performance drops if a tight loop, such as the ones in __clear_user(), straddles a 16-byte instruction fetch window, or worse, a 64-byte cacheline. This issues was discovered in the SUSE kernel with the following commit, 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") which increased the code object size from 10 bytes to 15 bytes and caused the 8-byte copy loop in __clear_user() to be split across a 64-byte cacheline. Aligning the start of the loop to 16-bytes makes this fit neatly inside a single instruction fetch window again and restores the performance of __clear_user() which is used heavily when reading from /dev/zero. Here are some numbers from running libmicro's read_z* and pread_z* microbenchmarks which read from /dev/zero: Zen 1 (Naples) libmicro-file 5.7.0-rc6 5.7.0-rc6 5.7.0-rc6 revert-1153933703d9+ align16+ Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%) Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%) Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%) Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%) Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%) Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%) Note that this doesn't affect Haswell or Broadwell microarchitectures which seem to avoid the alignment issue by executing the loop straight out of the Loop Stream Detector (verified using perf events). Fixes: 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.19+ Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
2020-06-19Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU"Vitaly Kuznetsov
Guest crashes are observed on a Cascade Lake system when 'perf top' is launched on the host, e.g. BUG: unable to handle kernel paging request at fffffe0000073038 PGD 7ffa7067 P4D 7ffa7067 PUD 7ffa6067 PMD 7ffa5067 PTE ffffffffff120 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1 Comm: systemd Not tainted 4.18.0+ #380 ... Call Trace: serial8250_console_write+0xfe/0x1f0 call_console_drivers.constprop.0+0x9d/0x120 console_unlock+0x1ea/0x460 Call traces are different but the crash is imminent. The problem was blindly bisected to the commit 041bc42ce2d0 ("KVM: VMX: Micro-optimize vmexit time when not exposing PMU"). It was also confirmed that the issue goes away if PMU is exposed to the guest. With some instrumentation of the guest we can see what is being switched (when we do atomic_switch_perf_msrs()): vmx_vcpu_run: switching 2 msrs vmx_vcpu_run: switching MSR38f guest: 70000000d host: 70000000f vmx_vcpu_run: switching MSR3f1 guest: 0 host: 2 The current guess is that PEBS (MSR_IA32_PEBS_ENABLE, 0x3f1) is to blame. Regardless of whether PMU is exposed to the guest or not, PEBS needs to be disabled upon switch. This reverts commit 041bc42ce2d0efac3b85bbb81dea8c74b81f4ef9. Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200619094046.654019-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-19x86/platform/intel-mid: convert to use i2c_new_client_device()Wolfram Sang
Move away from the deprecated API and return the shiny new ERRPTR where useful. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-06-18RISC-V: Acquire mmap lock before invoking walk_page_rangeAtish Patra
As per walk_page_range documentation, mmap lock should be acquired by the caller before invoking walk_page_range. mmap_assert_locked gets triggered without that. The details can be found here. http://lists.infradead.org/pipermail/linux-riscv/2020-June/010335.html Fixes: 395a21ff859c(riscv: add ARCH_HAS_SET_DIRECT_MAP support) Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-06-18RISC-V: Don't allow write+exec only page mapping request in mmapYash Shah
As per the table 4.4 of version "20190608-Priv-MSU-Ratified" of the RISC-V instruction set manual[0], the PTE permission bit combination of "write+exec only" is reserved for future use. Hence, don't allow such mapping request in mmap call. An issue is been reported by David Abdurachmanov, that while running stress-ng with "sysbadaddr" argument, RCU stalls are observed on RISC-V specific kernel. This issue arises when the stress-sysbadaddr request for pages with "write+exec only" permission bits and then passes the address obtain from this mmap call to various system call. For the riscv kernel, the mmap call should fail for this particular combination of permission bits since it's not valid. [0]: http://dabbelt.com/~palmer/keep/riscv-isa-manual/riscv-privileged-20190608-1.pdf Signed-off-by: Yash Shah <yash.shah@sifive.com> Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com> [Palmer: Refer to the latest ISA specification at the only link I could find, and update the terminology.] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-06-18ARCv2: support loop buffer (LPB) disablingEugeniy Paltsev
On HS cores, loop buffer (LPB) is programmable in runtime and can be optionally disabled. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-06-18Merge branch 'hch' (maccess patches from Christoph Hellwig)Linus Torvalds
Merge non-faulting memory access cleanups from Christoph Hellwig: "Andrew and I decided to drop the patches implementing your suggested rename of the probe_kernel_* and probe_user_* helpers from -mm as there were way to many conflicts. After -rc1 might be a good time for this as all the conflicts are resolved now" This also adds a type safety checking patch on top of the renaming series to make the subtle behavioral difference between 'get_user()' and 'get_kernel_nofault()' less potentially dangerous and surprising. * emailed patches from Christoph Hellwig <hch@lst.de>: maccess: make get_kernel_nofault() check for minimal type compatibility maccess: rename probe_kernel_address to get_kernel_nofault maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
2020-06-18maccess: make get_kernel_nofault() check for minimal type compatibilityLinus Torvalds
Now that we've renamed probe_kernel_address() to get_kernel_nofault() and made it look and behave more in line with get_user(), some of the subtle type behavior differences end up being more obvious and possibly dangerous. When you do get_user(val, user_ptr); the type of the access comes from the "user_ptr" part, and the above basically acts as val = *user_ptr; by design (except, of course, for the fact that the actual dereference is done with a user access). Note how in the above case, the type of the end result comes from the pointer argument, and then the value is cast to the type of 'val' as part of the assignment. So the type of the pointer is ultimately the more important type both for the access itself. But 'get_kernel_nofault()' may now _look_ similar, but it behaves very differently. When you do get_kernel_nofault(val, kernel_ptr); it behaves like val = *(typeof(val) *)kernel_ptr; except, of course, for the fact that the actual dereference is done with exception handling so that a faulting access is suppressed and returned as the error code. But note how different the casting behavior of the two superficially similar accesses are: one does the actual access in the size of the type the pointer points to, while the other does the access in the size of the target, and ignores the pointer type entirely. Actually changing get_kernel_nofault() to act like get_user() is almost certainly the right thing to do eventually, but in the meantime this patch adds logit to at least verify that the pointer type is compatible with the type of the result. In many cases, this involves just casting the pointer to 'void *' to make it obvious that the type of the pointer is not the important part. It's not how 'get_user()' acts, but at least the behavioral difference is now obvious and explicit. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18maccess: rename probe_kernel_address to get_kernel_nofaultChristoph Hellwig
Better describe what this helper does, and match the naming of copy_from_kernel_nofault. Also switch the argument order around, so that it acts and looks like get_user(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18objtool: Fix noinstr vs KCOVPeter Zijlstra
Since many compilers cannot disable KCOV with a function attribute, help it to NOP out any __sanitizer_cov_*() calls injected in noinstr code. This turns: 12: e8 00 00 00 00 callq 17 <lockdep_hardirqs_on+0x17> 13: R_X86_64_PLT32 __sanitizer_cov_trace_pc-0x4 into: 12: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 13: R_X86_64_NONE __sanitizer_cov_trace_pc-0x4 Just like recordmcount does. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com>
2020-06-18arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2Adam Ford
LDO1 and LDO2 settings are wrong and case the voltage to go above the maximum level of 2.15V permitted by the SoC to 3.0V. This patch is based on work done on the i.MX8M Mini-EVK which utilizes the same fix. Fixes: 593816fa2f35 ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit") Signed-off-by: Adam Ford <aford173@gmail.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-06-18arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpointsWill Deacon
Unprivileged memory accesses generated by the so-called "translated" instructions (e.g. STTR) at EL1 can cause EL0 watchpoints to fire unexpectedly if kernel debugging is enabled. In such cases, the hw_breakpoint logic will invoke the user overflow handler which will typically raise a SIGTRAP back to the current task. This is futile when returning back to the kernel because (a) the signal won't have been delivered and (b) userspace can't handle the thing anyway. Avoid invoking the user overflow handler for watchpoints triggered by kernel uaccess routines, and instead single-step over the faulting instruction as we would if no overflow handler had been installed. (Fixes tag identifies the introduction of unprivileged memory accesses, which exposed this latent bug in the hw_breakpoint code) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Fixes: 57f4959bad0a ("arm64: kernel: Add support for User Access Override") Reported-by: Luis Machado <luis.machado@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2020-06-18arm64: kexec_file: Use struct_size() in kmalloc()Gustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200617213407.GA1385@embeddedor Signed-off-by: Will Deacon <will@kernel.org>
2020-06-18x86/cpu: Use pinning mask for CR4 bits needing to be 0Kees Cook
The X86_CR4_FSGSBASE bit of CR4 should not change after boot[1]. Older kernels should enforce this bit to zero, and newer kernels need to enforce it depending on boot-time configuration (e.g. "nofsgsbase"). To support a pinned bit being either 1 or 0, use an explicit mask in combination with the expected pinned bit values. [1] https://lore.kernel.org/lkml/20200527103147.GI325280@hirez.programming.kicks-ass.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/202006082013.71E29A42@keescook