Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Thomas Gleixner:
"A small set of updates for the FPU code:
- Make the no387/nofxsr command line options useful by restricting
them to 32bit and actually clearing all dependencies to prevent
random crashes and malfunction.
- Simplify and cleanup the kernel_fpu_*() helpers"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Inline fpu__xstate_clear_all_cpu_caps()
x86/fpu: Make 'no387' and 'nofxsr' command line options useful
x86/fpu: Remove the fpu__save() export
x86/fpu: Simplify kernel_fpu_begin()
x86/fpu: Simplify kernel_fpu_end()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vsyscall updates from Thomas Gleixner:
"Further hardening of the legacy vsyscall by providing support for
execute only mode and switching the default to it.
This prevents a certain class of attacks which rely on the vsyscall
page being accessible at a fixed address in the canonical kernel
address space"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Add a test for process_vm_readv() on the vsyscall page
x86/vsyscall: Add __ro_after_init to global variables
x86/vsyscall: Change the default vsyscall mode to xonly
selftests/x86/vsyscall: Verify that vsyscall=none blocks execution
x86/vsyscall: Document odd SIGSEGV error code for vsyscalls
x86/vsyscall: Show something useful on a read fault
x86/vsyscall: Add a new vsyscall=xonly mode
Documentation/admin: Remove the vsyscall=native documentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x96 apic updates from Thomas Gleixner:
"Updates for the x86 APIC interrupt handling and APIC timer:
- Fix a long standing issue with spurious interrupts which was caused
by the big vector management rework a few years ago. Robert Hodaszi
provided finally enough debug data and an excellent initial failure
analysis which allowed to understand the underlying issues.
This contains a change to the core interrupt management code which
is required to handle this correctly for the APIC/IO_APIC. The core
changes are NOOPs for most architectures except ARM64. ARM64 is not
impacted by the change as confirmed by Marc Zyngier.
- Newer systems allow to disable the PIT clock for power saving
causing panic in the timer interrupt delivery check of the IO/APIC
when the HPET timer is not enabled either. While the clock could be
turned on this would cause an endless whack a mole game to chase
the proper register in each affected chipset.
These systems provide the relevant frequencies for TSC, CPU and the
local APIC timer via CPUID and/or MSRs, which allows to avoid the
PIT/HPET based calibration. As the calibration code is the only
usage of the legacy timers on modern systems and is skipped anyway
when the frequencies are known already, there is no point in
setting up the PIT and actually checking for the interrupt delivery
via IO/APIC.
To achieve this on a wide variety of platforms, the CPUID/MSR based
frequency readout has been made more robust, which also allowed to
remove quite some workarounds which turned out to be not longer
required. Thanks to Daniel Drake for analysis, patches and
verification"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Seperate unused system vectors from spurious entry again
x86/irq: Handle spurious interrupt after shutdown gracefully
x86/ioapic: Implement irq_get_irqchip_state() callback
genirq: Add optional hardware synchronization for shutdown
genirq: Fix misleading synchronize_irq() documentation
genirq: Delay deactivation in free_irq()
x86/timer: Skip PIT initialization on modern chipsets
x86/apic: Use non-atomic operations when possible
x86/apic: Make apic_bsp_setup() static
x86/tsc: Set LAPIC timer period to crystal clock frequency
x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'
x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timer and timekeeping departement delivers:
Core:
- The consolidation of the VDSO code into a generic library including
the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
route through the relevant maintainer trees and should end up in
5.4.
This gets rid of the unnecessary different copies of the same code
and brings all architectures on the same level of VDSO
functionality.
- Make the NTP user space interface more robust by restricting the
TAI offset to prevent undefined behaviour. Includes a selftest.
- Validate user input in the compat settimeofday() syscall to catch
invalid values which would be turned into valid values by a
multiplication overflow
- Consolidate the time accessors
- Small fixes, improvements and cleanups all over the place
Drivers:
- Support for the NXP system counter, TI davinci timer
- Move the Microsoft HyperV clocksource/events code into the
drivers/clocksource directory so it can be shared between x86 and
ARM64.
- Overhaul of the Tegra driver
- Delay timer support for IXP4xx
- Small fixes, improvements and cleanups as usual"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
time: Validate user input in compat_settimeofday()
timer: Document TIMER_PINNED
clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
clocksource/drivers: Make Hyper-V clocksource ISA agnostic
MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
hrtimer: Use a bullet for the returns bullet list
arm64: vdso: Fix compilation with clang older than 8
arm64: compat: Fix __arch_get_hw_counter() implementation
arm64: Fix __arch_get_hw_counter() implementation
lib/vdso: Make delta calculation work correctly
MAINTAINERS: Add entry for the generic VDSO library
arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
arm64: vdso: Remove unnecessary asm-offsets.c definitions
vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
clocksource/drivers/davinci: Add support for clocksource
clocksource/drivers/davinci: Add support for clockevents
clocksource/drivers/tegra: Set up maximum-ticks limit properly
clocksource/drivers/tegra: Cycles can't be 0
clocksource/drivers/tegra: Restore base address before cleanup
clocksource/drivers/tegra: Add verbose definition for 1MHz constant
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug updates from Thomas Gleixner:
"A small set of updates for SMP and CPU hotplug:
- Abort disabling secondary CPUs in the freezer when a wakeup is
pending instead of evaluating it only after all CPUs have been
offlined.
- Remove the shared annotation for the strict per CPU cfd_data in the
smp function call core code.
- Remove the return values of smp_call_function() and on_each_cpu()
as they are unconditionally 0. Fixup the few callers which actually
bothered to check the return value"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove smp_call_function() and on_each_cpu() return values
smp: Do not mark call_function_data as shared
cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending
cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Improve stop_machine wait logic: replace cpu_relax_yield call in
generic stop_machine function with a weak stop_machine_yield
function. This is overridden on s390, which yields the current cpu to
the neighbouring cpu after a couple of retries, instead of blindly
giving up the cpu to the hipervisor. This significantly improves
stop_machine performance on s390 in overcommitted scenarios.
This includes common code changes which have been Acked by Peter
Zijlstra and Thomas Gleixner.
- Improve jump label transformation speed: transform jump labels
without using stop_machine.
- Refactoring of the vfio-ccw cp handling, simplifying the code and
avoiding unneeded allocating/copying.
- Various vfio-ccw fixes (ccw translation, state machine).
- Add support for vfio-ap queue interrupt control in the guest. This
includes s390 kvm changes which have been Acked by Christian
Borntraeger.
- Add protected virtualization support for virtio-ccw.
- Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to
remove some code which most likely isn't working at all, besides that
s390 didn't even compile for !CONFIG_SMP.
- Support for special flagged EP11 CPRBs for zcrypt.
- Handle PCI devices with no support for new MIO instructions.
- Avoid KASAN false positives in reworked stack unwinder.
- Couple of fixes for the QDIO layer.
- Convert s390 specific documentation to ReST format.
- Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if
hardware is missing. This way our modules behave like most other
modules and which is also what systemd's systemd-modules-load.service
expects.
- Replace defconfig with performance_defconfig, so there is one config
file less to maintain.
- Remove the SCLP call home device driver, which was never useful.
- Cleanups all over the place.
* tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (83 commits)
docs: s390: s390dbf: typos and formatting, update crash command
docs: s390: unify and update s390dbf kdocs at debug.c
docs: s390: restore important non-kdoc parts of s390dbf.rst
vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1
s390/pci: correctly handle MIO opt-out
s390/pci: deal with devices that have no support for MIO instructions
s390: ap: kvm: Enable PQAP/AQIC facility for the guest
s390: ap: implement PAPQ AQIC interception in kernel
vfio: ap: register IOMMU VFIO notifier
s390: ap: kvm: add PQAP interception for AQIC
s390/unwind: cleanup unused READ_ONCE_TASK_STACK
s390/kasan: avoid false positives during stack unwind
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
s390/qdio: (re-)initialize tiqdio list entries
s390/dasd: Fix a precision vs width bug in dasd_feature_list()
s390/cio: introduce driver_override on the css bus
vfio-ccw: make convert_ccw0_to_ccw1 static
vfio-ccw: Remove copy_ccw_from_iova()
vfio-ccw: Factor out the ccw0-to-ccw1 transition
vfio-ccw: Copy CCW data outside length calculation
...
|
|
Provide abi_entry, abi_entry_default, abi_ret and abi_ret_default macros
that allocate aligned stack frame in windowed and call0 ABIs.
Provide XTENSA_SPILL_STACK_RESERVE macro that specifies required stack
frame size when register spilling is involved.
Replace all uses of 'entry' and 'retw' with the above macros.
This makes most of the xtensa assembly code ready for XEA3 and call0 ABI.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- switch to using the generic remapping DMA allocator
- defconfig updates
* tag 'm68k-for-v5.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Implement arch_dma_prep_coherent()
m68k: Use the generic dma coherent remap allocator
m68k: defconfig: Update defconfigs for v5.2-rc1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As part of setting up the host context, we populate its
MPIDR by using cpu_logical_map(). It turns out that contrary
to arm64, cpu_logical_map() on 32bit ARM doesn't return the
*full* MPIDR, but a truncated version.
This leaves the host MPIDR slightly corrupted after the first
run of a VM, since we won't correctly restore the MPIDR on
exit. Oops.
Since we cannot trust cpu_logical_map(), let's adopt a different
strategy. We move the initialization of the host CPU context as
part of the per-CPU initialization (which, in retrospect, makes
a lot of sense), and directly read the MPIDR from the HW. This
is guaranteed to work on both arm and arm64.
Reported-by: Andre Przywara <Andre.Przywara@arm.com>
Tested-by: Andre Przywara <Andre.Przywara@arm.com>
Fixes: 32f139551954 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Fix up the conflict between "VDSO: Drop implicit common-page-size
linker flag" and "vdso: pass --be8 to linker if necessary"
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
* pm-sleep:
PM: sleep: Drop dev_pm_skip_next_resume_phases()
ACPI: PM: Drop unused function and function header
ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
ACPI: PM: Simplify and fix PM domain hibernation callbacks
PCI: PM: Simplify bus-level hibernation callbacks
PM: ACPI/PCI: Resume all devices during hibernation
kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
PM: sleep: Update struct wakeup_source documentation
drivers: base: power: remove wakeup_sources_stats_dentry variable
PM: suspend: Rename pm_suspend_via_s2idle()
PM: sleep: Show how long dpm_suspend_start() and dpm_suspend_end() take
PM: hibernate: powerpc: Expose pfn_is_nosave() prototype
|
|
For some reasons my previous patch "Enable AXI DW DMAC support"
was applied only partially (only device tree part).
So enable AXI DW DMAC in HSDK defconfig to be able to use it in
verification flow.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
HSDK SoC has DW SPI controller. Enable it in preparation of
enabling on-board SPI peripherals.
Acked-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
As kernelci.org reports, this function is not used in
vdk_hs38_defconfig:
arch/arc/kernel/unwind.c:188:14: warning: 'unw_hdr_alloc' defined but not used [-Wunused-function]
Fixes: bc79c9a72165 ("ARC: dw2 unwind: Reinstante unwinding out of modules")
Link: https://kernelci.org/build/id/5d1cae3f59b514300340c132/logs/
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
As a preparation for QEMU usage for ARC let's add basic Virtio-MMIO
peripherals support for the platform we're going to use.
For now we add 5 Virtio slots in .dts and enable block and network devices
via Virtio-MMIO.
Note even though typically Virtio register set fits in 0x200 bytes
we "allocate" here 0x2000 so that it matches ARC's default 8KiB page size
and so remapping of that area is done clearly.
We also enable DEVTMPFS automount for more convenient use
of external root file-stystem. Before that we used to use built-in
Initramfs which didn't automount DEVTMPFS anyways so we didn't need
that option, while now it starts making sense.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
Commit 4255b07f2c9c43540 ("ARCv2: STAR 9000793984: Handle return
from intr to Delay Slot") involved a complex 2 staged trampoline.
Apparently this can be greatly simplified by returning from pure
kernel mode (iso interrupt) so drop to pure kernel mdoe and execute
the normal exception return path.
Testing this was a bit of challenge as return from interrupt is rarely
executed now after commit 4de0e52867d83105767 ("ARCv2: STAR 9000814690:
Really Re-enable interrupts to avoid deadlocks"). That fix is necessary
evil and pct interrupts etc do exercise intr return path.
Anyhow after a revert of above in my local test setup I was able to hit
this case and verify the patch works.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
avoids 1 MOV instruction in light of double load/store code
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/dt
Samsung DTS ARM changes for v5.3, third round
1. Fix imprecise abort on Exynos4210 caused by newly added Mali nodes,
2. Reorganize Mali nodes under /soc,
3. Adjust buck regulators voltages on Arndale Octa and Odroid XU3/XU4
family to sane values.
* tag 'samsung-dt-5.3-3' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: dts: exynos: Adjust buck[78] regulators to supported values on Arndale Octa
ARM: dts: exynos: Adjust buck[78] regulators to supported values on Odroid XU3 family
ARM: dts: exynos: Move Mali400 GPU node to "/soc"
ARM: dts: exynos: Fix imprecise abort on Mali GPU probe on Exynos4210
Link: https://lore.kernel.org/r/20190707180115.5562-1-krzk@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple
function since commit
73e3a7d2a7c3b ("x86/fpu: Remove the explicit clearing of XSAVE dependent features")
so invoke that function directly and remove the wrapper.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
|
|
The command line option `no387' is designed to disable the FPU
entirely. This only 'works' with CONFIG_MATH_EMULATION enabled.
But on 64bit this cannot work because user space expects SSE to work which
required basic FPU support. MATH_EMULATION does not help because SSE is not
emulated.
The command line option `nofxsr' should also be limited to 32bit because
FXSR is part of the required flags on 64bit so turning it off is not
possible.
Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and
hang in fpu__init_system_early_generic() before the console is enabled.
Setting additioal dependencies, ensures that the CPU still boots on a
modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the
kernel to crash early in fpu__init_system_mxcsr().
With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is
that xsetbv() with XMM set and SSE cleared is not allowed. That means
XSAVE has to be disabled. The XSAVE support is disabled in
fpu__init_system_xstate_size_legacy() but it is too late. It can be
removed, it has been added in commit
1f999ab5a1360 ("x86, xsave: Disable xsave in i387 emulation mode")
to use `no387' on a CPU with XSAVE support.
All this happens before console output.
After hat, the next possible crash is in RAID6 detect code because MMX
remained enabled. With a 3DNOW enabled config it will explode in memcpy()
for instance due to kernel_fpu_begin() but this is unconditionally enabled.
This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which
supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian
increased the minimum requirements to i686 which means it does not boot
userland atleast due to CMOV.
After masking the additional features it still keeps SSE4A and 3DNOW*
enabled (if present on the host) but those are unused in the kernel.
Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for
FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few more MIPS fixes:
- Fix a silly typo in virt_addr_valid which led to completely bogus
behavior (that happened to stop tripping up hardened usercopy
despite being broken).
- Fix UART parity setup on AR933x systems.
- A build fix for non-Linux build machines.
- Have the 'all' make target build DTBs, primarily to fit in with the
behavior of scripts/package/builddeb.
- Handle an execution hazard in TLB exceptions that use KScratch
registers, which could inadvertently clobber the $1 register on
some generally higher-end out-of-order CPUs.
- A MAINTAINERS update to fix the path to the NAND driver for Ingenic
systems"
* tag 'mips_fixes_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MAINTAINERS: Correct path to moved files
MIPS: Add missing EHB in mtc0 -> mfc0 sequence.
MIPS: have "plain" make calls build dtbs for selected platforms
MIPS: fix build on non-linux hosts
MIPS: ath79: fix ar933x uart parity mode
MIPS: Fix bounds check virt_addr_valid
|
|
Pull kvm fixes from Paolo Bonzini:
"x86 bugfix patches and one compilation fix for ARM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64/sve: Fix vq_present() macro to yield a bool
KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC
KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCS
KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMM
KVM: x86: degrade WARN to pr_warn_ratelimited
|
|
Commit 66d0d5a854a6 ("riscv: bpf: eliminate zero extension code-gen")
added the new zero-extension optimization for some BPF ALU operations.
Since then, bugs in the JIT that have been fixed in the bpf tree require
this optimization to be added to other operations: commit 1e692f09e091
("bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh"),
and commit fe121ee531d1 ("bpf, riscv: clear target register high 32-bits
for and/or/xor on ALU32").
Now that these have been merged to bpf-next, the zext optimization can
be enabled for the fixed operations.
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Jiong Wang <jiong.wang@netronome.com>
Cc: Xi Wang <xi.wang@gmail.com>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane which
can happen sporadically in product environment.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Avoid an extra function call by using a ternary operator instead of
a conditional statement for a setting selection.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Message-Id: <495c9f2e-7880-ee9a-5c61-eee598bb24c2@web.de>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
To increase readability/maintainability, replace hard coded
instructions values by symbolic names.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix R_PPC64_ENTRY case, the addi reads from r2 not r12]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
To increase readability/maintainability, replace hard coded
instructions values by symbolic names.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
PPC_HA() PPC_HI() and PPC_LO() macros are nice macros. Move them
from module64.c to ppc-opcode.h in order to use them in other places.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Clean up formatting in new code, drop duplicates in ftrace.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The comment here is wrong, the addi reads from r2 not r12. The code is
correct, 0x38420000 = addi r2,r2,0.
Fixes: a61674bdfc7c ("powerpc/module: Handle R_PPC64_ENTRY relocations")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Replace a magic 64-bit mask with a list of valid registers, computing
the same mask in the end.
Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, the {read,write}_sysreg_el*() accessors for accessing
particular ELs' sysregs in the presence of VHE rely on some local
hacks and define their system register encodings in a way that is
inconsistent with the core definitions in <asm/sysreg.h>.
As a result, it is necessary to add duplicate definitions for any
system register that already needs a definition in sysreg.h for
other reasons.
This is a bit of a maintenance headache, and the reasons for the
_el*() accessors working the way they do is a bit historical.
This patch gets rid of the shadow sysreg definitions in
<asm/kvm_hyp.h>, converts the _el*() accessors to use the core
__msr_s/__mrs_s interface, and converts all call sites to use the
standard sysreg #define names (i.e., upper case, with SYS_ prefix).
This patch will conflict heavily anyway, so the opportunity
to clean up some bad whitespace in the context of the changes is
taken.
The change exposes a few system registers that have no sysreg.h
definition, due to msr_s/mrs_s being used in place of msr/mrs:
additions are made in order to fill in the gaps.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://www.spinics.net/lists/kvm-arm/msg31717.html
[Rebased to v4.21-rc1]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[Rebased to v5.2-rc5, changelog updates]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
KVM implements the firmware interface for mitigating cache speculation
vulnerabilities. Guests may use this interface to ensure mitigation is
active.
If we want to migrate such a guest to a host with a different support
level for those workarounds, migration might need to fail, to ensure that
critical guests don't loose their protection.
Introduce a way for userland to save and restore the workarounds state.
On restoring we do checks that make sure we don't downgrade our
mitigation level.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Recent commits added the explicit notion of "workaround not required" to
the state of the Spectre v2 (aka. BP_HARDENING) workaround, where we
just had "needed" and "unknown" before.
Export this knowledge to the rest of the kernel and enhance the existing
kvm_arm_harden_branch_predictor() to report this new state as well.
Export this new state to guests when they use KVM's firmware interface
emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The kvm_pmu_{enable/disable}_counter functions can enable/disable
multiple counters at once as they operate on a bitmask. Let's
make this clearer by renaming the function.
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
kvm-unit-tests were adjusted to match bare metal behavior, but KVM
itself was not doing what bare metal does; fix that.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
During __guest_exit() we need to consume any SError left pending by the
guest so it doesn't contaminate the host. With v8.2 we use the
ESB-instruction. For systems without v8.2, we use dsb+isb and unmask
SError. We do this on every guest exit.
Use the same dsb+isr_el1 trick, this lets us know if an SError is pending
after the dsb, allowing us to skip the isb and self-synchronising PSTATE
write if its not.
This means SError remains masked during KVM's world-switch, so any SError
that occurs during this time is reported by the host, instead of causing
a hyp-panic.
As we're benchmarking this code lets polish the layout. If you give gcc
likely()/unlikely() hints in an if() condition, it shuffles the generated
assembly so that the likely case is immediately after the branch. Lets
do the same here.
Signed-off-by: James Morse <james.morse@arm.com>
Changes since v2:
* Added isb after the dsb to prevent an early read
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
KVM consumes any SError that were pending during guest exit with a
dsb/isb and unmasking SError. It currently leaves SError unmasked for
the rest of world-switch.
This means any SError that occurs during this part of world-switch
will cause a hyp-panic. We'd much prefer it to remain pending until
we return to the host.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error
as Unrecoverable. The kernel's architecture code already considers
Unrecoverable errors as fatal as without kernel-first support no
further error-handling is possible.
Now that KVM attributes SError to the host/guest more precisely
the host's architecture code will always handle host errors that
become pending during world-switch.
Errors misclassified by this errata that affected the guest will be
re-injected to the guest as an implementation-defined SError, which can
be uncontained.
Until kernel-first support is implemented, no workaround is needed
for this issue.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
SError that occur during world-switch's entry to the guest will be
accounted to the guest, as the exception is masked until we enter the
guest... but we want to attribute the SError as precisely as possible.
Reading DISR_EL1 before guest entry requires free registers, and using
ESB+DISR_EL1 to consume and read back the ESR would leave KVM holding
a host SError... We would rather leave the SError pending and let the
host take it once we exit world-switch. To do this, we need to defer
guest-entry if an SError is pending.
Read the ISR to see if SError (or an IRQ) is pending. If so fake an
exit. Place this check between __guest_enter()'s save of the host
registers, and restore of the guest's. SError that occur between
here and the eret into the guest must have affected the guest's
registers, which we can naturally attribute to the guest.
The dsb is needed to ensure any previous writes have been done before
we read ISR_EL1. On systems without the v8.2 RAS extensions this
doesn't give us anything as we can't contain errors, and the ESR bits
to describe the severity are all implementation-defined. Replace
this with a nop for these systems.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
On systems with v8.2 we switch the 'vaxorcism' of guest SError with an
alternative sequence that uses the ESB-instruction, then reads DISR_EL1.
This saves the unmasking and remasking of asynchronous exceptions.
We do this after we've saved the guest registers and restored the
host's. Any SError that becomes pending due to this will be accounted
to the guest, when it actually occurred during host-execution.
Move the ESB-instruction as early as possible. Any guest SError
will become pending due to this ESB-instruction and then consumed to
DISR_EL1 before the host touches anything.
This lets us account for host/guest SError precisely on the guest
exit exception boundary.
Because the ESB-instruction now lands in the preamble section of
the vectors, we need to add it to the unpatched indirect vectors
too, and to any sequence that may be patched in over the top.
The ESB-instruction always lives in the head of the vectors,
to be before any memory write. Whereas the register-store always
lives in the tail.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The KVM indirect vectors support is a little complicated. Different CPUs
may use different exception vectors for KVM that are generated at boot.
Adding new instructions involves checking all the possible combinations
do the right thing.
To make changes here easier to review lets state what we expect of the
preamble:
1. The first vector run, must always run the preamble.
2. Patching the head or tail of the vector shouldn't remove
preamble instructions.
Today, this is easy as we only have one instruction in the preamble.
Change the unpatched tail of the indirect vector so that it always
runs this, regardless of patching.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The EL2 vector hardening feature causes KVM to generate vectors for
each type of CPU present in the system. The generated sequences already
do some of the early guest-exit work (i.e. saving registers). To avoid
duplication the generated vectors branch to the original vector just
after the preamble. This size is hard coded.
Adding new instructions to the HYP vector causes strange side effects,
which are difficult to debug as the affected code is patched in at
runtime.
Add KVM_VECTOR_PREAMBLE to tell kvm_patch_vector_branch() how big
the preamble is. The valid_vect macro can then validate this at
build time.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The ESB-instruction is a nop on CPUs that don't implement the RAS
extensions. This lets us use it in places like the vectors without
having to use alternatives.
If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has
its RAS extensions behaviour, but we no longer read DISR_EL1 as this
register does depend on alternatives.
This could go wrong if we want to synchronize an SError from a KVM
guest. On a CPU that has the RAS extensions, but the KConfig option
was disabled, we consume the pending SError with no chance of ever
reading it.
Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option,
outputting a regular nop if the feature has been disabled.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
nested guests
According to section "Checks on Host Segment and Descriptor-Table
Registers" in Intel SDM vol 3C, the following checks are performed on
vmentry of nested guests:
- In the selector field for each of CS, SS, DS, ES, FS, GS and TR, the
RPL (bits 1:0) and the TI flag (bit 2) must be 0.
- The selector fields for CS and TR cannot be 0000H.
- The selector field for SS cannot be 0000H if the "host address-space
size" VM-exit control is 0.
- On processors that support Intel 64 architecture, the base-address
fields for FS, GS and TR must contain canonical addresses.
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM does not have 100% coverage of VMX consistency checks, i.e. some
checks that cause VM-Fail may only be detected by hardware during a
nested VM-Entry. In such a case, KVM must restore L1's state to the
pre-VM-Enter state as L2's state has already been loaded into KVM's
software model.
L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*. But
when EPT is disabled, the associated fields hold KVM's shadow values,
not L1's "real" values. Fortunately, when EPT is disabled the PDPTRs
come from memory, i.e. are not cached in the VMCS. Which leaves CR3
as the sole anomaly.
A previously applied workaround to handle CR3 was to force nested early
checks if EPT is disabled:
commit 2b27924bb1d48 ("KVM: nVMX: always use early vmcs check when EPT
is disabled")
Forcing nested early checks is undesirable as doing so adds hundreds of
cycles to every nested VM-Entry. Rather than take this performance hit,
handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
VM-Entry when EPT is disabled *and* nested early checks are disabled.
By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
These shenanigans work because nested_vmx_restore_host_state() does a
full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
prior to re-entering L1.
vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:
nested_vmx_restore_host_state() ->
kvm_mmu_reset_context() ->
kvm_mmu_unload() ->
kvm_mmu_free_roots()
kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
on 'root_hpa == INVALID_PAGE' unless the implementation of
kvm_mmu_reset_context() is changed.
On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
successful entry) via:
vcpu_enter_guest() ->
kvm_mmu_reload() ->
kvm_mmu_load() ->
kvm_mmu_load_cr3() ->
vmx_set_cr3()
Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
as a "late" VM-Fail should never happen win that case (KVM WARNs), and
the conditional write avoids the need to restore the correct GUEST_CR3
when nested_vmx_check_vmentry_hw() fails.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20190607185534.24368-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These are useful in debugging shadow paging.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Note that in such a case it is quite likely that KVM will BUG_ON
in __pte_list_remove when the VM is closed. However, there is no
immediate risk of memory corruption in the host so a WARN_ON is
enough and it lets you gather traces for debugging.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After the previous patch, the low bits of the gfn are masked in
both FNAME(fetch) and __direct_map, so we do not need to clear them
in transparent_hugepage_adjust.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|