summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-08-27x86/irq: Unbreak interrupt affinity settingThomas Gleixner
Several people reported that 5.8 broke the interrupt affinity setting mechanism. The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument. The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1. The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry. But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made. Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1. That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens. There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required. Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU. Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away. The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed. For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds. Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: Alex bykov <alex.bykov@scylladb.com> Reported-by: Avi Kivity <avi@scylladb.com> Reported-by: Alexander Graf <graf@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexander Graf <graf@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
2020-08-27x86/hotplug: Silence APIC only after all interrupts are migratedAshok Raj
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green <evgreen@chromium.org> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Evan Green <evgreen@chromium.org> Reviewed-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
2020-08-26s390/vmem: fix vmem_add_range for 4-level pagingVasily Gorbik
The kernel currently crashes if 4-level paging is used. Add missing p4d_populate for just allocated pud entry. Fixes: 3e0d3e408e63 ("s390/vmem: consolidate vmem_add_range() and vmem_remove_range()") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26s390: don't trace preemption in percpu macrosSven Schnelle
Since commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") the lockdep code itself uses percpu variables. This leads to recursions because the percpu macros are calling preempt_enable() which might call trace_preempt_on(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26lockdep: Only trace IRQ edgesNicholas Piggin
Problem: raw_local_irq_save(); // software state on local_irq_save(); // software state off ... local_irq_restore(); // software state still off, because we don't enable IRQs raw_local_irq_restore(); // software state still off, *whoopsie* existing instances: - lock_acquire() raw_local_irq_save() __lock_acquire() arch_spin_lock(&graph_lock) pv_wait() := kvm_wait() (same or worse for Xen/HyperV) local_irq_save() - trace_clock_global() raw_local_irq_save() arch_spin_lock() pv_wait() := kvm_wait() local_irq_save() - apic_retrigger_irq() raw_local_irq_save() apic->send_IPI() := default_send_IPI_single_phys() local_irq_save() Possible solutions: A) make it work by enabling the tracing inside raw_*() B) make it work by keeping tracing disabled inside raw_*() C) call it broken and clean it up now Now, given that the only reason to use the raw_* variant is because you don't want tracing. Therefore A) seems like a weird option (although it can be done). C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just because there is one raw user, this strips the validation/tracing off for all the other users. So we pick B) and declare any code that ends up doing: raw_local_irq_save() local_irq_save() lockdep_assert_irqs_disabled(); broken. AFAICT this problem has existed forever, the only reason it came up is because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking") changed IRQ tracing vs lockdep recursion and the first instance is fairly common, the other cases hardly ever happen. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [rewrote changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
2020-08-26mips: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200826101653.GE1362448@hirez.programming.kicks-ass.net
2020-08-26arm64: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200821085348.664425120@infradead.org
2020-08-26nds32: Implement arch_irqs_disabled()Peter Zijlstra
Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.604899379@infradead.org
2020-08-26x86/entry: Remove unused THUNKsPeter Zijlstra
Unused remnants Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.487040689@infradead.org
2020-08-26cpuidle: Move trace_cpu_idle() into generic codePeter Zijlstra
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
2020-08-26cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED genericPeter Zijlstra
This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
2020-08-24powerpc/64s: Disallow PROT_SAO in LPARs by defaultShawn Anastasio
Since migration of guests using SAO to ISA 3.1 hosts may cause issues, disable PROT_SAO in LPARs by default and introduce a new Kconfig option PPC_PROT_SAO_LPAR to allow users to enable it if desired. Signed-off-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-3-shawn@anastas.io
2020-08-24Revert "powerpc/64s: Remove PROT_SAO support"Shawn Anastasio
This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05. Since PROT_SAO can still be useful for certain classes of software, reintroduce it. Concerns about guest migration for LPARs using SAO will be addressed next. Signed-off-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-2-shawn@anastas.io
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23Merge tag 'powerpc-5.9-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Add perf support for emitting extended registers for power10. - A fix for CPU hotplug on pseries, where on large/loaded systems we may not wait long enough for the CPU to be offlined, leading to crashes. - Addition of a raw cputable entry for Power10, which is not required to boot, but is required to make our PMU setup work correctly in guests. - Three fixes for the recent changes on 32-bit Book3S to move modules into their own segment for strict RWX. - A fix for a recent change in our powernv PCI code that could lead to crashes. - A change to our perf interrupt accounting to avoid soft lockups when using some events, found by syzkaller. - A change in the way we handle power loss events from the hypervisor on pseries. We no longer immediately shut down if we're told we're running on a UPS. - A few other minor fixes. Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz, Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth, Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann, Vaidyanathan Srinivasan, Vasant Hegde. * tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000 powerpc/pseries: Do not initiate shutdown when system is running on UPS powerpc/perf: Fix soft lockups due to missed interrupt accounting powerpc/powernv/pci: Fix possible crash when releasing DMA resources powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32 powerpc/fixmap: Fix the size of the early debug area powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled powerpc/kernel: Cleanup machine check function declarations powerpc: Add POWER10 raw mode cputable entry powerpc/perf: Add extended regs support for power10 platform powerpc/perf: Add support for outputting extended regs in perf intr_regs powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
2020-08-23Merge tag 'x86-urgent-2020-08-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for x86 which removes the RDPID usage from the paranoid entry path and unconditionally uses LSL to retrieve the CPU number. RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values and restores them either when leaving the run loop, on preemption or when going out to user space. MSR_TSX_AUX is part of that lazy MSR set, so after writing the guest value and before the lazy restore any exception using the paranoid entry will read the guest value and use it as CPU number to retrieve the GSBASE value for the current CPU when FSGSBASE is enabled. As RDPID is only used in that particular entry path, there is no reason to burden VMENTER/EXIT with two extra MSR writes. Remove the RDPID optimization, which is not even backed by numbers from the paranoid entry path instead" * tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
2020-08-23Merge tag 'perf-urgent-2020-08-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Thomas Gleixner: "A single update for perf on x86 which has support for the broken down bandwith counters" * tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
2020-08-23Merge tag 'efi-urgent-2020-08-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: - Enforce NX on RO data in mixed EFI mode - Destroy workqueue in an error handling path to prevent UAF - Stop argument parser at '--' which is the delimiter for init - Treat a NULL command line pointer as empty instead of dereferncing it unconditionally. - Handle an unterminated command line correctly - Cleanup the 32bit code leftovers and remove obsolete documentation * tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: efi: remove description of efi=old_map efi/x86: Move 32-bit code into efi_32.c efi/libstub: Handle unterminated cmdline efi/libstub: Handle NULL cmdline efi/libstub: Stop parsing arguments at "--" efi: add missed destroy_workqueue when efisubsys_init fails efi/x86: Mark kernel rodata non-executable for mixed mode
2020-08-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Allow booting of late secondary CPUs affected by erratum 1418040 (currently they are parked if none of the early CPUs are affected by this erratum). - Add the 32-bit vdso Makefile to the vdso_install rule so that 'make vdso_install' installs the 32-bit compat vdso when it is compiled. - Print a warning that untrusted guests without a CPU erratum workaround (Cortex-A57 832075) may deadlock the affected system. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: ARM64: vdso32: Install vdso32 from vdso_install KVM: arm64: Print warning when cpu erratum can cause guests to deadlock arm64: Allow booting of late CPUs affected by erratum 1418040 arm64: Move handling of erratum 1418040 into C code
2020-08-22Merge tag 's390-5.9-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - a couple of fixes for storage key handling relevant for debugging - add cond_resched into potentially slow subchannels scanning loop - fixes for PF/VF linking and to ignore stale PCI configuration request events * tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: fix PF/VF linking on hot plug s390/pci: re-introduce zpci_remove_device() s390/pci: fix zpci_bus_link_virtfn() s390/ptrace: fix storage key handling s390/runtime_instrumentation: fix storage key handling s390/pci: ignore stale configuration request event s390/cio: add cond_resched() in the slow_eval_known_fn() loop
2020-08-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - PAE and PKU bugfixes for x86 - selftests fix for new binutils - MMU notifier fix for arm64 * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode KVM: x86: fix access code passed to gva_to_gpa selftests: kvm: Use a shorter encoding to clear RAX
2020-08-22mips/oprofile: Fix fallthrough placementHe Zhe
We want neither " include/linux/compiler_attributes.h:201:41: warning: statement will never be executed [-Wswitch-unreachable] 201 | # define fallthrough __attribute__((__fallthrough__)) | ^~~~~~~~~~~~~ " nor " include/linux/compiler_attributes.h:201:41: warning: attribute 'fallthrough' not preceding a case label or default label 201 | # define fallthrough __attribute__((__fallthrough__)) | ^~~~~~~~~~~~~ " It's not worth adding one more macro. Let's simply place the fallthrough in between the expansions. Fixes: c9b029903466 ("MIPS: Use fallthrough for arch/mips") Cc: stable@vger.kernel.org Signed-off-by: He Zhe <zhe.he@windriver.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-08-22MIPS: Loongson64: Remove unnecessary inclusion of boot_param.hWANG Xuerui
The couple of #includes are unused by now; remove to prevent namespace pollution. This fixes e.g. build of dm_thin, which has a VIRTUAL symbol that conflicted with the newly-introduced one in mach-loongson64/boot_param.h. Fixes: 39c1485c8baa ("MIPS: KVM: Add kvm guest support for Loongson-3") Signed-off-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-08-21KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not setWill Deacon
When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: <stable@vger.kernel.org> Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-3-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21Merge tag 'riscv-for-linus-5.9-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - The CLINT driver has been split in two: one to handle the M-mode CLINT (memory mapped and used on NOMMU systems) and one to handle the S-mode CLINT (via SBI). - The addition of SiFive's drivers to rv32_defconfig * tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Add SiFive drivers to rv32_defconfig dt-bindings: timer: Add CLINT bindings RISC-V: Remove CLINT related code from timer and arch clocksource/drivers: Add CLINT timer driver RISC-V: Add mechanism to provide custom IPI operations
2020-08-21Merge tag 'for-linus-5.9-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "One build fix and a minor fix for suppressing a useless warning when booting a Xen dom0 via UEFI" * tag 'for-linus-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Fix build error when CONFIG_ACPI is not set/enabled: efi: avoid error message when booting under Xen
2020-08-21ARM64: vdso32: Install vdso32 from vdso_installStephen Boyd
Add the 32-bit vdso Makefile to the vdso_install rule so that 'make vdso_install' installs the 32-bit compat vdso when it is compiled. Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-21x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVMSean Christopherson
KVM has an optmization to avoid expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values and restores them either when leaving the run loop, on preemption or when going out to user space. The affected MSRs are not required for kernel context operations. This changed with the recently introduced mechanism to handle FSGSBASE in the paranoid entry code which has to retrieve the kernel GSBASE value by accessing per CPU memory. The mechanism needs to retrieve the CPU number and uses either LSL or RDPID if the processor supports it. Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and lazily restored MSRs, which means between the point where the guest value is written and the point of restore, MSR_TSC_AUX contains a random number. If an NMI or any other exception which uses the paranoid entry path happens in such a context, then RDPID returns the random guest MSR_TSC_AUX value. As a consequence this reads from the wrong memory location to retrieve the kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*() operations. If the GSBASE in the exception handler points to the per CPU memory of a different CPU then this has the obvious consequences of data corruption and crashes. As the paranoid entry path is the only place which accesses MSR_TSX_AUX (via RDPID) and the fallback via LSL is not significantly slower, remove the RDPID alternative from the entry path and always use LSL. The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT which would be inflicting massive overhead on that code path. [ tglx: Rewrote changelog ] Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Debugged-by: Tom Lendacky <thomas.lendacky@amd.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
2020-08-21powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driverKajol Jain
Commit 792f73f747b8 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask") added cpumask file as part of hv-24x7 driver inside the interface folder. The cpumask file is supposed to be in the top folder of the PMU driver in order to make hotplug work. This patch fixes that issue and creates new group 'cpumask_attr_group' to add cpumask file and make sure it added in top folder. command:# cat /sys/devices/hv_24x7/cpumask 0 Fixes: 792f73f747b8 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821080610.123997-1-kjain@linux.ibm.com
2020-08-21powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000Christophe Leroy
In is_module_segment(), when VMALLOC_END is over 0xf0000000, ALIGN(VMALLOC_END, SZ_256M) has value 0. In that case, addr >= ALIGN(VMALLOC_END, SZ_256M) is always true then is_module_segment() always returns false. Use (ALIGN(VMALLOC_END, SZ_256M) - 1) which will have value 0xffffffff and will be suitable for the comparison. Fixes: c49643319715 ("powerpc/32s: Only leave NX unset on segments used for modules") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/09fc73fe9c7423c6b4cf93f93df9bb0ed8eefab5.1597994047.git.christophe.leroy@csgroup.eu
2020-08-21KVM: arm64: Print warning when cpu erratum can cause guests to deadlockRob Herring
If guests don't have certain CPU erratum workarounds implemented, then there is a possibility a guest can deadlock the system. IOW, only trusted guests should be used on systems with the erratum. This is the case for Cortex-A57 erratum 832075. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200803193127.3012242-2-robh@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-21arm64: Allow booting of late CPUs affected by erratum 1418040Marc Zyngier
As we can now switch from a system that isn't affected by 1418040 to a system that globally is affected, let's allow affected CPUs to come in at a later time. Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200731173824.107480-3-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-21arm64: Move handling of erratum 1418040 into C codeMarc Zyngier
Instead of dealing with erratum 1418040 on each entry and exit, let's move the handling to __switch_to() instead, which has several advantages: - It can be applied when it matters (switching between 32 and 64 bit tasks). - It is written in C (yay!) - It can rely on static keys rather than alternatives Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200731173824.107480-2-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-21MIPS: BMIPS: Also call bmips_cpu_setup() for secondary coresFlorian Fainelli
The initialization done by bmips_cpu_setup() typically affects both threads of a given core, on 7435 which supports 2 cores and 2 threads, logical CPU number 2 and 3 would not run this initialization. Fixes: 738a3f79027b ("MIPS: BMIPS: Add early CPU initialization code") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-08-21MIPS: mm: BMIPS5000 has inclusive physical cachesFlorian Fainelli
When the BMIPS generic cpu-feature-overrides.h file was introduced, cpu_has_inclusive_caches/MIPS_CPU_INCLUSIVE_CACHES was not set for BMIPS5000 CPUs. Correct this when we have initialized the MIPS secondary cache successfully. Fixes: f337967d6d87 ("MIPS: BMIPS: Add cpu-feature-overrides.h") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-08-20riscv: Add SiFive drivers to rv32_defconfigBin Meng
This adds SiFive drivers to rv32_defconfig, to keep in sync with the 64-bit config. This is useful when testing 32-bit kernel with QEMU 'sifive_u' 32-bit machine. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-08-20RISC-V: Remove CLINT related code from timer and archAnup Patel
Right now the RISC-V timer driver is convoluted to support: 1. Linux RISC-V S-mode (with MMU) where it will use TIME CSR for clocksource and SBI timer calls for clockevent device. 2. Linux RISC-V M-mode (without MMU) where it will use CLINT MMIO counter register for clocksource and CLINT MMIO compare register for clockevent device. We now have a separate CLINT timer driver which also provide CLINT based IPI operations so let's remove CLINT MMIO related code from arch/riscv directory and RISC-V timer driver. Signed-off-by: Anup Patel <anup.patel@wdc.com> Tested-by: Emil Renner Berhing <kernel@esmil.dk> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-08-20RISC-V: Add mechanism to provide custom IPI operationsAnup Patel
We add mechanism to set custom IPI operations so that CLINT driver from drivers directory can provide custom IPI operations. Signed-off-by: Anup Patel <anup.patel@wdc.com> Tested-by: Emil Renner Berhing <kernel@esmil.dk> Reviewed-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-08-20powerpc/pseries: Do not initiate shutdown when system is running on UPSVasant Hegde
As per PAPR we have to look for both EPOW sensor value and event modifier to identify the type of event and take appropriate action. In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes": SYSTEM_SHUTDOWN 3 The system must be shut down. An EPOW-aware OS logs the EPOW error log information, then schedules the system to be shut down to begin after an OS defined delay internal (default is 10 minutes.) Then in section 10.3.2.2.8 there is table 146 "Platform Event Log Format, Version 6, EPOW Section", which includes the "EPOW Event Modifier": For EPOW sensor value = 3 0x01 = Normal system shutdown with no additional delay 0x02 = Loss of utility power, system is running on UPS/Battery 0x03 = Loss of system critical functions, system should be shutdown 0x04 = Ambient temperature too high All other values = reserved We have a user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown after predefined time. It also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Commit 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of the "on UPS/Battery" case, to immediately shutdown the system. This breaks existing setups that rely on the userspace tool to delay shutdown and let the system run on the UPS. Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Massage change log and add PAPR references] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com
2020-08-20powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev
Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-20efi/x86: Move 32-bit code into efi_32.cArd Biesheuvel
Now that the old memmap code has been removed, some code that was left behind in arch/x86/platform/efi/efi.c is only used for 32-bit builds, which means it can live in efi_32.c as well. So move it over. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/x86: Mark kernel rodata non-executable for mixed modeArvind Sankar
When remapping the kernel rodata section RO in the EFI pagetables, the protection flags that were used for the text section are being reused, but the rodata section should not be marked executable. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200717194526.3452089-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20Fix build error when CONFIG_ACPI is not set/enabled:Randy Dunlap
../arch/x86/pci/xen.c: In function ‘pci_xen_init’: ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set(); Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: linux-pci@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-20powerpc/powernv/pci: Fix possible crash when releasing DMA resourcesFrederic Barrat
Fix a typo introduced during recent code cleanup, which could lead to silently not freeing resources or an oops message (on PCI hotplug or CAPI reset). Only impacts ioda2, the code path for ioda1 is correct. Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819130741.16769-1-fbarrat@linux.ibm.com
2020-08-19x86/boot/compressed: Use builtin mem functions for decompressorArvind Sankar
Since commits c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions") fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c") the decompressor stub has been using the compiler's builtin memcpy, memset and memcmp functions, _except_ where it would likely have the largest impact, in the decompression code itself. Remove the #undef's of memcpy and memset in misc.c so that the decompressor code also uses the compiler builtins. The rationale given in the comment doesn't really apply: just because some functions use the out-of-line version is no reason to not use the builtin version in the rest. Replace the comment with an explanation of why memzero and memmove are being #define'd. Drop the suggestion to #undef in boot/string.h as well: the out-of-line versions are not really optimized versions, they're generic code that's good enough for the preboot environment. The compiler will likely generate better code for constant-size memcpy/memset/memcmp if it is allowed to. Most decompressors' performance is unchanged, with the exception of LZ4 and 64-bit ZSTD. Before After ARCH LZ4 73ms 10ms 32 LZ4 120ms 10ms 64 ZSTD 90ms 74ms 64 Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels. Decompressor code size has small differences, with the largest being that 64-bit ZSTD decreases just over 2k. The largest code size increase was on 64-bit XZ, of about 400 bytes. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Nick Terrell <nickrterrell@gmail.com> Tested-by: Nick Terrell <nickrterrell@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-18powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU deathMichael Roth
For a power9 KVM guest with XIVE enabled, running a test loop where we hotplug 384 vcpus and then unplug them, the following traces can be seen (generally within a few loops) either from the unplugged vcpu: cpu 65 (hwid 65) Ready to die... Querying DEAD? cpu 66 (66) shows 2 list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ... CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1 NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000 ... NIP __list_del_entry_valid+0xac/0x100 LR __list_del_entry_valid+0xa8/0x100 Call Trace: __list_del_entry_valid+0xa8/0x100 (unreliable) free_pcppages_bulk+0x1f8/0x940 free_unref_page+0xd0/0x100 xive_spapr_cleanup_queue+0x148/0x1b0 xive_teardown_cpu+0x1bc/0x240 pseries_mach_cpu_die+0x78/0x2f0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x2f4/0x4c0 cpu_startup_entry+0x38/0x40 start_secondary+0x7bc/0x8f0 start_secondary_prolog+0x10/0x14 or on the worker thread handling the unplug: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 BUG: Bad page state in process kworker/u768:3 pfn:95de1 cpu 314 (hwid 314) Ready to die... page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0x5ffffc00000000() raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ... CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 Workqueue: pseries hotplug workque pseries_hp_work_fn Call Trace: dump_stack+0xb0/0xf4 (unreliable) bad_page+0x12c/0x1b0 free_pcppages_bulk+0x5bc/0x940 page_alloc_cpu_dead+0x118/0x120 cpuhp_invoke_callback.constprop.5+0xb8/0x760 _cpu_down+0x188/0x340 cpu_down+0x5c/0xa0 cpu_subsys_offline+0x24/0x40 device_offline+0xf0/0x130 dlpar_offline_cpu+0x1c4/0x2a0 dlpar_cpu_remove+0xb8/0x190 dlpar_cpu_remove_by_index+0x12c/0x150 dlpar_cpu+0x94/0x800 pseries_hp_work_fn+0x128/0x1e0 process_one_work+0x304/0x5d0 worker_thread+0xcc/0x7a0 kthread+0x1ac/0x1c0 ret_from_kernel_thread+0x5c/0x80 The latter trace is due to the following sequence: page_alloc_cpu_dead drain_pages drain_pages_zone free_pcppages_bulk where drain_pages() in this case is called under the assumption that the unplugged cpu is no longer executing. To ensure that is the case, and early call is made to __cpu_die()->pseries_cpu_die(), which runs a loop that waits for the cpu to reach a halted state by polling its status via query-cpu-stopped-state RTAS calls. It only polls for 25 iterations before giving up, however, and in the trace above this results in the following being printed only .1 seconds after the hotplug worker thread begins processing the unplug request: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 At that point the worker thread assumes the unplugged CPU is in some unknown/dead state and procedes with the cleanup, causing the race with the XIVE cleanup code executed by the unplugged CPU. Fix this by waiting indefinitely, but also making an effort to avoid spurious lockup messages by allowing for rescheduling after polling the CPU status and printing a warning if we wait for longer than 120s. Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> [mpe: Trim oopses in change log slightly for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
2020-08-18powerpc/32s: Fix is_module_segment() when MODULES_VADDR is definedChristophe Leroy
When MODULES_VADDR is defined, is_module_segment() shall check the address against it instead of checking agains VMALLOC_START. Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/07884ed033c31e074747b7eb8eaa329d15db07ec.1596641219.git.christophe.leroy@csgroup.eu
2020-08-18powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32Christophe Leroy
On BOOK3S_32, when we have modules and strict kernel RWX, modules are not in vmalloc space but in a dedicated segment that is below PAGE_OFFSET. So KASAN_SHADOW_START must take it into account. MODULES_VADDR can't be used because it is not defined yet in kasan.h Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6eddca2d5611fd57312a88eae31278c87a8fc99d.1596641224.git.christophe.leroy@csgroup.eu
2020-08-17kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE modeJim Mattson
See the SDM, volume 3, section 4.4.1: If PAE paging would be in use following an execution of MOV to CR0 or MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then the PDPTEs are loaded from the address in CR3. Fixes: b9baba8614890 ("KVM, pkeys: expose CPUID/CR4 to guest") Cc: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20200817181655.3716509-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>