git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-01-25	KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX	Paolo Bonzini
	VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"	Sean Christopherson
	Revert the dirty/available tracking of GPRs now that KVM copies the GPRs to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available GPRs"), tracking for GPRs noticeably impacts KVM's code footprint. This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest	Sean Christopherson
	Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration	Maxim Levitsky
	Even when we are outside the nested guest, some vmcs02 fields may not be in sync vs vmcs12. This is intentional, even across nested VM-exit, because the sync can be delayed until the nested hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those rarely accessed fields. However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare() before the vmcs12 contents are copied to userspace. Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	kvm: tracing: Fix unmatched kvm_entry and kvm_exit events	Lorenzo Brescia
	On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: x86: get smi pending status correctly	Jay Zhou
	The injection process of smi has two steps: Qemu KVM Step1: cpu->interrupt_request &= \ ~CPU_INTERRUPT_SMI; kvm_vcpu_ioctl(cpu, KVM_SMI) call kvm_vcpu_ioctl_smi() and kvm_make_request(KVM_REQ_SMI, vcpu); Step2: kvm_vcpu_ioctl(cpu, KVM_RUN, 0) call process_smi() if kvm_check_request(KVM_REQ_SMI, vcpu) is true, mark vcpu->arch.smi_pending = true; The vcpu->arch.smi_pending will be set true in step2, unfortunately if vcpu paused between step1 and step2, the kvm_run->immediate_exit will be set and vcpu has to exit to Qemu immediately during step2 before mark vcpu->arch.smi_pending true. During VM migration, Qemu will get the smi pending status from KVM using KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status will be lost. Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Shengen Zhuang <zhuangshengen@huawei.com> Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]	Like Xu
	The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as 0x0300 in the intel_perfmon_event_map[]. Correct its usage. Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2") Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()	Like Xu
	Since we know vPMU will not work properly when (1) the guest bit_width(s) of the [gp\|fixed] counters are greater than the host ones, or (2) guest requested architectural events exceeds the range supported by the host, so we can setup a smaller left shift value and refresh the guest cpuid entry, thus fixing the following UBSAN shift-out-of-bounds warning: shift exponent 197 is too large for 64-bit type 'long long unsigned int' Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	KVM: x86: Add more protection against undefined behavior in rsvd_bits()	Sean Christopherson
	Add compile-time asserts in rsvd_bits() to guard against KVM passing in garbage hardcoded values, and cap the upper bound at '63' for dynamic values to prevent generating a mask that would overflow a u64. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113204515.3473079-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25	Merge tag 'kvmarm-fixes-5.11-2' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #2 - Don't allow tagged pointers to point to memslots - Filter out ARMv8.1+ PMU events on v8.0 hardware - Hide PMU registers from userspace when no PMU is configured - More PMU cleanups - Don't try to handle broken PSCI firmware - More sys_reg() to reg_to_encoding() conversions
2021-01-26	arm64: dts: rockchip: Disable display for NanoPi R2S	Robin Murphy
	NanoPi R2S is headless, so rightly does not enable any of the display interface hardware, which currently provokes an obnoxious error in the boot log from the fake DRM device failing to find anything to bind to. It probably isn't too hard to obviate the fake device shenanigans entirely with a bit of driver reshuffling, but for now let's just disable it here to shut up the spurious error. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/c4553dfad1ad6792c4f22454c135ff55de77e2d6.1611186099.git.robin.murphy@arm.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2021-01-25	KVM: arm64: Don't clobber x4 in __do_hyp_init	Andrew Scull
	arm_smccc_1_1_hvc() only adds write contraints for x0-3 in the inline assembly for the HVC instruction so make sure those are the only registers that change when __do_hyp_init is called. Tested-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210125145415.122439-3-ascull@google.com
2021-01-25	ARM: dts: omap4-droid4: Fix lost keypad slide interrupts for droid4	Tony Lindgren
	We may lose edge interrupts for gpio banks other than the first gpio bank as they are not always powered. Instead, we must use the padconf interrupt as that is always powered. Note that we still also use the gpio for reading the pin state as that can't be done with the padconf device. Signed-off-by: Tony Lindgren <tony@atomide.com>
2021-01-24	Merge tag 'sh-for-5.11' of git://git.libc.org/linux-sh	Linus Torvalds
	Pull arch/sh updates from Rich Felker: "Cleanup and warning fixes" * tag 'sh-for-5.11' of git://git.libc.org/linux-sh: sh/intc: Restore devm_ioremap() alignment sh: mach-sh03: remove duplicate include arch: sh: remove duplicate include sh: Drop ARCH_NR_GPIOS definition sh: Remove unused HAVE_COPY_THREAD_TLS macro sh: remove CONFIG_IDE from most defconfig sh: mm: Convert to DEFINE_SHOW_ATTRIBUTE sh: intc: Convert to DEFINE_SHOW_ATTRIBUTE arch/sh: hyphenate Non-Uniform in Kconfig prompt sh: dma: fix kconfig dependency for G2_DMA
2021-01-24	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge misc fixes from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (pagealloc, memcg, kasan, memory-failure, and highmem), ubsan, proc, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add a couple more files to the Clang/LLVM section proc_sysctl: fix oops caused by incorrect command parameters powerpc/mm/highmem: use __set_pte_at() for kmap_local() mips/mm/highmem: use set_pte() for kmap_local() mm/highmem: prepare for overriding set_pte_at() sparc/mm/highmem: flush cache and TLB mm: fix page reference leak in soft_offline_page() ubsan: disable unsigned-overflow check for i386 kasan, mm: fix resetting page_alloc tags for HW_TAGS kasan, mm: fix conflicts with init_on_alloc/free kasan: fix HW_TAGS boot parameters kasan: fix incorrect arguments passing in kasan_add_zero_shadow kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow mm: fix numa stats for thp migration mm: memcg: fix memcg file_dirty numa stat mm: memcg/slab: optimize objcg stock draining mm: fix initialization of struct page for holes in memory layout x86/setup: don't remove E820_TYPE_RAM for pfn 0
2021-01-24	powerpc/mm/highmem: use __set_pte_at() for kmap_local()	Thomas Gleixner
	The original PowerPC highmem mapping function used __set_pte_at() to denote that the mapping is per CPU. This got lost with the conversion to the generic implementation. Override the default map function. Link: https://lkml.kernel.org/r/20210112170411.281464308@linutronix.de Fixes: 47da42b27a56 ("powerpc/mm/highmem: Switch to generic kmap atomic") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Cercueil <paul@crapouillou.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24	mips/mm/highmem: use set_pte() for kmap_local()	Thomas Gleixner
	set_pte_at() on MIPS invokes update_cache() which might recurse into kmap_local(). Use set_pte() like the original MIPS highmem implementation did. Link: https://lkml.kernel.org/r/20210112170411.187513575@linutronix.de Fixes: a4c33e83bca1 ("mips/mm/highmem: Switch to generic kmap atomic") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Paul Cercueil <paul@crapouillou.net> Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24	sparc/mm/highmem: flush cache and TLB	Thomas Gleixner
	Patch series "mm/highmem: Fix fallout from generic kmap_local conversions". The kmap_local conversion wreckaged sparc, mips and powerpc as it missed some of the details in the original implementation. This patch (of 4): The recent conversion to the generic kmap_local infrastructure failed to assign the proper pre/post map/unmap flush operations for sparc. Sparc requires cache flush before map/unmap and tlb flush afterwards. Link: https://lkml.kernel.org/r/20210112170136.078559026@linutronix.de Link: https://lkml.kernel.org/r/20210112170410.905976187@linutronix.de Fixes: 3293efa97807 ("sparc/mm/highmem: Switch to generic kmap atomic") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Andreas Larsson <andreas@gaisler.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Cercueil <paul@crapouillou.net> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24	Merge tag 'sched_urgent_for_v5.11_rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Correct the marking of kthreads which are supposed to run on a specific, single CPU vs such which are affine to only one CPU, mark per-cpu workqueue threads as such and make sure that marking "survives" CPU hotplug. Fix CPU hotplug issues with such kthreads. - A fix to not push away tasks on CPUs coming online. - Have workqueue CPU hotplug code use cpu_possible_mask when breaking affinity on CPU offlining so that pending workers can finish on newly arrived onlined CPUs too. - Dump tasks which haven't vacated a CPU which is currently being unplugged. - Register a special scale invariance callback which gets called on resume from RAM to read out APERF/MPERF after resume and thus make the schedutil scaling governor more precise. * tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Relax the set_cpus_allowed_ptr() semantics sched: Fix CPU hotplug / tighten is_per_cpu_kthread() sched: Prepare to use balance_push in ttwu() workqueue: Restrict affinity change to rescuer workqueue: Tag bound workers with KTHREAD_IS_PER_CPU kthread: Extract KTHREAD_IS_PER_CPU sched: Don't run cpu-online with balance_push() enabled workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity sched/core: Print out straggler tasks in sched_cpu_dying() x86: PM: Register syscore_ops for scale invariance
2021-01-24	Merge tag 'x86_urgent_for_v5.11_rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add a new Intel model number for Alder Lake - Differentiate which aspects of the FPU state get saved/restored when the FPU is used in-kernel and fix a boot crash on K7 due to early MXCSR access before CR4.OSFXSR is even set. - A couple of noinstr annotation fixes - Correct die ID setting on AMD for users of topology information which need the correct die ID - A SEV-ES fix to handle string port IO to/from kernel memory properly * tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add another Alder Lake CPU to the Intel family x86/mmx: Use KFPU_387 for MMX string operations x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state x86/topology: Make __max_die_per_package available unconditionally x86: __always_inline __{rd,wr}msr() x86/mce: Remove explicit/superfluous tracing locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP locking/lockdep: Cure noinstr fail x86/sev: Fix nonistr violation x86/entry: Fix noinstr fail x86/cpu/amd: Set __max_die_per_package on AMD x86/sev-es: Handle string port IO to kernel memory properly
2021-01-24	Merge tag 'powerpc-5.11-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bad interaction between the scv handling and the fallback L1D flush, which could lead to user register corruption. Only affects people using scv (~no one) on machines with old firmware that are missing the L1D flush. - Two small selftest fixes. Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das, and Tulio Magno Quites Machado Filho. * tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: fix scv entry fallback flush vs interrupt selftests/powerpc: Only test lwm/stmw on big endian selftests/powerpc: Fix exit status of pkey tests
2021-01-24	x86/setup: don't remove E820_TYPE_RAM for pfn 0	Mike Rapoport
	Patch series "mm: fix initialization of struct page for holes in memory layout", v3. Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues. Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the problem was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional discussion at [3]. [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org This patch (of 2): The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA. If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0. Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory. Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24	powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous ↵	Nicholas Piggin
	interrupt When an asynchronous interrupt calls irq_exit, it checks for softirqs that may have been created, and runs them. Running softirqs enables local irqs, which can replay pending interrupts causing recursion in replay_soft_interrupts. This abridged trace shows how this can occur: ! NIP replay_soft_interrupts LR interrupt_exit_kernel_prepare Call Trace: interrupt_exit_kernel_prepare (unreliable) interrupt_return --- interrupt: ea0 at __rb_reserve_next NIP __rb_reserve_next LR __rb_reserve_next Call Trace: ring_buffer_lock_reserve trace_function function_trace_call ftrace_call __do_softirq irq_exit timer_interrupt ! replay_soft_interrupts interrupt_exit_kernel_prepare interrupt_return --- interrupt: ea0 at arch_local_irq_restore This can not be prevented easily, because softirqs must not block hard irqs, so it has to be dealt with. The recursion is bounded by design in the softirq code because softirq replay disables softirqs and loops around again to check for new softirqs created while it ran, so that's not a problem. However it does mess up interrupt replay state, causing superfluous interrupts when the second replay_soft_interrupts clears a pending interrupt, leaving it still set in the first call in the 'happened' local variable. Fix this by not caching a copy of irqs_happened across interrupt handler calls. Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210123061244.2076145-1-npiggin@gmail.com
2021-01-22	riscv: Fixup pfn_valid error with wrong max_mapnr	Guo Ren
	The max_mapnr is the number of PFNs, not absolute PFN offset. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Fixes: d0d8aae64566 ("RISC-V: Set maximum number of mapped pages correctly") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-22	Merge tag 'imx-fixes-5.11-2' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.11, round 2: - Fix pcf2127 reset for imx7d-flex-concentrator board. - Fix i.MX6 suspend with Thumb-2 kernel. - Fix ethernet-phy address issue on imx6qdl-sr-som board. - Fix GPIO3 `gpio-ranges` on i.MX8MP. - Select SOC_BUS for IMX_SCU driver to fix build issue. * tag 'imx-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: firmware: imx: select SOC_BUS to fix firmware build arm64: dts: imx8mp: Correct the gpio ranges of gpio3 ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms ARM: imx: build suspend-imx6.S with arm instruction set ARM: dts: imx7d-flex-concentrator: fix pcf2127 reset Link: https://lore.kernel.org/r/20210119091949.GD4356@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-01-22	arm64: dts: broadcom: Fix USB DMA address translation for Stingray	Bharat Gooty
	Add a non-empty dma-ranges so that DMA address translation happens. Fixes: 2013a4b684b6 ("arm64: dts: broadcom: clear the warnings caused by empty dma-ranges") Signed-off-by: Bharat Gooty <bharat.gooty@broadcom.com> Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-01-22	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Correctly mask out bits 63:60 in a kernel tag check fault address (specified as unknown by the architecture). Previously they were just zeroed but for kernel pointers they need to be all ones. - Fix a panic (unexpected kernel BRK exception) caused by kprobes being reentered due to an interrupt. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kprobes: Fix Uexpected kernel BRK exception at EL1 kasan, arm64: fix pointer tags in KASAN reports
2021-01-22	arm64: kprobes: Fix Uexpected kernel BRK exception at EL1	Qais Yousef
	I was hitting the below panic continuously when attaching kprobes to scheduler functions [ 159.045212] Unexpected kernel BRK exception at EL1 [ 159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP [ 159.059954] Modules linked in: [ 159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56 [rt-app] <notice> [1] Exiting.[ 159.071166] Hardware name: ARM Juno development board (r2) (DT) [ 159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--) [ 159.085723] pc : 0xffff80001624501c [ 159.089377] lr : attach_entity_load_avg+0x2ac/0x350 [ 159.094271] sp : ffff80001622b640 [rt-app] <notice> [0] Exiting.[ 159.097591] x29: ffff80001622b640 x28: 0000000000000001 [ 159.105515] x27: 0000000000000049 x26: ffff000800b79980 [ 159.110847] x25: ffff00097ef37840 x24: 0000000000000000 [ 159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90 [ 159.121663] x21: ffff00097ef37700 x20: ffff800010119170 [rt-app] <notice> [11] Exiting.[ 159.126995] x19: ffff00097ef37840 x18: 000000000000000e [ 159.135003] x17: 0000000000000001 x16: 0000000000000019 [ 159.140335] x15: 0000000000000000 x14: 0000000000000000 [ 159.145666] x13: 0000000000000002 x12: 0000000000000002 [ 159.150996] x11: ffff80001592f9f0 x10: 0000000000000060 [ 159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1 [ 159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000 [ 159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000 [ 159.172321] x3 : ffff000800328948 x2 : 0000000000000000 [ 159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90 [ 159.182983] Call trace: [ 159.185433] 0xffff80001624501c [ 159.188581] update_load_avg+0x2d0/0x778 [ 159.192516] enqueue_task_fair+0x134/0xe20 [ 159.196625] enqueue_task+0x4c/0x2c8 [ 159.200211] ttwu_do_activate+0x70/0x138 [ 159.204147] sched_ttwu_pending+0xbc/0x160 [ 159.208253] flush_smp_call_function_queue+0x16c/0x320 [ 159.213408] generic_smp_call_function_single_interrupt+0x1c/0x28 [ 159.219521] ipi_handler+0x1e8/0x3c8 [ 159.223106] handle_percpu_devid_irq+0xd8/0x460 [ 159.227650] generic_handle_irq+0x38/0x50 [ 159.231672] __handle_domain_irq+0x6c/0xc8 [ 159.235781] gic_handle_irq+0xcc/0xf0 [ 159.239452] el1_irq+0xb4/0x180 [ 159.242600] rcu_is_watching+0x28/0x70 [ 159.246359] rcu_read_lock_held_common+0x44/0x88 [ 159.250991] rcu_read_lock_any_held+0x30/0xc0 [ 159.255360] kretprobe_dispatcher+0xc4/0xf0 [ 159.259555] __kretprobe_trampoline_handler+0xc0/0x150 [ 159.264710] trampoline_probe_handler+0x38/0x58 [ 159.269255] kretprobe_trampoline+0x70/0xc4 [ 159.273450] run_rebalance_domains+0x54/0x80 [ 159.277734] __do_softirq+0x164/0x684 [ 159.281406] irq_exit+0x198/0x1b8 [ 159.284731] __handle_domain_irq+0x70/0xc8 [ 159.288840] gic_handle_irq+0xb0/0xf0 [ 159.292510] el1_irq+0xb4/0x180 [ 159.295658] arch_cpu_idle+0x18/0x28 [ 159.299245] default_idle_call+0x9c/0x3e8 [ 159.303265] do_idle+0x25c/0x2a8 [ 159.306502] cpu_startup_entry+0x2c/0x78 [ 159.310436] secondary_start_kernel+0x160/0x198 [ 159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0) After a bit of head scratching and debugging it turned out that it is due to kprobe handler being interrupted by a tick that causes us to go into (I think another) kprobe handler. The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR which leads to the Unexpected kernel BRK exception. Reverting commit ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx") seemed to fix the problem for me. Further analysis showed that kcb->kprobe_status is set to KPROBE_REENTER when the error occurs. By teaching kprobe_breakpoint_ss_handler() to handle this status I can no longer reproduce the problem. Fixes: ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-01-21	Merge tag 'for-linus' of git://github.com/openrisc/linux	Linus Torvalds
	Pull OpenRISC fixes from Stafford Horne: - Compiler warning fixup for new Litex SoC driver - Sparse warning fixup for iounmap * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: io: Add missing __iomem annotation to iounmap() soc: litex: Fix compile warning when device tree is not configured
2021-01-21	x86/cpu: Add another Alder Lake CPU to the Intel family	Gayatri Kammela
	Add Alder Lake mobile CPU model number to Intel family. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210121215004.11618-1-tony.luck@intel.com
2021-01-21	x86/mmx: Use KFPU_387 for MMX string operations	Andy Lutomirski
	The default kernel_fpu_begin() doesn't work on systems that support XMM but haven't yet enabled CR4.OSFXSR. This causes crashes when _mmx_memcpy() is called too early because LDMXCSR generates #UD when the aforementioned bit is clear. Fix it by using kernel_fpu_begin_mask(KFPU_387) explicitly. Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()") Reported-by: Krzysztof Mazur <krzysiek@podlesie.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl> Tested-by: Krzysztof Mazur <krzysiek@podlesie.net> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/e7bf21855fe99e5f3baa27446e32623358f69e8d.1611205691.git.luto@kernel.org
2021-01-21	x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state	Andy Lutomirski
	Currently, requesting kernel FPU access doesn't distinguish which parts of the extended ("FPU") state are needed. This is nice for simplicity, but there are a few cases in which it's suboptimal: - The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do not use legacy 387 state. These users want MXCSR initialized but don't care about the FPU control word. Skipping FNINIT would save time. (Empirically, FNINIT is several times slower than LDMXCSR.) - Code that wants MMX doesn't want or need MXCSR initialized. _mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and initializing MXCSR will fail because LDMXCSR generates an #UD when the aforementioned CR4 bit is not set. - Any future in-kernel users of XFD (eXtended Feature Disable)-capable dynamic states will need special handling. Add a more specific API that allows callers to specify exactly what they want. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl> Link: https://lkml.kernel.org/r/aff1cac8b8fc7ee900cf73e8f2369966621b053f.1611205691.git.luto@kernel.org
2021-01-21	KVM: arm64: Filter out v8.1+ events on v8.0 HW	Marc Zyngier
	When running on v8.0 HW, make sure we don't try to advertise events in the 0x4000-0x403f range. Cc: stable@vger.kernel.org Fixes: 88865beca9062 ("KVM: arm64: Mask out filtered events in PCMEID{0,1}_EL1") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210121105636.1478491-1-maz@kernel.org
2021-01-21	KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag	Steven Price
	KASAN in HW_TAGS mode will store MTE tags in the top byte of the pointer. When computing the offset for TPIDR_EL2 we don't want anything in the top byte, so remove the tag to ensure the computation is correct no matter what the tag. Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: Steven Price <steven.price@arm.com> [maz: added comment] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210108161254.53674-1-steven.price@arm.com
2021-01-20	Merge tag 'for-linus-5.11-rc5-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A fix for build failure showing up in some configurations" * tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix 'nopvspin' build error
2021-01-20	powerpc/64s: fix scv entry fallback flush vs interrupt	Nicholas Piggin
	The L1D flush fallback functions are not recoverable vs interrupts, yet the scv entry flush runs with MSR[EE]=1. This can result in a timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting the EXRFI save area, which ends up corrupting userspace registers for scv return. Fix this by disabling RI and EE for the scv entry fallback flush. Fixes: f79643787e0a0 ("powerpc/64s: flush L1D on kernel entry") Cc: stable@vger.kernel.org # 5.9+ which also have flush L1D patch backport Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210111062408.287092-1-npiggin@gmail.com
2021-01-20	openrisc: io: Add missing __iomem annotation to iounmap()	Geert Uytterhoeven
	With C=1: drivers/soc/renesas/rmobile-sysc.c:330:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected void addr @@ got void [noderef] __iomem [assigned] base @@ drivers/soc/renesas/rmobile-sysc.c:330:33: sparse: expected void addr drivers/soc/renesas/rmobile-sysc.c:330:33: sparse: got void [noderef] __iomem [assigned] base Fix this by adding the missing __iomem annotation to iounmap(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Stafford Horne <shorne@gmail.com>
2021-01-19	Merge tag 'hyperv-fixes-signed-20210119' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "One patch from Dexuan to fix clockevent initialization" * tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Initialize clockevents after LAPIC is initialized
2021-01-19	x86: PM: Register syscore_ops for scale invariance	Rafael J. Wysocki
	On x86 scale invariace tends to be disabled during resume from suspend-to-RAM, because the MPERF or APERF MSR values are not as expected then due to updates taking place after the platform firmware has been invoked to complete the suspend transition. That, of course, is not desirable, especially if the schedutil scaling governor is in use, because the lack of scale invariance causes it to be less reliable. To counter that effect, modify init_freq_invariance() to register a syscore_ops object for scale invariance with the ->resume callback pointing to init_counter_refs() which will run on the CPU starting the resume transition (the other CPUs will be taken care of the "online" operations taking place later). Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz> Link: https://lkml.kernel.org/r/1803209.Mvru99baaF@kreacher
2021-01-19	x86/entry: Remove put_ret_addr_in_rdi THUNK macro argument	Borislav Petkov
	That logic is unused since 320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft") Remove it. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/YAAszZJ2GcIYZmB5@hirez.programming.kicks-ass.net
2021-01-18	Merge tag 'fixes-2021-01-18' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull ia64 build fix from Mike Rapoport: "Fix an ia64 build failure caused by memory model changes" * tag 'fixes-2021-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: ia64: fix build failure caused by memory model changes
2021-01-18	kasan, arm64: fix pointer tags in KASAN reports	Andrey Konovalov
	As of the "arm64: expose FAR_EL1 tag bits in siginfo" patch, the address that is passed to report_tag_fault has pointer tags in the format of 0x0X, while KASAN uses 0xFX format (note the difference in the top 4 bits). Fix up the pointer tag for kernel pointers in do_tag_check_fault by setting them to the same value as bit 55. Explicitly use __untagged_addr() instead of untagged_addr(), as the latter doesn't affect TTBR1 addresses. Fixes: dceec3ff7807 ("arm64: expose FAR_EL1 tag bits in siginfo") Fixes: 4291e9ee6189 ("kasan, arm64: print report from tag fault handler") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://linux-review.googlesource.com/id/I9ced973866036d8679e8f4ae325de547eb969649 Link: https://lore.kernel.org/r/ff30b0afe6005fd046f9ac72bfb71822aedccd89.1610731872.git.andreyknvl@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-01-18	arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node	Johan Jonker
	A test with the command below gives this error: /arch/arm64/boot/dts/rockchip/rk3399-evb.dt.yaml: video-codec@ff660000: 'interrupt-names' does not match any of the regexes: 'pinctrl-[0-9]+' The rkvdec driver gets it irq with help of the platform_get_irq() function, so remove the interrupt-names property from the rk3399 vdec node. make ARCH=arm64 dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/ media/rockchip,vdec.yaml Signed-off-by: Johan Jonker <jbx6244@gmail.com> Link: https://lore.kernel.org/r/20210117181653.24886-1-jbx6244@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2021-01-18	arm64: dts: imx8mp: Correct the gpio ranges of gpio3	Jacky Bai
	On i.MX8MP, The GPIO3's secondary gpio-ranges's 'gpio controller offset' cell value should be 26, so correct it. Signed-off-by: Jacky Bai <ping.bai@nxp.com> Fixes: 6d9b8d20431f ("arm64: dts: freescale: Add i.MX8MP dtsi support") Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-01-18	ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms	Russell King
	The PHY address bit 2 is configured by the LED pin. Attaching a LED to this pin is not sufficient to guarantee this configuration pin is correctly read. This leads to some platforms having their PHY at address 0 and others at address 4. If there is no phy-handle specified, the FEC driver will scan the PHY bus for a PHY and use that. Consequently, adding the DT configuration of the PHY and the phy properties to the FEC driver broke some boards. Fix this by removing the phy-handle property, and listing two PHY entries for both possible PHY addresses, so that the DT configuration for the PHY can be found by the PHY driver. Fixes: 86b08bd5b994 ("ARM: dts: imx6-sr-som: add ethernet PHY configuration") Reported-by: Christoph Mattheis <christoph.mattheis@arcor.de> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-01-18	ARM: imx: build suspend-imx6.S with arm instruction set	Max Krummenacher
	When the kernel is configured to use the Thumb-2 instruction set "suspend-to-memory" fails to resume. Observed on a Colibri iMX6ULL (i.MX 6ULL) and Apalis iMX6 (i.MX 6Q). It looks like the CPU resumes unconditionally in ARM instruction mode and then chokes on the presented Thumb-2 code it should execute. Fix this by using the arm instruction set for all code in suspend-imx6.S. Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com> Fixes: df595746fa69 ("ARM: imx: add suspend in ocram support for i.mx6q") Acked-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-01-18	ARM: dts: imx7d-flex-concentrator: fix pcf2127 reset	Bruno Thomsen
	RTC pcf2127 device driver has changed default behaviour of the watchdog feature in v5.11-rc1. Now you need to explicitly enable it with a device tree property, "reset-source", when used in the board design. Fixes: 71ac13457d9d ("rtc: pcf2127: only use watchdog when explicitly available") Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com> Cc: Bruno Thomsen <bth@kamstrup.com> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-01-18	x86/xen: fix 'nopvspin' build error	Randy Dunlap
	Fix build error in x86/xen/ when PARAVIRT_SPINLOCKS is not enabled. Fixes this build error: ../arch/x86/xen/smp_hvm.c: In function ‘xen_hvm_smp_init’: ../arch/x86/xen/smp_hvm.c:77:3: error: ‘nopvspin’ undeclared (first use in this function) nopvspin = true; Fixes: 3d7746bea925 ("x86/xen: Fix xen_hvm_smp_init() when vector callback not available") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210115191123.27572-1-rdunlap@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-17	Merge tag 'powerpc-5.11-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a lack of alignment in our linker script, that can lead to crashes depending on configuration etc. One fix for the 32-bit VDSO after the C VDSO conversion. Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy" * tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Fix clock_gettime_fallback for vdso32 powerpc: Fix alignment bug within the init sections
2021-01-17	x86/hyperv: Initialize clockevents after LAPIC is initialized	Dexuan Cui
	With commit 4df4cb9e99f8, the Hyper-V direct-mode STIMER is actually initialized before LAPIC is initialized: see apic_intr_mode_init() x86_platform.apic_post_init() hyperv_init() hv_stimer_alloc() apic_bsp_setup() setup_local_APIC() setup_local_APIC() temporarily disables LAPIC, initializes it and re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's registered, it can be programmed immediately and the timer can fire very soon: hv_stimer_init clockevents_config_and_register clockevents_register_device tick_check_new_device tick_setup_device tick_setup_periodic(), tick_setup_oneshot() clockevents_program_event When the timer fires in the hypervisor, if the LAPIC is in the disabled state, new versions of Hyper-V ignore the event and don't inject the timer interrupt into the VM, and hence the VM hangs when it boots. Note: when the VM starts/reboots, the LAPIC is pre-enabled by the firmware, so the window of LAPIC being temporarily disabled is pretty small, and the issue can only happen once out of 100~200 reboots for a 40-vCPU VM on one dev host, and on another host the issue doesn't reproduce after 2000 reboots. The issue is more noticeable for kdump/kexec, because the LAPIC is disabled by the first kernel, and stays disabled until the kdump/kexec kernel enables it. This is especially an issue to a Generation-2 VM (for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000 (rather than CONFIG_HZ=250) is used. Fix the issue by moving hv_stimer_alloc() to a later place where the LAPIC timer is initialized. Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>