summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2024-09-25x86/pvh: Set phys_base when calling xen_prepare_pvh()Jason Andryuk
phys_base needs to be set for __pa() to work in xen_pvh_init() when finding the hypercall page. Set it before calling into xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to avoid __startup_64() adding to it and creating an incorrect value. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240823193630.2583107-4-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25x86/pvh: Make PVH entrypoint PIC for x86-64Jason Andryuk
The PVH entrypoint is 32bit non-PIC code running the uncompressed vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000 (16MB). The kernel is loaded at that physical address inside the VM by the VMM software (Xen/QEMU). When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1 into the PVH container. There exist system firmwares (Coreboot/EDK2) with reserved memory at 16MB. This creates a conflict where the PVH kernel cannot be loaded at that address. Modify the PVH entrypoint to be position-indepedent to allow flexibility in load address. Only the 64bit entry path is converted. A 32bit kernel is not PIC, so calling into other parts of the kernel, like xen_prepare_pvh() and mk_pgtable_32(), don't work properly when relocated. This makes the code PIC, but the page tables need to be updated as well to handle running from the kernel high map. The UNWIND_HINT_END_OF_STACK is to silence: vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction after the lret into 64bit code. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240823193630.2583107-3-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25xen/pvh: Setup gsi for passthrough deviceJiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assigning a device to passthrough, proactively setup the gsi of the device during that process. Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-ID: <20240924061437.2636766-3-Jiqian.Chen@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()Alexey Dobriyan
If this memset() is not inlined than PVH early boot code can call into KASAN-instrumented memset() which results in unbootable VMs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-3-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()Alexey Dobriyan
If this memcmp() is not inlined then PVH early boot code can call into KASAN-instrumented memcmp() which results in unbootable VMs: pvh_start_xen xen_prepare_pvh xen_cpuid_base hypervisor_cpuid_base memcmp Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-2-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)Alexey Dobriyan
Uninstrument arch/x86/platform/pvh/enlighten.c: KASAN has not been setup _this_ early in the boot process. Steps to reproduce: make allnoconfig make sure CONFIG_AMD_MEM_ENCRYPT is disabled AMD_MEM_ENCRYPT independently uninstruments lib/string.o so PVH boot code calls into uninstrumented memset() and memcmp() which can make the bug disappear depending on the compiler. enable CONFIG_PVH enable CONFIG_KASAN enable serial console this is fun exercise if you never done it from nothing :^) make qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp cpus=1 \ -m 4096 \ -serial stdio \ -kernel vmlinux \ -append 'console=ttyS0 ignore_loglevel' Messages on serial console will easily tell OK kernel from unbootable kernel. In bad case qemu hangs in an infinite loop stroboscoping "SeaBIOS" message. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-1-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen: tolerate ACPI NVS memory overlapping with Xen allocated memoryJuergen Gross
In order to minimize required special handling for running as Xen PV dom0, the memory layout is modified to match that of the host. This requires to have only RAM at the locations where Xen allocated memory is living. Unfortunately there seem to be some machines, where ACPI NVS is located at 64 MB, resulting in a conflict with the loaded kernel or the initial page tables built by Xen. Avoid this conflict by swapping the ACPI NVS area in the memory map with unused RAM. This is possible via modification of the dom0 P2M map. Accesses to the ACPI NVS area are done either for saving and restoring it across suspend operations (this will work the same way as before), or by ACPI code when NVS memory is referenced from other ACPI tables. The latter case is handled by a Xen specific indirection of acpi_os_ioremap(). While the E820 map can (and should) be modified right away, the P2M map can be updated only after memory allocation is working, as the P2M map might need to be extended. Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen: allow mapping ACPI data using a different physical addressJuergen Gross
When running as a Xen PV dom0 the system needs to map ACPI data of the host using host physical addresses, while those addresses can conflict with the guest physical addresses of the loaded linux kernel. The same problem might apply in case a PV guest is configured to use the host memory map. This conflict can be solved by mapping the ACPI data to a different guest physical address, but mapping the data via acpi_os_ioremap() must still be possible using the host physical address, as this address might be generated by AML when referencing some of the ACPI data. When configured to support running as a Xen PV domain, have an implementation of acpi_os_ioremap() being aware of the possibility to need above mentioned translation of a host physical address to the guest physical address. This modification requires to #include linux/acpi.h in some sources which need to include asm/acpi.h directly. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10xen: add capability to remap non-RAM pages to different PFNsJuergen Gross
When running as a Xen PV dom0 it can happen that the kernel is being loaded to a guest physical address conflicting with the host memory map. In order to be able to resolve this conflict, add the capability to remap non-RAM areas to different guest PFNs. A function to use this remapping information for other purposes than doing the remap will be added when needed. As the number of conflicts should be rather low (currently only machines with max. 1 conflict are known), save the remap data in a small statically allocated array. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10xen: move max_pfn in xen_memory_setup() out of function scopeJuergen Gross
Instead of having max_pfn as a local variable of xen_memory_setup(), make it a static variable in setup.c instead. This avoids having to pass it to subfunctions, which will be needed in more cases in future. Rename it to ini_nr_pages, as the value denotes the currently usable number of memory pages as passed from the hypervisor at boot time. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10xen: move checks for e820 conflicts further upJuergen Gross
Move the checks for e820 memory map conflicts using the xen_chk_is_e820_usable() helper further up in order to prepare resolving some of the possible conflicts by doing some e820 map modifications, which must happen before evaluating the RAM layout. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10xen: introduce generic helper checking for memory map conflictsJuergen Gross
When booting as a Xen PV dom0 the memory layout of the dom0 is modified to match that of the host, as this requires less changes in the kernel for supporting Xen. There are some cases, though, which are problematic, as it is the Xen hypervisor selecting the kernel's load address plus some other data, which might conflict with the host's memory map. These conflicts are detected at boot time and result in a boot error. In order to support handling at least some of these conflicts in future, introduce a generic helper function which will later gain the ability to adapt the memory layout when possible. Add the missing check for the xen_start_info area. Note that possible p2m map and initrd memory conflicts are handled already by copying the data to memory areas not conflicting with the memory map. The initial stack allocated by Xen doesn't need to be checked, as early boot code is switching to the statically allocated initial kernel stack. Initial page tables and the kernel itself will be handled later. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-10xen: use correct end address of kernel for conflict checkingJuergen Gross
When running as a Xen PV dom0 the kernel is loaded by the hypervisor using a different memory map than that of the host. In order to minimize the required changes in the kernel, the kernel adapts its memory map to that of the host. In order to do that it is checking for conflicts of its load address with the host memory map. Unfortunately the tested memory range does not include the .brk area, which might result in crashes or memory corruption when this area does conflict with the memory map of the host. Fix the test by using the _end label instead of __bss_stop. Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-06KVM: x86: don't fall through case statements without annotationsLinus Torvalds
clang warns on this because it has an unannotated fall-through between cases: arch/x86/kvm/x86.c:4819:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] and while we could annotate it as a fallthrough, the proper fix is to just add the break for this case, instead of falling through to the default case and the break there. gcc also has that warning, but it looks like gcc only warns for the cases where they fall through to "real code", rather than to just a break. Odd. Fixes: d30d9ee94cc0 ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix the arm64 usage of ftrace_graph_ret_addr() to pass the &state->graph_idx pointer instead of NULL, otherwise this function just returns early" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: stacktrace: fix the usage of ftrace_graph_ret_addr()
2024-09-06Merge tag 'riscv-for-linus-6.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A revert for the mmap() change that ties the allocation range to the hint adress, as what we tried to do ended up regressing on other userspace workloads. - A fix to avoid a kernel memory leak when emulating misaligned accesses from userspace. - A Kconfig fix for toolchain vector detection, which now correctly detects vector support on toolchains where the V extension depends on the M extension. - A fix to avoid failing the linear mapping bootmem bounds check on NOMMU systems. - A fix for early alternatives on relocatable kernels. * tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix RISCV_ALTERNATIVE_EARLY riscv: Do not restrict memory size because of linear mapping on nommu riscv: Fix toolchain vector detection riscv: misaligned: Restrict user access to kernel memory riscv: mm: Do not restrict mmap address based on hint riscv: selftests: Remove mmap hint address checks Revert "RISC-V: mm: Document mmap changes"
2024-09-06Merge tag 'powerpc-6.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a deadlock in the powerpc qspinlock MCS queue logic - Fix the return type of pgd_val() to not truncate 64-bit PTEs on 85xx - Allow the check for dynamic relocations in the VDSO to work correctly - Make mmu_pte_psize static to fix a build error Thanks to Christophe Leroy, Nysal Jan K.A., Nicholas Piggin, Geetika Moolchandani, Jijo Varghese, and Vaishnavi Bhat. * tag 'powerpc-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/qspinlock: Fix deadlock in MCS queue powerpc/mm: Fix return type of pgd_val() powerpc/vdso: Don't discard rela sections powerpc/64e: Define mmu_pte_psize static
2024-09-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull x86 kvm fixes from Paolo Bonzini: "Many small fixes that accumulated while I was on vacation... - Fixup missed comments from the REMOVED_SPTE => FROZEN_SPTE rename - Ensure a root is successfully loaded when pre-faulting SPTEs - Grab kvm->srcu when handling KVM_SET_VCPU_EVENTS to guard against accessing memslots if toggling SMM happens to force a VM-Exit - Emulate MSR_{FS,GS}_BASE on SVM even though interception is always disabled, so that KVM does the right thing if KVM's emulator encounters {RD,WR}MSR - Explicitly clear BUS_LOCK_DETECT from KVM's caps on AMD, as KVM doesn't yet virtualize BUS_LOCK_DETECT on AMD - Cleanup the help message for CONFIG_KVM_AMD_SEV, and call out that KVM now supports SEV-SNP too - Specialize return value of KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM), based on VM type - Remove unnecessary dependency on CONFIG_HIGH_RES_TIMERS - Note an RCU quiescent state on guest exit. This avoids a call to rcu_core() if there was a grace period request while guest was running" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Remove HIGH_RES_TIMERS dependency kvm: Note an RCU quiescent state on guest exit KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM KVM: SEV: Update KVM_AMD_SEV Kconfig entry and mention SEV-SNP KVM: SVM: Don't advertise Bus Lock Detect to guest if SVM support is missing KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE KVM: x86: Acquire kvm->srcu when handling KVM_SET_VCPU_EVENTS KVM: x86/mmu: Check that root is valid/loaded when pre-faulting SPTEs KVM: x86/mmu: Fixup comments missed by the REMOVED_SPTE=>FROZEN_SPTE rename
2024-09-05KVM: Remove HIGH_RES_TIMERS dependencySteven Rostedt
Commit 92b5265d38f6a ("KVM: Depend on HIGH_RES_TIMERS") added a dependency to high resolution timers with the comment: KVM lapic timer and tsc deadline timer based on hrtimer, setting a leftmost node to rb tree and then do hrtimer reprogram. If hrtimer not configured as high resolution, hrtimer_enqueue_reprogram do nothing and then make kvm lapic timer and tsc deadline timer fail. That was back in 2012, where hrtimer_start_range_ns() would do the reprogramming with hrtimer_enqueue_reprogram(). But as that was a nop with high resolution timers disabled, this did not work. But a lot has changed in the last 12 years. For example, commit 49a2a07514a3a ("hrtimer: Kick lowres dynticks targets on timer enqueue") modifies __hrtimer_start_range_ns() to work with low res timers. There's been lots of other changes that make low res work. ChromeOS has tested this before as well, and it hasn't seen any issues with running KVM with high res timers disabled. There could be problems, especially at low HZ, for guests that do not support kvmclock and rely on precise delivery of periodic timers to keep their clock running. This can be the APIC timer (provided by the kernel), the RTC (provided by userspace), or the i8254 (choice of kernel/userspace). These guests are few and far between these days, and in the case of the APIC timer + Intel hosts we can use the preemption timer (which is TSC-based and has better latency _and_ accuracy). In KVM, only x86 is requiring CONFIG_HIGH_RES_TIMERS; perhaps a "depends on HIGH_RES_TIMERS || EXPERT" could be added to virt/kvm, or a pr_warn could be added to kvm_init if HIGH_RES_TIMERS are not enabled. But in general, it seems that there must be other code in the kernel (maybe sound/?) that is relying on having high-enough HZ or hrtimers but that's not documented anywhere. Whenever you disable it you probably need to know what you're doing and what your workload is; so the dependency is not particularly interesting, and we can just remove it. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Message-ID: <20240821095127.45d17b19@gandalf.local.home> [Added the last two paragraphs to the commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-05arm64: stacktrace: fix the usage of ftrace_graph_ret_addr()Puranjay Mohan
ftrace_graph_ret_addr() takes an 'idx' integer pointer that is used to optimize the stack unwinding process. arm64 currently passes `NULL` for this parameter which stops it from utilizing these optimizations. Further, the current code for ftrace_graph_ret_addr() will just return the passed in return address if it is NULL which will break this usage. Pass a valid integer pointer to ftrace_graph_ret_addr() similar to x86_64's stack unwinder. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Fixes: 29c1c24a2707 ("function_graph: Fix up ftrace_graph_ret_addr()") Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240618162342.28275-1-puranjay@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-09-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linuxLinus Torvalds
Pull ARM fix from Russell King: - Fix a build issue with older binutils with LD dead code elimination disabled * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION
2024-09-04Merge tag 'parisc-for-6.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fix from Helge Deller: - Fix boot issue where boot memory is marked read-only too early * tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Delay write-protection until mark_rodata_ro() call
2024-09-04ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATIONYuntao Liu
There is a build issue with LD segmentation fault, while CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled, as bellow. scripts/link-vmlinux.sh: line 49: 3796 Segmentation fault (core dumped) ${ld} ${ldflags} -o ${output} ${wl}--whole-archive ${objs} ${wl}--no-whole-archive ${wl}--start-group ${libs} ${wl}--end-group ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs} The error occurs in older versions of the GNU ld with version earlier than 2.36. It makes most sense to have a minimum LD version as a dependency for HAVE_LD_DEAD_CODE_DATA_ELIMINATION and eliminate the impact of ".reloc .text, R_ARM_NONE, ." when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled. Fixes: ed0f94102251 ("ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION") Reported-by: Harith George <mail2hgg@gmail.com> Tested-by: Harith George <mail2hgg@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Link: https://lore.kernel.org/all/14e9aefb-88d1-4eee-8288-ef15d4a9b059@gmail.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-09-03riscv: Fix RISCV_ALTERNATIVE_EARLYAlexandre Ghiti
RISCV_ALTERNATIVE_EARLY will issue sbi_ecall() very early in the boot process, before the first memory mapping is setup so we can't have any instrumentation happening here. In addition, when the kernel is relocatable, we must also not issue any relocation this early since they would have been patched virtually only. So, instead of disabling instrumentation for the whole kernel/sbi.c file and compiling it with -fno-pie, simply move __sbi_ecall() and __sbi_base_ecall() into their own file where this is fixed. Reported-by: Conor Dooley <conor.dooley@microchip.com> Closes: https://lore.kernel.org/linux-riscv/20240813-pony-truck-3e7a83e9759e@spud/ Reported-by: syzbot+cfbcb82adf6d7279fd35@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-riscv/00000000000065062c061fcec37b@google.com/ Fixes: 1745cfafebdf ("riscv: don't use global static vars to store alternative data") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240829165048.49756-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-03riscv: Do not restrict memory size because of linear mapping on nommuAlexandre Ghiti
It makes no sense to restrict physical memory size because of linear mapping size constraints when there is no linear mapping, so only do that when mmu is enabled. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-riscv/CAMuHMdW0bnJt5GMRtOZGkTiM7GK4UaLJCDMF_Ouq++fnDKi3_A@mail.gmail.com/ Fixes: 3b6564427aea ("riscv: Fix linear mapping checks for non-contiguous memory regions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240827065230.145021-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-03riscv: Fix toolchain vector detectionAnton Blanchard
A recent change to gcc flags rv64iv as no longer valid: cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension and as a result vector support is disabled. Fix this by adding m to our toolchain vector detection code. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Fixes: fa8e7cce55da ("riscv: Enable Vector code to be built") Link: https://lore.kernel.org/r/20240819001131.1738806-1-antonb@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-03parisc: Delay write-protection until mark_rodata_ro() callHelge Deller
Do not write-protect the kernel read-only and __ro_after_init sections earlier than before mark_rodata_ro() is called. This fixes a boot issue on parisc which is triggered by commit 91a1d97ef482 ("jump_label,module: Don't alloc static_key_mod for __ro_after_init keys"). That commit may modify static key contents in the __ro_after_init section at bootup, so this section needs to be writable at least until mark_rodata_ro() is called. Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Link: https://lore.kernel.org/linux-parisc/096cad5aada514255cd7b0b9dbafc768@matoro.tk/#r Fixes: 91a1d97ef482 ("jump_label,module: Don't alloc static_key_mod for __ro_after_init keys") Cc: stable@vger.kernel.org # v6.10+
2024-09-02KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VMTom Dohrmann
Until recently, KVM_CAP_READONLY_MEM was unconditionally supported on x86, but this is no longer the case for SEV-ES and SEV-SNP VMs. When KVM_CHECK_EXTENSION is invoked on a VM, only advertise KVM_CAP_READONLY_MEM when it's actually supported. Fixes: 66155de93bcf ("KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)") Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael Roth <michael.roth@amd.com> Signed-off-by: Tom Dohrmann <erbse.13@gmx.de> Message-ID: <20240902144219.3716974-1-erbse.13@gmx.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-01Merge tag 'x86-urgent-2024-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Reorder the logic so it actually works correctly - The XSTATE logic for handling LBR is incorrect as it assumes that XSAVES supports LBR when the CPU supports LBR. In fact both conditions need to be true. Otherwise the enablement of LBR in the IA32_XSS MSR fails and subsequently the machine crashes on the next XRSTORS operation because IA32_XSS is not initialized. Cache the XSTATE support bit during init and make the related functions use this cached information and the LBR CPU feature bit to cure this. - Cure a long standing bug in KASLR KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of an initialized variabled on the stack to the VMM. The variable is only required as output value, so it does not have to exposed to the VMM in the first place. - Prevent an array overrun in the resource control code on systems with Sub-NUMA Clustering enabled because the code failed to adjust the index by the number of SNC nodes per L3 cache. * tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix arch_mbm_* array overrun on SNC x86/tdx: Fix data leak in mmio_read() x86/kaslr: Expose and use the end of the physical memory address space x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported x86/apic: Make x2apic_disable() work correctly
2024-09-01Merge tag 'perf-urgent-2024-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for x86 performance monitoring. Haswell PMUs suffer from several errata and require a limit the minimal period for counter events, otherwise they suffer from endless loops in the PMU interrupt" * tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Limit the period on Haswell
2024-08-31riscv: misaligned: Restrict user access to kernel memorySamuel Holland
raw_copy_{to,from}_user() do not call access_ok(), so this code allowed userspace to access any virtual memory address. Cc: stable@vger.kernel.org Fixes: 7c83232161f6 ("riscv: add support for misaligned trap handling in S-mode") Fixes: 441381506ba7 ("riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240815005714.1163136-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-01Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull misc fixes from Guenter Roeck. These are fixes for regressions that Guenther has been reporting, and the maintainers haven't picked up and sent in. With rc6 fairly imminent, I'm taking them directly from Guenter. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: apparmor: fix policy_unpack_test on big endian systems Revert "MIPS: csrc-r4k: Apply verification clocksource flags" microblaze: don't treat zero reserved memory regions as error
2024-09-01Merge tag 'arm-fixes-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There is a fairly large number of bug fixes for Qualcomm platforms, most of them addressing issues with the devicetree files for the newly added Snapdragon X1 based laptops to make them more reliable. The Qualcomm driver changes address a few build-time issues as well as runtime problems in the tzmem and scm firmware, the USB Type-C driver, and the cmd-db and pmic_glink soc drivers. The NXP i.MX usually gets a bunch of devicetree fixes that is proportional to the number of supported machines. This includes both warning fixes and correctness for the 64-bit i.MX9, i.MX8 and layerscape platforms, as well as a single fix for a 32-bit i.MX6 based board. The other changes are the usual minor changes, including an update to the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs (risc-v)" * tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits) firmware: microchip: fix incorrect error report of programming:timeout on success soc: qcom: pd-mapper: Fix singleton refcount firmware: qcom: tzmem: disable sdm670 platform soc: qcom: pmic_glink: Actually communicate when remote goes down usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Fix race during initialization firmware: qcom: qseecom: remove unused functions firmware: qcom: tzmem: fix virtual-to-physical address conversion firmware: qcom: scm: Mark get_wq_ctx() as atomic call arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 ...
2024-08-29Merge patch series "riscv: mm: Do not restrict mmap address based on hint"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: There have been a couple of reports that using the hint address to restrict the address returned by mmap hint address has caused issues in applications. A different solution for restricting addresses returned by mmap is necessary to avoid breakages. [Palmer: This also just wasn't doing the right thing in the first place, as it didn't handle the sv39 cases we were trying to deal with.] * b4-shazam-merge: riscv: mm: Do not restrict mmap address based on hint riscv: selftests: Remove mmap hint address checks Revert "RISC-V: mm: Document mmap changes" Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-0-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29riscv: mm: Do not restrict mmap address based on hintCharlie Jenkins
The hint address should not forcefully restrict the addresses returned by mmap as this causes mmap to report ENOMEM when there is memory still available. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: b5b4287accd7 ("riscv: mm: Use hint address in mmap if available") Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Closes: https://lore.kernel.org/linux-kernel/ZbxTNjQPFKBatMq+@ghost/T/#mccb1890466bf5a488c9ce7441e57e42271895765 Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-3-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29powerpc/qspinlock: Fix deadlock in MCS queueNysal Jan K.A.
If an interrupt occurs in queued_spin_lock_slowpath() after we increment qnodesp->count and before node->lock is initialized, another CPU might see stale lock values in get_tail_qnode(). If the stale lock value happens to match the lock on that CPU, then we write to the "next" pointer of the wrong qnode. This causes a deadlock as the former CPU, once it becomes the head of the MCS queue, will spin indefinitely until it's "next" pointer is set by its successor in the queue. Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in occasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive \ --maximize --oomable --verify --syslog \ --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ec The following code flow illustrates how the deadlock occurs. For the sake of brevity, assume that both locks (A and B) are contended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1) Thanks to Saket Kumar Bhaskar for help with recreating the issue Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters") Cc: stable@vger.kernel.org # v6.2+ Reported-by: Geetika Moolchandani <geetika@linux.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Reported-by: Jijo Varghese <vargjijo@in.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com
2024-08-28Merge tag 'qcom-arm64-fixes-for-6.11' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 DeviceTree fixes for v6.11 On X1E the GPU node is disabled by default, to be enabled in the individual devices once the developers install the required firmware. The generic EDP panel driver used on the X1E CRD is replaced with the Samsung ATNA45AF01 driver, in order to ensure backlight is brought back up after being turned off. The pin configuration for PCIe-related pins are corrected across all the X1E targets. The PCIe controllers gain a minimum OPP vote, and PCIe domain numbers are corrected. WiFi calibration variant information is added to the Lenovo Yoga Slim 7x, to pick the right data from the firmware packages. The incorrect Adreno SMMU global interrupt is corrected. For IPQ5332, the IRQ triggers for the USB controller are corrected. * tag 'qcom-arm64-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits) arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: qcom: x1e80100-crd: Fix backlight arm64: dts: qcom: x1e80100-yoga-slim7x: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-yoga-slim7x: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-yoga-slim7x: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-yoga-slim7x: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-vivobook-s15: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-vivobook-s15: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-vivobook-s15: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-vivobook-s15: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-qcp: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-qcp: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-qcp: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-qcp: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-crd: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-crd: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-crd: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100: add missing PCIe minimum OPP arm64: dts: qcom: x1e80100: fix PCIe domain numbers ... Link: https://lore.kernel.org/r/20240826152426.1648383-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28Merge tag 'qcom-arm64-defconfig-fixes-for-6.11' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 defconfig fix for 6.11 Enable the Samsung ATNA33XC20 display panel driver, as we switched from the generic EDP panel for some of the X1E devices in v6.11. * tag 'qcom-arm64-defconfig-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: Add CONFIG_DRM_PANEL_SAMSUNG_ATNA33XC20 Link: https://lore.kernel.org/r/20240826145736.1646729-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28Merge tag 'imx-fixes-6.11' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 6.11: - One imx8mp-beacon-kit change from Adam Ford to fix the broken WM8962 audio support - One pinctrl property typo fix for imx8mm-phygate - One layerscape fix from Krzysztof Kozlowski to get thermal nodes correct name length - A couple of imx93-tqma9352 fixes from Markus Niebel, one on CMA alloc-ranges and the other on SD-Card cd-gpios typo - One change from Michal Vokáč to fix imx6dl-yapp43 LED current to match the HW design - A couple of imx95 fixes from Peng Fan, one to correct a55 power domains and the other to correct L3Cache cache-sets - One tqma9352 watchdog reset fix from Sascha Hauer - One imx93 change from Shenwei Wang to fix the default value for STMMAC EQOS snps,clk-csr * tag 'imx-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 arm64: dts: layerscape: fix thermal node names length Link: https://lore.kernel.org/r/ZrtsTO1+jXhJ6GSM@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28Merge tag 'omap-for-v6.11/fixes-signed' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into arm/fixes OMAP fixes for v6.11-rc - omap3-n900: fix accelerometer orientation * tag 'omap-for-v6.11/fixes-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap: ARM: dts: omap3-n900: correct the accelerometer orientation Link: https://lore.kernel.org/r/7h4j7eyhyh.fsf@baylibre.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28KVM: SEV: Update KVM_AMD_SEV Kconfig entry and mention SEV-SNPVitaly Kuznetsov
SEV-SNP support is present since commit 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") but Kconfig entry wasn't updated and still mentions SEV and SEV-ES only. Add SEV-SNP there and, while on it, expand 'SEV' in the description as 'Encrypted VMs' is not what 'SEV' stands for. No functional change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20240828122111.160273-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-28x86/resctrl: Fix arch_mbm_* array overrun on SNCPeter Newman
When using resctrl on systems with Sub-NUMA Clustering enabled, monitoring groups may be allocated RMID values which would overrun the arch_mbm_{local,total} arrays. This is due to inconsistencies in whether the SNC-adjusted num_rmid value or the unadjusted value in resctrl_arch_system_num_rmid_idx() is used. The num_rmid value for the L3 resource is currently: resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache As a simple fix, make resctrl_arch_system_num_rmid_idx() return the SNC-adjusted, L3 num_rmid value on x86. Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache") Signed-off-by: Peter Newman <peternewman@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240822190212.1848788-1-peternewman@google.com
2024-08-26x86/tdx: Fix data leak in mmio_read()Kirill A. Shutemov
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an address from the VMM. Sean noticed that mmio_read() unintentionally exposes the value of an initialized variable (val) on the stack to the VMM. This variable is only needed as an output value. It did not need to be passed to the VMM in the first place. Do not send the original value of *val to the VMM. [ dhansen: clarify what 'val' is used for. ] Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linux.intel.com
2024-08-26LoongArch: KVM: Invalidate guest steal time address on vCPU resetBibo Mao
If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Add ifdefs to fix LSX and LASX related warningsTiezhu Yang
There exist some warnings when building kernel if CONFIG_CPU_HAS_LBT is set but CONFIG_CPU_HAS_LSX and CONFIG_CPU_HAS_LASX are not set. In this case, there are no definitions of _restore_lsx & _restore_lasx and there are also no definitions of kvm_restore_lsx & kvm_restore_lasx in fpu.S and switch.S respectively, just add some ifdefs to fix these warnings. AS arch/loongarch/kernel/fpu.o arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 AS [M] arch/loongarch/kvm/switch.o arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 MODPOST Module.symvers ERROR: modpost: "kvm_restore_lsx" [arch/loongarch/kvm/kvm.ko] undefined! ERROR: modpost: "kvm_restore_lasx" [arch/loongarch/kvm/kvm.ko] undefined! Cc: stable@vger.kernel.org # 6.9+ Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408120955.qls5oNQY-lkp@intel.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBEHuacai Chen
Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated. Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
2024-08-25Revert "MIPS: csrc-r4k: Apply verification clocksource flags"Guenter Roeck
This reverts commit 7190401fc56fb5f02ee3d04476778ab000bbaf32. Verifying the clock source sometimes deems the MIPS clock to be unstable, at least in qemu. clocksource: timekeeping watchdog on CPU0: Marking clocksource 'MIPS' as unstable because the skew is too large: clocksource: 'jiffies' wd_nsec: 500000000 wd_now: ffff8bde wd_last: ffff8bac mask: ffffffff clocksource: 'MIPS' cs_nsec: 940634468 cs_now: 310181c4 cs_last: 28090a09 mask: ffffffff clocksource: Clocksource 'MIPS' skewed 440634468 ns (440 ms) over watchdog 'jiffies' interval of 500000000 ns (500 ms) clocksource: 'MIPS' is current clocksource. If this happens, network interfaces fail to come online. Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25microblaze: don't treat zero reserved memory regions as errorMike Rapoport
Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the check for non-zero of memblock.reserved.cnt in mmu_init() would always be true either because memblock.reserved.cnt is initialized to 1 or because there were memory reservations earlier. The removal of dummy empty entry in memblock caused this check to fail because now memblock.reserved.cnt is initialized to 0. Remove the check for non-zero of memblock.reserved.cnt because it's perfectly fine to have an empty memblock.reserved array that early in boot. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25perf/x86/intel: Limit the period on HaswellKan Liang
Running the ltp test cve-2015-3290 concurrently reports the following warnings. perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370 Call Trace: <NMI> ? __warn+0xa4/0x220 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? report_bug+0x3e/0xa0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? irq_work_claim+0x1e/0x40 ? intel_pmu_handle_irq+0x285/0x370 perf_event_nmi_handler+0x3d/0x60 nmi_handle+0x104/0x330 Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/) The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW. HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far. Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support") Reported-by: Li Huafei <lihuafei1@huawei.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
2024-08-25LoongArch: Remove the unused dma-direct.hMiao Wang
dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"), ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/ dma_to_phys() functions were removed. However, the unused dma-direct.h was left behind, which is removed by this patch. Cc: <stable@vger.kernel.org> Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA") Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>