summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2021-03-21Merge tag 'perf-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Boundary condition fixes for bugs unearthed by the perf fuzzer" * tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT perf/x86/intel: Fix a crash caused by zero PEBS status
2021-03-21Merge tag 'x86_urgent_for_v5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21Merge tag 'powerpc-5.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a possible stack corruption and subsequent DLPAR failure in the rpadlpar_io PCI hotplug driver - Two build fixes for uncommon configurations Thanks to Christophe Leroy and Tyrel Datwyler. * tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: PCI: rpadlpar: Fix potential drc_name corruption in store functions powerpc: Force inlining of cpu_has_feature() to avoid build failure powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
2021-03-20Merge tag 'riscv-for-linus-5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "A handful of fixes for 5.12: - fix the SBI remote fence numbers for hypervisor fences, which had been transcribed in the wrong order in Linux. These fences are only used with the KVM patches applied. - fix a whole host of build warnings, these should have no functional change. - fix init_resources() to prevent an off-by-one error from causing an out-of-bounds array reference. This was manifesting during boot on vexriscv. - ensure the KASAN mappings are visible before proceeding to use them" * tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Correct SPARSEMEM configuration RISC-V: kasan: Declare kasan_shallow_populate() static riscv: Ensure page table writes are flushed when initializing KASAN vmalloc RISC-V: Fix out-of-bounds accesses in init_resources() riscv: Fix compilation error with Canaan SoC ftrace: Fix spelling mistake "disabed" -> "disabled" riscv: fix bugon.cocci warnings riscv: process: Fix no prototype for arch_dup_task_struct riscv: ftrace: Use ftrace_get_regs helper riscv: process: Fix no prototype for show_regs riscv: syscall_table: Reduce W=1 compilation warnings noise riscv: time: Fix no prototype for time_init riscv: ptrace: Fix no prototype warnings riscv: sbi: Fix comment of __sbi_set_timer_v01 riscv: irq: Fix no prototype warning riscv: traps: Fix no prototype warnings RISC-V: correct enum sbi_ext_rfence_fid
2021-03-19bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIGStanislav Fomichev
__bpf_arch_text_poke does rewrite only for atomic nop5, emit_nops(xxx, 5) emits non-atomic one which breaks fentry/fexit with k8 atomics: P6_NOP5 == P6_NOP5_ATOMIC (0f1f440000 == 0f1f440000) K8_NOP5 != K8_NOP5_ATOMIC (6666906690 != 6666666690) Can be reproduced by doing "ideal_nops = k8_nops" in "arch_init_ideal_nops() and running fexit_bpf2bpf selftest. Fixes: e21aa341785c ("bpf: Fix fexit trampoline.") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210320000001.915366-1-sdf@google.com
2021-03-19x86/apic/of: Fix CPU devicetree-node lookupsJohan Hovold
Architectures that describe the CPU topology in devicetree and do not have an identity mapping between physical and logical CPU ids must override the default implementation of arch_match_cpu_phys_id(). Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node() and of_cpu_device_node_get() which several drivers rely on. It also causes the CPU struct devices exported through sysfs to point to the wrong devicetree nodes. On x86, CPUs are described in devicetree using their APIC ids and those do not generally coincide with the logical ids, even if CPU0 typically uses APIC id 0. Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node lookups work also with SMP. Apart from fixing the broken sysfs devicetree-node links this likely does not affect current users of mainline kernels on x86. Fixes: 4e07db9c8db8 ("x86/devicetree: Use CPU description from Device Tree") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
2021-03-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Fixes for kvm on x86: - new selftests - fixes for migration with HyperV re-enlightenment enabled - fix RCU/SRCU usage - fixes for local_irq_restore misuse false positive" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: documentation/kvm: additional explanations on KVM_SET_BOOT_CPU_ID x86/kvm: Fix broken irq restoration in kvm_wait KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish selftests: kvm: add set_boot_cpu_id test selftests: kvm: add _vm_ioctl selftests: kvm: add get_msr_index_features selftests: kvm: Add basic Hyper-V clocksources tests KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment KVM: x86: hyper-v: Track Hyper-V TSC page status KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS KVM: x86/mmu: Store the address space ID in the TDP iterator KVM: x86/mmu: Factor out tdp_iter_return_to_root KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page
2021-03-19arm64: mm: use XN table mapping attributes for user/kernel mappingsArd Biesheuvel
As the kernel and user space page tables are strictly mutually exclusive when it comes to executable permissions, we can set the UXN table attribute on all table entries that are created while creating kernel mappings in the swapper page tables, and the PXN table attribute on all table entries that are created while creating user space mappings in user space page tables. While at it, get rid of a redundant comment. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310104942.174584-4-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-19arm64: mm: use XN table mapping attributes for the linear regionArd Biesheuvel
The way the arm64 kernel virtual address space is constructed guarantees that swapper PGD entries are never shared between the linear region on the one hand, and the vmalloc region on the other, which is where all kernel text, module text and BPF text mappings reside. This means that mappings in the linear region (which never require executable permissions) never share any table entries at any level with mappings that do require executable permissions, and so we can set the table-level PXN attributes for all table entries that are created while setting up mappings in the linear region. Since swapper's PGD level page table is mapped r/o itself, this adds another layer of robustness to the way the kernel manages its own page tables. While at it, set the UXN attribute as well for all kernel mappings created at boot. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210310104942.174584-3-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-19arm64: mm: add missing P4D definitions and use them consistentlyArd Biesheuvel
Even though level 0, 1 and 2 descriptors share the same attribute encodings, let's be a bit more consistent about using the right one at the right level. So add new macros for level 0/P4D definitions, and clean up some inconsistencies involving these macros. Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310104942.174584-2-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-03-19Merge tag 's390-5.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - disable preemption when accessing local per-cpu variables in the new counter set driver - fix by a factor of four increased steal time due to missing cputime_to_nsecs() conversion - fix PCI device structure leak * tag 's390-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: fix leak of PCI device structure s390/vtime: fix increased steal time accounting s390/cpumf: disable preemption when accessing per-cpu variable
2021-03-19x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()Jarkko Sakkinen
Background ========== SGX enclave memory is enumerated by the processor in contiguous physical ranges called Enclave Page Cache (EPC) sections. Currently, there is a free list per section, but allocations simply target the lowest-numbered sections. This is functional, but has no NUMA awareness. Fortunately, EPC sections are covered by entries in the ACPI SRAT table. These entries allow each EPC section to be associated with a NUMA node, just like normal RAM. Solution ======== Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator and maintain a list of enclave pages for each NUMA node. Attempt to allocate enclave memory first from local nodes, then fall back to other nodes. Note that the fallback is not as sophisticated as the buddy allocator and is itself not aware of NUMA distances. When a node's free list is empty, it searches for the next-highest node with enclave pages (and will wrap if necessary). This could be improved in the future. Other ===== NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node(). [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because callers do not expect that and that leads to a NULL ptr deref. ] [ dhansen: Fix an uninitialized 'nid' variable in __sgx_alloc_epc_page() as Reported-by: kernel test robot <lkp@intel.com> to avoid any potential allocations from the wrong NUMA node or even premature allocation failures. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/ Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-19x86/sev-es: Optimize __sev_es_ist_enter() for better readabilityJoerg Roedel
Reorganize the code and improve the comments to make the function more readable and easier to understand. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210303141716.29223-4-joro@8bytes.org
2021-03-19x86/kaslr: Return boolean values from a function returning boolJiapeng Chong
Fix the following coccicheck warnings: ./arch/x86/boot/compressed/kaslr.c:642:10-11: WARNING: return of 0/1 in function 'process_mem_region' with return type bool. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1615283963-67277-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-03-19x86/ioapic: Ignore IRQ2 againThomas Gleixner
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where the matrix allocator claimed to be out of vectors. He analyzed it down to the point that IRQ2, the PIC cascade interrupt, which is supposed to be not ever routed to the IO/APIC ended up having an interrupt vector assigned which got moved during unplug of CPU0. The underlying issue is that IRQ2 for various reasons (see commit af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated as a reserved system vector by the vector core code and is not accounted as a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2 which causes the IO/APIC setup to claim that interrupt which is granted by the vector domain because there is no sanity check. As a consequence the allocation counter of CPU0 underflows which causes a subsequent unplug to fail with: [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU There is another sanity check missing in the matrix allocator, but the underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic during the conversion to irqdomains. For almost 6 years nobody complained about this wreckage, which might indicate that this requirement could be lifted, but for any system which actually has a PIC IRQ2 is unusable by design so any routing entry has no effect and the interrupt cannot be connected to a device anyway. Due to that and due to history biased paranoia reasons restore the IRQ2 ignore logic and treat it as non existent despite a routing entry claiming otherwise. Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
2021-03-19crypto: arm/chacha-scalar - switch to common rev_l macroArd Biesheuvel
Drop the local definition of a byte swapping macro and use the common one instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/aes-scalar - switch to common rev_l/mov_l macrosArd Biesheuvel
The scalar AES implementation has some locally defined macros which reimplement things that are now available in macros defined in assembler.h. So let's switch to those. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/blake2s - fix for big endianEric Biggers
The new ARM BLAKE2s code doesn't work correctly (fails the self-tests) in big endian kernel builds because it doesn't swap the endianness of the message words when loading them. Fix this. Fixes: 5172d322d34c ("crypto: arm/blake2s - add ARM scalar optimized BLAKE2s") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-18Merge tag 'imx-fixes-5.12' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.12: - Fix an Ethernet issue on imx6ul-14x14-evk board that is caused by independent PHY reset. - Add missing `dma-coherent` property for LayerScape device trees to fix a kernel BUG report. - Use IRQCHIP_DECLARE for AVIC driver to fix a boot issue on i.MX25 with fw_devlink=on. - Add missing I2C pinctrl entry for imx8mp-phyboard-pollux-rdk board to fix the broken I2C GPIO recovery support. - Add `fsl,use-minimum-ecc` property for imx6ull-myir-mys-6ulx-eval device tree to fix UBI filesystem mount failure. * tag 'imx-fixes-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6ull: fix ubi filesystem mount failed ARM: imx6ul-14x14-evk: Do not reset the Ethernet PHYs independently arm64: dts: imx8mp-phyboard-pollux-rdk: Add missing pinctrl entry arm64: dts: ls1012a: mark crypto engine dma coherent arm64: dts: ls1043a: mark crypto engine dma coherent arm64: dts: ls1046a: mark crypto engine dma coherent ARM: imx: avic: Convert to using IRQCHIP_DECLARE Link: https://lore.kernel.org/r/20210318090145.GA22955@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-03-18Merge tag 'at91-fixes-5.12' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes AT91 fixes for 5.12: - only DT changes -- wrong phy address that blocks Ethernet use on boards with sama5d27 SoM1 -- restrictive PIN possibilities for sam9x60 * tag 'at91-fixes-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: at91: sam9x60: fix mux-mask to match product's datasheet ARM: dts: at91: sam9x60: fix mux-mask for PA7 so it can be set to A, B and C ARM: dts: at91-sama5d27_som1: fix phy address to 7 Link: https://lore.kernel.org/r/20210310160547.55382-1-nicolas.ferre@microchip.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-03-18Merge tag 'omap-for-v5.12/fixes-rc1-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Fixes for omaps for v5.12-rc cycle Regression fixes for multiple issues found mostly caused by recent changes to drop legacy platform data and and starting to use the new prm driver reset controller: - Fix ocp interconnect bus access error reporting for omap_l3_noc by setting IRQF_NO_THREAD - Fix changed mmc slot order regression by adding mmc aliases for am335x - Fix dra7 reboot regression caused by invalid pcie reset map - Fix smartreflex init regression caused by dropped legacy data - Fix ti-sysc driver warning on unbind if reset is not deasserted - Fix flakey reset deassert for dra7 iva * tag 'omap-for-v5.12/fixes-rc1-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva bus: ti-sysc: Fix warning on unbind if reset is not deasserted ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data soc: ti: omap-prm: Fix reboot issue with invalid pcie reset map for dra7 ARM: dts: am33xx: add aliases for mmc interfaces bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD Link: https://lore.kernel.org/r/pull-1614868603-800959@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-03-18x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()Joerg Roedel
There are a few places left in the SEV-ES C code where hlt loops and/or terminate requests are implemented. Replace them all with calls to sev_es_terminate(). Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-9-joro@8bytes.org
2021-03-18x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-pathJoerg Roedel
Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-8-joro@8bytes.org
2021-03-18x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-pathJoerg Roedel
The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the 32-bit boot #VC handler to make sure the hypervisor does not pretend that SEV is not enabled. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-7-joro@8bytes.org
2021-03-18x86/boot/compressed/64: Add 32-bit boot #VC handlerJoerg Roedel
Add a #VC exception handler which is used when the kernel still executes in protected mode. This boot-path already uses CPUID, which will cause #VC exceptions in an SEV-ES guest. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-6-joro@8bytes.org
2021-03-18x86/kvm: Fix broken irq restoration in kvm_waitWanpeng Li
After commit 997acaf6b4b59c (lockdep: report broken irq restoration), the guest splatting below during boot: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30 Modules linked in: hid_generic usbhid hid CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25 RIP: 0010:warn_bogus_irq_restore+0x26/0x30 Call Trace: kvm_wait+0x76/0x90 __pv_queued_spin_lock_slowpath+0x285/0x2e0 do_raw_spin_lock+0xc9/0xd0 _raw_spin_lock+0x59/0x70 lockref_get_not_dead+0xf/0x50 __legitimize_path+0x31/0x60 legitimize_root+0x37/0x50 try_to_unlazy_next+0x7f/0x1d0 lookup_fast+0xb0/0x170 path_openat+0x165/0x9b0 do_filp_open+0x99/0x110 do_sys_openat2+0x1f1/0x2e0 do_sys_open+0x5c/0x80 __x64_sys_open+0x21/0x30 do_syscall_64+0x32/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae The new consistency checking, expects local_irq_save() and local_irq_restore() to be paired and sanely nested, and therefore expects local_irq_restore() to be called with irqs disabled. The irqflags handling in kvm_wait() which ends up doing: local_irq_save(flags); safe_halt(); local_irq_restore(flags); instead triggers it. This patch fixes it by using local_irq_disable()/enable() directly. Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUsWanpeng Li
In order to deal with noncoherent DMA, we should execute wbinvd on all dirty pCPUs when guest wbinvd exits to maintain data consistency. smp_call_function_many() does not execute the provided function on the local core, therefore replace it by on_each_cpu_mask(). Reported-by: Nadav Amit <namit@vmware.com> Cc: Nadav Amit <namit@vmware.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ishSean Christopherson
Fix a plethora of issues with MSR filtering by installing the resulting filter as an atomic bundle instead of updating the live filter one range at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as the hardware MSR bitmaps won't be updated until the next VM-Enter, but the relevant software struct is atomically updated, which is what KVM really needs. Similar to the approach used for modifying memslots, make arch.msr_filter a SRCU-protected pointer, do all the work configuring the new filter outside of kvm->lock, and then acquire kvm->lock only when the new filter has been vetted and created. That way vCPU readers either see the old filter or the new filter in their entirety, not some half-baked state. Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a TOCTOU bug, but that's just the tip of the iceberg... - Nothing is __rcu annotated, making it nigh impossible to audit the code for correctness. - kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel coding style aside, the lack of a smb_rmb() anywhere casts all code into doubt. - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs count before taking the lock. - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug. The entire approach of updating the live filter is also flawed. While installing a new filter is inherently racy if vCPUs are running, fixing the above issues also makes it trivial to ensure certain behavior is deterministic, e.g. KVM can provide deterministic behavior for MSRs with identical settings in the old and new filters. An atomic update of the filter also prevents KVM from getting into a half-baked state, e.g. if installing a filter fails, the existing approach would leave the filter in a half-baked state, having already committed whatever bits of the filter were already processed. [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Reported-by: Yuan Yao <yaoyuan0329os@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210316184436.2544875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18x86/boot/compressed/64: Setup IDT in startup_32 boot pathJoerg Roedel
This boot path needs exception handling when it is used with SEV-ES. Setup an IDT and provide a helper function to write IDT entries for use in 32-bit protected mode. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-5-joro@8bytes.org
2021-03-18x86/boot/compressed/64: Reload CS in startup_32Joerg Roedel
Exception handling in the startup_32 boot path requires the CS selector to be correctly set up. Reload it from the current GDT. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-4-joro@8bytes.org
2021-03-18x86/sev: Do not require Hypervisor CPUID bit for SEV guestsJoerg Roedel
A malicious hypervisor could disable the CPUID intercept for an SEV or SEV-ES guest and trick it into the no-SEV boot path, where it could potentially reveal secrets. This is not an issue for SEV-SNP guests, as the CPUID intercept can't be disabled for those. Remove the Hypervisor CPUID bit check from the SEV detection code to protect against this kind of attack and add a Hypervisor bit equals zero check to the SME detection path to prevent non-encrypted guests from trying to enable SME. This handles the following cases: 1) SEV(-ES) guest where CPUID intercept is disabled. The guest will still see leaf 0x8000001f and the SEV bit. It can retrieve the C-bit and boot normally. 2) Non-encrypted guests with intercepted CPUID will check the SEV_STATUS MSR and find it 0 and will try to enable SME. This will fail when the guest finds MSR_K8_SYSCFG to be zero, as it is emulated by KVM. But we can't rely on that, as there might be other hypervisors which return this MSR with bit 23 set. The Hypervisor bit check will prevent that the guest tries to enable SME in this case. 3) Non-encrypted guests on SEV capable hosts with CPUID intercept disabled (by a malicious hypervisor) will try to boot into the SME path. This will fail, but it is also not considered a problem because non-encrypted guests have no protection against the hypervisor anyway. [ bp: s/non-SEV/non-encrypted/g ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20210312123824.306-3-joro@8bytes.org
2021-03-18x86/boot/compressed/64: Cleanup exception handling before booting kernelJoerg Roedel
Disable the exception handling before booting the kernel to make sure any exceptions that happen during early kernel boot are not directed to the pre-decompression code. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-2-joro@8bytes.org
2021-03-18Merge tag 'v5.12-rc3' into x86/sevesBorislav Petkov
Pick up dependent SEV-ES urgent changes which went into -rc3 to base new work ontop. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-18x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_listJarkko Sakkinen
During normal runtime, the "ksgxd" daemon behaves like a version of kswapd just for SGX. But, before it starts acting like kswapd, its first job is to initialize enclave memory. Currently, the SGX boot code places each enclave page on a epc_section->init_laundry_list. Once it starts up, the ksgxd code walks over that list and populates the actual SGX page allocator. However, the per-section structures are going away to make way for the SGX NUMA allocator. There's also little need to have a per-section structure; the enclave pages are all treated identically, and they can be placed on the correct allocator list from metadata stored in the enclave page (struct sgx_epc_page) itself. Modify sgx_sanitize_section() to take a single page list instead of taking a section and deriving the list from there. Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20210317235332.362001-1-jarkko.sakkinen@intel.com
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-18Merge tag 'v5.12-rc3' into x86/cleanups, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-18KVM: x86: hyper-v: Don't touch TSC page values when guest opted for ↵Vitaly Kuznetsov
re-enlightenment When guest opts for re-enlightenment notifications upon migration, it is in its right to assume that TSC page values never change (as they're only supposed to change upon migration and the host has to keep things as they are before it receives confirmation from the guest). This is mostly true until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift apart (even slightly), the update causes TSC page values to change. The issue at hand is that when Hyper-V is migrated, it uses stale (cached) TSC page values to compute the difference between its own clocksource (provided by KVM) and its guests' TSC pages to program synthetic timers and in some cases, when TSC page is updated, this puts all stimer expirations in the past. This, in its turn, causes an interrupt storm and L2 guests not making much forward progress. Note, KVM doesn't fully implement re-enlightenment notification. Basically, the support for reenlightenment MSRs is just a stub and userspace is only expected to expose the feature when TSC scaling on the expected destination hosts is available. With TSC scaling, no real re-enlightenment is needed as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it likely makes little sense to fully implement re-enlightenment in KVM. Prevent TSC page from being updated after migration. In case it's not the guest who's initiating the change and when TSC page is already enabled, just keep it as it is: TSC value is supposed to be preserved across migration and TSC frequency can't change with re-enlightenment enabled. The guest is doomed anyway if any of this is not true. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18KVM: x86: hyper-v: Track Hyper-V TSC page statusVitaly Kuznetsov
Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it was updated from guest/host side or if we've failed to set it up (because e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no need to retry. Also, in a hypothetical situation when we are in 'always catchup' mode for TSC we can now avoid contending 'hv->hv_lock' on every guest enter by setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters() returns false. Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in get_time_ref_counter() to properly handle the situation when we failed to write the updated TSC page values to the guest. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18ARM: dts: imx6ull: fix ubi filesystem mount faileddillon min
For NAND Ecc layout, there is a dependency from old kernel's nand driver setting and current. if old kernel use 4 bit ecc , we should use 4 bit in new kernel either. else will run into following error at filesystem mounting. So, enable fsl,use-minimum-ecc from device tree, to fix this mismatch [ 9.449265] ubi0: scanning is finished [ 9.463968] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 513:4096, read only 22528 bytes, retry [ 9.486940] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 513:4096, read only 22528 bytes, retry [ 9.509906] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 513:4096, read only 22528 bytes, retry [ 9.532845] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 513:4096, read 22528 bytes Fixes: f9ecf10cb88c ("ARM: dts: imx6ull: add MYiR MYS-6ULX SBC") Signed-off-by: dillon min <dillon.minfei@gmail.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-03-18bpf: Fix fexit trampoline.Alexei Starovoitov
The fexit/fmod_ret programs can be attached to kernel functions that can sleep. The synchronize_rcu_tasks() will not wait for such tasks to complete. In such case the trampoline image will be freed and when the task wakes up the return IP will point to freed memory causing the crash. Solve this by adding percpu_ref_get/put for the duration of trampoline and separate trampoline vs its image life times. The "half page" optimization has to be removed, since first_half->second_half->first_half transition cannot be guaranteed to complete in deterministic time. Every trampoline update becomes a new image. The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and call_rcu_tasks. Together they will wait for the original function and trampoline asm to complete. The trampoline is patched from nop to jmp to skip fexit progs. They are freed independently from the trampoline. The image with fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which will wait for both sleepable and non-sleepable progs to complete. Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-17module: remove never implemented MODULE_SUPPORTED_DEVICELeon Romanovsky
MODULE_SUPPORTED_DEVICE was added in pre-git era and never was implemented. We can safely remove it, because the kernel has grown to have many more reliable mechanisms to determine if device is supported or not. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-17Merge tag 'mips-fixes_5.12_2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Fix for fdt alignment when image is compressed" * tag 'mips-fixes_5.12_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: vmlinux.lds.S: Fix appended dtb not properly aligned
2021-03-17ARM: OMAP4: PM: update ROM return address for OSWR and OFFCarlos Leija
We need to add a dummy smc call to the cpuidle wakeup path to force the ROM code to save the return address after MMU is enabled again. This is needed to prevent random hangs on secure devices like droid4. Otherwise the system will eventually hang when entering deeper SoC idle states with the core and mpu domains in open-switch retention (OSWR). The hang happens as the ROM code tries to use the earlier physical return address set by omap-headsmp.S with MMU off while waking up CPU1 again. The hangs started happening in theory already with commit caf8c87d7ff2 ("ARM: OMAP2+: Allow core oswr for omap4"), but in practise the issue went unnoticed as various drivers were often blocking any deeper idle states with hardware autoidle features. This patch is based on an earlier TI Linux kernel tree commit 92f0b3028d9e ("OMAP4: PM: update ROM return address for OSWR and OFF") written by Carlos Leija <cileija@ti.com>, Praneeth Bajjuri <praneeth@ti.com>, and Bryan Buckley <bryan.buckley@ti.com>. A later version of the patch was updated to use CPU_PM notifiers by Tero Kristo <t-kristo@ti.com>. Signed-off-by: Carlos Leija <cileija@ti.com> Signed-off-by: Praneeth Bajjuri <praneeth@ti.com> Signed-off-by: Bryan Buckley <bryan.buckley@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com> Fixes: caf8c87d7ff2 ("ARM: OMAP2+: Allow core oswr for omap4") Reported-by: Carl Philipp Klemm <philipp@uvos.xyz> Reported-by: Merlijn Wajer <merlijn@wizzup.org> Cc: Ivan Jelincic <parazyd@dyne.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sebastian Reichel <sre@kernel.org> Cc: Tero Kristo <kristo@kernel.org> [tony@atomide.com: updated to apply, updated description] Signed-off-by: Tony Lindgren <tony@atomide.com>
2021-03-17ARM: OMAP4: Fix PMIC voltage domains for bionicTony Lindgren
We are now registering the mpu domain three times instead of registering mpu, core and iva domains like we should. Fixes: d44fa156dcb2 ("ARM: OMAP2+: Configure voltage controller for cpcap") Signed-off-by: Tony Lindgren <tony@atomide.com>
2021-03-17KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUsVitaly Kuznetsov
When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration) we need to make sure no vCPU sees stale values in PV clock structures and thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page clocksource is global and kvm_guest_time_update() only updates in on vCPU0 but this is not entirely correct: nothing blocks some other vCPU from entering the guest before we finish the update on CPU0 and it can read stale values from the page. Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-17KVM: x86: hyper-v: Limit guest to writing zero to ↵Vitaly Kuznetsov
HV_X64_MSR_TSC_EMULATION_STATUS HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated after migration (to accommodate for a different host TSC frequency when TSC scaling is not supported; we don't implement this in KVM). Guest can use the same MSR to stop TSC access emulation by writing zero. Writing anything else is forbidden. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-16riscv: Correct SPARSEMEM configurationKefeng Wang
There are two issues for RV32, 1) if use FLATMEM, it is useless to enable SPARSEMEM_STATIC. 2) if use SPARSMEM, both SPARSEMEM_VMEMMAP and SPARSEMEM_STATIC is enabled. Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16RISC-V: kasan: Declare kasan_shallow_populate() staticPalmer Dabbelt
Without this I get a missing prototype warning. Reported-by: kernel test robot <lkp@intel.com> Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16riscv: Ensure page table writes are flushed when initializing KASAN vmallocAlexandre Ghiti
Make sure that writes to kernel page table during KASAN vmalloc initialization are made visible by adding a sfence.vma. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16RISC-V: Fix out-of-bounds accesses in init_resources()Geert Uytterhoeven
init_resources() allocates an array of resources, based on the current total number of memory regions and reserved memory regions. However, allocating this array using memblock_alloc() might increase the number of reserved memory regions. If that happens, populating the array later based on the new number of regions will cause out-of-bounds writes beyond the end of the allocated array. Fix this by allocating one more entry, which may or may not be used. Fixes: 797f0375dd2ef5cd ("RISC-V: Do not allocate memblock while iterating reserved memblocks") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Atish Patra <atish.patra@wdc.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>