summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2022-11-29arm64: dts: altera: align LED node names with dtschemaKrzysztof Kozlowski
The node names should be generic and DT schema expects certain pattern: altera/socfpga_stratix10_socdk.dtb: leds: 'hps0', 'hps1', 'hps2' do not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2022-11-29x86/boot: Remove x86_32 PIC using %ebx workaroundUros Bizjak
The currently supported minimum gcc version is 5.1. Before that, the PIC register, when generating Position Independent Code, was considered "fixed" in the sense that it wasn't in the set of registers available to the compiler's register allocator. Which, on x86-32, is already a very small set. What is more, the register allocator was unable to satisfy extended asm "=b" constraints. (Yes, PIC code uses %ebx on 32-bit as the base reg.) With gcc 5.1: "Reuse of the PIC hard register, instead of using a fixed register, was implemented on x86/x86-64 targets. This improves generated PIC code performance as more hard registers can be used. Shared libraries can significantly benefit from this optimization. Currently it is switched on only for x86/x86-64 targets. As RA infrastructure is already implemented for PIC register reuse, other targets might follow this in the future." (from: https://gcc.gnu.org/gcc-5/changes.html) which basically means that the register allocator has a higher degree of freedom when handling %ebx, including reloading it with the correct value before a PIC access. Furthermore: arch/x86/Makefile: # Never want PIC in a 32-bit kernel, prevent breakage with GCC built # with nonstandard options KBUILD_CFLAGS += -fno-pic $ gcc -Wp,-MMD,arch/x86/boot/.cpuflags.o.d ... -fno-pic ... -D__KBUILD_MODNAME=kmod_cpuflags -c -o arch/x86/boot/cpuflags.o arch/x86/boot/cpuflags.c so the 32-bit workaround in cpuid_count() is fixing exactly nothing because 32-bit configs don't even allow PIC builds. As to 64-bit builds: they're done using -mcmodel=kernel which produces RIP-relative addressing for PIC builds and thus does not apply here either. So get rid of the thing and make cpuid_count() nice and simple. There should be no functional changes resulting from this. [ bp: Expand commit message. ] Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221104124546.196077-1-ubizjak@gmail.com
2022-11-29arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()Mark Brown
For reasons that are unclear to this reader fpsimd_bind_state_to_cpu() populates the struct fpsimd_last_state_struct that it uses to store the active floating point state for KVM guests by passing an argument for each member of the structure. As the richness of the architecture increases this is resulting in a function with a rather large number of arguments which isn't ideal. Simplify the interface by using the struct directly as the single argument for the function, renaming it as we lift the definition into the header. This could be built on further to reduce the work we do adding storage for new FP state in various places but for now it just simplifies this one interface. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-9-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/sve: Leave SVE enabled on syscall if we don't context switchMark Brown
The syscall ABI says that the SVE register state not shared with FPSIMD may not be preserved on syscall, and this is the only mechanism we have in the ABI to stop tracking the extra SVE state for a process. Currently we do this unconditionally by means of disabling SVE for the process on syscall, causing userspace to take a trap to EL1 if it uses SVE again. These extra traps result in a noticeable overhead for using SVE instead of FPSIMD in some workloads, especially for simple syscalls where we can return directly to userspace and would not otherwise need to update the floating point registers. Tests with fp-pidbench show an approximately 70% overhead on a range of implementations when SVE is in use - while this is an extreme and entirely artificial benchmark it is clear that there is some useful room for improvement here. Now that we have the ability to track the decision about what to save seprately to TIF_SVE we can improve things by leaving TIF_SVE enabled on syscall but only saving the FPSIMD registers if we are in a syscall. This means that if we need to restore the register state from memory (eg, after a context switch or kernel mode NEON) we will drop TIF_SVE and reenable traps for userspace but if we can just return to userspace then traps will remain disabled. Since our current implementation and hence ABI has the effect of zeroing all the SVE register state not shared with FPSIMD on syscall we replace the disabling of TIF_SVE with a flush of the non-shared register state, this means that there is still some overhead for syscalls when SVE is in use but it is very much reduced. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-8-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/fpsimd: SME no longer requires SVE register stateMark Brown
Now that we track the type of the stored register state separately to what is active in the task, it is valid to have the FPSIMD register state stored while in streaming mode. Remove the special case handling for SME when setting FPSIMD register state. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-7-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/fpsimd: Load FP state based on recorded data typeMark Brown
Now that we are recording the type of floating point register state we are saving when we write the register state out to memory we can use that information when we load from memory to decide which format to load, bringing TIF_SVE into line with what we saved rather than relying on TIF_SVE to determine what to load. The SME state details are already recorded directly in the saved SVCR and handled based on the information there. Since we are not changing any of the save paths there should be no functional change from this patch, further patches will make use of this to optimise and clarify the code. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-6-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVMMark Brown
Now that we are explicitly telling the host FP code which register state it needs to save we can remove the manipulation of TIF_SVE from the KVM code, simplifying it and allowing us to optimise our handling of normal tasks. Remove the manipulation of TIF_SVE from KVM and instead rely on to_save to ensure we save the correct data for it. There should be no functional or performance impact from this change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-5-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/fpsimd: Have KVM explicitly say which FP registers to saveMark Brown
In order to avoid needlessly saving and restoring the guest registers KVM relies on the host FPSMID code to save the guest registers when we context switch away from the guest. This is done by binding the KVM guest state to the CPU on top of the task state that was originally there, then carefully managing the TIF_SVE flag for the task to cause the host to save the full SVE state when needed regardless of the needs of the host task. This works well enough but isn't terribly direct about what is going on and makes it much more complicated to try to optimise what we're doing with the SVE register state. Let's instead have KVM pass in the register state it wants saving when it binds to the CPU. We introduce a new FP_STATE_CURRENT for use during normal task binding to indicate that we should base our decisions on the current task. This should not be used when actually saving. Ideally we might want to use a separate enum for the type to save but this enum and the enum values would then need to be named which has problems with clarity and ambiguity. In order to ease any future debugging that might be required this patch does not actually update any of the decision making about what to save, it merely starts tracking the new information and warns if the requested state is not what we would otherwise have decided to save. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-4-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVEMark Brown
When we save the state for the floating point registers this can be done in the form visible through either the FPSIMD V registers or the SVE Z and P registers. At present we track which format is currently used based on TIF_SVE and the SME streaming mode state but particularly in the SVE case this limits our options for optimising things, especially around syscalls. Introduce a new enum which we place together with saved floating point state in both thread_struct and the KVM guest state which explicitly states which format is active and keep it up to date when we change it. At present we do not use this state except to verify that it has the expected value when loading the state, future patches will introduce functional changes. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-3-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29KVM: arm64: Discard any SVE state when entering KVM guestsMark Brown
Since 8383741ab2e773a99 (KVM: arm64: Get rid of host SVE tracking/saving) KVM has not tracked the host SVE state, relying on the fact that we currently disable SVE whenever we perform a syscall. This may not be true in future since performance optimisation may result in us keeping SVE enabled in order to avoid needing to take access traps to reenable it. Handle this by clearing TIF_SVE and converting the stored task state to FPSIMD format when preparing to run the guest. This is done with a new call fpsimd_kvm_prepare() to keep the direct state manipulation functions internal to fpsimd.c. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221115094640.112848-2-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29Merge tag 'at91-fixes-6.1-3' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes AT91 fixes for 6.1 #3 It contains: - build fix for SAMA5D3 devices which don't have an L2 cache and due to this accesssing outer_cache.write_sec in sama5_secure_cache_init() could throw undefined reference to `outer_cache' if CONFIG_OUTER_CACHE is disabled from common sama5_defconfig. * tag 'at91-fixes-6.1-3' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: at91: fix build for SAMA5D3 w/o L2 cache Link: https://lore.kernel.org/r/20221125093521.382105-1-claudiu.beznea@microchip.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-29arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NIAnshuman Khandual
__armv8pmu_probe_pmu() returns if detected PMU is either not implemented or implementation defined. Extracted ID_AA64DFR0_EL1_PMUVer value, when PMU is not implemented is '0' which can be replaced with ID_AA64DFR0_EL1_PMUVer_NI defined as '0b0000'. Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221128025449.39085-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29ARM: 9277/1: Make the dumped instructions are consistent with the ↵Zhen Lei
disassembled ones In ARM, the mapping of instruction memory is always little-endian, except some BE-32 supported ARM architectures. Such as ARMv7-R, its instruction endianness may be BE-32. Of course, its data endianness will also be BE-32 mode. Due to two negatives make a positive, the instruction stored in the register after reading is in little-endian format. But for the case of BE-8, the instruction endianness is LE, the instruction stored in the register after reading is in big-endian format, which is inconsistent with the disassembled one. For example: The content of disassembly: c0429ee8: e3500000 cmp r0, #0 c0429eec: 159f2044 ldrne r2, [pc, #68] c0429ef0: 108f2002 addne r2, pc, r2 c0429ef4: 1882000a stmne r2, {r1, r3} c0429ef8: e7f000f0 udf #0 The output of undefined instruction exception: Internal error: Oops - undefined instruction: 0 [#1] SMP ARM ... ... Code: 000050e3 44209f15 02208f10 0a008218 (f000f0e7) This inconveniences the checking of instructions. What's worse is that, for somebody who don't know about this, might think the instructions are all broken. So, when CONFIG_CPU_ENDIAN_BE8=y, let's convert the instructions to little-endian format before they are printed. The conversion result is as follows: Code: e3500000 159f2044 108f2002 1882000a (e7f000f0) Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-29KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabledPeter Collingbourne
Certain VMMs such as crosvm have features (e.g. sandboxing) that depend on being able to map guest memory as MAP_SHARED. The current restriction on sharing MAP_SHARED pages with the guest is preventing the use of those features with MTE. Now that the races between tasks concurrently clearing tags on the same page have been fixed, remove this restriction. Note that this is a relaxation of the ABI. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-8-pcc@google.com
2022-11-29KVM: arm64: unify the tests for VMAs in memslots when MTE is enabledPeter Collingbourne
Previously we allowed creating a memslot containing a private mapping that was not VM_MTE_ALLOWED, but would later reject KVM_RUN with -EFAULT. Now we reject the memory region at memslot creation time. Since this is a minor tweak to the ABI (a VMM that created one of these memslots would fail later anyway), no VMM to my knowledge has MTE support yet, and the hardware with the necessary features is not generally available, we can probably make this ABI change at this point. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-7-pcc@google.com
2022-11-29arm64: mte: Lock a page for MTE tag initialisationCatalin Marinas
Initialising the tags and setting PG_mte_tagged flag for a page can race between multiple set_pte_at() on shared pages or setting the stage 2 pte via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and set it before attempting page initialisation. Given that PG_mte_tagged is never cleared for a page, consider setting this flag to mean page unlocked and wait on this bit with acquire semantics if the page is locked: - try_page_mte_tagging() - lock the page for tagging, return true if it can be tagged, false if already tagged. No acquire semantics if it returns true (PG_mte_tagged not set) as there is no serialisation with a previous set_page_mte_tagged(). - set_page_mte_tagged() - set PG_mte_tagged with release semantics. The two-bit locking is based on Peter Collingbourne's idea. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-6-pcc@google.com
2022-11-29KVM: arm64: Simplify the sanitise_mte_tags() logicCatalin Marinas
Currently sanitise_mte_tags() checks if it's an online page before attempting to sanitise the tags. Such detection should be done in the caller via the VM_MTE_ALLOWED vma flag. Since kvm_set_spte_gfn() does not have the vma, leave the page unmapped if not already tagged. Tag initialisation will be done on a subsequent access fault in user_mem_abort(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix the page initializer] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-4-pcc@google.com
2022-11-29arm64: mte: Fix/clarify the PG_mte_tagged semanticsCatalin Marinas
Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags. Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
2022-11-29mm: Do not enable PG_arch_2 for all 64-bit architecturesCatalin Marinas
Commit 4beba9486abd ("mm: Add PG_arch_2 page flag") introduced a new page flag for all 64-bit architectures. However, even if an architecture is 64-bit, it may still have limited spare bits in the 'flags' member of 'struct page'. This may happen if an architecture enables SPARSEMEM without SPARSEMEM_VMEMMAP as is the case with the newly added loongarch. This architecture port needs 19 more bits for the sparsemem section information and, while it is currently fine with PG_arch_2, adding any more PG_arch_* flags will trigger build-time warnings. Add a new CONFIG_ARCH_USES_PG_ARCH_X option which can be selected by architectures that need more PG_arch_* flags beyond PG_arch_1. Select it on arm64. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-2-pcc@google.com
2022-11-29Merge tag 'kvm-s390-master-6.1-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD VSIE epdx shadowing fix
2022-11-28riscv: Sync efi page table's kernel mappings before switchingAlexandre Ghiti
The EFI page table is initially created as a copy of the kernel page table. With VMAP_STACK enabled, kernel stacks are allocated in the vmalloc area: if the stack is allocated in a new PGD (one that was not present at the moment of the efi page table creation or not synced in a previous vmalloc fault), the kernel will take a trap when switching to the efi page table when the vmalloc kernel stack is accessed, resulting in a kernel panic. Fix that by updating the efi kernel mappings before switching to the efi page table. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Fixes: b91540d52a08 ("RISC-V: Add EFI runtime services") Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20221121133303.1782246-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-28arm64: dts: Update cache properties for broadcomPierre Gondois
The DeviceTree Specification v0.3 specifies that the cache node 'compatible' and 'cache-level' properties are 'required'. Cf. s3.8 Multi-level and Shared Cache Nodes The 'cache-unified' property should be present if one of the properties for unified cache is present ('cache-size', ...). Update the Device Trees accordingly. Acked-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20221122163208.3810985-3-pierre.gondois@arm.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2022-11-28arm: dts: Update cache properties for broadcomPierre Gondois
The DeviceTree Specification v0.3 specifies that the cache node 'compatible' and 'cache-level' properties are 'required'. Cf. s3.8 Multi-level and Shared Cache Nodes The 'cache-unified' property should be present if one of the properties for unified cache is present ('cache-size', ...). Update the Device Trees accordingly. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20221122163208.3810985-2-pierre.gondois@arm.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2022-11-28ARM: dts: broadcom: align LED node names with dtschemaKrzysztof Kozlowski
The node names should be generic and DT schema expects certain pattern: bcm4708-asus-rt-ac68u.dtb: leds: 'logo', 'power', 'usb2', 'usb3' do not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221125144128.477059-1-krzysztof.kozlowski@linaro.org Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2022-11-28riscv: Fix NR_CPUS range conditionsSamuel Holland
The conditions reference the symbol SBI_V01, which does not exist. The correct symbol is RISCV_SBI_V01. Fixes: e623715f3d67 ("RISC-V: Increase range and default value of NR_CPUS") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221126061557.3541-1-samuel@sholland.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-28Merge tag 'kvm-s390-next-6.2-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function
2022-11-28KVM: x86: Advertise PREFETCHIT0/1 CPUID to user spaceJiaxi Chen
Latest Intel platform Granite Rapids has introduced a new instruction - PREFETCHIT0/1, which moves code to memory (cache) closer to the processor depending on specific hints. The bit definition: CPUID.(EAX=7,ECX=1):EDX[bit 14] PREFETCHIT0/1 is on a KVM-only subleaf. Plus an x86_FEATURE definition for this feature bit to direct it to the KVM entry. Advertise PREFETCHIT0/1 to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Message-Id: <20221125125845.1182922-9-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28KVM: x86: Advertise AVX-NE-CONVERT CPUID to user spaceJiaxi Chen
AVX-NE-CONVERT is a new set of instructions which can convert low precision floating point like BF16/FP16 to high precision floating point FP32, and can also convert FP32 elements to BF16. This instruction allows the platform to have improved AI capabilities and better compatibility. The bit definition: CPUID.(EAX=7,ECX=1):EDX[bit 5] AVX-NE-CONVERT is on a KVM-only subleaf. Plus an x86_FEATURE definition for this feature bit to direct it to the KVM entry. Advertise AVX-NE-CONVERT to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Message-Id: <20221125125845.1182922-8-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28KVM: x86: Advertise AVX-VNNI-INT8 CPUID to user spaceJiaxi Chen
AVX-VNNI-INT8 is a new set of instructions in the latest Intel platform Sierra Forest, aims for the platform to have superior AI capabilities. This instruction multiplies the individual bytes of two unsigned or unsigned source operands, then adds and accumulates the results into the destination dword element size operand. The bit definition: CPUID.(EAX=7,ECX=1):EDX[bit 4] AVX-VNNI-INT8 is on a new and sparse CPUID leaf and all bits on this leaf have no truly kernel use case for now. Given that and to save space for kernel feature bits, move this new leaf to KVM-only subleaf and plus an x86_FEATURE definition for AVX-VNNI-INT8 to direct it to the KVM entry. Advertise AVX-VNNI-INT8 to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Message-Id: <20221125125845.1182922-7-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86: KVM: Advertise AVX-IFMA CPUID to user spaceJiaxi Chen
AVX-IFMA is a new instruction in the latest Intel platform Sierra Forest. This instruction packed multiplies unsigned 52-bit integers and adds the low/high 52-bit products to Qword Accumulators. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 23] AVX-IFMA is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering AVX-IFMA itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise AVX-IFMA to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-6-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86: KVM: Advertise AMX-FP16 CPUID to user spaceChang S. Bae
Latest Intel platform Granite Rapids has introduced a new instruction - AMX-FP16, which performs dot-products of two FP16 tiles and accumulates the results into a packed single precision tile. AMX-FP16 adds FP16 capability and also allows a FP16 GPU trained model to run faster without loss of accuracy or added SW overhead. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 21] AMX-FP16 is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering AMX-FP16 itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise AMX-FP16 to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-5-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86: KVM: Advertise CMPccXADD CPUID to user spaceJiaxi Chen
CMPccXADD is a new set of instructions in the latest Intel platform Sierra Forest. This new instruction set includes a semaphore operation that can compare and add the operands if condition is met, which can improve database performance. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 7] CMPccXADD is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering CMPccXADD itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise CMPCCXADD to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-4-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafsSean Christopherson
Rename kvm_cpu_cap_init_scattered() to kvm_cpu_cap_init_kvm_defined() in anticipation of adding KVM-only CPUID leafs that aren't recognized by the kernel and thus not scattered, i.e. for leafs that are 100% KVM-defined. Adjust/add comments to kvm_only_cpuid_leafs and KVM_X86_FEATURE to document how to create new kvm_only_cpuid_leafs entries for scattered features as well as features that are entirely unknown to the kernel. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221125125845.1182922-3-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28KVM: x86: Add BUILD_BUG_ON() to detect bad usage of "scattered" flagsSean Christopherson
Add a compile-time assert in the SF() macro to detect improper usage, i.e. to detect passing in an X86_FEATURE_* flag that isn't actually scattered by the kernel. Upcoming feature flags will be 100% KVM-only and will have X86_FEATURE_* macros that point at a kvm_only_cpuid_leafs word, not a kernel-defined word. Using SF() and thus boot_cpu_has() for such feature flags would access memory beyond x86_capability[NCAPINTS] and at best incorrectly hide a feature, and at worst leak kernel state to userspace. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221125125845.1182922-2-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28KVM: x86/xen: Add CPL to Xen hypercall tracepointDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28iommu/hyper-v: Allow hyperv irq remapping without x2apicNuno Das Neves
If x2apic is not available, hyperv-iommu skips remapping irqs. This breaks root partition which always needs irqs remapped. Fix this by allowing irq remapping regardless of x2apic, and change hyperv_enable_irq_remapping() to return IRQ_REMAP_XAPIC_MODE in case x2apic is missing. Tested with root and non-root hyperv partitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28clocksource: hyper-v: Add TSC page support for root partitionStanislav Kinsburskiy
Microsoft Hypervisor root partition has to map the TSC page specified by the hypervisor, instead of providing the page to the hypervisor like it's done in the guest partitions. However, it's too early to map the page when the clock is initialized, so, the actual mapping is happening later. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-hyperv@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166759443644.385891.15921594265843430260.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28clocksource: hyper-v: Use TSC PFN getter to map vvar pageStanislav Kinsburskiy
Instead of converting the virtual address to physical directly. This is a precursor patch for the upcoming support for TSC page mapping into Microsoft Hypervisor root partition, where TSC PFN will be defined by the hypervisor and thus can't be obtained by linear translation of the physical address. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: Andy Lutomirski <luto@kernel.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-kernel@vger.kernel.org CC: linux-hyperv@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166749833939.218190.14095015146003109462.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28x86/hyperv: Expand definition of struct hv_vp_assist_pageSaurabh Sengar
The struct hv_vp_assist_page has 24 bytes which is defined as u64[3], expand that to expose vtl_entry_reason, vtl_ret_x64rax and vtl_ret_x64rcx field. vtl_entry_reason is updated by hypervisor for the entry reason as to why the VTL was entered on the virtual processor. Guest updates the vtl_ret_* fields to provide the register values to restore on VTL return. The specific register values that are restored which will be updated on vtl_ret_x64rax and vtl_ret_x64rcx. Also added the missing fields for synthetic_time_unhalted_timer_expired, virtualization_fault_information and intercept_message. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/1667587123-31645-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu runMarc Zyngier
Userspace can play some dirty tricks on us by selecting a given PMU version (such as PMUv3p5), restore a PMCR_EL0 value that has PMCR_EL0.LP set, and then switch the PMU version to PMUv3p1, for example. In this situation, we end-up with PMCR_EL0.LP being set and spreading havoc in the PMU emulation. This is specially hard as the first two step can be done on one vcpu and the third step on another, meaning that we need to sanitise *all* vcpus when the PMU version is changed. In orer to avoid a pretty complicated locking situation, defer the sanitisation of PMCR_EL0 to the point where the vcpu is actually run for the first tine, using the existing KVM_REQ_RELOAD_PMU request that calls into kvm_pmu_handle_pmcr(). There is still an obscure corner case where userspace could do the above trick, and then save the VM without running it. They would then observe an inconsistent state (PMUv3.1 + LP set), but that state will be fixed on the first run anyway whenever the guest gets restored on a host. Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-11-28KVM: arm64: PMU: Simplify PMCR_EL0 reset handlingMarc Zyngier
Resetting PMCR_EL0 is a pretty involved process that includes poisoning some of the writable bits, just because we can. It makes it hard to reason about about what gets configured, and just resetting things to 0 seems like a much saner option. Reduce reset_pmcr() to just preserving PMCR_EL0.N from the host, and setting PMCR_EL0.LC if we don't support AArch32. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-11-28KVM: arm64: PMU: Replace version number '0' with ID_AA64DFR0_EL1_PMUVer_NIAnshuman Khandual
kvm_host_pmu_init() returns when detected PMU is either not implemented, or implementation defined. kvm_pmu_probe_armpmu() also has a similar situation. Extracted ID_AA64DFR0_EL1_PMUVer value, when PMU is not implemented is '0', which can be replaced with ID_AA64DFR0_EL1_PMUVer_NI defined as '0b0000'. Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221128135629.118346-1-anshuman.khandual@arm.com
2022-11-28ARM: 9276/1: Refactor dump_instr()Zhen Lei
1. Rename local variable 'val16' to 'tmp'. So that the processing statements of thumb and arm can be aligned. 2. Fix two sparse check warnings: (add __user for type conversion) warning: incorrect type in initializer (different address spaces) expected unsigned short [noderef] __user *register __p got unsigned short [usertype] * 3. Prepare for the next patch to avoid repeated judgment. Before: if (!user_mode(regs)) { if (thumb) else } else { if (thumb) else } After: if (thumb) { if (user_mode(regs)) else } else { if (user_mode(regs)) else } Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISANathan Chancellor
When building with CONFIG_THUMB2_KERNEL=y + a version of clang from Debian using CROSS_COMPILE=arm-linux-gnueabihf-, the following warning occurs frequently: <built-in>:383:9: warning: '__thumb2__' macro redefined [-Wmacro-redefined] #define __thumb2__ 2 ^ <built-in>:353:9: note: previous definition is here #define __thumb2__ 1 ^ 1 warning generated. Debian carries a downstream patch that changes the default CPU of the arm-linux-gnueabihf target from 'arm1176jzf-s' (v6) to 'cortex-a7' (v7). As a result, '-mthumb' defines both '__thumb__' and '__thumb2__'. The define of '__thumb2__' via the command line was purposefully added to catch a situation like this. In a similar vein as commit 26b12e084bce ("ARM: 9264/1: only use -mtp=cp15 for the compiler"), do not add '-mthumb' to AFLAGS_ISA, as it is already passed to the assembler via '-Wa,-mthumb' and '__thumb2__' is already defined for preprocessing. Link: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/raw/622dbcbd40b316ed3905a2d25d9623544a06e6b1/debian/patches/930008-arm.diff Fixes: 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") Reported-by: "kernelci.org bot" <bot@kernelci.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9274/1: Add hwcap for Speculative Store Bypassing SafeAmit Daniel Kachhap
Speculative Store Bypassing Safe(FEAT_SSBS) is a feature present in AArch32 state for Armv8 and is represented by ID_PFR2_EL1.SSBS identification register. This feature denotes the presence of PSTATE.ssbs bit and hence adding a hwcap will enable the userspace to check it before trying to set/unset this PSTATE. This commit adds the ID feature bit detection, and uses elf_hwcap2 accordingly. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9273/1: Add hwcap for Speculation Barrier(SB)Amit Daniel Kachhap
Speculation Barrier(FEAT_SB) is a feature present in AArch32 state for Armv8 and is represented by ISAR6.SB identification register. This feature denotes the presence of SB instruction and hence adding a hwcap will enable the userspace to check it before trying to use this instruction. This commit adds the ID feature bit detection, and uses elf_hwcap2 accordingly. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9272/1: vfp: Add hwcap for FEAT_AA32I8MMAmit Daniel Kachhap
Int8 matrix multiplication (FEAT_AA32I8MM) is a feature present in AArch32 state for Armv8 and is represented by ISAR6.I8MM identification register. This feature denotes the presence of VSMMLA, VSUDOT, VUMMLA, VUSMMLA and VUSDOT instructions and hence adding a hwcap will enable the userspace to check it before trying to use those instructions. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9271/1: vfp: Add hwcap for FEAT_AA32BF16Amit Daniel Kachhap
Advanced SIMD BFloat16 (FEAT_AA32BF16) is a feature present in AArch32 state for Armv8 and is represented by ISAR6.BF16 identification register. This feature denotes the presence of VCVT, VCVTB, VCVTT, VDOT, VFMAB, VFMAT and VMMLA instructions and hence adding a hwcap will enable the userspace to check it before trying to use those instructions. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9270/1: vfp: Add hwcap for FEAT_FHMAmit Daniel Kachhap
Floating-point half-precision multiplication (FHM) is a feature present in AArch32 state for Armv8 and is represented by ISAR6.FHM identification register. This feature denotes the presence of VFMAL and VMFSL instructions and hence adding a hwcap will enable the userspace to check it before trying to use those instructions. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-11-28ARM: 9269/1: vfp: Add hwcap for FEAT_DotProdAmit Daniel Kachhap
Advanced Dot product is a feature present in AArch32 state for Armv8 and is represented by ISAR6 identification register. This feature denotes the presence of UDOT and SDOT instructions and hence adding a hwcap will enable the userspace to check it before trying to use those instructions. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>