summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2023-10-16arm64: Use a positive cpucap for FP/SIMDMark Rutland
Currently we have a negative cpucap which describes the *absence* of FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is somewhat awkward relative to other cpucaps that describe the presence of a feature, and it would be nicer to have a cpucap which describes the presence of FP/SIMD: * This will allow the cpucap to be treated as a standard ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard has_cpuid_feature() function and ARM64_CPUID_FIELDS() description. * This ensures that the cpucap will only transition from not-present to present, reducing the risk of unintentional and/or unsafe usage of FP/SIMD before cpucaps are finalized. * This will allow using arm64_cpu_capabilities::cpu_enable() to enable the use of FP/SIMD later, with FP/SIMD being disabled at boot time otherwise. This will ensure that any unintentional and/or unsafe usage of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is never unintentionally enabled for userspace in mismatched big.LITTLE systems. This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a positive ARM64_HAS_FPSIMD cpucap, making changes as described above. Note that as FP/SIMD will now be trapped when not supported system-wide, do_fpsimd_acc() must handle these traps in the same way as for SVE and SME. The commentary in fpsimd_restore_current_state() is updated to describe the new scheme. No users of system_supports_fpsimd() need to know that FP/SIMD is available prior to alternatives being patched, so this is updated to use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD cpucap, without generating code to test the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Rename SVE/SME cpu_enable functionsMark Rutland
The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2, and FA64 are named with an unusual "${feature}_kernel_enable" pattern rather than the much more common "cpu_enable_${feature}". Now that we only use these as cpu_enable() callbacks, it would be nice to have them match the usual scheme. This patch renames the cpu_enable() callbacks to match this scheme. At the same time, the comment above cpu_enable_sve() is removed for consistency with the other cpu_enable() callbacks. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Use build-time assertions for cpucap orderingMark Rutland
Both sme2_kernel_enable() and fa64_kernel_enable() need to run after sme_kernel_enable(). This happens to be true today as ARM64_SME has a lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions have a comment to this effect. It would be nicer to have a build-time assertion like we for for can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way it will be harder to miss any potential breakage. This patch replaces the comments with build-time assertions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Explicitly save/restore CPACR when probing SVE and SMEMark Rutland
When a CPUs onlined we first probe for supported features and propetites, and then we subsequently enable features that have been detected. This is a little problematic for SVE and SME, as some properties (e.g. vector lengths) cannot be probed while they are disabled. Due to this, the code probing for SVE properties has to enable SVE for EL1 prior to proving, and the code probing for SME properties has to enable SME for EL1 prior to probing. We never disable SVE or SME for EL1 after probing. It would be a little nicer to transiently enable SVE and SME during probing, leaving them both disabled unless explicitly enabled, as this would make it much easier to catch unintentional usage (e.g. when they are not present system-wide). This patch reworks the SVE and SME feature probing code to only transiently enable support at EL1, disabling after probing is complete. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: kvm: Use cpus_have_final_cap() explicitlyMark Rutland
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Split kpti_install_ng_mappings()Mark Rutland
The arm64_cpu_capabilities::cpu_enable callbacks are intended for cpu-local feature enablement (e.g. poking system registers). These get called for each online CPU when boot/system cpucaps get finalized and enabled, and get called whenever a CPU is subsequently onlined. For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the kpti_install_ng_mappings() function as the cpu_enable callback. This does a mixture of cpu-local configuration (setting VBAR_EL1 to the appropriate trampoline vectors) and some global configuration (rewriting the swapper page tables to sue non-glboal mappings) that must happen at most once. This patch splits kpti_install_ng_mappings() into a cpu-local cpu_enable_kpti() initialization function and a system-wide kpti_install_ng_mappings() function. The cpu_enable_kpti() function is responsible for selecting the necessary cpu-local vectors each time a CPU is onlined, and the kpti_install_ng_mappings() function performs the one-time rewrite of the translation tables too use non-global mappings. Splitting the two makes the code a bit easier to follow and also allows the page table rewriting code to be marked as __init such that it can be freed after use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Fixup user features at boot timeMark Rutland
For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as CPUs may attempt to apply the workaround concurrently, requiring that we protect the bulk of the callback with a raw_spinlock, and requiring some pointless work every time a CPU is subsequently hotplugged in. This patch makes this a little simpler by handling the masking once at boot time. A new user_feature_fixup() function is called at the start of setup_user_features() to mask the feature, matching the style of elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Note that the ARM64_WORKAROUND_2658417 capability is matched with ERRATA_MIDR_RANGE(), which implicitly gives the capability a ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of a CPU with the erratum if the erratum was not present at boot time. Therefore this patch doesn't change the behaviour for late onlining. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Rework setup_cpu_features()Mark Rutland
Currently setup_cpu_features() handles a mixture of one-time kernel feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF hwcaps). Subsequent patches will rework other one-time setup and expand the logic currently in setup_cpu_features(), and in preparation for this it would be helpful to split the kernel and user setup into separate functions. This patch splits setup_user_features() out of setup_cpu_features(), with a few additional cleanups of note: * setup_cpu_features() is renamed to setup_system_features() to make it clear that it handles system-wide feature setup rather than cpu-local feature setup. * setup_system_capabilities() is folded into setup_system_features(). * Presence of TTBR0 pan is logged immediately after update_cpu_capabilities(), so that this is guaranteed to appear alongside all the other detected system cpucaps. * The 'cwg' variable is removed as its value is only consumed once and it's simpler to use cache_type_cwg() directly without assigning its return value to a variable. * The call to setup_user_features() is moved after alternatives are patched, which will allow user feature setup code to depend on alternative branches and allow for simplifications in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Add cpus_have_final_boot_cap()Mark Rutland
The cpus_have_final_cap() function can be used to test a cpucap while also verifying that we do not consume the cpucap until system capabilities have been finalized. It would be helpful if we could do likewise for boot cpucaps. This patch adds a new cpus_have_final_boot_cap() helper which can be used to test a cpucap while also verifying that boot capabilities have been finalized. Users will be added in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Add cpucap_is_possible()Mark Rutland
Many cpucaps can only be set when certain CONFIG_* options are selected, and we need to check the CONFIG_* option before the cap in order to avoid generating redundant code. Due to this, we have a growing number of helpers in <asm/cpufeature.h> of the form: | static __always_inline bool system_supports_foo(void) | { | return IS_ENABLED(CONFIG_ARM64_FOO) && | cpus_have_const_cap(ARM64_HAS_FOO); | } This is unfortunate as it forces us to use cpus_have_const_cap() unnecessarily, resulting in redundant code being generated by the compiler. In the vast majority of cases, we only require that feature checks indicate the presence of a feature after cpucaps have been finalized, and so it would be sufficient to use alternative_has_cap_*(). However some code needs to handle a feature before alternatives have been patched, and must test the system_cpucaps bitmap via cpus_have_const_cap(). In other cases we'd like to check for unintentional usage of a cpucap before alternatives are patched, and so it would be preferable to use cpus_have_final_cap(). Placing the IS_ENABLED() checks in each callsite is tedious and error-prone, and the same applies for writing wrappers for each comination of cpucap and alternative_has_cap_*() / cpus_have_cap() / cpus_have_final_cap(). It would be nicer if we could centralize the knowledge of which cpucaps are possible, and have alternative_has_cap_*(), cpus_have_cap(), and cpus_have_final_cap() handle this automatically. This patch adds a new cpucap_is_possible() function which will be responsible for checking the CONFIG_* option, and updates the low-level cpucap checks to use this. The existing CONFIG_* checks in <asm/cpufeature.h> are moved over to cpucap_is_possible(), but the (now trival) wrapper functions are retained for now. There should be no functional change as a result of this patch alone. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Factor out cpucap definitionsMark Rutland
For clarity it would be nice to factor cpucap manipulation out of <asm/cpufeature.h>, and the obvious place would be <asm/cpucap.h>, but this will clash somewhat with <generated/asm/cpucaps.h>. Rename <generated/asm/cpucaps.h> to <generated/asm/cpucap-defs.h>, matching what we do for <generated/asm/sysreg-defs.h>, and introduce a new <asm/cpucaps.h> which includes the generated header. Subsequent patches will fill out <asm/cpucaps.h>. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64/arm: xen: enlighten: Fix KPTI checksMark Rutland
When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_starting_cpu() must avoid registering each CPU's runstate region, and xen_guest_init() must avoid setting up features that depend upon it. We tried to ensure that in commit: f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled") ... where we added checks for xen_kernel_unmapped_at_usr(), which wraps arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit arm. Unfortunately, as xen_guest_init() is an early_initcall, this happens before secondary CPUs are booted and arm64 has finalized the ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as secondary CPUs are onlined. On a big.LITTLE system where the boot CPU does not require KPTI but some secondary CPUs do, this will result in xen_guest_init() intializing features that depend on the runstate region, and xen_starting_cpu() registering the runstate region on some CPUs before KPTI is subsequent enabled, resulting the the problems the aforementioned commit tried to avoid. Handle this more robsutly by deferring the initialization of the runstate region until secondary CPUs have been initialized and the ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is moved into a new hotplug starting function which is registered later when we're certain that KPTI will not be used. Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix EL2 Stage-1 MMIO mappings where a random address was used - Fix SMCCC function number comparison when the SVE hint is set RISC-V: - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test x86: - Fixes for TSC_AUX virtualization - Stop zapping page tables asynchronously, since we don't zap them as often as before" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX KVM: SVM: Fix TSC_AUX virtualization setup KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() KVM: x86/mmu: Open code leaf invalidation from mmu_notifier KVM: riscv: selftests: Selectively filter-out AIA registers KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers KVM: selftests: Assert that vasprintf() is successful KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
2023-09-23Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three are cc:stable" * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: nommu: fix empty /proc/<pid>/maps filemap: add filemap_map_order0_folio() to handle order0 folio proc: nommu: /proc/<pid>/maps: release mmap read lock mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement pidfd: prevent a kernel-doc warning argv_split: fix kernel-doc warnings scatterlist: add missing function params to kernel-doc selftests/proc: fixup proc-empty-vm test after KSM changes revert "scripts/gdb/symbols: add specific ko module load command" selftests: link libasan statically for tests with -fsanitize=address task_work: add kerneldoc annotation for 'data' argument mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-23Merge tag 'loongarch-fixes-6.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix lockdep, fix a boot failure, fix some build warnings, fix document links, and some cleanups" * tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: docs/zh_CN/LoongArch: Update the links of ABI docs/LoongArch: Update the links of ABI LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem() kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage LoongArch: Set all reserved memblocks on Node#0 at initialization LoongArch: Remove dead code in relocate_new_kernel LoongArch: Use _UL() and _ULL() LoongArch: Fix some build warnings with W=1 LoongArch: Fix lockdep static memory detection
2023-09-23Merge tag 's390-6.6-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix potential string buffer overflow in hypervisor user-defined certificates handling - Update defconfigs * tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cert_store: fix string length handling s390: update defconfigs
2023-09-23Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini
HEAD KVM/riscv fixes for 6.6, take #1 - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test
2023-09-23KVM: SVM: Do not use user return MSR support for virtualized TSC_AUXTom Lendacky
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B" within the VMSA. This means that the guest value is loaded on VMRUN and the host value is restored from the host save area on #VMEXIT. Since the value is restored on #VMEXIT, the KVM user return MSR support for TSC_AUX can be replaced by populating the host save area with the current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux post-boot, the host save area can be set once in svm_hardware_enable(). This eliminates the two WRMSR instructions associated with the user return MSR support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: Fix TSC_AUX virtualization setupTom Lendacky
The checks for virtualizing TSC_AUX occur during the vCPU reset processing path. However, at the time of initial vCPU reset processing, when the vCPU is first created, not all of the guest CPUID information has been set. In this case the RDTSCP and RDPID feature support for the guest is not in place and so TSC_AUX virtualization is not established. This continues for each vCPU created for the guest. On the first boot of an AP, vCPU reset processing is executed as a result of an APIC INIT event, this time with all of the guest CPUID information set, resulting in TSC_AUX virtualization being enabled, but only for the APs. The BSP always sees a TSC_AUX value of 0 which probably went unnoticed because, at least for Linux, the BSP TSC_AUX value is 0. Move the TSC_AUX virtualization enablement out of the init_vmcb() path and into the vcpu_after_set_cpuid() path to allow for proper initialization of the support after the guest CPUID information has been set. With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid() path, the intercepts must be either cleared or set based on the guest CPUID input. Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: INTERCEPT_RDTSCP is never intercepted anywayPaolo Bonzini
svm_recalc_instruction_intercepts() is always called at least once before the vCPU is started, so the setting or clearing of the RDTSCP intercept can be dropped from the TSC_AUX virtualization support. Extracted from a patch by Tom Lendacky. Cc: stable@vger.kernel.org Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronouslySean Christopherson
Stop zapping invalidate TDP MMU roots via work queue now that KVM preserves TDP MMU roots until they are explicitly invalidated. Zapping roots asynchronously was effectively a workaround to avoid stalling a vCPU for an extended during if a vCPU unloaded a root, which at the time happened whenever the guest toggled CR0.WP (a frequent operation for some guest kernels). While a clever hack, zapping roots via an unbound worker had subtle, unintended consequences on host scheduling, especially when zapping multiple roots, e.g. as part of a memslot. Because the work of zapping a root is no longer bound to the task that initiated the zap, things like the CPU affinity and priority of the original task get lost. Losing the affinity and priority can be especially problematic if unbound workqueues aren't affined to a small number of CPUs, as zapping multiple roots can cause KVM to heavily utilize the majority of CPUs in the system, *beyond* the CPUs KVM is already using to run vCPUs. When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root zap can result in KVM occupying all logical CPUs for ~8ms, and result in high priority tasks not being scheduled in in a timely manner. In v5.15, which doesn't preserve unloaded roots, the issues were even more noticeable as KVM would zap roots more frequently and could occupy all CPUs for 50ms+. Consuming all CPUs for an extended duration can lead to significant jitter throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots is a semi-frequent operation as memslots are deleted and recreated with different host virtual addresses to react to host GPU drivers allocating and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips during games due to the audio server's tasks not getting scheduled in promptly, despite the tasks having a high realtime priority. Deleting memslots isn't exactly a fast path and should be avoided when possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the memslot shenanigans, but KVM is squarely in the wrong. Not to mention that removing the async zapping eliminates a non-trivial amount of complexity. Note, one of the subtle behaviors hidden behind the async zapping is that KVM would zap invalidated roots only once (ignoring partial zaps from things like mmu_notifier events). Preserve this behavior by adding a flag to identify roots that are scheduled to be zapped versus roots that have already been zapped but not yet freed. Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can encounter invalid roots, as it's not at all obvious why zapping invalidated roots shouldn't simply zap all invalid roots. Reported-by: Pattara Teerapong <pteerapong@google.com> Cc: David Stevens <stevensd@google.com> Cc: Yiwei Zhang<zzyiwei@google.com> Cc: Paul Hsia <paulhsia@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230916003916.2545000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()Paolo Bonzini
All callers except the MMU notifier want to process all address spaces. Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe() and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe(). Extracted out of a patch by Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-22Merge tag 'acpi-6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a general ACPI processor driver regression and an ia64 build issue, both introduced recently. Specifics: - Fix recently introduced uninitialized memory access issue in the ACPI processor driver (Michal Wilczynski) - Fix ia64 build inadvertently broken by recent ACPI processor driver changes, which is prudent to do for 6.6 even though ia64 support is slated for removal in 6.7 (Ard Biesheuvel)" * tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: processor: Fix uninitialized access of buf in acpi_set_pdc_bits() acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()
2023-09-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Small crop of relatively boring arm64 fixes for -rc3. That's not to say we don't have any juicy bugs, however, it's just that fixes for those are likely to come via -mm and -tip for a hugetlb and an atomics issue respectively. I get left with the documentation... - Fix detection of "ClearBHB" and "Hinted Conditional Branch" features - Fix broken wildcarding for Arm PMU MAINTAINERS entry - Add missing documentation for userspace-visible ID register fields" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Document missing userspace visible fields in ID_AA64ISAR2_EL1 arm64/hbc: Document HWCAP2_HBC arm64/sme: Include ID_AA64PFR1_EL1.SME in cpu-feature-registers.rst arm64: cpufeature: Fix CLRBHB and BC detection MAINTAINERS: Use wildcard pattern for ARM PMU headers
2023-09-22Merge tag 'x86_urgent_for_v6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 rethunk fixes from Borislav Petkov: "Fix the patching ordering between static calls and return thunks" * tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86,static_call: Fix static-call vs return-thunk x86/alternatives: Remove faulty optimization
2023-09-22Merge tag 'x86-urgent-2023-09-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a kexec bug - Fix an UML build bug - Fix a handful of SRSO related bugs - Fix a shadow stacks handling bug & robustify related code * tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/shstk: Add warning for shadow stack double unmap x86/shstk: Remove useless clone error handling x86/shstk: Handle vfork clone failure correctly x86/srso: Fix SBPB enablement for spec_rstack_overflow=off x86/srso: Don't probe microcode in a guest x86/srso: Set CPUID feature bits independently of bug or mitigation status x86/srso: Fix srso_show_state() side effect x86/asm: Fix build of UML with KASAN x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
2023-09-22x86,static_call: Fix static-call vs return-thunkPeter Zijlstra
Commit 7825451fa4dc ("static_call: Add call depth tracking support") failed to realize the problem fixed there is not specific to call depth tracking but applies to all return-thunk uses. Move the fix to the appropriate place and condition. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: David Kaplan <David.Kaplan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-09-22x86/alternatives: Remove faulty optimizationJosh Poimboeuf
The following commit 095b8303f383 ("x86/alternative: Make custom return thunk unconditional") made '__x86_return_thunk' a placeholder value. All code setting X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So the optimization at the beginning of apply_returns() is dead code. Also, before the above-mentioned commit, the optimization actually had a bug It bypassed __static_call_fixup(), causing some raw returns to remain unpatched in static call trampolines. Thus the 'Fixes' tag. Fixes: d2408e043e72 ("x86/alternative: Optimize returns patching") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
2023-09-21Merge tag 'fix-ia64-build-for-v6.6' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ardb/linux Merge an ia64 ACPI build fix for v6.6 from Ard Biesheuvel: "Build fix for Itanium/ia64: - provide dummy implementation of acpi_proc_quirk_mwait_check() which was moved out of generic code into arch/x86, breaking the ia64 build" * tag 'fix-ia64-build-for-v6.6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ardb/linux: acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()
2023-09-21Merge tag 'powerpc-6.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A fix for breakpoint handling which was using get_user() while atomic - Fix the Power10 HASHCHK handler which was using get_user() while atomic - A few build fixes for issues caused by recent changes Thanks to Benjamin Gray, Christophe Leroy, Kajol Jain, and Naveen N Rao. * tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/dexcr: Move HASHCHK trap handler powerpc/82xx: Select FSL_SOC powerpc: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION and FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY powerpc/watchpoints: Annotate atomic context in more places powerpc/watchpoint: Disable pagefaults when getting user instruction powerpc/watchpoints: Disable preemption in thread_change_pc() powerpc/perf/hv-24x7: Update domain value check
2023-09-21KVM: x86/mmu: Open code leaf invalidation from mmu_notifierSean Christopherson
The mmu_notifier path is a bit of a special snowflake, e.g. it zaps only a single address space (because it's per-slot), and can't always yield. Because of this, it calls kvm_tdp_mmu_zap_leafs() in ways that no one else does. Iterate manually over the leafs in response to an mmu_notifier invalidation, instead of invoking kvm_tdp_mmu_zap_leafs(). Drop the @can_yield param from kvm_tdp_mmu_zap_leafs() as its sole remaining caller unconditionally passes "true". Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230916003916.2545000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-21RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensionsAnup Patel
The riscv_vcpu_get_isa_ext_single() should fail with -ENOENT error when corresponding ISA extension is not available on the host. Fixes: e98b1085be79 ("RISC-V: KVM: Factor-out ONE_REG related code to its own source file") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-09-21RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registersAnup Patel
The ISA_EXT registers to enabled/disable ISA extensions for VCPU are always available when underlying host has the corresponding ISA extension. The copy_isa_ext_reg_indices() called by the KVM_GET_REG_LIST API does not align with this expectation so let's fix it. Fixes: 031f9efafc08 ("KVM: riscv: Add KVM_GET_REG_LIST API support") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-09-20LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()Huacai Chen
As Linus suggested, kasan_mem_to_shadow()/kasan_shadow_to_mem() are not performance-critical and too big to inline. This is simply wrong so just define them out-of-line. If they really need to be inlined in future, such as the objtool / SMAP issue for X86, we should mark them __always_inline. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usageHuacai Chen
As Linus suggested, __HAVE_ARCH_XYZ is "stupid" and "having historical uses of it doesn't make it good". So migrate __HAVE_ARCH_SHADOW_MAP to separate macros named after the respective functions. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20LoongArch: Set all reserved memblocks on Node#0 at initializationHuacai Chen
After commit 61167ad5fecdea ("mm: pass nid to reserve_bootmem_region()") we get a panic if DEFERRED_STRUCT_PAGE_INIT is enabled: [ 0.000000] CPU 0 Unable to handle kernel paging request at virtual address 0000000000002b82, era == 90000000040e3f28, ra == 90000000040e3f18 [ 0.000000] Oops[#1]: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0+ #733 [ 0.000000] pc 90000000040e3f28 ra 90000000040e3f18 tp 90000000046f4000 sp 90000000046f7c90 [ 0.000000] a0 0000000000000001 a1 0000000000200000 a2 0000000000000040 a3 90000000046f7ca0 [ 0.000000] a4 90000000046f7ca4 a5 0000000000000000 a6 90000000046f7c38 a7 0000000000000000 [ 0.000000] t0 0000000000000002 t1 9000000004b00ac8 t2 90000000040e3f18 t3 90000000040f0800 [ 0.000000] t4 00000000000f0000 t5 80000000ffffe07e t6 0000000000000003 t7 900000047fff5e20 [ 0.000000] t8 aaaaaaaaaaaaaaab u0 0000000000000018 s9 0000000000000000 s0 fffffefffe000000 [ 0.000000] s1 0000000000000000 s2 0000000000000080 s3 0000000000000040 s4 0000000000000000 [ 0.000000] s5 0000000000000000 s6 fffffefffe000000 s7 900000000470b740 s8 9000000004ad4000 [ 0.000000] ra: 90000000040e3f18 reserve_bootmem_region+0xec/0x21c [ 0.000000] ERA: 90000000040e3f28 reserve_bootmem_region+0xfc/0x21c [ 0.000000] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 0.000000] PRMD: 00000000 (PPLV0 -PIE -PWE) [ 0.000000] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 0.000000] ECFG: 00070800 (LIE=11 VS=7) [ 0.000000] ESTAT: 00010800 [PIL] (IS=11 ECode=1 EsubCode=0) [ 0.000000] BADV: 0000000000002b82 [ 0.000000] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000) [ 0.000000] Modules linked in: [ 0.000000] Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____)) [ 0.000000] Stack : 0000000000000000 9000000002eb5430 0000003a00000020 90000000045ccd00 [ 0.000000] 900000000470e000 90000000002c1918 0000000000000000 9000000004110780 [ 0.000000] 00000000fe6c0000 0000000480000000 9000000004b4e368 9000000004110748 [ 0.000000] 0000000000000000 900000000421ca84 9000000004620000 9000000004564970 [ 0.000000] 90000000046f7d78 9000000002cc9f70 90000000002c1918 900000000470e000 [ 0.000000] 9000000004564970 90000000040bc0e0 90000000046f7d78 0000000000000000 [ 0.000000] 0000000000004000 90000000045ccd00 0000000000000000 90000000002c1918 [ 0.000000] 90000000002c1900 900000000470b700 9000000004b4df78 9000000004620000 [ 0.000000] 90000000046200a8 90000000046200a8 0000000000000000 9000000004218b2c [ 0.000000] 9000000004270008 0000000000000001 0000000000000000 90000000045ccd00 [ 0.000000] ... [ 0.000000] Call Trace: [ 0.000000] [<90000000040e3f28>] reserve_bootmem_region+0xfc/0x21c [ 0.000000] [<900000000421ca84>] memblock_free_all+0x114/0x350 [ 0.000000] [<9000000004218b2c>] mm_core_init+0x138/0x3cc [ 0.000000] [<9000000004200e38>] start_kernel+0x488/0x7a4 [ 0.000000] [<90000000040df0d8>] kernel_entry+0xd8/0xdc [ 0.000000] [ 0.000000] Code: 02eb21ad 00410f4c 380c31ac <262b818d> 6800b70d 02c1c196 0015001c 57fe4bb1 260002cd The reason is early memblock_reserve() in memblock_init() set node id to MAX_NUMNODES, making NODE_DATA(nid) a NULL dereference in the call chain reserve_bootmem_region() -> init_reserved_page(). After memblock_init(), those late calls of memblock_reserve() operate on subregions of memblock .memory regions. As a result, these reserved regions will be set to the correct node at the first iteration of memmap_init_reserved_pages(). So set all reserved memblocks on Node#0 at initialization can avoid this panic. Reported-by: WANG Xuerui <git@xen0n.name> Tested-by: WANG Xuerui <git@xen0n.name> Reviewed-by: WANG Xuerui <git@xen0n.name> # with nits addressed Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20LoongArch: Remove dead code in relocate_new_kernelTiezhu Yang
The initial aim is to silence the following objtool warning: arch/loongarch/kernel/relocate_kernel.o: warning: objtool: relocate_new_kernel+0x74: unreachable instruction There are two adjacent "b" instructions, the second one is unreachable, it is dead code, just remove it. Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20LoongArch: Use _UL() and _ULL()Andy Shevchenko
Use _UL() and _ULL() that are provided by const.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20LoongArch: Fix some build warnings with W=1Bibo Mao
There are some building warnings when building LoongArch kernel with W=1 as following, this patch fixes them. arch/loongarch/kernel/acpi.c:284:13: warning: no previous prototype for ‘acpi_numa_arch_fixup’ [-Wmissing-prototypes] 284 | void __init acpi_numa_arch_fixup(void) {} | ^~~~~~~~~~~~~~~~~~~~ arch/loongarch/kernel/time.c:32:13: warning: no previous prototype for ‘constant_timer_interrupt’ [-Wmissing-prototypes] 32 | irqreturn_t constant_timer_interrupt(int irq, void *data) | ^~~~~~~~~~~~~~~~~~~~~~~~ arch/loongarch/kernel/traps.c:496:25: warning: no previous prototype for 'do_fpe' [-Wmissing-prototypes] 496 | asmlinkage void noinstr do_fpe(struct pt_regs *regs | ^~~~~~ arch/loongarch/kernel/traps.c:813:22: warning: variable ‘opcode’ set but not used [-Wunused-but-set-variable] 813 | unsigned int opcode; | ^~~~~~ arch/loongarch/kernel/signal.c:895:14: warning: no previous prototype for ‘get_sigframe’ [-Wmissing-prototypes] 895 | void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, | ^~~~~~~~~~~~ arch/loongarch/kernel/syscall.c:21:40: warning: initialized field overwritten [-Woverride-init] 21 | #define __SYSCALL(nr, call) [nr] = (call), | ^ arch/loongarch/kernel/syscall.c:40:14: warning: no previous prototype for ‘do_syscall’ [-Wmissing-prototypes] 40 | void noinstr do_syscall(struct pt_regs *regs) | ^~~~~~~~~~ arch/loongarch/kernel/smp.c:502:17: warning: no previous prototype for ‘start_secondary’ [-Wmissing-prototypes] 502 | asmlinkage void start_secondary(void) | ^~~~~~~~~~~~~~~ arch/loongarch/kernel/process.c:309:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes] 309 | unsigned long arch_align_stack(unsigned long sp) | ^~~~~~~~~~~~~~~~ arch/loongarch/kernel/topology.c:13:5: warning: no previous prototype for ‘arch_register_cpu’ [-Wmissing-prototypes] 13 | int arch_register_cpu(int cpu) | ^~~~~~~~~~~~~~~~~ arch/loongarch/kernel/topology.c:27:6: warning: no previous prototype for ‘arch_unregister_cpu’ [-Wmissing-prototypes] 27 | void arch_unregister_cpu(int cpu) | ^~~~~~~~~~~~~~~~~~~ arch/loongarch/kernel/module-sections.c:103:5: warning: no previous prototype for ‘module_frob_arch_sections’ [-Wmissing-prototypes] 103 | int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, | ^~~~~~~~~~~~~~~~~~~~~~~~~ arch/loongarch/mm/hugetlbpage.c:56:5: warning: no previous prototype for ‘is_aligned_hugepage_range’ [-Wmissing-prototypes] 56 | int is_aligned_hugepage_range(unsigned long addr, unsigned long len) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20LoongArch: Fix lockdep static memory detectionHelge Deller
Since commit 0a6b58c5cd0d ("lockdep: fix static memory detection even more") the lockdep code uses is_kernel_core_data(), is_kernel_rodata() and init_section_contains() to verify if a lock is located inside a kernel static data section. This change triggers a failure on LoongArch, for which the vmlinux.lds.S script misses to put the locks (as part of in the .data.rel symbols) into the Linux data section. This patch fixes the lockdep problem by moving *(.data.rel*) symbols into the kernel data section (from _sdata to _edata). Additionally, move other wrongly assigned symbols too: - altinstructions into the _initdata section, - PLT symbols behind the read-only section, and - *(.la_abs) into the data section. Cc: stable <stable@kernel.org> # v6.4+ Fixes: 0a6b58c5cd0d ("lockdep: fix static memory detection even more") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-19sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warningGeert Uytterhoeven
When __ioremap_caller() was replaced by ioremap_prot(), the __ref annotation added in commit af1415314a4190b8 ("sh: Flag __ioremap_caller() __init_refok.") was removed, causing a modpost warning: WARNING: modpost: vmlinux: section mismatch in reference: ioremap_prot+0x88 (section: .text) -> ioremap_fixed (section: .init.text) ioremap_prot() calls ioremap_fixed() (which is marked __init), but only before mem_init_done becomes true, so this is safe. Hence fix this by re-adding the lost __ref. Link: https://lkml.kernel.org/r/20230911093850.1517389-1-geert+renesas@glider.be Fixes: 0453c9a78015cb22 ("sh: mm: convert to GENERIC_IOREMAP") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19x86/shstk: Add warning for shadow stack double unmapRick Edgecombe
There are several ways a thread's shadow stacks can get unmapped. This can happen on exit or exec, as well as error handling in exec or clone. The task struct already keeps track of the thread's shadow stack. Use the size variable to keep track of if the shadow stack has already been freed. When an attempt to double unmap the thread shadow stack is caught, warn about it and abort the operation. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-4-rick.p.edgecombe%40intel.com
2023-09-19x86/shstk: Remove useless clone error handlingRick Edgecombe
When clone fails after the shadow stack is allocated, any allocated shadow stack is cleaned up in exit_thread() in copy_process(). So the logic in copy_thread() is unneeded, and also will not handle failures that happen outside of copy_thread(). In addition, since there is a second attempt to unmap the same shadow stack, there is a race where an newly mapped region could get unmapped. So remove the logic in copy_thread() and rely on exit_thread() to handle clone failure. Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-3-rick.p.edgecombe%40intel.com
2023-09-19x86/shstk: Handle vfork clone failure correctlyRick Edgecombe
Shadow stacks are allocated automatically and freed on exit, depending on the clone flags. The two cases where new shadow stacks are not allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For !CLONE_VM, although a new stack is not allocated, it can be freed normally because it will happen in the child's copy of the VM. However, for CLONE_VFORK the parent and the child are actually using the same shadow stack. So the kernel doesn't need to allocate *or* free a shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need special tracking to avoid returning to userspace until the child exits or execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK shadow stacks. However, the tracking is not setup until the clone has succeeded (internally). Which means, if a CLONE_VFORK fails, the existing logic will not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack. This error handling cleanup logic runs via exit_thread() in the bad_fork_cleanup_thread label in copy_process(). The issue was seen in the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack using currently out-of-tree glibc patches. Fix it by not unmapping the vfork shadow stack in the error case as well. Since clone is implemented in core code, it is not ideal to pass the clone flags along the error path in order to have shadow stack code have symmetric logic in the freeing half of the thread shadow stack handling. Instead use the existing state for thread shadow stacks to track whether the thread is managing its own shadow stack. For CLONE_VFORK, simply set shstk->base and shstk->size to 0, and have it mean the thread is not managing a shadow stack and so should skip cleanup work. Implement this by breaking up the CLONE_VFORK and !CLONE_VM cases in shstk_alloc_thread_stack() to separate conditionals since, the logic is now different between them. In the case of CLONE_VFORK && !CLONE_VM, the existing behavior is to not clean up the shadow stack in the child (which should go away quickly with either be exit or exec), so maintain that behavior by handling the CLONE_VFORK case first in the allocation path. This new logioc cleanly handles the case of normal, successful CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well. So remove the existing, vfork shadow stack freeing logic. This is in deactivate_mm() where vfork_done is used to tell if it is a vfork child that can skip cleaning up the thread shadow stack. Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-2-rick.p.edgecombe%40intel.com
2023-09-19s390/cert_store: fix string length handlingPeter Oberparleiter
Building cert_store.o with W=1 reveals this bug: CC arch/s390/kernel/cert_store.o arch/s390/kernel/cert_store.c:443:45: warning: ‘sprintf’ may write a terminating nul past the end of the destination [-Wformat-overflow=] 443 | sprintf(desc + name_len, ":%04u:%08u", vce->vce_hdr.vc_index, cs_token); | ^ arch/s390/kernel/cert_store.c:443:9: note: ‘sprintf’ output between 15 and 18 bytes into a destination of size 15 443 | sprintf(desc + name_len, ":%04u:%08u", vce->vce_hdr.vc_index, cs_token); Fix this by using the correct maximum width for each integer component in both buffer length calculation and format string. Also switch to using snprintf() to guard against potential future changes to the integer range of each component. Fixes: 8cf57d7217c3 ("s390: add support for user-defined certificates") Reported-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390: update defconfigsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19x86/srso: Fix SBPB enablement for spec_rstack_overflow=offJosh Poimboeuf
If the user has requested no SRSO mitigation, other mitigations can use the lighter-weight SBPB instead of IBPB. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/b20820c3cfd1003171135ec8d762a0b957348497.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Don't probe microcode in a guestJosh Poimboeuf
To support live migration, the hypervisor sets the "lowest common denominator" of features. Probing the microcode isn't allowed because any detected features might go away after a migration. As Andy Cooper states: "Linux must not probe microcode when virtualised.  What it may see instantaneously on boot (owing to MSR_PRED_CMD being fully passed through) is not accurate for the lifetime of the VM." Rely on the hypervisor to set the needed IBPB_BRTYPE and SBPB bits. Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/3938a7209606c045a3f50305d201d840e8c834c7.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Set CPUID feature bits independently of bug or mitigation statusJosh Poimboeuf
Booting with mitigations=off incorrectly prevents the X86_FEATURE_{IBPB_BRTYPE,SBPB} CPUID bits from getting set. Also, future CPUs without X86_BUG_SRSO might still have IBPB with branch type prediction flushing, in which case SBPB should be used instead of IBPB. The current code doesn't allow for that. Also, cpu_has_ibpb_brtype_microcode() has some surprising side effects and the setting of these feature bits really doesn't belong in the mitigation code anyway. Move it to earlier. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/869a1709abfe13b673bdd10c2f4332ca253a40bc.1693889988.git.jpoimboe@kernel.org
2023-09-19x86/srso: Fix srso_show_state() side effectJosh Poimboeuf
Reading the 'spec_rstack_overflow' sysfs file can trigger an unnecessary MSR write, and possibly even a (handled) exception if the microcode hasn't been updated. Avoid all that by just checking X86_FEATURE_IBPB_BRTYPE instead, which gets set by srso_select_mitigation() if the updated microcode exists. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/27d128899cb8aee9eb2b57ddc996742b0c1d776b.1693889988.git.jpoimboe@kernel.org