summaryrefslogtreecommitdiff
path: root/arch/arm64/kernel
AgeCommit message (Collapse)Author
2021-09-23arm64: Restore forced disabling of KPTI on ThunderXdann frazier
A noted side-effect of commit 0c6c2d3615ef ("arm64: Generate cpucaps.h") is that cpucaps are now sorted, changing the enumeration order. This assumed no dependencies between cpucaps, which turned out not to be true in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered when unmap_kernel_at_el0() checks for it, so the kernel tries to run w/ KPTI - and quickly falls over. Because all ThunderX implementations have homogeneous CPUs, we can remove this dependency by just checking the current CPU for the erratum. Fixes: 0c6c2d3615ef ("arm64: Generate cpucaps.h") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: dann frazier <dann.frazier@canonical.com> Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210923145002.3394558-1-dann.frazier@canonical.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-21arm64: add MTE supported check to thread switching and syscall entry/exitPeter Collingbourne
This lets us avoid doing unnecessary work on hardware that does not support MTE, and will allow us to freely use MTE instructions in the code called by mte_thread_switch(). Since this would mean that we do a redundant check in mte_check_tfsr_el1(), remove it and add two checks now required in its callers. This also avoids an unnecessary DSB+ISB sequence on the syscall exit path for hardware not supporting MTE. Fixes: 65812c6921cc ("arm64: mte: Enable async tag check fault") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I02fd000d1ef2c86c7d2952a7f099b254ec227a5d Link: https://lore.kernel.org/r/20210915190336.398390-1-pcc@google.com [catalin.marinas@arm.com: adjust the commit log slightly] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-16arm64: Mark __stack_chk_guard as __ro_after_initDan Li
__stack_chk_guard is setup once while init stage and never changed after that. Although the modification of this variable at runtime will usually cause the kernel to crash (so does the attacker), it should be marked as __ro_after_init, and it should not affect performance if it is placed in the ro_after_init section. Signed-off-by: Dan Li <ashimida@linux.alibaba.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1631612642-102881-1-git-send-email-ashimida@linux.alibaba.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-16arm64/kernel: remove duplicate include in process.cLv Ruyi
Remove all but the first include of linux/sched.h from process.c Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210902011126.29828-1-lv.ruyi@zte.com.cn Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-16arm64/sve: Use correct size when reinitialising SVE stateMark Brown
When we need a buffer for SVE register state we call sve_alloc() to make sure that one is there. In order to avoid repeated allocations and frees we keep the buffer around unless we change vector length and just memset() it to ensure a clean register state. The function that deals with this takes the task to operate on as an argument, however in the case where we do a memset() we initialise using the SVE state size for the current task rather than the task passed as an argument. This is only an issue in the case where we are setting the register state for a task via ptrace and the task being configured has a different vector length to the task tracing it. In the case where the buffer is larger in the traced process we will leak old state from the traced process to itself, in the case where the buffer is smaller in the traced process we will overflow the buffer and corrupt memory. Fixes: bc0ee4760364 ("arm64/sve: Core task context handling") Cc: <stable@vger.kernel.org> # 4.15.x Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210909165356.10675-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-11Merge branch 'linus' into smp/urgentThomas Gleixner
Ensure that all usage sites of get/put_online_cpus() except for the struggler in drivers/thermal are gone. So the last user and the deprecated inlines can be removed.
2021-09-07Merge tag 'pci-v5.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Convert controller drivers to generic_handle_domain_irq() (Marc Zyngier) - Simplify VPD (Vital Product Data) access and search (Heiner Kallweit) - Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use simplified VPD interfaces (Heiner Kallweit) - Run Max Payload Size quirks before configuring MPS; work around ASMedia ASM1062 SATA MPS issue (Marek Behún) Resource management: - Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof Wilczyński) - Optimize pci_resource_len() to reduce kernel size (Zhen Lei) PCI device hotplug: - Fix a double unmap in ibmphp (Vishal Aslot) PCIe port driver: - Enable Bandwidth Notification only if port supports it (Stuart Hayes) Sysfs/proc/syscalls: - Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński) - Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof Wilczyński) - Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński) Virtualization: - Extend "pci=noats" to also turn on Translation Blocking to protect against some DMA attacks (Alex Williamson) - Add sysfs mechanism to control the type of reset used between device assignments to VMs (Amey Narkhede) - Add support for ACPI _RST reset method (Shanker Donthineni) - Add ACS quirks for Cavium multi-function devices (George Cherian) - Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan) - Allow HiSilicon AMBA devices that appear as fake PCI devices to use PASID and SVA (Zhangfei Gao) Endpoint framework: - Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I) - Zero-initialize endpoint test tool parameters so we don't use random parameters (Shunyong Yang) APM X-Gene PCIe controller driver: - Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang) Broadcom iProc PCIe controller driver: - Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because it's optional on BCMA devices (Rob Herring) - Fix BCMA probe resource handling (Rob Herring) Cadence PCIe driver: - Work around J7200 Link training electrical issue by increasing delays in LTSSM (Nadeem Athani) Intel IXP4xx PCI controller driver: - Depend on ARCH_IXP4XX to avoid useless config questions (Geert Uytterhoeven) Intel Keembay PCIe controller driver: - Add Intel Keem Bay PCIe controller (Srikanth Thokala) Marvell Aardvark PCIe controller driver: - Work around config space completion handling issues (Evan Wang) - Increase timeout for config access completions (Pali Rohár) - Emulate CRS Software Visibility bit (Pali Rohár) - Configure resources from DT 'ranges' property to fix I/O space access (Pali Rohár) - Serialize INTx mask/unmask (Pali Rohár) MediaTek PCIe controller driver: - Add MT7629 support in DT (Chuanjia Liu) - Fix an MSI issue (Chuanjia Liu) - Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number ("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if present (Chuanjia Liu) Microsoft Hyper-V host bridge driver: - Add ARM64 support (Boqun Feng) - Support "Create Interrupt v3" message (Sunil Muthuswamy) NVIDIA Tegra PCIe controller driver: - Use seq_puts(), move err_msg from stack to static, fix OF node leak (Christophe JAILLET) NVIDIA Tegra194 PCIe driver: - Disable suspend when in Endpoint mode (Om Prakash Singh) - Fix MSI-X address programming error (Om Prakash Singh) - Disable interrupts during suspend to avoid spurious AER link down (Om Prakash Singh) Renesas R-Car PCIe controller driver: - Work around hardware issue that prevents Link L1->L0 transition (Marek Vasut) - Fix runtime PM refcount leak (Dinghao Liu) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK356X host controller driver (Simon Xue) TI J721E PCIe driver: - Add support for J7200 and AM64 (Kishon Vijay Abraham I) Toshiba Visconti PCIe controller driver: - Add Toshiba Visconti PCIe host controller driver (Nobuhiro Iwamatsu) Xilinx NWL PCIe controller driver: - Enable PCIe reference clock via CCF (Hyun Kwon) Miscellaneous: - Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET) - Fix pci_dev_str_match_path() alloc while atomic bug (used for kernel parameters that specify devices) (Dan Carpenter) - Remove pointless Precision Time Management warning when PTM is present but not enabled (Jakub Kicinski) - Remove surplus "break" statements (Krzysztof Wilczyński)" * tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits) PCI: ibmphp: Fix double unmap of io_mem x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API PCI/VPD: Use unaligned access helpers PCI/VPD: Clean up public VPD defines and inline functions cxgb4: Use pci_vpd_find_id_string() to find VPD ID string PCI/VPD: Add pci_vpd_find_id_string() PCI/VPD: Include post-processing in pci_vpd_find_tag() PCI/VPD: Stop exporting pci_vpd_find_info_keyword() PCI/VPD: Stop exporting pci_vpd_find_tag() PCI: Set dma-can-stall for HiSilicon chips PCI: rockchip-dwc: Add Rockchip RK356X host controller driver PCI: dwc: Remove surplus break statement after return PCI: artpec6: Remove local code block from switch statement PCI: artpec6: Remove surplus break statement after return MAINTAINERS: Add entries for Toshiba Visconti PCIe controller PCI: visconti: Add Toshiba Visconti PCIe host controller driver PCI/portdrv: Enable Bandwidth Notification only if port supports it PCI: Allow PASID on fake PCIe devices without TLP prefixes PCI: mediatek: Use PCI domain to handle ports detection PCI: mediatek: Add new method to get irq number ...
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-03Merge tag 'kbuild-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add -s option (strict mode) to merge_config.sh to make it fail when any symbol is redefined. - Show a warning if a different compiler is used for building external modules. - Infer --target from ARCH for CC=clang to let you cross-compile the kernel without CROSS_COMPILE. - Make the integrated assembler default (LLVM_IAS=1) for CC=clang. - Add <linux/stdarg.h> to the kernel source instead of borrowing <stdarg.h> from the compiler. - Add Nick Desaulniers as a Kbuild reviewer. - Drop stale cc-option tests. - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG to handle symbols in inline assembly. - Show a warning if 'FORCE' is missing for if_changed rules. - Various cleanups * tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kbuild: redo fake deps at include/ksym/*.h kbuild: clean up objtool_args slightly modpost: get the *.mod file path more simply checkkconfigsymbols.py: Fix the '--ignore' option kbuild: merge vmlinux_link() between ARCH=um and other architectures kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh kbuild: merge vmlinux_link() between the ordinary link and Clang LTO kbuild: remove stale *.symversions kbuild: remove unused quiet_cmd_update_lto_symversions gen_compile_commands: extract compiler command from a series of commands x86: remove cc-option-yn test for -mtune= arc: replace cc-option-yn uses with cc-option s390: replace cc-option-yn uses with cc-option ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild sparc: move the install rule to arch/sparc/Makefile security: remove unneeded subdir-$(CONFIG_...) kbuild: sh: remove unused install script kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y kbuild: Switch to 'f' variants of integrated assembler flag kbuild: Shuffle blank line to improve comment meaning ...
2021-09-01Merge tag 'hyperv-next-signed-20210831' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - make Hyper-V code arch-agnostic (Michael Kelley) - fix sched_clock behaviour on Hyper-V (Ani Sinha) - fix a fault when Linux runs as the root partition on MSHV (Praveen Kumar) - fix VSS driver (Vitaly Kuznetsov) - cleanup (Sonia Sharma) * tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer Drivers: hv: Enable Hyper-V code to be built on ARM64 arm64: efi: Export screen_info arm64: hyperv: Initialize hypervisor on boot arm64: hyperv: Add panic handler arm64: hyperv: Add Hyper-V hypercall and register access utilities x86/hyperv: fix root partition faults when writing to VP assist page MSR hv: hyperv.h: Remove unused inline functions drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0 Drivers: hv: Move Hyper-V misc functionality to arch-neutral code Drivers: hv: Add arch independent default functions for some Hyper-V handlers Drivers: hv: Make portions of Hyper-V init code be arch neutral x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable asm-generic/hyperv: Add missing #include of nmi.h
2021-09-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for 32-bit tasks on asymmetric AArch32 systems (on top of the scheduler changes merged via the tip tree). - More entry.S clean-ups and conversion to C. - MTE updates: allow a preferred tag checking mode to be set per CPU (the overhead of synchronous mode is smaller for some CPUs than others); optimisations for kernel entry/exit path; optionally disable MTE on the kernel command line. - Kselftest improvements for SVE and signal handling, PtrAuth. - Fix unlikely race where a TLBI could use stale ASID on an ASID roll-over (found by inspection). - Miscellaneous fixes: disable trapping of PMSNEVFR_EL1 to higher exception levels; drop unnecessary sigdelsetmask() call in the signal32 handling; remove BUG_ON when failing to allocate SVE state (just signal the process); SYM_CODE annotations. - Other trivial clean-ups: use macros instead of magic numbers, remove redundant returns, typos. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (56 commits) arm64: Do not trap PMSNEVFR_EL1 arm64: mm: fix comment typo of pud_offset_phys() arm64: signal32: Drop pointless call to sigdelsetmask() arm64/sve: Better handle failure to allocate SVE register storage arm64: Document the requirement for SCR_EL3.HCE arm64: head: avoid over-mapping in map_memory arm64/sve: Add a comment documenting the binutils needed for SVE asm arm64/sve: Add some comments for sve_save/load_state() kselftest/arm64: signal: Add a TODO list for signal handling tests kselftest/arm64: signal: Add test case for SVE register state in signals kselftest/arm64: signal: Verify that signals can't change the SVE vector length kselftest/arm64: signal: Check SVE signal frame shows expected vector length kselftest/arm64: signal: Support signal frames with SVE register data kselftest/arm64: signal: Add SVE to the set of features we can check for arm64: replace in_irq() with in_hardirq() kselftest/arm64: pac: Fix skipping of tests on systems without PAC Documentation: arm64: describe asymmetric 32-bit support arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 arm64: Advertise CPUs capable of running 32-bit applications in sysfs ...
2021-09-01Merge branch 'siginfo-si_trapno-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo si_trapno updates from Eric Biederman: "The full set of si_trapno changes was not appropriate as a fix for the newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the related cleanups. This is the rest of the cleanups for si_trapno that reduces it from being a really weird arch special case that is expect to be always present (but isn't) on the architectures that support it to being yet another field in the _sigfault union of struct siginfo. The changes have been reviewed and marinated in linux-next. With the removal of this awkward special case new code (like SIGTRAP TRAP_PERF) that works across architectures should be easier to write and maintain" * 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency signal: Verify the alignment and size of siginfo_t signal: Remove the generic __ARCH_SI_TRAPNO support signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP arm64: Add compile-time asserts for siginfo_t offsets arm: Add compile-time asserts for siginfo_t offsets sparc64: Add compile-time asserts for siginfo_t offsets
2021-09-01drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()Thomas Gleixner
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework to ensure that the cache related functions are called on the upcoming CPU because the notifier itself could run on any online CPU. The hotplug state machine guarantees that the callbacks are invoked on the upcoming CPU. So there is no need to have this SMP function call obfuscation. That indirection was missed when the hotplug notifiers were converted. This also solves the problem of ARM64 init_cache_level() invoking ACPI functions which take a semaphore in that context. That's invalid as SMP function calls run with interrupts disabled. Running it just from the callback in context of the CPU hotplug thread solves this. Fixes: 8571890e1513 ("arm64: Add support for ACPI based firmware tables") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
2021-08-31Merge remote-tracking branch 'tip/sched/arm64' into for-next/coreCatalin Marinas
* tip/sched/arm64: (785 commits) Documentation: arm64: describe asymmetric 32-bit support arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 arm64: Advertise CPUs capable of running 32-bit applications in sysfs arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 arm64: Implement task_cpu_possible_mask() sched: Introduce dl_task_check_affinity() to check proposed affinity sched: Allow task CPU affinity to be restricted on asymmetric systems sched: Split the guts of sched_setaffinity() into a helper function sched: Introduce task_struct::user_cpus_ptr to track requested affinity sched: Reject CPU affinity changes based on task_cpu_possible_mask() cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 sched: Introduce task_cpu_possible_mask() to limit fallback rq selection sched: Cgroup SCHED_IDLE support sched/topology: Skip updating masks for non-online nodes Linux 5.14-rc6 lib: use PFN_PHYS() in devmem_is_allowed() ...
2021-08-26Merge branch 'for-next/entry' into for-next/coreCatalin Marinas
* for-next/entry: : More entry.S clean-ups and conversion to C. arm64: entry: call exit_to_user_mode() from C arm64: entry: move bulk of ret_to_user to C arm64: entry: clarify entry/exit helpers arm64: entry: consolidate entry/exit helpers
2021-08-26Merge branches 'for-next/mte', 'for-next/misc' and 'for-next/kselftest', ↵Catalin Marinas
remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: arm64/perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF * for-next/mte: : Miscellaneous MTE improvements. arm64/cpufeature: Optionally disable MTE via command-line arm64: kasan: mte: remove redundant mte_report_once logic arm64: kasan: mte: use a constant kernel GCR_EL1 value arm64: avoid double ISB on kernel entry arm64: mte: optimize GCR_EL1 modification on kernel entry/exit Documentation: document the preferred tag checking mode feature arm64: mte: introduce a per-CPU tag checking mode preference arm64: move preemption disablement to prctl handlers arm64: mte: change ASYNC and SYNC TCF settings into bitfields arm64: mte: rename gcr_user_excl to mte_ctrl arm64: mte: avoid TFSRE0_EL1 related operations unless in async mode * for-next/misc: : Miscellaneous updates. arm64: Do not trap PMSNEVFR_EL1 arm64: mm: fix comment typo of pud_offset_phys() arm64: signal32: Drop pointless call to sigdelsetmask() arm64/sve: Better handle failure to allocate SVE register storage arm64: Document the requirement for SCR_EL3.HCE arm64: head: avoid over-mapping in map_memory arm64/sve: Add a comment documenting the binutils needed for SVE asm arm64/sve: Add some comments for sve_save/load_state() arm64: replace in_irq() with in_hardirq() arm64: mm: Fix TLBI vs ASID rollover arm64: entry: Add SYM_CODE annotation for __bad_stack arm64: fix typo in a comment arm64: move the (z)install rules to arch/arm64/Makefile arm64/sve: Make fpsimd_bind_task_to_cpu() static arm64: unnecessary end 'return;' in void functions arm64/sme: Document boot requirements for SME arm64: use __func__ to get function name in pr_err arm64: SSBS/DIT: print SSBS and DIT bit when printing PSTATE arm64: cpufeature: Use defined macro instead of magic numbers arm64/kexec: Test page size support with new TGRAN range values * for-next/kselftest: : Kselftest additions for arm64. kselftest/arm64: signal: Add a TODO list for signal handling tests kselftest/arm64: signal: Add test case for SVE register state in signals kselftest/arm64: signal: Verify that signals can't change the SVE vector length kselftest/arm64: signal: Check SVE signal frame shows expected vector length kselftest/arm64: signal: Support signal frames with SVE register data kselftest/arm64: signal: Add SVE to the set of features we can check for kselftest/arm64: pac: Fix skipping of tests on systems without PAC kselftest/arm64: mte: Fix misleading output when skipping tests kselftest/arm64: Add a TODO list for floating point tests kselftest/arm64: Add tests for SVE vector configuration kselftest/arm64: Validate vector lengths are set in sve-probe-vls kselftest/arm64: Provide a helper binary and "library" for SVE RDVL kselftest/arm64: Ignore check_gcr_el1_cswitch binary
2021-08-25ACPI: Add memory semantics to acpi_os_map_memory()Lorenzo Pieralisi
The memory attributes attached to memory regions depend on architecture specific mappings. For some memory regions, the attributes specified by firmware (eg uncached) are not sufficient to determine how a memory region should be mapped by an OS (for instance a region that is define as uncached in firmware can be mapped as Normal or Device memory on arm64) and therefore the OS must be given control on how to map the region to match the expected mapping behaviour (eg if a mapping is requested with memory semantics, it must allow unaligned accesses). Rework acpi_os_map_memory() and acpi_os_ioremap() back-end to split them into two separate code paths: acpi_os_memmap() -> memory semantics acpi_os_ioremap() -> MMIO semantics The split allows the architectural implementation back-ends to detect the default memory attributes required by the mapping in question (ie the mapping API defines the semantics memory vs MMIO) and map the memory accordingly. Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25arm64: signal32: Drop pointless call to sigdelsetmask()Will Deacon
Commit 77097ae503b1 ("most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set") extended set_current_blocked() to remove SIGKILL and SIGSTOP from the new signal set and updated all callers accordingly. Unfortunately, this collided with the merge of the arm64 architecture, which duly removes these signals when restoring the compat sigframe, as this was what was previously done by arch/arm/. Remove the redundant call to sigdelsetmask() from compat_restore_sigframe(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210825093911.24493-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-24arm64/sve: Better handle failure to allocate SVE register storageMark Brown
Currently we "handle" failure to allocate the SVE register storage by doing a BUG_ON() and hoping for the best. This is obviously not great and the memory allocation failure will already be loud enough without the BUG_ON(). As the comment says it is a corner case but let's try to do a bit better, remove the BUG_ON() and add code to handle the failure in the callers. For the ptrace and signal code we can return -ENOMEM gracefully however we have no real error reporting path available to us for the SVE access trap so instead generate a SIGKILL if the allocation fails there. This at least means that we won't try to soldier on and end up trying to access the nonexistant state and while it's obviously not ideal for userspace SIGKILL doesn't allow any handling so minimises the ABI impact, making it easier to improve the interface later if we come up with a better idea. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210824153417.18371-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-24arm64: head: avoid over-mapping in map_memoryMark Rutland
The `compute_indices` and `populate_entries` macros operate on inclusive bounds, and thus the `map_memory` macro which uses them also operates on inclusive bounds. We pass `_end` and `_idmap_text_end` to `map_memory`, but these are exclusive bounds, and if one of these is sufficiently aligned (as a result of kernel configuration, physical placement, and KASLR), then: * In `compute_indices`, the computed `iend` will be in the page/block *after* the final byte of the intended mapping. * In `populate_entries`, an unnecessary entry will be created at the end of each level of table. At the leaf level, this entry will map up to SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend to map. As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may violate the boot protocol and map physical address past the 2MiB-aligned end address we are permitted to map. As we map these with Normal memory attributes, this may result in further problems depending on what these physical addresses correspond to. The final entry at each level may require an additional table at that level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate enough memory for this. Avoid the extraneous mapping by having map_memory convert the exclusive end address to an inclusive end address by subtracting one, and do likewise in EARLY_ENTRIES() when calculating the number of required tables. For clarity, comments are updated to more clearly document which boundaries the macros operate on. For consistency with the other macros, the comments in map_memory are also updated to describe `vstart` and `vend` as virtual addresses. Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels") Cc: <stable@vger.kernel.org> # 4.16.x Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-24arm64/sve: Add some comments for sve_save/load_state()Mark Brown
The use of macros for the actual function bodies means legibility is always going to be a bit of a challenge, especially while we can't rely on SVE support in the toolchain, but this helps a little. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210812201143.35578-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23arm64: PCI: Support root bridge preparation for Hyper-VBoqun Feng
Currently at root bridge preparation, the corresponding ACPI device will be set as the companion, however for a Hyper-V virtual PCI root bridge, there is no corresponding ACPI device, because a Hyper-V virtual PCI root bridge is discovered via VMBus rather than ACPI table. In order to support this, we need to make pcibios_root_bridge_prepare() work with cfg->parent being NULL. Use a NULL pointer as the ACPI device if there is no corresponding ACPI device, and this is fine because: 1) ACPI_COMPANION_SET() can work with the second parameter being NULL, 2) semantically, if a NULL pointer is set via ACPI_COMPANION_SET(), ACPI_COMPANION() (the read API for this field) will return NULL, and since ACPI_COMPANION() may return NULL, so users must have handled the cases where it returns NULL, and 3) since there is no corresponding ACPI device, it would be wrong to use any other value here. Link: https://lore.kernel.org/r/20210726180657.142727-5-boqun.feng@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-23arm64: PCI: Restructure pcibios_root_bridge_prepare()Boqun Feng
Restructure the pcibios_root_bridge_prepare() as the preparation for supporting cases when no real ACPI device is related to the PCI host bridge. No functional change. Link: https://lore.kernel.org/r/20210726180657.142727-4-boqun.feng@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-20Merge branch kvm-arm64/pkvm-fixed-features-prologue into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-fixed-features-prologue: : Rework a bunch of common infrastructure as a prologue : to Fuad Tabba's protected VM fixed feature series. KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit KVM: arm64: Add config register bit definitions KVM: arm64: Add feature register flag definitions KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug KVM: arm64: Restore mdcr_el2 from vcpu KVM: arm64: Refactor sys_regs.h,c for nVHE reuse KVM: arm64: Fix names of config register fields KVM: arm64: MDCR_EL2 is a 64-bit register KVM: arm64: Remove trailing whitespace in comment KVM: arm64: placeholder to check if VM is protected Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20arm64: Remove logic to kill 32-bit tasks on 64-bit-only coresWill Deacon
The scheduler now knows enough about these braindead systems to place 32-bit tasks accordingly, so throw out the safety checks and allow the ret-to-user path to avoid do_notify_resume() if there is nothing to do. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-16-will@kernel.org
2021-08-20arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0Will Deacon
Allow systems with mismatched 32-bit support at EL0 to run 32-bit applications based on a new kernel parameter. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-15-will@kernel.org
2021-08-20arm64: Advertise CPUs capable of running 32-bit applications in sysfsWill Deacon
Since 32-bit applications will be killed if they are caught trying to execute on a 64-bit-only CPU in a mismatched system, advertise the set of 32-bit capable CPUs to userspace in sysfs. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-14-will@kernel.org
2021-08-20arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched systemWill Deacon
If we want to support 32-bit applications, then when we identify a CPU with mismatched 32-bit EL0 support we must ensure that we will always have an active 32-bit CPU available to us from then on. This is important for the scheduler, because is_cpu_allowed() will be constrained to 32-bit CPUs for compat tasks and forced migration due to a hotplug event will hang if no 32-bit CPUs are available. On detecting a mismatch, prevent offlining of either the mismatching CPU if it is 32-bit capable, or find the first active 32-bit capable CPU otherwise. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-13-will@kernel.org
2021-08-20arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0Will Deacon
When exec'ing a 32-bit task on a system with mismatched support for 32-bit EL0, try to ensure that it starts life on a CPU that can actually run it. Similarly, when exec'ing a 64-bit task on such a system, try to restore the old affinity mask if it was previously restricted. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210730112443.23245-12-will@kernel.org
2021-08-20KVM: arm64: Add feature register flag definitionsFuad Tabba
Add feature register flag definitions to clarify which features might be supported. Consolidate the various ID_AA64PFR0_ELx flags for all ELs. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-10-tabba@google.com
2021-08-19isystem: trim/fixup stdarg.h and other headersAlexey Dobriyan
Delete/fixup few includes in anticipation of global -isystem compile option removal. Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition of uintptr_t error (one definition comes from <stddef.h>, another from <linux/types.h>). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-11arm64/perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEFAnshuman Khandual
ID_AA64DFR0_PMUVER_IMP_DEF, indicating an "implementation defined" PMU, never actually gets used although there are '0xf' instances scattered all around. Use the symbolic name instead of the raw hex constant. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1628652427-24695-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-06arm64: entry: Add SYM_CODE annotation for __bad_stackMark Brown
When converting arm64 to modern assembler annotations __bad_stack was left as a raw local label without annotations. While this will have little if any practical impact at present it may cause issues in the future if we start using the annotations for things like reliable stack trace. Add SYM_CODE annotations to fix this. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210804181710.19059-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: call exit_to_user_mode() from CMark Rutland
When handling an exception from EL0, we perform the entry work in that exception's C handler, and once the C handler has finished, we return back to the entry assembly. Subsequently in the common `ret_to_user` assembly we perform the exit work that balances with the entry work. This can be somewhat difficult to follow, and makes it hard to rework the return paths (e.g. to pass additional context to the exit code, or to have exception return logic for specific exceptions). This patch reworks the entry code such that each EL0 C exception handler is responsible for both the entry and exit work. This clearly balances the two (and will permit additional variation in future), and avoids an unnecessary bounce between assembly and C in the common case, leaving `ret_from_fork` as the only place assembly has to call the exit code. This means that the exit work is now inlined into the C handler, which is already the case for the entry work, and allows the compiler to generate better code (e.g. by immediately returning when there is no exit work to perform). To align with other exception entry/exit helpers, enter_from_user_mode() is updated to take the EL0 pt_regs as a parameter, though this is currently unused. There should be no functional change as a result of this patch. However, this should lead to slightly better backtraces when an error is encountered within do_notify_resume(), as the C handler should appear in the backtrace, indicating the specific exception that the kernel was entered with. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: move bulk of ret_to_user to CMark Rutland
In `ret_to_user` we perform some conditional work depending on the thread flags, then perform some IRQ/context tracking which is intended to balance with the IRQ/context tracking performed in the entry C code. For simplicity and consistency, it would be preferable to move this all to C. As a step towards that, this patch moves the conditional work and IRQ/context tracking into a C helper function. To aid bisectability, this is called from the `ret_to_user` assembly, and a subsequent patch will move the call to C code. As local_daif_mask() handles all necessary tracing and PMR manipulation, we no longer need to handle this explicitly. As we call exit_to_user_mode() directly, the `user_enter_irqoff` macro is no longer used, and can be removed. As enter_from_user_mode() and exit_to_user_mode() are no longer called from assembly, these can be made static, and as these are typically very small, they are marked __always_inline to avoid the overhead of a function call. For now, enablement of single-step is left in entry.S, and for this we still need to read the flags in ret_to_user(). It is safe to read this separately as TIF_SINGLESTEP is not part of _TIF_WORK_MASK. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-4-mark.rutland@arm.com [catalin.marinas@arm.com: removed unused gic_prio_kentry_setup macro] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: clarify entry/exit helpersMark Rutland
When entering an exception, we must perform irq/context state management before we can use instrumentable C code. Similarly, when exiting an exception we cannot use instrumentable C code after we perform irq/context state management. Originally, we'd intended that the enter_from_*() and exit_to_*() helpers would enforce this by virtue of being the first and last functions called, respectively, in an exception handler. However, as they now call instrumentable code themselves, this is not as clearly true. To make this more robust, this patch splits the irq/context state management into separate helpers, with all the helpers commented to make their intended purpose more obvious. In exit_to_kernel_mode() we'll now check TFSR_EL1 before we assert that IRQs are disabled, but this ordering is not important, and other than this there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-3-mark.rutland@arm.com [catalin.marinas@arm.com: comment typos fix-up] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-05arm64: entry: consolidate entry/exit helpersMark Rutland
To make the various entry/exit helpers easier to understand and easier to compare, this patch moves all the entry/exit helpers to be adjacent at the top of entry-common.c, rather than being spread out throughout the file. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20210802140733.52716-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-04arm64: efi: Export screen_infoMichael Kelley
The Hyper-V frame buffer driver may be built as a module, and it needs access to screen_info. So export screen_info. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1628092359-61351-5-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-08-04arm64: Move .hyp.rodata outside of the _sdata.._edata rangeMarc Zyngier
The HYP rodata section is currently lumped together with the BSS, which isn't exactly what is expected (it gets registered with kmemleak, for example). Move it away so that it is actually marked RO. As an added benefit, it isn't registered with kmemleak anymore. Fixes: 380e18ade4a5 ("KVM: arm64: Introduce a BSS section for use at Hyp") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org #5.13 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802123830.2195174-2-maz@kernel.org
2021-08-03arm64/cpufeature: Optionally disable MTE via command-lineYee Lee
MTE support needs to be optionally disabled in runtime for HW issue workaround, FW development and some evaluation works on system resource and performance. This patch makes two changes: (1) moves init of tag-allocation bits(ATA/ATA0) to cpu_enable_mte() as not cached in TLB. (2) allows ID_AA64PFR1_EL1.MTE to be overridden on its shadow value by giving "arm64.nomte" on cmdline. When the feature value is off, ATA and TCF will not set and the related functionalities are accordingly suppressed. Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Yee Lee <yee.lee@mediatek.com> Link: https://lore.kernel.org/r/20210803070824.7586-2-yee.lee@mediatek.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-03arm64: stacktrace: avoid tracing arch_stack_walk()Mark Rutland
When the function_graph tracer is in use, arch_stack_walk() may unwind the stack incorrectly, erroneously reporting itself, missing the final entry which is being traced, and reporting all traced entries between these off-by-one from where they should be. When ftrace hooks a function return, the original return address is saved to the fgraph ret_stack, and the return address in the LR (or the function's frame record) is replaced with `return_to_handler`. When arm64's unwinder encounter frames returning to `return_to_handler`, it finds the associated original return address from the fgraph ret stack, assuming the most recent `ret_to_hander` entry on the stack corresponds to the most recent entry in the fgraph ret stack, and so on. When arch_stack_walk() is used to dump the current task's stack, it starts from the caller of arch_stack_walk(). However, arch_stack_walk() can be traced, and so may push an entry on to the fgraph ret stack, leaving the fgraph ret stack offset by one from the expected position. This can be seen when dumping the stack via /proc/self/stack, where enabling the graph tracer results in an unexpected `stack_trace_save_tsk` entry at the start of the trace, and `el0_svc` missing form the end of the trace. This patch fixes this by marking arch_stack_walk() as notrace, as we do for all other functions on the path to ftrace_graph_get_ret_stack(). While a few helper functions are not marked notrace, their calls/returns are balanced, and will have no observable effect when examining the fgraph ret stack. It is possible for an exeption boundary to cause a similar offset if the return address of the interrupted context was in the LR. Fixing those cases will require some more substantial rework, and is left for subsequent patches. Before: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] stack_trace_save_tsk+0xa4/0x110 | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c After: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c Cc: <stable@vger.kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210802164845.45506-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-03arm64: fix the doc of RANDOMIZE_MODULE_REGION_FULLBarry Song
Obviously kaslr is setting the module region to 2GB rather than 4GB since commit b2eed9b588112 ("arm64/kernel: kaslr: reduce module randomization range to 2 GB"). So fix the size of region in Kconfig. On the other hand, even though RANDOMIZE_MODULE_REGION_FULL is not set, module_alloc() can fall back to a 2GB window if ARM64_MODULE_PLTS is set. In this case, veneers are still needed. !RANDOMIZE_MODULE_REGION_FULL doesn't necessarily mean veneers are not needed. So fix the doc to be more precise to avoid any confusion to the readers of the code. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com> Cc: Qi Liu <liuqi115@huawei.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210730125131.13724-1-song.bao.hua@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-03arm64: fix compat syscall return truncationMark Rutland
Due to inconsistencies in the way we manipulate compat GPRs, we have a few issues today: * For audit and tracing, where error codes are handled as a (native) long, negative error codes are expected to be sign-extended to the native 64-bits, or they may fail to be matched correctly. Thus a syscall which fails with an error may erroneously be identified as failing. * For ptrace, *all* compat return values should be sign-extended for consistency with 32-bit arm, but we currently only do this for negative return codes. * As we may transiently set the upper 32 bits of some compat GPRs while in the kernel, these can be sampled by perf, which is somewhat confusing. This means that where a syscall returns a pointer above 2G, this will be sign-extended, but will not be mistaken for an error as error codes are constrained to the inclusive range [-4096, -1] where no user pointer can exist. To fix all of these, we must consistently use helpers to get/set the compat GPRs, ensuring that we never write the upper 32 bits of the return code, and always sign-extend when reading the return code. This patch does so, with the following changes: * We re-organise syscall_get_return_value() to always sign-extend for compat tasks, and reimplement syscall_get_error() atop. We update syscall_trace_exit() to use syscall_get_return_value(). * We consistently use syscall_set_return_value() to set the return value, ensureing the upper 32 bits are never set unexpectedly. * As the core audit code currently uses regs_return_value() rather than syscall_get_return_value(), we special-case this for compat_user_mode(regs) such that this will do the right thing. Going forward, we should try to move the core audit code over to syscall_get_return_value(). Cc: <stable@vger.kernel.org> Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: weiyuchen <weiyuchen3@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-02arm64: kasan: mte: remove redundant mte_report_once logicMark Rutland
We have special logic to suppress MTE tag check fault reporting, based on a global `mte_report_once` and `reported` variables. These can be used to suppress calling kasan_report() when taking a tag check fault, but do not prevent taking the fault in the first place, nor does they affect the way we disable tag checks upon taking a fault. The core KASAN code already defaults to reporting a single fault, and has a `multi_shot` control to permit reporting multiple faults. The only place we transiently alter `mte_report_once` is in lib/test_kasan.c, where we also the `multi_shot` state as the same time. Thus `mte_report_once` and `reported` are redundant, and can be removed. When a tag check fault is taken, tag checking will be disabled by `do_tag_recovery` and must be explicitly re-enabled if desired. The test code does this by calling kasan_enable_tagging_sync(). This patch removes the redundant mte_report_once() logic and associated variables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20210714143843.56537-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-02arm64: kasan: mte: use a constant kernel GCR_EL1 valueMark Rutland
When KASAN_HW_TAGS is selected, KASAN is enabled at boot time, and the hardware supports MTE, we'll initialize `kernel_gcr_excl` with a value dependent on KASAN_TAG_MAX. While the resulting value is a constant which depends on KASAN_TAG_MAX, we have to perform some runtime work to generate the value, and have to read the value from memory during the exception entry path. It would be better if we could generate this as a constant at compile-time, and use it as such directly. Early in boot within __cpu_setup(), we initialize GCR_EL1 to a safe value, and later override this with the value required by KASAN. If CONFIG_KASAN_HW_TAGS is not selected, or if KASAN is disabeld at boot time, the kernel will not use IRG instructions, and so the initial value of GCR_EL1 is does not matter to the kernel. Thus, we can instead have __cpu_setup() initialize GCR_EL1 to a value consistent with KASAN_TAG_MAX, and avoid the need to re-initialize it during hotplug and resume form suspend. This patch makes arem64 use a compile-time constant KERNEL_GCR_EL1 value, which is compatible with KASAN_HW_TAGS when this is selected. This removes the need to re-initialize GCR_EL1 dynamically, and acts as an optimization to the entry assembly, which no longer needs to load this value from memory. The redundant initialization hooks are removed. In order to do this, KASAN_TAG_MAX needs to be visible outside of the core KASAN code. To do this, I've moved the KASAN_TAG_* values into <linux/kasan-tags.h>. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20210714143843.56537-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-02arm64/sve: Make fpsimd_bind_task_to_cpu() staticMark Brown
This function is not referenced outside fpsimd.c so can be static, making it that little bit easier to follow what is called from where. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210730165846.18558-1-broonie@kernel.org Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-02arm64: unnecessary end 'return;' in void functionsJason Wang
The end 'return;' in a void function is useless and verbose. It can be removed safely. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Link: https://lore.kernel.org/r/20210726123940.63232-1-wangborong@cdjrlc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-07-30arm64: SSBS/DIT: print SSBS and DIT bit when printing PSTATELingyan Huang
The current code to print PSTATE when generating backtraces does not include SSBS bit and DIT bit, so add this information. Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1626920436-54816-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-07-30arm64: cpufeature: Use defined macro instead of magic numbersShaokun Zhang
Use defined macro to simplify the code and make it more readable. Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1626415089-57584-1-git-send-email-zhangshaokun@hisilicon.com Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-07-28arm64: avoid double ISB on kernel entryPeter Collingbourne
Although an ISB is required in order to make the MTE-related system register update to GCR_EL1 effective, and the same is true for PAC-related updates to SCTLR_EL1 or APIAKey{Hi,Lo}_EL1, we issue two ISBs on machines that support both features while we only need to issue one. To avoid the unnecessary additional ISB, remove the ISBs from the PAC and MTE-specific alternative blocks and add a couple of additional blocks that cause us to only execute one ISB if both features are supported. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Idee7e8114d5ae5a0b171d06220a0eb4bb015a51c Link: https://lore.kernel.org/r/20210727205439.2557419-1-pcc@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>