linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-12-07	arm64: kexec_file: forbid kdump via kexec_file_load()	James Morse
	Now that kexec_walk_memblock() can do the crash-kernel placement itself architectures that don't support kdump via kexe_file_load() need to explicitly forbid it. We don't support this on arm64 until the kernel can add the elfcorehdr and usable-memory-range fields to the DT. Without these the crash-kernel overwrites the previous kernel's memory during startup. Add a check to refuse crash image loading. Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07	arm64: preempt: Provide our own implementation of asm/preempt.h	Will Deacon
	The asm-generic/preempt.h implementation doesn't make use of the PREEMPT_NEED_RESCHED flag, since this can interact badly with load/store architectures which rely on the preempt_count word being unchanged across an interrupt. However, since we're a 64-bit architecture and the preempt count is only 32 bits wide, we can simply pack it next to the resched flag and load the whole thing in one go, so that a dec-and-test operation doesn't need to load twice. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07	preempt: Move PREEMPT_NEED_RESCHED definition into arch code	Will Deacon
	PREEMPT_NEED_RESCHED is never used directly, so move it into the arch code where it can potentially be implemented using either a different bit in the preempt count or as an entirely separate entity. Cc: Robert Love <rml@tech9.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: hugetlb: Register hugepages during arch init	Allen Pais
	Add hstate for each supported hugepage size using arch initcall. * no hugepage parameters Without hugepage parameters, only a default hugepage size is available for dynamic allocation. It's different, for example, from x86_64 and sparc64 where all supported hugepage sizes are available. * only default_hugepagesz= is specified and set not to HPAGE_SIZE In spite of the fact that default_hugepagesz= is set to a valid hugepage size, it's treated as unsupported and reverted to HPAGE_SIZE. Such behaviour is also different from x86_64 and sparc64. Acked-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Dmitry Klochkov <dmitry.klochkov@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: crypto: add NEON accelerated XOR implementation	Jackie Liu
	This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64/neon: add workaround for ambiguous C99 stdint.h types	Jackie Liu
	In a way similar to ARM commit 09096f6a0ee2 ("ARM: 7822/1: add workaround for ambiguous C99 stdint.h types"), this patch redefines the macros that are used in stdint.h so its definitions of uint64_t and int64_t are compatible with those of the kernel. This patch comes from: https://patchwork.kernel.org/patch/3540001/ Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org> We mark this file as a private file and don't have to override asm/types.h Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: entry: Remove confusing comment	Will Deacon
	The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being context-synchronizing is confusing and misplaced with kpti. Given that this is already documented under Documentation/ (see arch-support.txt for membarrier), remove the comment altogether. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: entry: Place an SB sequence following an ERET instruction	Will Deacon
	Some CPUs can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. This patch emits an SB sequence after each ERET so that speculation is held up on exception return. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: Add support for SB barrier and patch in over DSB; ISB sequences	Will Deacon
	We currently use a DSB; ISB sequence to inhibit speculation in set_fs(). Whilst this works for current CPUs, future CPUs may implement a new SB barrier instruction which acts as an architected speculation barrier. On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB sequence and advertise the presence of the new instruction to userspace. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: Refactor setup_dtb() to consolidate error checking	Will Deacon
	setup_dtb() is a little difficult to read. This is largely because it duplicates the FDT -> Linux errno conversion for every intermediate return value, but also because of silly cosmetic things like naming and formatting. Given that this is all brand new, refactor the function to get us off on the right foot. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: add kaslr support	AKASHI Takahiro
	Adding "kaslr-seed" to dtb enables triggering kaslr, or kernel virtual address randomization, at secondary kernel boot. We always do this as it will have no harm on kaslr-incapable kernel. We don't have any "switch" to turn off this feature directly, but still can suppress it by passing "nokaslr" as a kernel boot argument. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [will: Use rng_is_initialized()] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: add kernel signature verification support	AKASHI Takahiro
	With this patch, kernel verification can be done without IMA security subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead. On x86, a signature is embedded into a PE file (Microsoft's format) header of binary. Since arm64's "Image" can also be seen as a PE file as far as CONFIG_EFI is enabled, we adopt this format for kernel signing. You can create a signed kernel image with: $ sbsign --key ${KEY} --cert ${CERT} Image Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: James Morse <james.morse@arm.com> [will: removed useless pr_debug()] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Batch cpu_enable callbacks	Suzuki K Poulose
	We use a stop_machine call for each available capability to enable it on all the CPUs available at boot time. Instead we could batch the cpu_enable callbacks to a single stop_machine() call to save us some time. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Use linear array for detection and verification	Suzuki K Poulose
	Use the sorted list of capability entries for the detection and verification. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Optimize this_cpu_has_cap	Suzuki K Poulose
	Make use of the sorted capability list to access the capability entry in this_cpu_has_cap() to avoid iterating over the two tables. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Speed up capability lookup	Suzuki K Poulose
	We maintain two separate tables of capabilities, errata and features, which decide the system capabilities. We iterate over each of these tables for various operations (e.g, detection, verification etc.). We do not have a way to map a system "capability" to its entry, (i.e, cap -> struct arm64_cpu_capabilities) which is needed for this_cpu_has_cap(). So we iterate over the table one by one to find the entry and then do the operation. Also, this prevents us from optimizing the way we "enable" the capabilities on the CPUs, where we now issue a stop_machine() for each available capability. One solution is to merge the two tables into a single table, sorted by the capability. But this is has the following disadvantages: - We loose the "classification" of an errata vs. feature - It is quite easy to make a mistake when adding an entry, unless we sort the table at runtime. So we maintain a list of pointers to the capability entry, sorted by the "cap number" in a separate array, initialized at boot time. The only restriction is that we can have one "entry" per capability. While at it, remove the duplicate declaration of arm64_errata table. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	include: pe.h: remove message[] from mz header definition	AKASHI Takahiro
	message[] field won't be part of the definition of mz header. This change is crucial for enabling kexec_file_load on arm64 because arm64's "Image" binary, as in PE format, doesn't have any data for it and accordingly the following check in pefile_parse_binary() will fail: chkaddr(cursor, mz->peaddr, sizeof(*pe)); Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Howells <dhowells@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: invoke the kernel without purgatory	AKASHI Takahiro
	On arm64, purgatory would do almost nothing. So just invoke secondary kernel directly by jumping into its entry code. While, in this case, cpu_soft_restart() must be called with dtb address in the fifth argument, the behavior still stays compatible with kexec_load case as long as the argument is null. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: James Morse <james.morse@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: allow for loading Image-format kernel	AKASHI Takahiro
	This patch provides kexec_file_ops for "Image"-format kernel. In this implementation, a binary is always loaded with a fixed offset identified in text_offset field of its header. Regarding signature verification for trusted boot, this patch doesn't contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later in this series, but file-attribute-based verification is still a viable option by enabling IMA security subsystem. You can sign(label) a to-be-kexec'ed kernel image on target file system with: $ evmctl ima_sign --key /path/to/private_key.pem Image On live system, you must have IMA enforced with, at least, the following security policy: "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" See more details about IMA here: https://sourceforge.net/p/linux-ima/wiki/Home/ Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: kexec_file: load initrd and device-tree	AKASHI Takahiro
	load_other_segments() is expected to allocate and place all the necessary memory segments other than kernel, including initrd and device-tree blob (and elf core header for crash). While most of the code was borrowed from kexec-tools' counterpart, users may not be allowed to specify dtb explicitly, instead, the dtb presented by the original boot loader is reused. arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64- specific data allocated in load_other_segments(). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: enable KEXEC_FILE config	AKASHI Takahiro
	Modify arm64/Kconfig to enable kexec_file_load support. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: cpufeature: add MMFR0 helper functions	AKASHI Takahiro
	Those helper functions for MMFR0 register will be used later by kexec_file loader. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: add image head flag definitions	AKASHI Takahiro
	Those image head's flags will be used later by kexec_file loader. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	kexec_file: kexec_walk_memblock() only walks a dedicated region at kdump	AKASHI Takahiro
	In kdump case, there exists only one dedicated memblock region as usable memory (crashk_res). With this patch, kexec_walk_memblock() runs a given callback function on this region. Cosmetic change: 0 to MEMBLOCK_NONE at for_each_free_mem_range*() Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()	AKASHI Takahiro
	Memblock list is another source for usable system memory layout. So move powerpc's arch_kexec_walk_mem() to common code so that other memblock-based architectures, particularly arm64, can also utilise it. A moved function is now renamed to kexec_walk_memblock() and integrated into kexec_locate_mem_hole(), which will now be usable for all architectures with no need for overriding arch_kexec_walk_mem(). With this change, arch_kexec_walk_mem() need no longer be a weak function, and was now renamed to kexec_walk_resources(). Since powerpc doesn't support kdump in its kexec_file_load(), the current kexec_walk_memblock() won't work for kdump either in this form, this will be fixed in the next patch. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	s390, kexec_file: drop arch_kexec_mem_walk()	AKASHI Takahiro
	Since s390 already knows where to locate buffers, calling arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem indicates this while all other architectures sets it to 0 initially. This change is a preparatory work for the next patch, where all the variant memory walks, either on system resource or memblock, will be put in one common place so that it will satisfy all the architectures' need. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	kexec_file: make kexec_image_post_load_cleanup_default() global	AKASHI Takahiro
	Change this function from static to global so that arm64 can implement its own arch_kimage_file_post_load_cleanup() later using kexec_image_post_load_cleanup_default(). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	asm-generic: add kexec_file_load system call to unistd.h	AKASHI Takahiro
	The initial user of this system call number is arm64. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver	Kulkarni, Ganapatrao
	This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory Controller(DMC) and Level 3 Cache(L3C). Each PMU supports up to 4 counters. All counters lack overflow interrupt and are sampled periodically. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> [will: consistent enum cpuhp_state naming] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	Documentation: perf: Add documentation for ThunderX2 PMU uncore driver	Kulkarni, Ganapatrao
	The SoC has PMU support in its L3 cache controller (L3C) and in the DDR4 Memory Controller (DMC). Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> [will: minor spelling and format fixes, dropped events list] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003	Suzuki K Poulose
	Remove duplicate entries for Qualcomm erratum 1003. Since the entries are not purely based on generic MIDR checks, use the multi_cap_entry type to merge the entries. Cc: Christopher Covington <cov@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Merge duplicate Cavium erratum entries	Suzuki K Poulose
	Merge duplicate entries for a single capability using the midr range list for Cavium errata 30115 and 27456. Cc: Andrew Pinski <apinski@cavium.com> Cc: David Daney <david.daney@cavium.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06	arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE	Suzuki K Poulose
	We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability : 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012] 2) ARM Errata 819472 on A53 r0p[01] Both have the same work around. Merge these entries to avoid duplicate entries for a single capability. Add a new Kconfig entry to control the "capability" entry to make it easier to handle combinations of the CONFIGs. Cc: Will Deacon <will.deacon@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-04	arm64: relocatable: fix inconsistencies in linker script and options	Ard Biesheuvel
	readelf complains about the section layout of vmlinux when building with CONFIG_RELOCATABLE=y (for KASLR): readelf: Warning: [21]: Link field (0) should index a symtab section. readelf: Warning: [21]: Info field (0) should index a relocatable section. Also, it seems that our use of '-pie -shared' is contradictory, and thus ambiguous. In general, the way KASLR is wired up at the moment is highly tailored to how ld.bfd happens to implement (and conflate) PIE executables and shared libraries, so given the current effort to support other toolchains, let's fix some of these issues as well. - Drop the -pie linker argument and just leave -shared. In ld.bfd, the differences between them are unclear (except for the ELF type of the produced image [0]) but lld chokes on seeing both at the same time. - Rename the .rela output section to .rela.dyn, as is customary for shared libraries and PIE executables, so that it is not misidentified by readelf as a static relocation section (producing the warnings above). - Pass the -z notext and -z norelro options to explicitly instruct the linker to permit text relocations, and to omit the RELRO program header (which requires a certain section layout that we don't adhere to in the kernel). These are the defaults for current versions of ld.bfd. - Discard .eh_frame and .gnu.hash sections to avoid them from being emitted between .head.text and .text, screwing up the section layout. These changes only affect the ELF image, and produce the same binary image. [0] b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...") Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Smith <peter.smith@linaro.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64/lib: improve CRC32 performance for deep pipelines	Ard Biesheuvel
	Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: ftrace: always pass instrumented pc in x0	Mark Rutland
	The core ftrace hooks take the instrumented PC in x0, but for some reason arm64's prepare_ftrace_return() takes this in x1. For consistency, let's flip the argument order and always pass the instrumented PC in x0. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: ftrace: remove return_regs macros	Mark Rutland
	The save_return_regs and restore_return_regs macros are only used by return_to_handler, and having them defined out-of-line only serves to obscure the logic. Before we complicate, let's clean this up and fold the logic directly into return_to_handler, saving a few lines of macro boilerplate in the process. At the same time, a missing trailing space is added to the comments, fixing a code style violation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: ftrace: don't adjust the LR value	Mark Rutland
	The core ftrace code requires that when it is handed the PC of an instrumented function, this PC is the address of the instrumented instruction. This is necessary so that the core ftrace code can identify the specific instrumentation site. Since the instrumented function will be a BL, the address of the instrumented function is LR - 4 at entry to the ftrace code. This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers, which acquire the PC of the instrumented function. The mcount_get_lr helper is used to acquire the LR of the instrumented function, whose value does not require this adjustment, and cannot be adjusted to anything meaningful. No adjustment of this value is made on other architectures, including arm. However, arm64 adjusts this value by 4. This patch brings arm64 in line with other architectures and removes the adjustment of the LR value. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: ftrace: enable graph FP test	Mark Rutland
	The core frace code has an optional sanity check on the frame pointer passed by ftrace_graph_caller and return_to_handler. This is cheap, useful, and enabled unconditionally on x86, sparc, and riscv. Let's do the same on arm64, so that we can catch any problems early. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: ftrace: use GLOBAL()	Mark Rutland
	The global exports of ftrace_call and ftrace_graph_call are somewhat painful to read. Let's use the generic GLOBAL() macro to ameliorate matters. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	linkage: add generic GLOBAL() macro	Mark Rutland
	Declaring a global symbol in assembly is tedious, error-prone, and painful to read. While ENTRY() exists, this is supposed to be used for function entry points, and this affects alignment in a potentially undesireable manner. Instead, let's add a generic GLOBAL() macro for this, as x86 added locally in commit: 95695547a7db44b8 ("x86: asm linkage - introduce GLOBAL macro") ... thus allowing us to use this more freely in the kernel. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30	arm64: drop linker script hack to hide __efistub_ symbols	Ard Biesheuvel
	Commit 1212f7a16af4 ("scripts/kallsyms: filter arm64's __efistub_ symbols") updated the kallsyms code to filter out symbols with the __efistub_ prefix explicitly, so we no longer require the hack in our linker script to emit them as absolute symbols. Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-29	arm64: io: Ensure value passed to __iormb() is held in a 64-bit register	Will Deacon
	As of commit 6460d3201471 ("arm64: io: Ensure calls to delay routines are ordered against prior readX()"), MMIO reads smaller than 64 bits fail to compile under clang because we end up mixing 32-bit and 64-bit register operands for the same data processing instruction: ./include/asm-generic/io.h:695:9: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] return readb(addr); ^ ./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb' ^ ./include/asm-generic/io.h:695:9: note: use constraint modifier "w" ./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb' ^ ./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb' asm volatile("eor %0, %1, %1\n" \ ^ Fix the build by casting the macro argument to 'unsigned long' when used as an input to the inline asm. Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-29	perf: arm_spe: handle devm_kasprintf() failure	Nicholas Mc Guire
	devm_kasprintf() may return NULL on failure of internal allocation thus the assignment to 'name' is not safe if unchecked. If NULL is passed in for name then perf_pmu_register() would not fail but rather silently jump to skip_type which is not the intent here. As perf_pmu_register() may also return -ENOMEM returning -ENOMEM in the (unlikely) failure case of devm_kasprintf() should be fine here as well. Acked-by: Mark Rutland <mark.rutland@arm.com> Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> [will: reworded error message] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE	Will Deacon
	In order to reduce the possibility of soft lock-ups, we bound the maximum number of TLBI operations performed by a single call to flush_tlb_range() to an arbitrary constant of 1024. Whilst this does the job of avoiding lock-ups, we can actually be a bit smarter by defining this as PTRS_PER_PTE. Due to the structure of our page tables, using PTRS_PER_PTE means that an outer loop calling flush_tlb_range() for entire table entries will end up performing just a single TLBI operation for each entry. As an example, mremap()ing a 1GB range mapped using 4k pages now requires only 512 TLBI operations when moving the page tables as opposed to 262144 operations (512*512) when using the current threshold of 1024. Cc: Joel Fernandes <joel@joelfernandes.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64/module: switch to ADRP/ADD sequences for PLT entries	Ard Biesheuvel
	Now that we have switched to the small code model entirely, and reduced the extended KASLR range to 4 GB, we can be sure that the targets of relative branches that are out of range are in range for a ADRP/ADD pair, which is one instruction shorter than our current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it is more likely to be implemented efficiently by micro-architectures. So switch over the ordinary PLT code and the special handling of the Cortex-A53 ADRP errata, as well as the ftrace trampline handling. Reviewed-by: Torsten Duwe <duwe@lst.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: Added a couple of comments in the plt equality check] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64/insn: add support for emitting ADR/ADRP instructions	Ard Biesheuvel
	Add support for emitting ADR and ADRP instructions so we can switch over our PLT generation code in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64: Use a raw spinlock in __install_bp_hardening_cb()	James Morse
	__install_bp_hardening_cb() is called via stop_machine() as part of the cpu_enable callback. To force each CPU to take its turn when allocating slots, they take a spinlock. With the RT patches applied, the spinlock becomes a mutex, and we get warnings about sleeping while in stop_machine(): \| [ 0.319176] CPU features: detected: RAS Extension Support \| [ 0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002 \| [ 0.319955] Modules linked in: \| [ 0.319958] Preemption disabled at: \| [ 0.319969] [<ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108 \| [ 0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted 4.19.1-rt3-00250-g330fc2c2a880 #2 \| [ 0.319975] Hardware name: linux,dummy-virt (DT) \| [ 0.319976] Call trace: \| [ 0.319981] dump_backtrace+0x0/0x148 \| [ 0.319983] show_stack+0x14/0x20 \| [ 0.319987] dump_stack+0x80/0xa4 \| [ 0.319989] __schedule_bug+0x94/0xb0 \| [ 0.319991] __schedule+0x510/0x560 \| [ 0.319992] schedule+0x38/0xe8 \| [ 0.319994] rt_spin_lock_slowlock_locked+0xf0/0x278 \| [ 0.319996] rt_spin_lock_slowlock+0x5c/0x90 \| [ 0.319998] rt_spin_lock+0x54/0x58 \| [ 0.320000] enable_smccc_arch_workaround_1+0xdc/0x260 \| [ 0.320001] __enable_cpu_capability+0x10/0x20 \| [ 0.320003] multi_cpu_stop+0x84/0x108 \| [ 0.320004] cpu_stopper_thread+0x84/0x108 \| [ 0.320008] smpboot_thread_fn+0x1e8/0x2b0 \| [ 0.320009] kthread+0x124/0x128 \| [ 0.320010] ret_from_fork+0x10/0x18 Switch this to a raw spinlock, as we know this is only called with IRQs masked. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64: acpi: Prepare for longer MADTs	Jeremy Linton
	The BAD_MADT_GICC_ENTRY check is a little too strict because it rejects MADT entries that don't match the currently known lengths. We should remove this restriction to avoid problems if the table length changes. Future code which might depend on additional fields should be written to validate those fields before using them, rather than trying to globally check known MADT version lengths. Link: https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.linton@arm.com Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> [lorenzo.pieralisi@arm.com: added MADT macro comments] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Stone <ahs3@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27	arm64: io: Ensure calls to delay routines are ordered against prior readX()	Will Deacon
	A relatively standard idiom for ensuring that a pair of MMIO writes to a device arrive at that device with a specified minimum delay between them is as follows: writel_relaxed(42, dev_base + CTL1); readl(dev_base + CTL1); udelay(10); writel_relaxed(42, dev_base + CTL2); the intention being that the read-back from the device will push the prior write to CTL1, and the udelay will hold up the write to CTL1 until at least 10us have elapsed. Unfortunately, on arm64 where the underlying delay loop is implemented as a read of the architected counter, the CPU does not guarantee ordering from the readl() to the delay loop and therefore the delay loop could in theory be speculated and not provide the desired interval between the two writes. Fix this in a similar manner to PowerPC by introducing a dummy control dependency on the output of readX() which, combined with the ISB in the read of the architected counter, guarantees that a subsequent delay loop can not be executed until the readX() has returned its result. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>