summaryrefslogtreecommitdiff
path: root/arch/arm64/lib
AgeCommit message (Collapse)Author
7 daysMerge tag 'bpf-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "For this merge window we're splitting BPF pull request into three for higher visibility: main changes, res_spin_lock, try_alloc_pages. These are the main BPF changes: - Add DFA-based live registers analysis to improve verification of programs with loops (Eduard Zingerman) - Introduce load_acquire and store_release BPF instructions and add x86, arm64 JIT support (Peilin Ye) - Fix loop detection logic in the verifier (Eduard Zingerman) - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet) - Add kfunc for populating cpumask bits (Emil Tsalapatis) - Convert various shell based tests to selftests/bpf/test_progs format (Bastien Curutchet) - Allow passing referenced kptrs into struct_ops callbacks (Amery Hung) - Add a flag to LSM bpf hook to facilitate bpf program signing (Blaise Boscaccy) - Track arena arguments in kfuncs (Ihor Solodrai) - Add copy_remote_vm_str() helper for reading strings from remote VM and bpf_copy_from_user_task_str() kfunc (Jordan Rome) - Add support for timed may_goto instruction (Kumar Kartikeya Dwivedi) - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy) - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup local storage (Martin KaFai Lau) - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko) - Allow retrieving BTF data with BTF token (Mykyta Yatsenko) - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix (Song Liu) - Reject attaching programs to noreturn functions (Yafang Shao) - Introduce pre-order traversal of cgroup bpf programs (Yonghong Song)" * tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits) selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid bpf: Fix out-of-bounds read in check_atomic_load/store() libbpf: Add namespace for errstr making it libbpf_errstr bpf: Add struct_ops context information to struct bpf_prog_aux selftests/bpf: Sanitize pointer prior fclose() selftests/bpf: Migrate test_xdp_vlan.sh into test_progs selftests/bpf: test_xdp_vlan: Rename BPF sections bpf: clarify a misleading verifier error message selftests/bpf: Add selftest for attaching fexit to __noreturn functions bpf: Reject attaching fexit/fmod_ret to __noreturn functions bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage bpf: Make perf_event_read_output accessible in all program types. bpftool: Using the right format specifiers bpftool: Add -Wformat-signedness flag to detect format errors selftests/bpf: Test freplace from user namespace libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID bpf: Return prog btf_id without capable check bpf: BPF token support for BPF_BTF_GET_FD_BY_ID bpf, x86: Fix objtool warning for timed may_goto bpf: Check map->record at the beginning of check_and_free_fields() ...
11 daysarm64/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()Eric Biggers
Fix a silly bug where an array was used outside of its scope. Fixes: 2051da858534 ("arm64/crc-t10dif: expose CRC-T10DIF function through lib") Cc: stable@vger.kernel.org Reported-by: David Binderman <dcb314@hotmail.com> Closes: https://lore.kernel.org/r/AS8PR02MB102170568EAE7FFDF93C8D1ED9CA62@AS8PR02MB10217.eurprd02.prod.outlook.com Link: https://lore.kernel.org/r/20250326200918.125743-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
12 daysMerge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Another set of improvements to the kernel's CRC (cyclic redundancy check) code: - Rework the CRC64 library functions to be directly optimized, like what I did last cycle for the CRC32 and CRC-T10DIF library functions - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ support and acceleration for crc64_be and crc64_nvme - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for crc_t10dif, crc64_be, and crc64_nvme - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they are no longer needed there - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect - Add kunit test cases for crc64_nvme and crc7 - Eliminate redundant functions for calculating the Castagnoli CRC32, settling on just crc32c() - Remove unnecessary prompts from some of the CRC kconfig options - Further optimize the x86 crc32c code" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits) x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512 lib/crc: remove unnecessary prompt for CONFIG_CRC64 lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C lib/crc: remove unnecessary prompt for CONFIG_CRC8 lib/crc: remove unnecessary prompt for CONFIG_CRC7 lib/crc: remove unnecessary prompt for CONFIG_CRC4 lib/crc7: unexport crc7_be_syndrome_table lib/crc_kunit.c: update comment in crc_benchmark() lib/crc_kunit.c: add test and benchmark for crc7_be() x86/crc32: optimize tail handling for crc32c short inputs riscv/crc64: add Zbc optimized CRC64 functions riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function riscv/crc32: reimplement the CRC32 functions using new template riscv/crc: add "template" for Zbc optimized CRC functions x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings x86/crc32: improve crc32c_arch() code generation with clang x86/crc64: implement crc64_be and crc64_nvme using new template x86/crc-t10dif: implement crc_t10dif using new template x86/crc32: implement crc32_le using new template x86/crc: add "template" for [V]PCLMULQDQ based CRC functions ...
2025-03-15arm64: insn: Add load-acquire and store-release instructionsPeilin Ye
Add load-acquire ("load_acq", LDAR{,B,H}) and store-release ("store_rel", STLR{,B,H}) instructions. Breakdown of encoding: size L (Rs) o0 (Rt2) Rn Rt mask (0x3fdffc00): 00 111111 1 1 0 11111 1 11111 00000 00000 value, load_acq (0x08dffc00): 00 001000 1 1 0 11111 1 11111 00000 00000 value, store_rel (0x089ffc00): 00 001000 1 0 0 11111 1 11111 00000 00000 As suggested by Xu [1], include all Should-Be-One (SBO) bits ("Rs" and "Rt2" fields) in the "mask" and "value" numbers. It is worth noting that we are adding the "no offset" variant of STLR instead of the "pre-index" variant, which has a different encoding. Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a, ID032224), * C6.2.161 LDAR * C6.2.353 STLR [1] https://lore.kernel.org/bpf/4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com/ Acked-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/ba92057b7502ce4c9c9b03b7d637abe5e178134e.1741049567.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-07arm64: lib: Use MOPS for usercopy routinesKristina Martšenko
Similarly to what was done with the memcpy() routines, make copy_to_user(), copy_from_user() and clear_user() also use the Armv8.8 FEAT_MOPS instructions. Both MOPS implementation options (A and B) are supported, including asymmetric systems. The exception fixup code fixes up the registers according to the option used. In case of a fault the routines return precisely how much was not copied (as required by the comment in include/linux/uaccess.h), as unprivileged versions of CPY/SET are guaranteed not to have written past the addresses reported in the GPRs. The MOPS instructions could possibly be inlined into callers (and patched to branch to the generic implementation if not detected; similarly to what x86 does), but as a first step this patch just uses them in the out-of-line routines. Signed-off-by: Kristina Martšenko <kristina.martsenko@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250228170006.390100-4-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-09lib/crc-t10dif: remove crc_t10dif_is_optimized()Eric Biggers
With the "crct10dif" algorithm having been removed from the crypto API, crc_t10dif_is_optimized() is no longer used. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208175647.12333-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08lib/crc32: remove "_le" from crc32c base and arch functionsEric Biggers
Following the standardization on crc32c() as the lib entry point for the Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(), and __crc32c_le(), make the same change to the underlying base and arch functions that implement it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08lib/crc32: don't bother with pure and const function attributesEric Biggers
Drop the use of __pure and __attribute_const__ from the CRC32 library functions that had them. Both of these are unusual optimizations that don't help properly written code. They seem more likely to cause problems than have any real benefit. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01arm64/crc-t10dif: expose CRC-T10DIF function through libEric Biggers
Move the arm64 CRC-T10DIF assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01lib/crc32: expose whether the lib is really optimized at runtimeEric Biggers
Make the CRC32 library export a function crc32_optimizations() which returns flags that indicate which CRC32 functions are actually executing optimized code at runtime. This will be used to determine whether the crc32[c]-$arch shash algorithms should be registered in the crypto API. btrfs could also start using these flags instead of the hack that it currently uses where it parses the crypto_shash_driver_name. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01lib/crc32: improve support for arch-specific overridesEric Biggers
Currently the CRC32 library functions are defined as weak symbols, and the arm64 and riscv architectures override them. This method of arch-specific overrides has the limitation that it only works when both the base and arch code is built-in. Also, it makes the arch-specific code be silently not used if it is accidentally built with lib-y instead of obj-y; unfortunately the RISC-V code does this. This commit reorganizes the code to have explicit *_arch() functions that are called when they are enabled, similar to how some of the crypto library code works (e.g. chacha_crypt() calls chacha_crypt_arch()). Make the existing kconfig choice for the CRC32 implementation also control whether the arch-optimized implementation (if one is available) is enabled or not. Make it enabled by default if CRC32 is also enabled. The result is that arch-optimized CRC32 library functions will be included automatically when appropriate, but it is now possible to disable them. They can also now be built as a loadable module if the CRC32 library functions happen to be used only by loadable modules, in which case the arch and base CRC32 modules will be automatically loaded via direct symbol dependency when appropriate. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01lib/crc32: drop leading underscores from __crc32c_le_baseEric Biggers
Remove the leading underscores from __crc32c_le_base(). This is in preparation for adding crc32c_le_arch() and eventually renaming __crc32c_le() to crc32c_le(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-14Merge branch 'for-next/mops' into for-next/coreCatalin Marinas
* for-next/mops: : More FEAT_MOPS (memcpy instructions) uses - in-kernel routines arm64: mops: Document requirements for hypervisors arm64: lib: Use MOPS for copy_page() and clear_page() arm64: lib: Use MOPS for memcpy() routines arm64: mops: Document booting requirement for HCR_EL2.MCE2 arm64: mops: Handle MOPS exceptions from EL1 arm64: probes: Disable kprobes/uprobes on MOPS instructions # Conflicts: # arch/arm64/kernel/entry-common.c
2024-10-22arm64/crc32: Implement 4-way interleave using PMULLArd Biesheuvel
Now that kernel mode NEON no longer disables preemption, using FP/SIMD in library code which is not obviously part of the crypto subsystem is no longer problematic, as it will no longer incur unexpected latencies. So accelerate the CRC-32 library code on arm64 to use a 4-way interleave, using PMULL instructions to implement the folding. On Apple M2, this results in a speedup of 2 - 2.8x when using input sizes of 1k - 8k. For smaller sizes, the overhead of preserving and restoring the FP/SIMD register file may not be worth it, so 1k is used as a threshold for choosing this code path. The coefficient tables were generated using code provided by Eric. [0] [0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20241018075347.2821102-8-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-22arm64/crc32: Reorganize bit/byte ordering macrosArd Biesheuvel
In preparation for a new user, reorganize the bit/byte ordering macros that are used to parameterize the crc32 template code and instantiate CRC-32, CRC-32c and 'big endian' CRC-32. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20241018075347.2821102-7-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-22arm64/lib: Handle CRC-32 alternative in C codeArd Biesheuvel
In preparation for adding another code path for performing CRC-32, move the alternative patching for ARM64_HAS_CRC32 into C code. The logic for deciding whether to use this new code path will be implemented in C too. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241018075347.2821102-6-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: lib: Use MOPS for copy_page() and clear_page()Kristina Martsenko
Similarly to what was done to the memcpy() routines, make copy_page() and clear_page() also use the Armv8.8 FEAT_MOPS instructions. Note: For copy_page() this uses the CPY* instructions instead of CPYF* as CPYF* doesn't allow src and dst to be equal. It's not clear if copy_page() needs to allow equal src and dst but it has worked so far with the current implementation and there is no documentation forbidding it. Note, the unoptimized version of copy_page() in assembler.h is left as it is. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20240930161051.3777828-6-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: lib: Use MOPS for memcpy() routinesKristina Martsenko
Make memcpy(), memmove() and memset() use the Armv8.8 FEAT_MOPS instructions when implemented on the CPU. The CPY*/SET* instructions copy or set a block of memory of arbitrary size and alignment. They can be interrupted by the CPU and the copying resumed later. Their performance is expected to be close to the best generic copy/set sequence of loads/stores for a given CPU. Using them in the kernel's copy/set routines therefore avoids the need to periodically rewrite the routines to optimize for new microarchitectures. It could also lead to a performance improvement for some CPUs and systems. With this change the kernel will always use the instructions if they are implemented on the CPU (and have not been disabled by the arm64.nomops command line parameter). When not implemented the usual routines will be used (patched via alternatives). Note, we need to patch B/NOP instead of the whole sequence to avoid executing a partially patched sequence in case the compiler generates a mem*() call inside the alternatives patching code. Note that MOPS instructions have relaxed behavior on Device memory, but it is expected that these routines are not generally used on MMIO. Note: For memcpy(), this uses the CPY* instructions instead of CPYF*, as CPY* allows overlaps between the source and destination buffers, and despite contradicting the C standard, compilers require that memcpy() work on exactly overlapping source and destination: https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language https://reviews.llvm.org/D86993 Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20240930161051.3777828-5-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-05-19arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGSSamuel Holland
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Link: https://lkml.kernel.org/r/20240329072441.591471-6-samuel.holland@sifive.com Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: WANG Xuerui <git@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-12arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu access using tpidr_el1"), the per-cpu offset for the CPU is stored in the tpidr_el1/2 register of that CPU. To support this BPF instruction in the ARM64 JIT, the following ARM64 instructions are emitted: mov dst, src // Move src to dst, if src != dst mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. add dst, dst, tmp // Add the per cpu offset to the dst. To measure the performance improvement provided by this change, the benchmark in [1] was used: Before: glob-arr-inc : 23.597 ± 0.012M/s arr-inc : 23.173 ± 0.019M/s hash-inc : 12.186 ± 0.028M/s After: glob-arr-inc : 23.819 ± 0.034M/s arr-inc : 23.285 ± 0.017M/s hash-inc : 12.419 ± 0.011M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05arm64: Get rid of ARM64_HAS_NO_HW_PREFETCHMarc Zyngier
Back in 2016, it was argued that implementations lacking a HW prefetcher could be helped by sprinkling a number of PRFM instructions in strategic locations. In 2023, the one platform that presumably needed this hack is no longer in active use (let alone maintained), and an quick experiment shows dropping this hack only leads to a 0.4% drop on a full kernel compilation (tested on a MT30-GS0 48 CPU system). Given that this is pretty much in the noise department and that it may give odd ideas to other implementers, drop the hack for good. Suggested-by: Will Deacon <will@kernel.org> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXTMark Rutland
In __delay() we use cpus_have_const_cap() to check for ARM64_HAS_WFXT, but this is not necessary and alternative_has_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in __delay() is an optimization to use WFIT and WFET in preference to busy-polling the counter and/or using regular WFE and relying upon the architected timer event stream. It is not necessary to apply this optimization in the window between detecting the ARM64_HAS_WFXT cpucap and patching alternatives. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-08Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The main one is a fix for a broken strscpy() conversion that landed in the merge window and broke early parsing of the kernel command line. - Fix an incorrect mask in the CXL PMU driver - Fix a regression in early parsing of the kernel command line - Fix an IP checksum OoB access reported by syzbot" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: csum: Fix OoB access in IP checksum code for negative lengths arm64/sysreg: Fix broken strncpy() -> strscpy() conversion perf: CXL: fix mismatched number of counters mask
2023-09-07arm64: csum: Fix OoB access in IP checksum code for negative lengthsWill Deacon
Although commit c2c24edb1d9c ("arm64: csum: Fix pathological zero-length calls") added an early return for zero-length input, syzkaller has popped up with an example of a _negative_ length which causes an undefined shift and an out-of-bounds read: | BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39 | Read of size 4294966928 at addr ffff0000d7ac0170 by task syz-executor412/5975 | | CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted 6.4.0-rc4-syzkaller-g908f31f2a05b #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 | Call trace: | dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233 | show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240 | __dump_stack lib/dump_stack.c:88 [inline] | dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 | print_address_description mm/kasan/report.c:351 [inline] | print_report+0x174/0x514 mm/kasan/report.c:462 | kasan_report+0xd4/0x130 mm/kasan/report.c:572 | kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187 | __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31 | do_csum+0x44/0x254 arch/arm64/lib/csum.c:39 | csum_partial+0x30/0x58 lib/checksum.c:128 | gso_make_checksum include/linux/skbuff.h:4928 [inline] | __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332 | udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47 | ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119 | skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141 | __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401 | skb_gso_segment include/linux/netdevice.h:4859 [inline] | validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659 | validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709 | sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327 | __dev_xmit_skb net/core/dev.c:3805 [inline] | __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210 | dev_queue_xmit include/linux/netdevice.h:3085 [inline] | packet_xmit+0x6c/0x318 net/packet/af_packet.c:276 | packet_snd net/packet/af_packet.c:3081 [inline] | packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113 | sock_sendmsg_nosec net/socket.c:724 [inline] | sock_sendmsg net/socket.c:747 [inline] | __sys_sendto+0x3b4/0x538 net/socket.c:2144 Extend the early return to reject negative lengths as well, aligning our implementation with the generic code in lib/checksum.c Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine") Reported-by: syzbot+4a9f9820bd8d302e22f7@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000e0e94c0603f8d213@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-08-18arm64: insn: Add encoders for LDRSB/LDRSH/LDRSWXu Kuohai
To support BPF sign-extend load instructions, add encoders for LDRSB/LDRSH/LDRSW. LDRSB/LDRSH/LDRSW (immediate) is encoded as follows: 3 2 2 2 2 1 0 0 0 7 6 4 2 0 5 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sz|1 1 1|0|0 1|opc| imm12 | Rn | Rt | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ LDRSB/LDRSH/LDRSW (register) is encoded as follows: 3 2 2 2 2 2 1 1 1 1 0 0 0 7 6 4 2 1 6 3 2 0 5 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sz|1 1 1|0|0 0|opc|1| Rm | opt |S|1 0| Rn | Rt | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: - sz indicates whether 8-bit, 16-bit or 32-bit data is to be loaded - opc opc[1] (bit 23) is always 1 and opc[0] == 1 indicates regsize is 32-bit. Since BPF signed load instructions always exend the sign bit to bit 63 regardless of whether it loads an 8-bit, 16-bit or 32-bit data. So only 64-bit register size is required. That is, it's sufficient to set field opc fixed to 0x2. - opt Indicates whether to sign extend the offset register Rm and the effective bits of Rm. We set opt to 0x7 (SXTX) since we'll use Rm as a sgined 64-bit value in BPF. - S Optional only when opt field is 0x3 (LSL) In short, the above fields are encoded to the values listed below. sz opc opt S LDRSB (immediate) 0x0 0x2 na na LDRSH (immediate) 0x1 0x2 na na LDRSW (immediate) 0x2 0x2 na na LDRSB (register) 0x0 0x2 0x7 0 LDRSH (register) 0x1 0x2 0x7 0 LDRSW (register) 0x2 0x2 0x7 0 Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-2-xukuohai@huaweicloud.com
2023-05-25arm64: xor-neon: mark xor_arm64_neon_*() staticArnd Bergmann
The only references to these functions are in the same file, and there is no prototype, which causes a harmless warning: arch/arm64/lib/xor-neon.c:13:6: error: no previous prototype for 'xor_arm64_neon_2' [-Werror=missing-prototypes] arch/arm64/lib/xor-neon.c:40:6: error: no previous prototype for 'xor_arm64_neon_3' [-Werror=missing-prototypes] arch/arm64/lib/xor-neon.c:76:6: error: no previous prototype for 'xor_arm64_neon_4' [-Werror=missing-prototypes] arch/arm64/lib/xor-neon.c:121:6: error: no previous prototype for 'xor_arm64_neon_5' [-Werror=missing-prototypes] Fixes: cc9f8349cb33 ("arm64: crypto: add NEON accelerated XOR implementation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230516160642.523862-2-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-03-27arm: uaccess: Remove memcpy_page_flushcache()Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). Remove the unnecessary memcpy_page_flushcache() call. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Dan Williams" <dan.j.williams@intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221230-kmap-x86-v1-3-15f1ecccab50@intel.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-06Merge branch 'for-next/sysregs' into for-next/coreWill Deacon
* for-next/sysregs: (39 commits) arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation ...
2022-12-01arm64/sysreg: Remove duplicate definitions from asm/sysreg.hWill Deacon
With the new-fangled generation of asm/sysreg-defs.h, some definitions have ended up being duplicated between the two files. Remove these duplicate definitions, and consolidate the naming for GMID_EL1_BS_WIDTH. Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15arm64: insn: always inline hint generationMark Rutland
All users of aarch64_insn_gen_hint() (e.g. aarch64_insn_gen_nop()) pass a constant argument and generate a constant value. Some of those users are noinstr code (e.g. for alternatives patching). For noinstr code it is necessary to either inline these functions or to ensure the out-of-line versions are noinstr. Since in all cases these are generating a constant, make them __always_inline. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15arm64: insn: simplify insn group identificationMark Rutland
The only code which needs to check for an entire instruction group is the aarch64_insn_is_steppable() helper function used by kprobes, which must not be instrumented, and only needs to check for the "Branch, exception generation and system instructions" class. Currently we have an out-of-line helper in insn.c which must be marked as __kprobes, which indexes a table with some bits extracted from the instruction. In aarch64_insn_is_steppable() we then need to compare the result with an expected enum value. It would be simpler to have a predicate for this, as with the other aarch64_insn_is_*() helpers, which would be always inlined to prevent inadvertent instrumentation, and would permit better code generation. This patch adds a predicate function for this instruction group using the existing __AARCH64_INSN_FUNCS() helpers, and removes the existing out-of-line helper. As the only class we currently care about is the branch+exception+sys class, I have only added helpers for this, and left the other classes unimplemented for now. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15arm64: insn: always inline predicatesMark Rutland
We have a number of aarch64_insn_*() predicates which are used in code which is not instrumentation safe (e.g. alternatives patching, kprobes). Some of those are marked with __kprobes, but most are not, and are implemented out-of-line in insn.c. This patch moves the predicates to insn.h and marks them with __always_inline. This is ensures that they will respect the instrumentation requirements of their caller which they will be inlined into. At the same time, I've formatted each of the functions consistently as a list, to make them easier to read and update in future. Other than preventing unwanted instrumentation, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15arm64: insn: remove aarch64_insn_gen_prefetch()Mark Rutland
There are no users of aarch64_insn_gen_prefetch(), and which encodes a PRFM (immediate) with a hard-coded offset of 0. Remove it for now; we can always restore it with tests if we need it in future. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20221114135928.3000571-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-08-03Merge tag 'net-next-6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Paolo Abeni: "Core: - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF: - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols: - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API: - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers: - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers: - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years" * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits) doc: sfp-phylink: Fix a broken reference wireguard: selftests: support UML wireguard: allowedips: don't corrupt stack when detecting overflow wireguard: selftests: update config fragments wireguard: ratelimiter: use hrtimer in selftest net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ net: usb: ax88179_178a: Bind only to vendor-specific interface selftests: net: fix IOAM test skip return code net: usb: make USB_RTL8153_ECM non user configurable net: marvell: prestera: remove reduntant code octeontx2-pf: Reduce minimum mtu size to 60 net: devlink: Fix missing mutex_unlock() call net/tls: Remove redundant workqueue flush before destroy net: txgbe: Fix an error handling path in txgbe_probe() net: dsa: Fix spelling mistakes and cleanup code Documentation: devlink: add add devlink-selftests to the table of contents dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: ionic: fix error check for vlan flags in ionic_set_nic_features() net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() nfp: flower: add support for tunnel offload without key ID ...
2022-07-11arm64: Add LDR (literal) instructionXu Kuohai
Add LDR (literal) instruction to load data from address relative to PC. This instruction will be used to implement long jump from bpf prog to bpf trampoline in the follow-up patch. The instruction encoding: 3 2 2 2 0 0 0 7 6 4 5 0 +-----+-------+---+-----+-------------------------------------+--------+ | 0 x | 0 1 1 | 0 | 0 0 | imm19 | Rt | +-----+-------+---+-----+-------------------------------------+--------+ for 32-bit, variant x == 0; for 64-bit, x == 1. branch_imm_common() is used to check the distance between pc and target address, since it's reused by this patch and LDR (literal) is not a branch instruction, rename it to label_imm_common(). Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
2022-07-05arm64/mte: Standardise GMID field name definitionsMark Brown
Usually our defines for bitfields in system registers do not include a SYS_ prefix but those for GMID do. In preparation for automatic generation of defines remove that prefix. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220704170302.2609529-9-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - ultravisor communication device driver - fix TEID on terminating storage key ops RISC-V: - Added Sv57x4 support for G-stage page table - Added range based local HFENCE functions - Added remote HFENCE functions based on VCPU requests - Added ISA extension registers in ONE_REG interface - Updated KVM RISC-V maintainers entry to cover selftests support ARM: - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes x86: - New ioctls to get/set TSC frequency for a whole VM - Allow userspace to opt out of hypercall patching - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr AMD SEV improvements: - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES - V_TSC_AUX support Nested virtualization improvements for AMD: - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) - Allow AVIC to co-exist with a nested guest running - Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support - PAUSE filtering for nested hypervisors Guest support: - Decoupling of vcpu_is_preempted from PV spinlocks" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits) KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest KVM: selftests: x86: Sync the new name of the test case to .gitignore Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND x86, kvm: use correct GFP flags for preemption disabled KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer x86/kvm: Alloc dummy async #PF token outside of raw spinlock KVM: x86: avoid calling x86 emulator without a decoded instruction KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) s390/uv_uapi: depend on CONFIG_S390 KVM: selftests: x86: Fix test failure on arch lbr capable platforms KVM: LAPIC: Trace LAPIC timer expiration on every vmentry KVM: s390: selftest: Test suppression indication on key prot exception KVM: s390: Don't indicate suppression on dirtying, failing memop selftests: drivers/s390x: Add uvdevice tests drivers/s390/char: Add Ultravisor io device MAINTAINERS: Update KVM RISC-V entry to cover selftests support RISC-V: KVM: Introduce ISA extension register RISC-V: KVM: Cleanup stale TLB entries when host CPU changes RISC-V: KVM: Add remote HFENCE functions based on VCPU requests ...
2022-05-25Merge tag 'net-next-5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core ---- - Support TCPv6 segmentation offload with super-segments larger than 64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP). - Generalize skb freeing deferral to per-cpu lists, instead of per-socket lists. - Add a netdev statistic for packets dropped due to L2 address mismatch (rx_otherhost_dropped). - Continue work annotating skb drop reasons. - Accept alternative netdev names (ALT_IFNAME) in more netlink requests. - Add VLAN support for AF_PACKET SOCK_RAW GSO. - Allow receiving skb mark from the socket as a cmsg. - Enable memcg accounting for veth queues, sysctl tables and IPv6. BPF --- - Add libbpf support for User Statically-Defined Tracing (USDTs). - Speed up symbol resolution for kprobes multi-link attachments. - Support storing typed pointers to referenced and unreferenced objects in BPF maps. - Add support for BPF link iterator. - Introduce access to remote CPU map elements in BPF per-cpu map. - Allow middle-of-the-road settings for the kernel.unprivileged_bpf_disabled sysctl. - Implement basic types of dynamic pointers e.g. to allow for dynamically sized ringbuf reservations without extra memory copies. Protocols --------- - Retire port only listening_hash table, add a second bind table hashed by port and address. Avoid linear list walk when binding to very popular ports (e.g. 443). - Add bridge FDB bulk flush filtering support allowing user space to remove all FDB entries matching a condition. - Introduce accept_unsolicited_na sysctl for IPv6 to implement router-side changes for RFC9131. - Support for MPTCP path manager in user space. - Add MPTCP support for fallback to regular TCP for connections that have never connected additional subflows or transmitted out-of-sequence data (partial support for RFC8684 fallback). - Avoid races in MPTCP-level window tracking, stabilize and improve throughput. - Support lockless operation of GRE tunnels with seq numbers enabled. - WiFi support for host based BSS color collision detection. - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets. - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2). - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile). - Allow matching on the number of VLAN tags via tc-flower. - Add tracepoint for tcp_set_ca_state(). Driver API ---------- - Improve error reporting from classifier and action offload. - Add support for listing line cards in switches (devlink). - Add helpers for reporting page pool statistics with ethtool -S. - Add support for reading clock cycles when using PTP virtual clocks, instead of having the driver convert to time before reporting. This makes it possible to report time from different vclocks. - Support configuring low-latency Tx descriptor push via ethtool. - Separate Clause 22 and Clause 45 MDIO accesses more explicitly. New hardware / drivers ---------------------- - Ethernet: - Marvell's Octeon NIC PCI Endpoint support (octeon_ep) - Sunplus SP7021 SoC (sp7021_emac) - Add support for Renesas RZ/V2M (in ravb) - Add support for MediaTek mt7986 switches (in mtk_eth_soc) - Ethernet PHYs: - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting) - TI DP83TD510 PHY - Microchip LAN8742/LAN88xx PHYs - WiFi: - Driver for pureLiFi X, XL, XC devices (plfxlc) - Driver for Silicon Labs devices (wfx) - Support for WCN6750 (in ath11k) - Support Realtek 8852ce devices (in rtw89) - Mobile: - MediaTek T700 modems (Intel 5G 5000 M.2 cards) - CAN: - ctucanfd: add support for CTU CAN FD open-source IP core from Czech Technical University in Prague Drivers ------- - Delete a number of old drivers still using virt_to_bus(). - Ethernet NICs: - intel: support TSO on tunnels MPLS - broadcom: support multi-buffer XDP - nfp: support VF rate limiting - sfc: use hardware tx timestamps for more than PTP - mlx5: multi-port eswitch support - hyper-v: add support for XDP_REDIRECT - atlantic: XDP support (including multi-buffer) - macb: improve real-time perf by deferring Tx processing to NAPI - High-speed Ethernet switches: - mlxsw: implement basic line card information querying - prestera: add support for traffic policing on ingress and egress - Embedded Ethernet switches: - lan966x: add support for packet DMA (FDMA) - lan966x: add support for PTP programmable pins - ti: cpsw_new: enable bc/mc storm prevention - Qualcomm 802.11ax WiFi (ath11k): - Wake-on-WLAN support for QCA6390 and WCN6855 - device recovery (firmware restart) support - support setting Specific Absorption Rate (SAR) for WCN6855 - read country code from SMBIOS for WCN6855/QCA6390 - enable keep-alive during WoWLAN suspend - implement remain-on-channel support - MediaTek WiFi (mt76): - support Wireless Ethernet Dispatch offloading packet movement between the Ethernet switch and WiFi interfaces - non-standard VHT MCS10-11 support - mt7921 AP mode support - mt7921 IPv6 NS offload support - Ethernet PHYs: - micrel: ksz9031/ksz9131: cabletest support - lan87xx: SQI support for T1 PHYs - lan937x: add interrupt support for link detection" * tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits) ptp: ocp: Add firmware header checks ptp: ocp: fix PPS source selector debugfs reporting ptp: ocp: add .init function for sma_op vector ptp: ocp: vectorize the sma accessor functions ptp: ocp: constify selectors ptp: ocp: parameterize input/output sma selectors ptp: ocp: revise firmware display ptp: ocp: add Celestica timecard PCI ids ptp: ocp: Remove #ifdefs around PCI IDs ptp: ocp: 32-bit fixups for pci start address Revert "net/smc: fix listen processing for SMC-Rv2" ath6kl: Use cc-disable-warning to disable -Wdangling-pointer selftests/bpf: Dynptr tests bpf: Add dynptr data slices bpf: Add bpf_dynptr_read and bpf_dynptr_write bpf: Dynptr support for ring buffers bpf: Add bpf_dynptr_from_mem for local dynptrs bpf: Add verifier support for dynptrs bpf: Suppress 'passing zero to PTR_ERR' warning bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack ...
2022-05-16arm64: mte: Clean up user tag accessorsRobin Murphy
Invoking user_ldst to explicitly add a post-increment of 0 is silly. Just use a normal USER() annotation and save the redundant instruction. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220420030418.3189040-6-tongtiangen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-20arm64: Use WFxT for __delay() when possibleMarc Zyngier
Marginally optimise __delay() by using a WFIT/WFET sequence. It probably is a win if no interrupt fires during the delay. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220419182755.601427-11-maz@kernel.org
2022-04-01arm64, insn: Add ldr/str with immediate offsetXu Kuohai
This patch introduces ldr/str with immediate offset support to simplify the JIT implementation of BPF LDX/STX instructions on arm64. Although arm64 ldr/str immediate is available in pre-index, post-index and unsigned offset forms, the unsigned offset form is sufficient for BPF, so this patch only adds this type. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-2-xukuohai@huawei.com
2022-03-21Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - hwrng core now credits for low-quality RNG devices. Algorithms: - Optimisations for neon aes on arm/arm64. - Add accelerated crc32_be on arm64. - Add ffdheXYZ(dh) templates. - Disallow hmac keys < 112 bits in FIPS mode. - Add AVX assembly implementation for sm3 on x86. Drivers: - Add missing local_bh_disable calls for crypto_engine callback. - Ensure BH is disabled in crypto_engine callback path. - Fix zero length DMA mappings in ccree. - Add synchronization between mailbox accesses in octeontx2. - Add Xilinx SHA3 driver. - Add support for the TDES IP available on sama7g5 SoC in atmel" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits) crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list crypto: dh - Remove the unused function dh_safe_prime_dh_alg() hwrng: nomadik - Change clk_disable to clk_disable_unprepare crypto: arm64 - cleanup comments crypto: qat - fix initialization of pfvf rts_map_msg structures crypto: qat - fix initialization of pfvf cap_msg structures crypto: qat - remove unneeded assignment crypto: qat - disable registration of algorithms crypto: hisilicon/qm - fix memset during queues clearing crypto: xilinx: prevent probing on non-xilinx hardware crypto: marvell/octeontx - Use swap() instead of open coding it crypto: ccree - Fix use after free in cc_cipher_exit() crypto: ccp - ccp_dmaengine_unregister release dma channels crypto: octeontx2 - fix missing unlock hwrng: cavium - fix NULL but dereferenced coccicheck error crypto: cavium/nitrox - don't cast parameter in bit operations crypto: vmx - add missing dependencies MAINTAINERS: Add maintainer for Xilinx ZynqMP SHA3 driver crypto: xilinx - Add Xilinx SHA3 driver ...
2022-03-14Merge branch 'for-next/strings' into for-next/coreWill Deacon
* for-next/strings: Revert "arm64: Mitigate MTE issues with str{n}cmp()" arm64: lib: Import latest version of Arm Optimized Routines' strncmp arm64: lib: Import latest version of Arm Optimized Routines' strcmp
2022-03-14Merge branch 'for-next/linkage' into for-next/coreWill Deacon
* for-next/linkage: arm64: module: remove (NOLOAD) from linker script linkage: remove SYM_FUNC_{START,END}_ALIAS() x86: clean up symbol aliasing arm64: clean up symbol aliasing linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
2022-03-14Merge branch 'for-next/insn' into for-next/coreWill Deacon
* for-next/insn: arm64: insn: add encoders for atomic operations arm64: move AARCH64_BREAK_FAULT into insn-def.h arm64: insn: Generate 64 bit mask immediates correctly
2022-03-07Revert "arm64: Mitigate MTE issues with str{n}cmp()"Joey Gouly
This reverts commit 59a68d4138086c015ab8241c3267eec5550fbd44. Now that the str{n}cmp functions have been updated to handle MTE properly, the workaround to use the generic functions is no longer needed. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220301101435.19327-4-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07arm64: lib: Import latest version of Arm Optimized Routines' strncmpJoey Gouly
Import the latest version of the Arm Optimized Routines strncmp function based on the upstream code of string/aarch64/strncmp.S at commit 189dfefe37d5 from: https://github.com/ARM-software/optimized-routines This latest version includes MTE support. Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT OR Apache-2.0 WITH LLVM-exception license. Arm is the sole copyright holder for this code. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220301101435.19327-3-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07arm64: lib: Import latest version of Arm Optimized Routines' strcmpJoey Gouly
Import the latest version of the Arm Optimized Routines strcmp function based on the upstream code of string/aarch64/strcmp.S at commit 189dfefe37d5 from: https://github.com/ARM-software/optimized-routines This latest version includes MTE support. Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT OR Apache-2.0 WITH LLVM-exception license. Arm is the sole copyright holder for this code. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220301101435.19327-2-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22arm64: insn: add encoders for atomic operationsHou Tao
It is a preparation patch for eBPF atomic supports under arm64. eBPF needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg|cmpxchg}. The ordering semantics of eBPF atomics are the same with the implementations in linux kernel. Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra helper is added. atomic_fetch_add() and other atomic ops needs support for STLXR instruction, so extend enum aarch64_insn_ldst_type to do that. LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE atomics is enabled, so just return AARCH64_BREAK_FAULT directly in these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22arm64: clean up symbol aliasingMark Rutland
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those to simplify and more consistently define function aliases across arch/arm64. Aliases are now defined in terms of a canonical function name. For position-independent functions I've made the __pi_<func> name the canonical name, and defined other alises in terms of this. The SYM_FUNC_{START,END}_PI(func) macros obscure the __pi_<func> name, and make this hard to seatch for. The SYM_FUNC_START_WEAK_PI() macro also obscures the fact that the __pi_<func> fymbol is global and the <func> symbol is weak. For clarity, I have removed these macros and used SYM_FUNC_{START,END}() directly with the __pi_<func> name. For example: SYM_FUNC_START_WEAK_PI(func) ... asm insns ... SYM_FUNC_END_PI(func) EXPORT_SYMBOL(func) ... becomes: SYM_FUNC_START(__pi_func) ... asm insns ... SYM_FUNC_END(__pi_func) SYM_FUNC_ALIAS_WEAK(func, __pi_func) EXPORT_SYMBOL(func) For clarity, where there are multiple annotations such as EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For example, where a function has a name and an alias which are both exported, this is organised as: SYM_FUNC_START(func) ... asm insns ... SYM_FUNC_END(func) EXPORT_SYMBOL(func) SYM_FUNC_ALIAS(alias, func) EXPORT_SYMBOL(alias) For consistency with the other string functions, I've defined strrchr as a position-independent function, as it can safely be used as such even though we have no users today. As we no longer use SYM_FUNC_{START,END}_ALIAS(), our local copies are removed. The common versions will be removed by a subsequent patch. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220216162229.1076788-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>