summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2023-08-16um: refactor deprecated strncpy to memcpyJustin Stitt
Use `memcpy` since `console_buf` is not expected to be NUL-terminated and it more accurately describes what is happening with the buffers `console_buf` and `string` as per Kees' analysis [1]. Also mark char buffer as `__nonstring` as per Kees' suggestion [2]. This change now makes it more clear what this code does and that `console_buf` is not expected to be NUL-terminated. Link: https://lore.kernel.org/all/202308081708.D5ADC80F@keescook/ [1] Link: https://github.com/KSPP/linux/issues/90 [2] Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Cc: linux-hardening@vger.kernel.org Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230809-arch-um-v3-1-f63e1122d77e@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-16um: vector: refactor deprecated strncpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`! In this case, we are able to drop the now superfluous `... - 1` instances because `strscpy` will automatically truncate the last byte by setting it to a NUL byte if the source size exceeds the destination size or if the source string is not NUL-terminated. I've also opted to remove the seemingly useless char* casts. I'm not sure why they're present at all since (after expanding the `ifr_name` macro) `ifr.ifr_ifrn.ifrn_name` is a char* already. All in all, `strscpy` is a more robust and less ambiguous interface while also letting us remove some `... -1`'s which cleans things up a bit. [1]: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [2]: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20230807-arch-um-drivers-v1-1-10d602c5577a@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-16x86/srso: Explain the untraining sequences a bit moreBorislav Petkov (AMD)
The goal is to eventually have a proper documentation about all this. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814164447.GFZNpZ/64H4lENIe94@fat_crate.local
2023-08-16x86/cpu/kvm: Provide UNTRAIN_RET_VMPeter Zijlstra
Similar to how it doesn't make sense to have UNTRAIN_RET have two untrain calls, it also doesn't make sense for VMEXIT to have an extra IBPB call. This cures VMEXIT doing potentially unret+IBPB or double IBPB. Also, the (SEV) VMEXIT case seems to have been overlooked. Redefine the meaning of the synthetic IBPB flags to: - ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT) - IBPB_ON_VMEXIT -- issue IBPB on VMEXIT And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains the previous behaviour and issues IBPB on entry+VMEXIT. The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT. Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that check IBPB_ON_VMEXIT. All this avoids having the VMEXIT case having to check both ENTRY_IBPB and IBPB_ON_VMEXIT and simplifies the alternatives. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
2023-08-16x86/cpu: Cleanup the untrain messPeter Zijlstra
Since there can only be one active return_thunk, there only needs be one (matching) untrain_ret. It fundamentally doesn't make sense to allow multiple untrain_ret at the same time. Fold all the 3 different untrain methods into a single (temporary) helper stub. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
2023-08-16x86/cpu: Rename srso_(.*)_alias to srso_alias_\1Peter Zijlstra
For a more consistent namespace. [ bp: Fixup names in the doc too. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org
2023-08-16x86/cpu: Rename original retbleed methodsPeter Zijlstra
Rename the original retbleed return thunk and untrain_ret to retbleed_return_thunk() and retbleed_untrain_ret(). No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
2023-08-16x86/cpu: Clean up SRSO return thunk messPeter Zijlstra
Use the existing configurable return thunk. There is absolute no justification for having created this __x86_return_thunk alternative. To clarify, the whole thing looks like: Zen3/4 does: srso_alias_untrain_ret: nop2 lfence jmp srso_alias_return_thunk int3 srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so add $8, %rsp ret int3 srso_alias_return_thunk: call srso_alias_safe_ret ud2 While Zen1/2 does: srso_untrain_ret: movabs $foo, %rax lfence call srso_safe_ret (jmp srso_return_thunk ?) int3 srso_safe_ret: // embedded in movabs instruction add $8,%rsp ret int3 srso_return_thunk: call srso_safe_ret ud2 While retbleed does: zen_untrain_ret: test $0xcc, %bl lfence jmp zen_return_thunk int3 zen_return_thunk: // embedded in the test instruction ret int3 Where Zen1/2 flush the BTB entry using the instruction decoder trick (test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence (srso_safe_ret()) which forces the function return instruction to speculate into a trap (UD2). This RET will then mispredict and execution will continue at the return site read from the top of the stack. Pick one of three options at boot (evey function can only ever return once). [ bp: Fixup commit message uarch details and add them in a comment in the code too. Add a comment about the srso_select_mitigation() dependency on retbleed_select_mitigation(). Add moar ifdeffery for 32-bit builds. Add a dummy srso_untrain_ret_alias() definition for 32-bit alternatives needing the symbol. ] Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
2023-08-16riscv: dts: change TH1520 files to dual licenseDrew Fustini
Modify the SPDX-License-Identifier for dual license of GPL-2.0 OR MIT. Signed-off-by: Drew Fustini <dfustini@baylibre.com> Acked-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-08-16riscv: dts: thead: add BeagleV Ahead board device treeDrew Fustini
The BeagleV Ahead single board computer uses the T-Head TH1520 SoC. Add a minimal device tree to support basic uart/gpio/dmac drivers so that a user can boot to a basic shell. Link: https://beagleboard.org/beaglev-ahead Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Drew Fustini <dfustini@baylibre.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-08-16riscv: kdump: Implement crashkernel=X,[high,low]Chen Jiahao
On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Here introduce the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20230726175000.2536220-2-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16arm64/sysreg: refactor deprecated strncpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. Which seems to be the case here due to the forceful setting of `buf`'s tail to 0. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`! In this case, we can simplify the logic and also check for any silent truncation by using `strscpy`'s return value. This should have no functional change and yet uses a more robust and less ambiguous interface whilst reducing code complexity. Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Suggested-by: Kees Cook <keescook@chromium.org> Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230811-strncpy-arch-arm64-v2-1-ba84eabffadb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-08-16riscv: kprobes: simulate c.beqz and c.bnezNam Cao
kprobes currently rejects instruction c.beqz and c.bnez. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/1d879dba4e4ee9a82e27625d6483b5c9cfed684f.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.jr and c.jalr instructionsNam Cao
kprobes currently rejects c.jr and c.jalr instructions. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/db8b7787e9208654cca50484f68334f412be2ea9.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.j instructionNam Cao
kprobes currently rejects c.j instruction. Implement it. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/6ef76cd9984b8015826649d13f870f8ac45a2d0d.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: Handle zicsr/zifencei issue between gcc and binutilsMingzheng Xing
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer 20191213 version which moves some instructions from the I extension to the Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds that version, we should explicitly specifying Zicsr and Zifencei via -march to cope with the new changes. but this only occurs when binutils >= 2.36 and GCC >= 11.1.0. It's a different story when binutils < 2.36. binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr and Zifencei extension for -march. binutils-2.35 does not support the Zifencei extension, and does not need to specify Zicsr and Zifencei when working with GCC >= 12.1.0. To make our lives easier, let's relax the check to binutils >= 2.36 in CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases, where clang < 17 or GCC < 11.1.0, we will deal with them in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC. For more information, please refer to: commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils") Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc [0] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd [1] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5a1b31e1e1cee6e9f1c92abff59cdcfff0dddf30 [2] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=729a53530e86972d1143553a415db34e6e01d5d2 [3] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a78fa3fb4ef49 [4] Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20230809165648.21071-1-xingmingzheng@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: uaccess: Return the number of bytes effectively not copiedAlexandre Ghiti
It was reported that the riscv kernel hangs while executing the test in [1]. Indeed, the test hangs when trying to write a buffer to a file. The problem is that the riscv implementation of raw_copy_from_user() does not return the correct number of bytes not written when an exception happens and is fixed up, instead it always returns the initial size to copy, even if some bytes were actually copied. generic_perform_write() pre-faults the user pages and bails out if nothing can be written, otherwise it will access the userspace buffer: here the riscv implementation keeps returning it was not able to copy any byte though the pre-faulting indicates otherwise. So generic_perform_write() keeps retrying to access the user memory and ends up in an infinite loop. Note that before the commit mentioned in [1] that introduced this regression, it worked because generic_perform_write() would bail out if only one byte could not be written. So fix this by returning the number of bytes effectively not written in __asm_copy_[to|from]_user() and __clear_user(), as it is expected. Link: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/ [1] Fixes: ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code") Reported-by: Bo YU <tsu.yubo@gmail.com> Closes: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/#t Reported-by: Aurelien Jarno <aurelien@aurel32.net> Closes: https://lore.kernel.org/linux-riscv/ZNOnCakhwIeue3yr@aurel32.net/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Link: https://lore.kernel.org/r/20230811150604.1621784-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: stack: Fixup independent softirq stack for CONFIG_FRAME_POINTER=nGuo Ren
The independent softirq stack uses s0 to save & restore sp, but s0 would be corrupted when CONFIG_FRAME_POINTER=n. So add s0 in the clobber list to fix the problem. Fixes: dd69d07a5a6c ("riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK") Cc: stable@vger.kernel.org Reported-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Drew Fustini <dfustini@baylibre.com> Link: https://lore.kernel.org/r/20230716001506.3506041-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: stack: Fixup independent irq stack for CONFIG_FRAME_POINTER=nGuo Ren
The independent irq stack uses s0 to save & restore sp, but s0 would be corrupted when CONFIG_FRAME_POINTER=n. So add s0 in the clobber list to fix the problem. Fixes: 163e76cc6ef4 ("riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK") Cc: stable@vger.kernel.org Reported-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Drew Fustini <dfustini@baylibre.com> Link: https://lore.kernel.org/r/20230716001506.3506041-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: correct riscv_insn_is_c_jr() and riscv_insn_is_c_jalr()Nam Cao
The instructions c.jr and c.jalr must have rs1 != 0, but riscv_insn_is_c_jr() and riscv_insn_is_c_jalr() do not check for this. So, riscv_insn_is_c_jr() can match a reserved encoding, while riscv_insn_is_c_jalr() can match the c.ebreak instruction. Rewrite them with check for rs1 != 0. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header") Link: https://lore.kernel.org/r/20230731183925.152145-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: entry: set a0 = -ENOSYS only when syscall != -1Celeste Liu
When we test seccomp with 6.4 kernel, we found errno has wrong value. If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv: entry: Save a0 prior syscall_enter_from_user_mode()"). After analysing code, we think that regs->a0 = -ENOSYS should only be executed when syscall != -1. In __seccomp_filter, when seccomp rejected this syscall with specified errno, they will set a0 to return number as syscall ABI, and then return -1. This return number is finally pass as return number of syscall_enter_from_user_mode, and then is compared with NR_syscalls after converted to ulong (so it will be ULONG_MAX). The condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS is always executed. It covered a0 set by seccomp, so we always get ENOSYS when match seccomp RET_ERRNO rule. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Reported-by: Felix Yan <felixonmars@archlinux.org> Co-developed-by: Ruizhe Pan <c141028@gmail.com> Signed-off-by: Ruizhe Pan <c141028@gmail.com> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Tested-by: Felix Yan <felixonmars@archlinux.org> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230801141607.435192-1-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16powerpc/ptrace: Split gpr32_set_commonChristophe Leroy
objtool reports the following warning: arch/powerpc/kernel/ptrace/ptrace-view.o: warning: objtool: gpr32_set_common+0x23c (.text+0x860): redundant UACCESS disable gpr32_set_common() conditionally opens and closes UACCESS based on whether kbuf pointer is NULL or not. This is wackelig. Split gpr32_set_common() in two fonctions, one for user one for kernel. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix oops in gpr32_set_common_user() due to NULL kbuf] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/b8d6ae4483fcfd17524e79d803c969694a85cc02.1687428075.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/watchpoints: Remove ptrace/perf exclusion trackingBenjamin Gray
ptrace and perf watchpoints were considered incompatible in commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events"), but the logic in that commit doesn't really apply. Ptrace doesn't automatically single step; the ptracer must request this explicitly. And the ptracer can do so regardless of whether a ptrace/perf watchpoint triggered or not: it could single step every instruction if it wanted to. Whatever stopped the ptracee before executing the instruction that would trigger the perf watchpoint is no longer relevant by this point. To get correct behaviour when perf and ptrace are watching the same data we must ignore the perf watchpoint. After all, ptrace has before-execute semantics, and perf is after-execute, so perf doesn't actually care about the watchpoint trigger at this point in time. Pausing before execution does not mean we will actually end up executing the instruction. Importantly though, we don't remove the perf watchpoint yet. This is key. The ptracer is free to do whatever it likes right now. E.g., it can continue the process, single step. or even set the child PC somewhere completely different. If it does try to execute the instruction though, without reinserting the watchpoint (in which case we go back to the start of this example), the perf watchpoint would immediately trigger. This time there is no ptrace watchpoint, so we can safely perform a single step and increment the perf counter. Upon receiving the single step exception, the existing code already handles propagating or consuming it based on whether another subsystem (e.g. ptrace) requested a single step. Again, this is needed with or without perf/ptrace exclusion, because ptrace could be single stepping this instruction regardless of if a watchpoint is involved. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-6-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Simplify watchpoint reinsertionBenjamin Gray
We only remove watchpoints when they have the perf_single_step flag set, so we can reinsert them during the first iteration. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-5-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Track perf single step directly on the breakpointBenjamin Gray
There is a bug in the current watchpoint tracking logic, where the teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it does not have a reference of and parallel threads may be in the process of destroying. This was partially addressed in commit fb822e6076d9 ("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"), but the underlying issue of accessing a struct member in an unknown state still remained. Syzkaller managed to trigger a null pointer derefernce due to the race between the task destructor and checking the pointer and dereferencing it in the loop. While this null pointer dereference could be fixed by using READ_ONCE to access the task up front, that just changes the error to manipulating possbily freed memory. Instead, the breakpoint logic needs to be reworked to remove any dependency on a context or task struct during breakpoint removal. The reason we have this currently is to clear thread.last_hit_ubp. This member is used to differentiate the perf DAWR single-step sequence from other causes of single-step, such as userspace just calling ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because, when the single step interrupt is received, we need to know whether to re-insert the DAWR breakpoint (perf) or not (ptrace / other). arch_unregister_hw_breakpoint() needs to clear this information to prevent dangling pointers to possibly freed memory. These pointers are dereferenced in single_step_dabr_instruction() without a way to check their validity. This patch moves the tracking of this information to the breakpoint itself. This means we no longer have to do anything special to clean up. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-4-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Don't track info persistentlyBenjamin Gray
info is cheap to retrieve, and is likely optimised by the compiler anyway. On the other hand, propagating it across the functions makes it possible to be inconsistent and adds needless complexity. Remove it, and invoke counter_arch_bp() when we need to work with it. As we don't persist it, we just use the local bp array to track whether we are ignoring a breakpoint. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-3-bgray@linux.ibm.com
2023-08-16powerpc/watchpoints: Explain thread_change_pc() moreBenjamin Gray
The behaviour of the thread_change_pc() function is a bit cryptic without being more familiar with how the watchpoint logic handles perf's after-execute semantics. Expand the comment to explain why we can re-insert the breakpoint and unset the perf_single_step flag. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801011744.153973-2-bgray@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via partition information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_PARTITION(0XB1), can be used to get the system affinity domain via partition information. To expose the system affinity domain via partition information, patch adds sysfs file called "affinity_domain_via_partition" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add new entry for AFFINITY_DOMAIN_VIA_PAR in sysinfo_counter_request array, which points to the counter request value "affinity_domain_via_partition" in hv-gpci.c file. Also add a new function called "affinity_domain_via_partition_result_parse" to parse the hcall result and store it in output buffer. The affinity_domain_via_partition sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_PAR_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_partition attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-10-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via domain information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN(0XB0), can be used to get the system affinity domain via domain information. To expose the system affinity domain via domain information, patch adds sysfs file called "affinity_domain_via_domain" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add new entry for AFFINITY_DOMAIN_VIA_DOM in sysinfo_counter_request array, which points to the counter request value "affinity_domain_via_domain" in hv-gpci.c file. The affinity_domain_via_domain sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_DOM_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_domain attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-8-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity ↵Kajol Jain
domain via virtual processor information The hcall H_GET_PERF_COUNTER_INFO with counter request value as AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR(0XA0), can be used to get the system affinity domain via virtual processor information. To expose the system affinity domain via virtual processor information, patch adds sysfs file called "affinity_domain_via_virtual_processor" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. The affinity_domain_via_virtual_processor sysfs file is only available for power10 and above platforms. Add a macro called INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR, which points to the index of NULL placeholder, for affinity_domain_via_virtual_processor attribute in interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.c file. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-6-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor ↵Kajol Jain
config information The hcall H_GET_PERF_COUNTER_INFO with counter request value as PROCESSOR_CONFIG(0X90), can be used to get the system processor configuration information. To expose the system processor config information, patch adds sysfs file called "processor_config" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add enum and sysinfo_counter_request array to get required counter request value in hv-gpci.c file. Also add a new function called "sysinfo_device_attr_create", which will create and return required device attribute to the add_sysinfo_interface_files function. The processor_config sysfs file is only available for power10 and above platforms. Add a new macro called INTERFACE_PROCESSOR_CONFIG_ATTR, which points to the index of NULL placefolder, for processor_config attribute in the interface_attrs array. Also add macro INTERFACE_NULL_ATTR which points to index of NULL attribute in interface_attrs array. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-4-kjain@linux.ibm.com
2023-08-16powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus ↵Kajol Jain
topology information The hcall H_GET_PERF_COUNTER_INFO with counter request value as PROCESSOR_BUS_TOPOLOGY(0XD0), can be used to get the system topology information. To expose the system topology information, patch adds sysfs file called "processor_bus_topology" to the "/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver. Add macro for PROCESSOR_BUS_TOPOLOGY counter request value in hv-gpci.c file. Also add a new function called "systeminfo_gpci_request", to make the H_GET_PERF_COUNTER_INFO hcall with added macro and populates the output buffer. The processor_bus_topology sysfs file is only available for power10 and above platforms. Add a new function called "add_sysinfo_interface_files", which will add processor_bus_topology attribute in the interface_attrs array, only for power10 and above platforms. Also add macro INTERFACE_PROCESSOR_BUS_TOPOLOGY_ATTR in hv-gpci.c file, which points to the index of NULL placefolder, for processor_bus_topology attribute. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230729073455.7918-2-kjain@linux.ibm.com
2023-08-16powerpc: Make virt_to_pfn() a static inlineLinus Walleij
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings. Move the virt_to_pfn() and related functions below the declaration of __pa() so it compiles. For symmetry do the same with pfn_to_kaddr(). As the file is included right into the linker file, we need to surround the functions with ifndef __ASSEMBLY__ so we don't cause compilation errors. The conversion moreover exposes the fact that pmd_page_vaddr() was returning an unsigned long rather than a const void * as could be expected, so all the sites defining pmd_page_vaddr() had to be augmented as well. Finally the KVM code in book3s_64_mmu_hv.c was passing an unsigned int to virt_to_phys() so fix that up with a cast so the result compiles. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> [mpe: Fixup kfence.h, simplify pfn_to_kaddr() & pmd_page_vaddr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230809-virt-to-phys-powerpc-v1-1-12e912a7d439@linaro.org
2023-08-16powerpc/powernv/pci: use pci_dev_id() to simplify the codeXiongfeng Wang
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it mannually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230804080435.191196-1-wangxiongfeng2@huawei.com
2023-08-16powerpc/xics: Remove unnecessary endian conversionGautam Menghani
Remove an unnecessary piece of code that does an endianness conversion but does not use the result. The following warning was reported by Clang's static analyzer: arch/powerpc/sysdev/xics/ics-opal.c:114:2: warning: Value stored to 'server' is never read [deadcode.DeadStores] server = be16_to_cpu(oserver); 'server' was used as a parameter to opal_get_xive() in commit 5c7c1e9444d8 ("powerpc/powernv: Add OPAL ICS backend") when it was introduced. 'server' was also used in an error message for the call to opal_get_xive(). 'server' was always later set by a call to ics_opal_mangle_server() before being used. Commit bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS backend") used a new variable 'oserver' as the parameter to opal_get_xive() instead of 'server' for endian correctness. It also removed 'server' from the error message for the call to opal_get_xive(). Fix the warning by removing the server variable assignment. Fixes: bf8e0f891a32 ("powerpc/powernv: Fix endian issues in OPAL ICS backend") Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230731115543.36991-1-gautam@linux.ibm.com
2023-08-16powerpc/pseries: fix possible memory leak in ibmebus_bus_init()ruanjinjie
If device_register() returns error in ibmebus_bus_init(), name of kobject which is allocated in dev_set_name() called in device_add() is leaked. As comment of device_add() says, it should call put_device() to drop the reference count that was set in device_initialize() when it fails, so the name can be freed in kobject_cleanup(). Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20221110011929.3709774-1-ruanjinjie@huawei.com
2023-08-16powerpc: remove <asm/export.h>Masahiro Yamada
All *.S files under arch/powerpc/ have been converted to include <linux/export.h> instead of <asm/export.h>. Remove <asm/export.h>. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230806150954.394189-3-masahiroy@kernel.org
2023-08-16powerpc: replace #include <asm/export.h> with #include <linux/export.h>Masahiro Yamada
Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>. Replace #include <asm/export.h> with #include <linux/export.h>. After all the <asm/export.h> lines are converted, <asm/export.h> and <asm-generic/export.h> will be removed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> [mpe: Fixup selftests that stub asm/export.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230806150954.394189-2-masahiroy@kernel.org
2023-08-16powerpc: remove unneeded #include <asm/export.h>Masahiro Yamada
There is no EXPORT_SYMBOL line there, hence #include <asm/export.h> is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230806150954.394189-1-masahiroy@kernel.org
2023-08-16powerpc/inst: add PPC_TLBILX_LPIDNick Desaulniers
Clang didn't recognize the instruction tlbilxlpid. This was fixed in clang-18 [0] then backported to clang-17 [1]. To support clang-16 and older, rather than using that instruction bare in inline asm, add it to ppc-opcode.h and use that macro as is done elsewhere for other instructions. Link: https://github.com/ClangBuiltLinux/linux/issues/1891 Link: https://github.com/llvm/llvm-project/issues/64080 Link: https://github.com/llvm/llvm-project/commit/53648ac1d0c953ae6d008864dd2eddb437a92468 [0] Link: https://github.com/llvm/llvm-project-release-prs/commit/0af7e5e54a8c7ac665773ac1ada328713e8338f5 [1] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/llvm/202307211945.TSPcyOhh-lkp@intel.com/ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230803-ppc_tlbilxlpid-v3-1-ca84739bfd73@google.com
2023-08-16powerpc/reg: Remove #ifdef around mtspr macroChristophe Leroy
That ifdef was introduced by commit 1458dd951f7c ("powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro") and left over by commit 2a45addd21de ("powerpc/8xx: Remove CPU6 ERRATA Workaround") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/cf652e47ea9e453e89813611b6f76d0939a12063.1687344017.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/step: Mark __copy_mem_out() and __emulate_dcbz() __always_inlineChristophe Leroy
objtool reports two folliwng warnings: arch/powerpc/lib/sstep.o: warning: objtool: copy_mem_out+0x3c (.text+0x30c): call to __copy_mem_out() with UACCESS enabled arch/powerpc/lib/sstep.o: warning: objtool: emulate_dcbz+0x70 (.text+0x4dc): call to __emulate_dcbz() with UACCESS enabled Mark __copy_mem_out() and __emulate_dcbz() __always_inline Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/f1d4a15da70190f8c2fcddb377bbc1e09827242c.1687343857.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/cpm2: Remove cpm2_map() and cpm2_unmap()Christophe Leroy
Since commit 449012daa92a ("[POWERPC] cpm2: Infrastructure code cleanup.") cpm2_map() is just returning cpm2_immr pointer and cpm2_unmap() does nothing. We already have parts of code that use cpm2_immr directly so get rid of cpm2_map() and cpm2_unmap() by using cpm2_immr directly. And avoid going through local pointers that hide the pointed structure. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/9fe6ff7284e9f968b12abe7de7c08d7ea40e29d6.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/8xx: Remove immr_map() and immr_unmap()Christophe Leroy
Since commit fb533d0c5a97 ("[POWERPC] 8xx: Infrastructure code cleanup.") immr_map() is just returning mpc8xxx_immr pointer and immr_unmap() do nothing. We already have parts of code that use mpc8xxx_immr directly so get rid of immr_map() and immr_unmap() by using mpc8xxx_immr directly. And avoid going through local pointers that hide the pointed structure. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/633ed46f6015ff44d5599258647ea517f75d6a1d.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16powerpc: Remove CONFIG_PCI_8260Christophe Leroy
CONFIG_PCI_8260 is not used anymore, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/19a4c07466ce8b80f287a06eadcc80c4ab1d2c9e.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/include: Remove mpc8260.h and m82xx_pci.hChristophe Leroy
SIU_INT_IRQ1 is not used anywhere and __IO_BASE is defined in asm/io.h Remove m82xx_pci.h Then the only thing remaining in mpc8260.h is MPC82XX_BCR_PLDP Move MPC82XX_BCR_PLDP into asm/cpm2.h then remove mpc8260.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/afe23bf3624c389ff17e9789884c78c124b7b202.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/include: Declare mpc8xx_immr in 8xx_immap.hChristophe Leroy
Do the same as for cmp2_immr : declare it at the same place as its type immap_t, that is in 8xx_immap.h instead of fs_pd.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/62d490b65899c2f2667ca7045c5f0fad9cbda458.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16powerpc/include: Remove unneeded #include <asm/fs_pd.h>Christophe Leroy
tqm8xx_setup.c and fs_enet.h don't use any items provided by fs_pd.h Remove unneeded #include <asm/fs_pd.h> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/b056c4e986a4a7707fc1994304c34f7bd15d6871.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16ocxl: Use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it mannually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230811102039.17257-1-zhengzengkai@huawei.com
2023-08-16arm64: sysreg: Generate C compiler warnings on {read,write}_sysreg_s argumentsJames Clark
Evaluate the register before the asm section so that the C compiler generates warnings when there is an issue with the register argument. This will prevent possible future issues such as the one seen here [1] where a missing bracket caused the shift and addition operators to be evaluated in the wrong order, but no warning was emitted. The GNU assembler has no warning for when expressions evaluate differently to C due to different operator precedence, but the C compiler has some warnings that may suggest something is wrong. For example in this case the following warning would have been emitted: error: operator '>>' has lower precedence than '+'; '+' will be evaluated first [-Werror,-Wshift-op-parentheses] There are currently no existing warnings that need to be fixed. [1]: https://lore.kernel.org/linux-perf-users/20230728162011.GA22050@willie-the-truck/ Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20230815140639.614769-1-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org>