summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2022-01-05powerpc/cacheinfo: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the powerpc cacheinfo sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220104155450.1291277-1-gregkh@linuxfoundation.org
2021-12-25powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warningsMichael Ellerman
When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing BUGFLAG_WARNING. However that fails to create an exception table entry for the warning, and so do_program_check() fails the exception table search and proceeds to call _exception(), resulting in an oops like: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P 5.16.0-rc2-00194-g91ca3d4f77c5 #12 NIP: c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000 ... NIP [c00000000000c5b0] system_call_common+0x150/0x268 LR [0000000000000000] 0x0 Call Trace: [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable) ... Instruction dump: 7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178 7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088 We should instead use EMIT_WARN_ENTRY, which creates an exception table entry for the warning, allowing the warning to be correctly recognised, and the code to resume after printing the warning. Note however that because this warning is buried deep in the interrupt return path, we are not able to recover from it (due to MSR_RI being clear), so we still end up in die() with an unrecoverable exception. Fixes: 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au
2021-12-25powerpc/64s: Mask NIP before checking against SRR0Michael Ellerman
When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when returning from interrupts. This can trigger falsely if NIP has either of its two low bits set via sigreturn or ptrace, while SRR0 has its low two bits masked in hardware. As a quick fix make sure to mask the low bits before doing the check. Fixes: 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au
2021-12-23powerpc/32: Fix boot failure with GCC latent entropy pluginChristophe Leroy
Boot fails with GCC latent entropy plugin enabled. This is due to early boot functions trying to access 'latent_entropy' global data while the kernel is not relocated at its final destination yet. As there is no way to tell GCC to use PTRRELOC() to access it, disable latent entropy plugin in early_32.o and feature-fixups.o and code-patching.o Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217 Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu
2021-12-23powerpc/mm: Switch obsolete dssall to .longAlexey Kardashevskiy
The dssall ("Data Stream Stop All") instruction is obsolete altogether with other Data Cache Instructions since ISA 2.03 (year 2006). LLVM IAS does not support it but PPC970 seems to be using it. This switches dssall to .long as there is no much point in fixing LLVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23powerpc/64/asm: Do not reassign labelsDaniel Axtens
The LLVM integrated assembler really does not like us reassigning things to the same label: <instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label' This happens across a bunch of platforms: https://github.com/ClangBuiltLinux/linux/issues/1043 https://github.com/ClangBuiltLinux/linux/issues/1008 https://github.com/ClangBuiltLinux/linux/issues/920 https://github.com/ClangBuiltLinux/linux/issues/1050 There is no hope of getting this fixed in LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200 and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 ) so if we want to build with LLVM_IAS, we need to hack around it ourselves. For us the big problem comes from this: \#define USE_FIXED_SECTION(sname) \ fs_label = start_##sname; \ fs_start = sname##_start; \ use_ftsec sname; \#define USE_TEXT_SECTION() fs_label = start_text; \ fs_start = text_start; \ .text and in particular fs_label. This works around it by not setting those 'variables' and requiring that users of the variables instead track for themselves what section they are in. This isn't amazing, by any stretch, but it gets us further in the compilation. Note that even though users have to keep track of the section, using a wrong one produces an error with both binutils and llvm which prevents from using wrong section at the compile time: llvm error example: AS arch/powerpc/kernel/head_64.o <unknown>:0: error: Cannot represent a difference across sections make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1 binutils error example: /home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages: /home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r eal_vectors' {.head.text.real_vectors section} make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1 Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23powerpc/64/asm: Inline BRANCH_TO_C000Alexey Kardashevskiy
It is used just once and does not really help with readability, remove it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23powerpc/toc: Future proof kernel tocAlan Modra
This patch future-proofs the kernel against linker changes that might put the toc pointer at some location other than .got+0x8000, by replacing __toc_start+0x8000 with .TOC. throughout. If the kernel's idea of the toc pointer doesn't agree with the linker, bad things happen. prom_init.c code relocating its toc is also changed so that a symbolic __prom_init_toc_start toc-pointer relative address is calculated rather than assuming that it is always at toc-pointer - 0x8000. The length calculations loading values from the toc are also avoided. It's a little incestuous to do that with unreloc_toc picking up adjusted values (which is fine in practice, they both adjust by the same amount if all goes well). I've also changed the way .got is aligned in vmlinux.lds and zImage.lds, mostly so that dumping out section info by objdump or readelf plainly shows the alignment is 256. This linker script feature was added 2005-09-27, available in FSF binutils releases from 2.17 onwards. Should be safe to use in the kernel, I think. Finally, put *(.got) before the prom_init.o entry which only needs *(.toc), so that the GOT header goes in the correct place. I don't believe this makes any difference for the kernel as it would for dynamic objects being loaded by ld.so. That change is just to stop lusers who blindly copy kernel scripts being led astray. Of course, this change needs the prom_init.c changes. Some notes on .toc and .got. .toc is a compiler generated section of addresses. .got is a linker generated section of addresses, generally built when the linker sees R_*_*GOT* relocations. In the case of powerpc64 ld.bfd, there are multiple generated .got sections, one per input object file. So you can somewhat reasonably write in a linker script an input section statement like *prom_init.o(.got .toc) to mean "the .got and .toc section for files matching *prom_init.o". On other architectures that doesn't make sense, because the linker generally has just one .got section. Even on powerpc64, note well that the GOT entries for prom_init.o may be merged with GOT entries from other objects. That means that if prom_init.o references, say, _end via some GOT relocation, and some other object also references _end via a GOT relocation, the GOT entry for _end may be in the range __prom_init_toc_start to __prom_init_toc_end and if the kernel does something special to GOT/TOC entries in that range then the value of _end as seen by objects other than prom_init.o will be affected. On the other hand the GOT entry for _end may not be in the range __prom_init_toc_start to __prom_init_toc_end. Which way it turns out is deterministic but a detail of linker operation that should not be relied on. A feature of ld.bfd is that input .toc (and .got) sections matching one linker input section statement may be sorted, to put entries used by small-model code first, near the toc base. This is why scripts for powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since the first form allows more freedom to sort. Another feature of ld.bfd is that indirect addressing sequences using the GOT/TOC may be edited by the linker to relative addressing. In many cases relative addressing would be emitted by gcc for -mcmodel=medium if you appropriately decorate variable declarations with non-default visibility. The original patch is here: https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/ Signed-off-by: Alan Modra <amodra@au1.ibm.com> [aik: removed non-relocatable which is gone in 24d33ac5b8ffb] [aik: added <=2.24 check] [aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23powerpc/kernel: Add __init attribute to eligible functionsNick Child
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/ kexec`) are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-16powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interruptsNicholas Piggin
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is a good idea to have for most synchronous interrupts, but for async ones it is much less important. Doorbell, external, and decrementer interrupts are the important asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible, because it might be a guest exit that requires CFAR preserved. But the important pseries interrupts can avoid loading CFAR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is ↵Nicholas Piggin
in use Enabling MSR[EE] in interrupt handlers while interrupts are still soft masked allows PMIs to profile interrupt handlers to some degree, beyond what SIAR latching allows. When perf is not being used, this is almost useless work. It requires an extra mtmsrd in the irq handler, and it also opens the door to masked interrupts hitting and requiring replay, which is more expensive than just taking them directly. This effect can be noticable in high IRQ workloads. Avoid enabling MSR[EE] unless perf is currently in use. This saves about 60 cycles (or 8%) on a simple decrementer interrupt microbenchmark. Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%. This does prevent the soft-nmi interrupt being taken in these handlers, but that's not too reliable anyway. The SMP watchdog will continue to be the reliable way to catch lockups. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapperNicholas Piggin
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable MSR[EE] in interrupt entry code, for those interrupts which enable EE. This helps performance of important synchronous interrupts (e.g., page faults). This is similar to what commit dd152f70bdc1 ("powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]") does for system calls. Do this by enabling EE and RI together at the beginning of the entry wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is set. Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones leave it unchanged, so by default they always get EE=1 unless they have interrupted a caller that is hard disabled. When the sync interrupt later calls interrupt_cond_local_irq_enable(), it will not require another mtmsrd because MSR[EE] was already enabled here. This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which saves about 20 cycles on POWER9. And for kernel-mode interrupts, both synchronous and asynchronous, this saves an additional 40 cycles due to the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR scoreboard stall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-09powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panicHari Bathini
In panic path, fadump is triggered via a panic notifier function. Before calling panic notifier functions, smp_send_stop() gets called, which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc ("powerpc: stop_this_cpu: remove the cpu from the online map.") and again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") started marking CPUs as offline while stopping them. So, if a kernel has either of the above commits, vmcore captured with fadump via panic path would not process register data for all CPUs except the panic'ing CPU. Sample output of crash-utility with such vmcore: # crash vmlinux vmcore ... KERNEL: vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Wed Nov 10 09:56:34 EST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 2.27, 0.69, 0.24 TASKS: 183 NODENAME: XXXXXXXXX RELEASE: 5.15.0+ VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 MACHINE: ppc64le (2500 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3394 COMMAND: "bash" TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> crash> crash> p -x __cpu_active_mask __cpu_active_mask = $2 = { bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> While this has been the case since fadump was introduced, the issue was not identified for two probable reasons: - In general, the bulk of the vmcores analyzed were from crash due to exception. - The above did change since commit 8341f2f222d7 ("sysrq: Use panic() to force a crash") started using panic() instead of deferencing NULL pointer to force a kernel crash. But then commit de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs") stopped marking CPUs as offline till kernel commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") reverted that change. To ensure post processing register data of all other CPUs happens as intended, let panic() function take the crash friendly path (read crash_smp_send_stop()) with the help of crash_kexec_post_notifiers option. Also, as register data for all CPUs is captured by f/w, skip IPI callbacks here for fadump, to avoid any complications in finding the right backtraces. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09powerpc: handle kdump appropriately with crash_kexec_post_notifiers optionHari Bathini
Kdump can be triggered after panic_notifers since commit f06e5153f4ae2 ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers") introduced crash_kexec_post_notifiers option. But using this option would mean smp_send_stop(), that marks all other CPUs as offline, gets called before kdump is triggered. As a result, kdump routines fail to save other CPUs' registers. To fix this, kdump friendly crash_smp_send_stop() function was introduced with kernel commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path"). Override this kdump friendly weak function to handle crash_kexec_post_notifiers option appropriately on powerpc. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> [Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09powerpc/inst: Define ppc_inst_t as u32 on PPC32Christophe Leroy
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined as a struct, even when the struct has a single simple element. To avoid that, define ppc_inst_t as u32 on PPC32. Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that sparse can perform type checking. Also revert commit 511eea5e2ccd ("powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to emulate_step() on ppc32") as now the instruction to be emulated is passed as a register to emulate_step(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/inst: Define ppc_inst_tChristophe Leroy
In order to stop using 'struct ppc_inst' on PPC32, define a ppc_inst_t typedef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.Christophe Leroy
This adds KUAP support to 85xx in 32 bits mode. This is done by reading the content of SPRN_MAS1 and checking the TID at the time user pgtable is loaded. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Wire-up KUAP on 40xChristophe Leroy
This adds KUAP support to 40x. This is done by checking the content of SPRN_PID at the time user pgtable is loaded. 40x doesn't have KUEP, but KUAP implies KUEP because when the PID doesn't match the page's PID, the page cannot be read nor executed. So KUEP is now automatically selected when KUAP is selected and disabled when KUAP is disabled. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Wire-up KUAP on 44xChristophe Leroy
This adds KUAP support to 44x. This is done by checking the content of SPRN_PID at the time it is read and written into SPRN_MMUCR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc: Add KUAP support for BOOKE and 40xChristophe Leroy
On booke/40x we don't have segments like book3s/32. On booke/40x we don't have access protection groups like 8xx. Use the PID register to provide user access protection. Kernel address space can be accessed with any PID. User address space has to be accessed with the PID of the user. User PID is always not null. Everytime the kernel is entered, set PID register to 0 and restore PID register when returning to user. Everytime kernel needs to access user data, PID is restored for the access. In TLB miss handlers, check the PID and bail out to data storage exception when PID is 0 and accessed address is in user space. Note that also forbids execution of user text by kernel except when user access is unlocked. But this shouldn't be a problem as the kernel is not supposed to ever run user text. This patch prepares the infrastructure but the real activation of KUAP is done by following patches for each processor type one by one. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64Christophe Leroy
Also call kuap_lock() and kuap_save_and_lock() from interrupt functions with CONFIG_PPC64. For book3s/64 we keep them empty as it is done in assembly. Also do the locked assert when switching task unless it is book3s/64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/config: Add CONFIG_BOOKE_OR_40xChristophe Leroy
We have many functionnalities common to 40x and BOOKE, it leads to many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x). We are going to add a few more with KUAP for booke/40x, so create a new symbol which is defined when either BOOKE or 40x is defined. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Add kuap_lock()Christophe Leroy
Add kuap_lock() and call it when entering interrupts from user. It is called kuap_lock() as it is similar to kuap_save_and_lock() without the save. However book3s/32 already have a kuap_lock(). Rename it kuap_lock_addr(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/32s: Save content of sr0 to avoid 'mfsr'Christophe Leroy
Calling 'mfsr' to get the content of segment registers is heavy, in addition it requires clearing of the 'reserved' bits. In order to avoid this operation, save it in mm context and in thread struct. The saved sr0 is the one used by kernel, this means that on locking entry it can be used as is. For unlocking, the only thing to do is to clear SR_NX. This improves null_syscall selftest by 12 cycles, ie 4%. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/32s: Do kuep_lock() and kuep_unlock() in assemblyChristophe Leroy
When interrupt and syscall entries where converted to C, KUEP locking and unlocking was also converted. It improved performance by unrolling the loop, and allowed easily implementing boot time deactivation of KUEP. However, null_syscall selftest shows that KUEP is still heavy (361 cycles with KUEP, 212 cycles without). A way to improve more is to group 'mtsr's together, instead of repeating 'addi' + 'mtsr' several times. In order to do that, more registers need to be available. In C, GCC will always be able to provide the requested number of registers, but at the cost of saving some data on the stack, which is counter performant here. So let's do it in assembly, when we have full control of which register can be used. It also has the advantage of locking earlier and unlocking later and it helps GCC generating less tricky code. The only drawback is to make boot time deactivation less straight forward and require 'hand' instruction patching. Group 'mtsr's by 4. With this change, null_syscall selftest reports 336 cycles. Without the change it was 361 cycles, that's a 7% reduction. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/book3e: Activate KUEP at all timeChristophe Leroy
On book3e, - When using 64 bits PTE: User pages don't have the SX bit defined so KUEP is always active. - When using 32 bits PTE: Implement KUEP by clearing SX bit during TLB miss for user pages. The impact is minimal and worth neither boot time nor build time selection. Activate it at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/44x: Activate KUEP at all timeChristophe Leroy
On 44x, KUEP is implemented by clearing SX bit during TLB miss for user pages. The impact is minimal and not worth neither boot time nor build time selection. Activate it at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/40x: Map 32Mbytes of memory at startupChristophe Leroy
As reported by Carlo, 16Mbytes is not enough with modern kernels that tend to be a bit big, so map another 16M page at boot. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMUNicholas Piggin
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09powerpc/64s: Make hash MMU support configurableNicholas Piggin
This adds Kconfig selection which allows 64s hash MMU support to be disabled. It can be disabled if radix support is enabled, the minimum supported CPU type is POWER9 (or higher), and KVM is not selected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-02powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radixNicholas Piggin
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu atom_size to always be PAGE_SIZE. 64e sets it to 1GB always. Make paths for these platforms to be explicit about what value they set atom_size to. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabledNicholas Piggin
The radix test can exclude slb_flush_all_realmode() from being called because flush_and_reload_slb() is only expected to flush ERAT when called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not support radix. This helps the later change to make hash support configurable to not introduce runtime changes to radix mode behaviour. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specificNicholas Piggin
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with segment interrupts that occur with radix MMU as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02powerpc/signal32: Use struct_group() to zero spe regsKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the spe registers so that memset() can correctly reason about the size: In function 'fortify_memset_chk', inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3: >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 195 | __write_overflow_field(); | ^~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-11-30powerpc/modules: Don't WARN on first module allocation attemptChristophe Leroy
module_alloc() first tries to allocate module text within 24 bits direct jump from kernel text, and tries a wider allocation if first one fails. When first allocation fails the following is observed in kernel logs: vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null) CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G W 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4 [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c [e2a53c10] [c00338c0] module_alloc+0xa0/0xac [e2a53c40] [c027a368] load_module+0x2ae0/0x8148 [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130 [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28 ... Add __GFP_NOWARN flag to first allocation so that no warning appears when it fails. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-29powerpc: flexible GPR range save/restore macrosNicholas Piggin
Introduce macros that operate on a (start, end) range of GPRs, which reduces lines of code and need to do mental arithmetic while reading the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29powerpc/watchdog: help remote CPUs to flush NMI printk outputNicholas Piggin
The printk layer at the moment does not seem to have a good way to force flush printk messages that are created in NMI context, except in the panic path. NMI-context printk messages normally get to the console with irq_work, but that won't help if the CPU is stuck with irqs disabled, as can be the case for hard lockup watchdog messages. The watchdog currently flushes the printk buffers after detecting a lockup on remote CPUs, but they may not have processed their NMI IPI yet by that stage, or they may have self-detected a lockup in which case they won't go via this NMI IPI path. Improve the situation by having NMI-context mark a flag if it called printk, and have watchdog timer interrupts check if that flag was set and try to flush if it was. Latency is not a big problem because we were already stuck for a while, just need to try to make sure the messages eventually make it out. Depends-on: 5d5e4522a7f4 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29powerpc: Don't bother about .data..Lubsan sectionsChristophe Leroy
Since commit 9a427556fb8e ("vmlinux.lds.h: catch compound literals into data and BSS") .data..Lubsan sections are taken into account in DATA_MAIN which is included in DATA_DATA macro. No need to take care of them anymore in powerpc vmlinux.lds.S Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32Christophe Leroy
Unlike PPC64, PPC32 doesn't require any special compiler option to get _mcount() call not clobbering registers. Provide ftrace_regs_caller() and ftrace_regs_call() and activate HAVE_DYNAMIC_FTRACE_WITH_REGS. That's heavily copied from ftrace_64_mprofile.S For the time being leave livepatching aside, it will come with following patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: Add module_trampoline_target() for PPC32Christophe Leroy
module_trampoline_target() is used by __ftrace_modify_call(). Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS can be activated on PPC32 as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: No need to read LR from stack in _mcount()Christophe Leroy
All functions calling _mcount do it exactly the same way, with the following sequence of instructions: c07de788: 7c 08 02 a6 mflr r0 c07de78c: 90 01 00 04 stw r0,4(r1) c07de790: 4b 84 13 65 bl c001faf4 <_mcount> Allthough LR is pushed on stack, it is still in r0 while entering _mcount(). Function arguments are in r3-r10, so r11 and r12 are still available at that point. Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved and doesn't need to be restored from the stack. While at it, bring back the EXPORT_SYMBOL at the end of _mcount. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc: Mark probe_machine() __init and staticMichael Ellerman
Prior to commit b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") probe_machine() was called from setup_32/64.c and lived in setup-common.c. But now it's only called from setup-common.c so it can be static and __init, and we don't need the declaration in machdep.h either. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILINGMichael Ellerman
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled. Fixes the following W=1 warning when CONFIG_PROFILING=n: linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-25powerpc/watchdog: Fix wd_smp_last_reset_tb reportingNicholas Piggin
wd_smp_last_reset_tb now gets reset by watchdog_smp_panic() as part of marking CPUs stuck and removing them from the pending mask before it begins any printing. This causes last reset times reported to be off. Fix this by reading it into a local variable before it gets reset. Fixes: 76521c4b0291 ("powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211125103346.1188958-1-npiggin@gmail.com
2021-11-25powerpc/watchdog: read TB close to where it is usedNicholas Piggin
When taking watchdog actions, printing messages, comparing and re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use and under wd_smp_lock or printing lock (if applicable). This should keep timebase mostly monotonic with kernel log messages, and could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to something a long way in the past, and causing other CPUs to appear to be stuck. These additional TB reads are all slowpath (lockup has been detected), so performance does not matter. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110025056.2084347-5-npiggin@gmail.com
2021-11-25powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipiNicholas Piggin
There is a deadlock with the console_owner lock and the wd_smp_lock: CPU x takes the console_owner lock CPU y takes a watchdog timer interrupt and takes __wd_smp_lock CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock CPU y detects deadlock, tries to print something and spins on console_owner -> deadlock Change the watchdog locking scheme so wd_smp_lock protects the watchdog internal data, but "reporting" (printing, issuing NMI IPIs, taking any action outside of watchdog) uses a non-waiting exclusion. If a CPU detects a problem but can not take the reporting lock, it just returns because something else is already reporting. It will try again at some point. Typically hard lockup watchdog report usefulness is not impacted due to failure to spewing a large enough amount of data in as short a time as possible, but by messages getting garbled. Laurent debugged this and found the deadlock, and this patch is based on his general approach to avoid expensive operations while holding the lock. With the addition of the reporting exclusion. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> [np: rework to add reporting exclusion update changelog] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
2021-11-25powerpc/watchdog: tighten non-atomic read-modify-write accessNicholas Piggin
Most updates to wd_smp_cpus_pending are under lock except the watchdog interrupt bit clear. This can race with non-atomic RMW updates to the mask under lock, which can happen in two instances: Firstly, if another CPU detects this one is stuck, removes it from the mask, mask becomes empty and is re-filled with non-atomic stores. This is okay because it would re-fill the mask with this CPU's bit clear anyway (because this CPU is now stuck), so it doesn't matter that the bit clear update got "lost". Add a comment for this. Secondly, if another CPU detects a different CPU is stuck and removes it from the pending mask with a non-atomic store to bytes which also include the bit of this CPU. This case can result in the bit clear being lost and the end result being the bit is set. This should be so rare it hardly matters, but to make things simpler to reason about just avoid the non-atomic access for that case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110025056.2084347-3-npiggin@gmail.com
2021-11-25powerpc/watchdog: Fix missed watchdog reset due to memory ordering raceNicholas Piggin
It is possible for all CPUs to miss the pending cpumask becoming clear, and then nobody resetting it, which will cause the lockup detector to stop working. It will eventually expire, but watchdog_smp_panic will avoid doing anything if the pending mask is clear and it will never be reset. Order the cpumask clear vs the subsequent test to close this race. Add an extra check for an empty pending mask when the watchdog fires and finds its bit still clear, to try to catch any other possible races or bugs here and keep the watchdog working. The extra test in arch_touch_nmi_watchdog is required to prevent the new warning from firing off. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Debugged-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
2021-11-25powerpc/prom_init: Fix improper check of prom_getprop()Peiwei Hu
prom_getprop() can return PROM_ERROR. Binary operator can not identify it. Fixes: 94d2dde738a5 ("[POWERPC] Efika: prune fixups and make them more carefull") Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com
2021-11-25powerpc/rtas: rtas_busy_delay_time() kernel-docNathan Lynch
Provide API documentation for rtas_busy_delay_time(), explaining why we return the same value for 9900 and -2. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com