summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/powernv/opal.c
AgeCommit message (Collapse)Author
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - some of the rest of MM - various misc things - dynamic-debug updates - checkpatch - some epoll speedups - autofs - rapidio - lib/, lib/lzo/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) samples/mic/mpssd/mpssd.h: remove duplicate header kernel/fork.c: remove duplicated include include/linux/relay.h: fix percpu annotation in struct rchan arch/nios2/mm/fault.c: remove duplicate include unicore32: stop printing the virtual memory layout MAINTAINERS: fix GTA02 entry and mark as orphan mm: create the new vm_fault_t type arm, s390, unicore32: remove oneliner wrappers for memblock_alloc() arch: simplify several early memory allocations openrisc: simplify pte_alloc_one_kernel() sh: prefer memblock APIs returning virtual address microblaze: prefer memblock API returning virtual address powerpc: prefer memblock APIs returning virtual address lib/lzo: separate lzo-rle from lzo lib/lzo: implement run-length encoding lib/lzo: fast 8-byte copy on arm64 lib/lzo: 64-bit CTZ on arm64 lib/lzo: tidy-up ifdefs ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size ipc: annotate implicit fall through ...
2019-03-07arch: simplify several early memory allocationsMike Rapoport
There are several early memory allocations in arch/ code that use memblock_phys_alloc() to allocate memory, convert the returned physical address to the virtual address and then set the allocated memory to zero. Exactly the same behaviour can be achieved simply by calling memblock_alloc(): it allocates the memory in the same way as memblock_phys_alloc(), then it performs the phys_to_virt() conversion and clears the allocated memory. Replace the longer sequence with a simpler call to memblock_alloc(). Link: http://lkml.kernel.org/r/1546248566-14910-6-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-22Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge commits we're sharing with kvm-ppc tree.
2019-02-21powerpc/64s: Better printing of machine check info for guest MCEsPaul Mackerras
This adds an "in_guest" parameter to machine_check_print_event_info() so that we can avoid trying to translate guest NIP values into symbolic form using the host kernel's symbol table. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-30powerpc/powernv: Remove duplicate headerSabyasachi Gupta
Remove linux/printk.h which is included more than once. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc: Use of_node_name_eq for node name comparisonsRob Herring
Convert string compares of DT node names to use of_node_name_eq helper instead. This removes direct access to the node name pointer. A couple of open coded iterating thru the child node names are converted to use for_each_child_of_node() instead. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/powernv: Move opal_power_control_init() call in opal_init().Mahesh Salgaonkar
opal_power_control_init() depends on opal message notifier to be initialized, which is done in opal_init()->opal_message_init(). But both these initialization are called through machine initcalls and it all depends on in which order they being called. So far these are called in correct order (may be we got lucky) and never saw any issue. But it is clearer to control initialization order explicitly by moving opal_power_control_init() into opal_init(). Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-31memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*Mike Rapoport
Make it explicit that the caller gets a physical address rather than a virtual one. This will also allow using meblock_alloc prefix for memblock allocations returning virtual address, which is done in the following patches. The conversion is done using the following semantic patch: @@ expression e1, e2, e3; @@ ( - memblock_alloc(e1, e2) + memblock_phys_alloc(e1, e2) | - memblock_alloc_nid(e1, e2, e3) + memblock_phys_alloc_nid(e1, e2, e3) | - memblock_alloc_try_nid(e1, e2, e3) + memblock_phys_alloc_try_nid(e1, e2, e3) ) Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-03powerpc/64s: consolidate MCE counter increment.Michal Suchanek
The code in machine_check_exception excludes 64s hvmode when incrementing the MCE counter only to call opal_machine_check to increment it specifically for this case. Remove the exclusion and special case. Fixes: a43c1590426c ("powerpc/pseries: Flush SLB contents on SLB MCE errors.") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/powernv: Mark function as __noreturnBreno Leitao
There is a mismatch between function pnv_platform_error_reboot() definition and declaration regarding function modifiers. In the declaration part, it contains the function attribute __noreturn, while function definition itself lacks it. This was reported by sparse tool as an error: arch/powerpc/platforms/powernv/opal.c:538:6: error: symbol 'pnv_platform_error_reboot' redeclared with different type (originally declared at arch/powerpc/platforms/powernv/powernv.h:11) - different modifiers I checked and the function is already being considered as being 'noreturn' by the compiler, thus, I understand this patch does not change any code being generated. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/pseries: Flush SLB contents on SLB MCE errors.Mahesh Salgaonkar
On pseries, as of today system crashes if we get a machine check exceptions due to SLB errors. These are soft errors and can be fixed by flushing the SLBs so the kernel can continue to function instead of system crash. We do this in real mode before turning on MMU. Otherwise we would run into nested machine checks. This patch now fetches the rtas error log in real mode and flushes the SLBs on SLB/ERAT errors. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michal Suchanek <msuchanek@suse.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20powerpc/powernv: provide a console flush operation for opal hvc driverNicholas Piggin
Provide the flush hv_op for the opal hvc driver. This will flush the firmware console buffers without spinning with interrupts disabled. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08crypto/nx: Initialize 842 high and normal RxFIFO control registersHaren Myneni
NX increments readOffset by FIFO size in receive FIFO control register when CRB is read. But the index in RxFIFO has to match with the corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX may be processing incorrect CRBs and can cause CRB timeout. VAS FIFO offset is 0 when the receive window is opened during initialization. When the module is reloaded or in kexec boot, readOffset in FIFO control register may not match with VAS entry. This patch adds nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO control register for both high and normal FIFOs. Signed-off-by: Haren Myneni <haren@us.ibm.com> [mpe: Fixup uninitialized variable warning] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08powerpc/powernv: Export opal_check_token symbolHaren Myneni
Export opal_check_token symbol for modules to check the availability of OPAL calls before using them. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: implement opal_put_chars_atomicNicholas Piggin
The RAW console does not need writes to be atomic, so relax opal_put_chars to be able to do partial writes, and implement an _atomic variant which does not take a spinlock. This API is used in xmon, so the less locking that is used, the better chance there is that a crash can be debugged. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: move opal console flushing to udbgNicholas Piggin
OPAL console writes do not have to synchronously flush firmware / hardware buffers unless they are going through the udbg path. Remove the unconditional flushing from opal_put_chars. Flush if there was no space in the buffer as an optimisation (callers loop waiting for success in that case). udbg flushing is moved to udbg_opal_putc. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Remove OPALv1 support from opal console driverNicholas Piggin
opal_put_chars deals with partial writes because in OPALv1, opal_console_write_buffer_space did not work correctly. That firmware is not supported. This reworks the opal_put_chars code to no longer deal with partial writes by turning them into full writes. Partial write handling is still supported in terms of what gets returned to the caller, but it may not go to the console atomically. A warning message is printed in this case. This allows console flushing to be moved out of the opal_write_lock spinlock. That could cause the lock to be held for long periods if the console is busy (especially if it was being spammed by firmware), which is dangerous because the lock is taken by xmon to debug the system. Flushing outside the lock improves the situation a bit. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Implement and use opal_flush_consoleNicholas Piggin
A new console flushing firmware API was introduced to replace event polling loops, and implemented in opal-kmsg with affddff69c55e ("powerpc/powernv: Add a kmsg_dumper that flushes console output on panic"), to flush the console in the panic path. The OPAL console driver has other situations where interrupts are off and it needs to flush the console synchronously. These still use a polling loop. So move the opal-kmsg flush code to opal_flush_console, and use the new function in opal-kmsg and opal_put_chars. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Fix OPAL console driver OPAL_BUSY loopsNicholas Piggin
The OPAL console driver does not delay in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware. It can't yet be made to sleep because it is called under spinlock, but it can be changed to the standard OPAL_BUSY loop form, and a delay added to keep it from hitting the firmware too frequently. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: opal_put_chars partial write fixNicholas Piggin
The intention here is to consume and discard the remaining buffer upon error. This works if there has not been a previous partial write. If there has been, then total_len is no longer total number of bytes to copy. total_len is always "bytes left to copy", so it should be added to written bytes. This code may not be exercised any more if partial writes will not be hit, but this is a small bugfix before a larger change. Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/powernv: process all OPAL event interrupts with kopaldNicholas Piggin
Using irq_work for processing OPAL event interrupts is not necessary. irq_work is typically used to schedule work from NMI context, a softirq may be more appropriate. However OPAL events are not particularly performance or latency critical, so they can all be invoked by kopald. This patch removes the irq_work queueing, and instead wakes up kopald when there is an event to be processed. kopald processes interrupts individually, enabling irqs and calling cond_resched between each one to minimise latencies. Event handlers themselves should still use threaded handlers, workqueues, etc. as necessary to avoid high interrupts-off latencies within any single interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-07powerpc/powernv: Create platform devs for nvdimm busesOliver O'Halloran
Scan the devicetree for an nvdimm-bus compatible and create a platform device for them. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-13powerpc/powernv/mce: Don't silently restart the machineBalbir Singh
On MCE the current code will restart the machine with ppc_md.restart(). This case was extremely unlikely since prior to that a skiboot call is made and that resulted in a checkstop for analysis. With newer skiboots, on P9 we don't checkstop the box by default, instead we return back to the kernel to extract useful information at the time of the MCE. While we still get this information, this patch converts the restart to a panic(), so that if configured a dump can be taken and we can track and probably debug the potential issue causing the MCE. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22powerpc/pseries, ps3: panic flush kernel messages before halting systemNicholas Piggin
Platforms with a panic handler that halts the system can have problems getting kernel messages out, because the panic notifiers are called before kernel/panic.c does its flushing of printk buffers an console etc. This was attempted to be solved with commit a3b2cb30f252 ("powerpc: Do not call ppc_md.panic in fadump panic notifier"), but that wasn't the right approach and caused other problems, and was reverted by commit ab9dbf771ff9. Instead, the powernv shutdown paths have already had a similar problem, fixed by taking the message flushing sequence from kernel/panic.c. That's a little bit ugly, but while we have the code duplicated, it will work for this case as well. So have ppc panic handlers do the same flushing before they terminate. Without this patch, a qemu pseries_le_defconfig guest stops silently when issued the nmi command when xmon is off and no crash dumpers enabled. Afterwards, an oops is printed by each CPU as expected. Fixes: ab9dbf771ff9 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11powerpc: Reduce log level of "OPAL detected !" messageBenjamin Herrenschmidt
This message isn't terribly useful. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04powerpc: Use pr_warn instead of pr_warningJoe Perches
At some point, pr_warning will be removed so all logging messages use a consistent <prefix>_warn style. Update arch/powerpc/ Miscellanea: o Coalesce formats o Realign arguments o Use %s, __func__ instead of embedded function names o Remove unnecessary line continuations Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Geoff Levand <geoff@infradead.org> [mpe: Rebase due to some %pOF changes.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/powernv: Add OPAL_BUSY to opal_error_code()Cyril Bur
Also export opal_error_code() so that it can be used in modules Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/powernv: Use kernel crash path for machine checksNicholas Piggin
There are quite a few machine check exceptions that can be caused by kernel bugs. To make debugging easier, use the kernel crash path in cases of synchronous machine checks that occur in kernel mode, if that would not result in the machine going straight to panic or crash dump. There is a downside here that die()ing the process in kernel mode can still leave the system unstable. panic_on_oops will always force the system to fail-stop, so systems where that behaviour is important will still do the right thing. As a test, when triggering an i-side 0111b error (ifetch from foreign address) in kernel mode process context on POWER9, the kernel currently dies quickly like this: Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] [ 127.426651616,0] OPAL: Reboot requested due to Platform error. Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000 opal: Reboot type 1 not supported Kernel panic - not syncing: PowerNV Unrecovered Machine Check CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35 Call Trace: [ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising code in OPAL for this BMC Rebooting in 10 seconds.. Trying to free IRQ 496 from IRQ context! After this patch, the process is killed and the kernel continues with this message, which gives enough information to identify the offending branch (i.e., with CFAR): Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] Effective address: ffff000000000000 Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ... CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36 task: c000000932300000 task.stack: c000000932380000 NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000 REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty) MSR: 90000000001c1003 <SF,HV,ME,RI,LE> CR: 24000484 XER: 20000000 CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1 GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000 GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8 GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030 GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918 NIP [ffff000000000000] 0xffff000000000000 LR [00000000217706a4] 0x217706a4 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/powernv: Flush console before platform error rebootNicholas Piggin
Unrecovered MCE and HMI errors are sent through a special restart OPAL call to log the platform error. The downside is that they don't go through normal Linux crash paths, so they don't give much information to the Linux console. Change this by providing a special crash function which does some of the console flushing from the panic() path before calling firmware to reboot. The downside of this is a little more code to execute before reaching the firmware reboot. However in practice, it's critical to get the Linux console messages output in order to debug a problem. So this is a desirable tradeoff. Note on the implementation: It is difficult to plumb a custom reboot handler into the panic path, because panic does a little bit too much work. For example, it will try to delay with the timebase, but that may be corrupted in some cases resulting in a hang without reaching the platform reboot. Another problem is that panic can invoke the crash dump code which is not what we want in the case of a hardware platform error. Long-term the best solution will be to rework the panic path so it can be suitable for this kind of panic, but for now we just duplicate a bit of the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/powernv: powernv platform is not constrained by RMANicholas Piggin
Remove incorrect comment about real mode address restrictions on powernv (bare metal), and unnecessary clamping to ppc64_rma_size. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/powernv: Add support to clear sensor groups dataShilpasri G Bhat
Adds support for clearing different sensor groups. OCC inband sensor groups like CSM, Profiler, Job Scheduler can be cleared using this driver. The min/max of all sensors belonging to these sensor groups will be cleared. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/powernv: Add support to set power-shifting-ratioShilpasri G Bhat
This patch adds support to set power-shifting-ratio which hints the firmware how to distribute/throttle power between different entities in a system (e.g CPU v/s GPU). This ratio is used by OCC for power capping algorithm. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/powernv: Add support for powercap frameworkShilpasri G Bhat
Adds a generic powercap framework to change the system powercap inband through OPAL-OCC command/response interface. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-25powerpc/powernv: Detect and create IMC deviceMadhavan Srinivasan
Code to create platform device for the In-Memory Collection (IMC) counters. Platform devices are created based on the IMC compatibility. New header file created to contain the data structures and macros needed for In-Memory Collection (IMC) counter pmu devices. The device tree for IMC counters starts at the node "imc-counters". This node contains all the IMC PMU nodes and event nodes for these IMC PMUs. Device probe() parses the device to locate three possible IMC device types (Nest/Core/Thread). Function then branch to parse each unit nodes to populate vital information such as device memory sizes, event nodes information, base address for reserve memory access (if any) and so on. Simple bare-minimum shutdown function added which only "stops" the engines. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Fix build with CONFIG_PERF_EVENTS=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-17powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()Michael Ellerman
In commit 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9"), we added additional flags to the OPAL call to configure CPUs at boot. These flags only work on Power9 firmwares, and worse can cause boot failures on Power8 machines, so we check for CPU_FTR_ARCH_300 (aka POWER9) before adding the extra flags. Unfortunately we forgot that opal_configure_cores() is called before the CPU feature checks are dynamically patched, meaning the check always returns true. We definitely need to do something to make the CPU feature checks less prone to bugs like this, but for now the minimal fix is to use early_cpu_has_feature(). Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-10powerpc/powernv: Tell OPAL about our MMU mode on POWER9Benjamin Herrenschmidt
That will allow OPAL to configure the CPU in an optimal way. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-26powerpc/powernv: Fix missing attr initialisation in opal_export_attrs()Michael Ellerman
In opal_export_attrs() we dynamically allocate some bin_attributes. They're allocated with kmalloc() and although we initialise most of the fields, we don't initialise write() or mmap(), and in particular we don't initialise the lockdep related fields in the embedded struct attribute. This leads to a lockdep warning at boot: BUG: key c0000000f11906d8 not in .data! WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0 ... Call Trace: lockdep_init_map+0x288/0x2a0 (unreliable) __kernfs_create_file+0x8c/0x170 sysfs_add_file_mode_ns+0xc8/0x240 __machine_initcall_powernv_opal_init+0x60c/0x684 do_one_initcall+0x60/0x1c0 kernel_init_freeable+0x2f4/0x3d4 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0xb0 Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep fields. Fixes: 11fe909d2362 ("powerpc/powernv: Add OPAL exports attributes to sysfs") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04powerpc/powernv: Add OPAL exports attributes to sysfsMatt Brown
New versions of OPAL have a device node /ibm,opal/firmware/exports, each property of which describes a range of memory in OPAL that Linux might want to export to userspace for debugging. This patch adds a sysfs file under 'opal/exports' for each property found there, and makes it read-only by root. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Drop counting of props, rename to attr, free on sysfs error, c'log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03powerpc/book3s: Print task info if we take a machine check in user modeMichael Ellerman
For an MCE (Machine Check Exception) that hits while in user mode MSR(PR=1), print the task info to the console MCE error log. This may help to identify an application that triggered the MCE. After this patch the MCE console looks like: Severe Machine check interrupt [Recovered] NIP: [0000000010039778] PID: 762 Comm: ebizzy Initiator: CPU Error type: SLB [Multihit] Effective address: 0000000010039778 Severe Machine check interrupt [Not recovered] NIP: [0000000010039778] PID: 763 Comm: ebizzy Initiator: CPU Error type: UE [Page table walk ifetch] Effective address: 0000000010039778 ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-10powerpc/64s: fix handling of non-synchronous machine checksNicholas Piggin
A synchronous machine check is an exception raised by the attempt to execute the current instruction. If the error can't be corrected, it can make sense to SIGBUS the currently running process. In other cases, the error condition is not related to the current instruction, so killing the current process is not the right thing to do. Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no practical change. It will be used to handle POWER9 asynchronous machine checks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30powerpc/powernv: Initialise nest mmuAlistair Popple
POWER9 contains an off core mmu called the nest mmu (NMMU). This is used by other hardware units on the chip to translate virtual addresses into real addresses. The unit attempting an address translation provides the majority of the context required for the translation request except for the base address of the partition table (ie. the PTCR) which needs to be programmed into the NMMU. This patch adds a call to OPAL to set the PTCR for the nest mmu in opal_init(). Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-24Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge the topic branch we're sharing with the kvm-ppc tree.
2016-11-23powerpc/powernv: Define real-mode versions of OPAL XICS accessorsPaul Mackerras
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi() and opal_int_set_mfrr(), for use by KVM real-mode code. It also exports opal_int_set_mfrr() so that the modular part of KVM can use it to send IPIs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-14powerpc/powernv: Simplify searching for compatible device nodesJack Miller
This condenses the opal node searching into a single function that finds all compatible nodes, instead of just searching the ibm,opal children, for ipmi, flash, and prd similar to how opal-i2c nodes are found. Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/book3s: Fix MCE console messages for unrecoverable MCE.Mahesh Salgaonkar
When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console message still shows it as recovered. This patch fixes the MCE console messages. Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Put exception configuration in a common placeBenjamin Herrenschmidt
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>