summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/fadump.c
AgeCommit message (Collapse)Author
2019-11-13powerpc/fadump: when fadump is supported register the fadump sysfs files.Michal Suchanek
Currently it is not possible to distinguish the case when fadump is supported by firmware and disabled in kernel and completely unsupported using the kernel sysfs interface. User can investigate the devicetree but it is more reasonable to provide sysfs files in case we get some fadumpv2 in the future. With this patch sysfs files are available whenever fadump is supported by firmware. There is duplicate message about lack of support by firmware in fadump_reserve_mem and setup_fadump. Remove the duplicate message in setup_fadump. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191107164757.15140-1-msuchanek@suse.de
2019-09-14powerpc/fadump: support holes in kernel boot memory areaHari Bathini
With support to copy multiple kernel boot memory regions owing to copy size limitation, also handle holes in the memory area to be preserved. Support as many as 128 kernel boot memory regions. This allows having an adequate FADump capture kernel size for different scenarios. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821385448.5656.6124791213910877759.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: remove RMA_START and RMA_END macrosHari Bathini
RMA_START is defined as '0' and there is even a BUILD_BUG_ON() to make sure it is never anything else. Remove this macro and use '0' instead as code change is needed anyway when it has to be something else. Also, remove unused RMA_END macro. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821384096.5656.15026984053970204652.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: consider f/w load areaHari Bathini
OPAL loads kernel & initrd at 512MB offset (256MB size), also exported as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is less than 768MB, kernel memory to be exported as '/proc/vmcore' would be overwritten by f/w while loading kernel & initrd. To avoid such a scenario, enforce a minimum boot memory size of 768MB on OPAL platform and skip using FADump if a newer F/W version loads kernel & initrd above 768MB. Also, irrespective of RMA size, set the minimum boot memory size expected on pseries platform at 320MB. This is to avoid inflating the minimum memory requirements on systems with 512M/1024M RMA size. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821381414.5656.1592867278535469652.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: add support to preserve crash data on FADUMP disabled kernelHari Bathini
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures that crash data, from previously crash'ed kernel, is preserved. This helps in cases where FADump is not enabled but the subsequent memory preserving kernel boot is likely to process this crash data. One typical usecase for this config option is petitboot kernel. As OPAL allows registering address with it in the first kernel and retrieving it after MPIPL, use it to store the top of boot memory. A kernel that intends to preserve crash data retrieves it and avoids using memory beyond this address. Move arch_reserved_kernel_pages() function as it is needed for both FA_DUMP and PRESERVE_FA_DUMP configurations. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: improve how crashed kernel's memory is reservedHari Bathini
The size parameter to fadump_reserve_crash_area() function is not needed as all the memory above boot memory size must be preserved anyway. Update the function by dropping this redundant parameter. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821374440.5656.2945512543806951766.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: consider reserved ranges while releasing memoryHari Bathini
Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse 'reserved-ranges' DT node to reserve kernel memory falling in these ranges for firmware purposes. Along with the preserved area memory, ensure memory in reserved ranges is not overlapped with memory released by capture kernel aftering saving vmcore. Also, fix the off-by-one error in fadump_release_reserved_area function while releasing memory. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821371358.5656.6061214942558818661.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: make crash memory ranges array allocation genericHari Bathini
Make allocate_crash_memory_ranges() and free_crash_memory_ranges() functions generic to reuse them for memory management of all types of dynamic memory range arrays. This change helps in memory management of reserved ranges array to be added later. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821369863.5656.4375667005352155892.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: make use of memblock's bottom up allocation modeHari Bathini
Earlier, memblock_find_in_range() was not used to find the memory to be reserved for FADump as bottom up allocation mode was not supported. But since commit 79442ed189acb8b ("mm/memblock.c: introduce bottom-up allocation mode") bottom up allocation mode is supported for memblock. So, use it to find the memory to be reserved for FADump. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821364211.5656.14336025460336135194.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: handle invalidation of crashdump and re-registraionHari Bathini
Make OPAL call to indicate that the dump is processed and the metadata area in OPAL can be cleared/released. Also, setup/initialize FADump for re-registration. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821356046.5656.12270927048195494911.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: reset metadata address during clean upHari Bathini
During kexec boot, metadata address needs to be reset to avoid running into errors interpreting stale metadata address, in case the kexec'ed kernel crashes before metadata address could be setup again. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821346629.5656.10783321582005237813.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: register kernel metadata address with opalHari Bathini
OPAL allows registering address with it in the first kernel and retrieving it after MPIPL. Setup kernel metadata and register its address with OPAL to use it for processing the crash dump. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821345011.5656.13567765019032928471.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: improve fadump_reserve_mem()Hari Bathini
Some code clean-up like using minimal assignments and updating printk messages. Also, add an 'error_out' label for handling error cleanup at one place. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821343485.5656.10202857091553646948.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: add fadump support on powernvHari Bathini
Add basic callback functions for FADump on PowerNV platform. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821342072.5656.4346362203141486452.stgit@hbathini.in.ibm.com
2019-09-14pseries/fadump: move out platform specific support from generic codeHari Bathini
Move code that supports processing the crash'ed kernel's memory preserved by firmware to platform specific callback functions. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821337690.5656.13050665924800177744.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: release all the memory above boot memory sizeHari Bathini
Except for Reserved dump area (see Documentation/powerpc/firmware- assisted-dump.rst) which is permanent reserved, all memory above boot memory size, where boot memory size is the memory required for the kernel to boot successfully when booted with restricted memory (memory for capture kernel), is released when the dump is invalidated. Make this a bit more explicit in the code. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821336092.5656.1079046285366041687.stgit@hbathini.in.ibm.com
2019-09-14pseries/fadump: define RTAS register/un-register callback functionsHari Bathini
Move platform specific register/un-register code, the RTAS calls, to register/un-register callback functions. This would also mean moving code that initializes and prints the platform specific FADump data. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821332856.5656.16380417702046411631.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: introduce callbacks for platform specific operationsHari Bathini
Introduce callback functions for platform specific operations like register, unregister, invalidate & such. Also, define place-holders for the same on pSeries platform. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821330286.5656.15538934400074110770.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: move rtas specific definitions to platform codeHari Bathini
Currently, FADump is only supported on pSeries but that is going to change soon with FADump support being added on PowerNV platform. So, move rtas specific definitions to platform code to allow FADump to have multiple platforms support. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821328494.5656.16219929140866195511.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: use helper functions to reserve/release cpu notes bufferHari Bathini
Use helper functions to simplify memory allocation, pinning down and freeing the memory used for CPU notes buffer. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821323555.5656.2486038022572739622.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: declare helper functions in internal header fileHari Bathini
Declare helper functions, that can be reused by multiple platforms, in the internal header file. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821320487.5656.2660730464212209984.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: add helper functionsHari Bathini
Add helper functions to setup & free CPU notes buffer and to find if a given memory area is contiguous. Also, use boolean as return type for the function that finds if boot memory area is contiguous. While at it, save the virtual address of CPU notes buffer instead of physical address as virtual address is used often. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821318971.5656.9281936950510635858.stgit@hbathini.in.ibm.com
2019-09-14powerpc/fadump: move internal macros/definitions to a new headerHari Bathini
Though asm/fadump.h is meant to be used by other components dealing with FADump, it also has macros/definitions internal to FADump code. Move them to a new header file used within FADump code. This also makes way for refactoring platform specific FADump code. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821313134.5656.6597770626574392140.stgit@hbathini.in.ibm.com
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-03powerpc/mm: move hugetlb_disabled into asm/hugetlb.hChristophe Leroy
No need to have this in asm/page.h, move it into asm/hugetlb.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.Mahesh Salgaonkar
For fadump to work successfully there should not be any holes in reserved memory ranges where kernel has asked firmware to move the content of old kernel memory in event of crash. Now that fadump uses CMA for reserved area, this memory area is now not protected from hot-remove operations unless it is cma allocated. Hence, fadump service can fail to re-register after the hot-remove operation, if hot-removed memory belongs to fadump reserved region. To avoid this make sure that memory from fadump reserved area is not hot-removable if fadump is registered. However, if user still wants to remove that memory, he can do so by manually stopping fadump service before hot-remove operation. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Throw proper error message on fadump registration failureMahesh Salgaonkar
fadump fails to register when there are holes in reserved memory area. This can happen if user has hot-removed a memory that falls in the fadump reserved memory area. Throw a meaningful error message to the user in such case. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Reservationless firmware assisted dumpMahesh Salgaonkar
One of the primary issues with Firmware Assisted Dump (fadump) on Power is that it needs a large amount of memory to be reserved. On large systems with TeraBytes of memory, this reservation can be quite significant. In some cases, fadump fails if the memory reserved is insufficient, or if the reserved memory was DLPAR hot-removed. In the normal case, post reboot, the preserved memory is filtered to extract only relevant areas of interest using the makedumpfile tool. While the tool provides flexibility to determine what needs to be part of the dump and what memory to filter out, all supported distributions default this to "Capture only kernel data and nothing else". We take advantage of this default and the Linux kernel's Contiguous Memory Allocator (CMA) to fundamentally change the memory reservation model for fadump. Instead of setting aside a significant chunk of memory nobody can use, this patch uses CMA instead, to reserve a significant chunk of memory that the kernel is prevented from using (due to MIGRATE_CMA), but applications are free to use it. With this fadump will still be able to capture all of the kernel memory and most of the user space memory except the user pages that were present in CMA region. Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream: [root@zzxx-yy10 ~]# free -m total used free shared buff/cache available Mem: 7557 193 6822 12 541 6725 Swap: 4095 0 4095 With this patch: [root@zzxx-yy10 ~]# free -m total used free shared buff/cache available Mem: 8133 194 7464 12 475 7338 Swap: 4095 0 4095 Changes made here are completely transparent to how fadump has traditionally worked. Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand CMA and its usage. TODO: - Handle case where CMA reservation spans nodes. Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-25powerpc/fadump: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/fadump: re-register firmware-assisted dump if already registeredHari Bathini
Firmware-Assisted Dump (FADump) needs to be registered again after any memory hot add/remove operation to update the crash memory ranges. But currently, the kernel returns '-EEXIST' if we try to register without uregistering it first. This could expose the system to racing issues while unregistering and registering FADump from userspace during udev events. Spare the userspace of this and let it be taken care of in the kernel space for a simpler interface. Since this change, running 'echo 1 > /sys/kernel/fadump_registered' would result in re-regisering (unregistering and registering) FADump, if it was already registered. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20powerpc/fadump: cleanup crash memory ranges supportHari Bathini
Commit 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index overflow") changed crash memory ranges to a dynamic array that is reallocated on-demand with krealloc(). The relevant header for this call was not included. The kernel compiles though. But be cautious and add the header anyway. Also, memory allocation logic in fadump_add_crash_memory() takes care of memory allocation for crash memory ranges in all scenarios. Drop unnecessary memory allocation in fadump_setup_crash_memory_ranges(). Fixes: 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index overflow") Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segementsHari Bathini
With dynamic memory allocation support for crash memory ranges array, there is no hard limit on the no. of crash memory ranges kernel could export, but program headers count could overflow in the /proc/vmcore ELF file while exporting each memory range as PT_LOAD segment. Reduce the likelihood of a such scenario, by folding adjacent crash memory ranges which minimizes the total number of PT_LOAD segments. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/fadump: handle crash memory ranges array index overflowHari Bathini
Crash memory ranges is an array of memory ranges of the crashing kernel to be exported as a dump via /proc/vmcore file. The size of the array is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since commit 142b45a72e22 ("memblock: Add array resizing support"). On large memory systems with a few DLPAR operations, the memblock memory regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such systems, registering fadump results in crash or other system failures like below: task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000 NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180 REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22004484 XER: 20000000 CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0 ... NIP [c000000000047df4] smp_send_reschedule+0x24/0x80 LR [c0000000000f9e58] resched_curr+0x138/0x160 Call Trace: resched_curr+0x138/0x160 (unreliable) check_preempt_curr+0xc8/0xf0 ttwu_do_wakeup+0x38/0x150 try_to_wake_up+0x224/0x4d0 __wake_up_common+0x94/0x100 ep_poll_callback+0xac/0x1c0 __wake_up_common+0x94/0x100 __wake_up_sync_key+0x70/0xa0 sock_def_readable+0x58/0xa0 unix_stream_sendmsg+0x2dc/0x4c0 sock_sendmsg+0x68/0xa0 ___sys_sendmsg+0x2cc/0x2e0 __sys_sendmsg+0x5c/0xc0 SyS_socketcall+0x36c/0x3f0 system_call+0x3c/0x100 as array index overflow is not checked for while setting up crash memory ranges causing memory corruption. To resolve this issue, dynamically allocate memory for crash memory ranges and resize it incrementally, in units of pagesize, on hitting array size limit. Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [mpe: Just use PAGE_SIZE directly, fixup variable placement] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: Unregister fadump on kexec down path.Mahesh Salgaonkar
Unregister fadump on kexec down path otherwise the fadump registration in new kexec-ed kernel complains that fadump is already registered. This makes new kernel to continue using fadump registered by previous kernel which may lead to invalid vmcore generation. Hence this patch fixes this issue by un-registering fadump in fadump_cleanup() which is called during kexec path so that new kernel can register fadump with new valid values. Fixes: b500afff11f6 ("fadump: Invalidate registration and release reserved memory for general use.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: Do not use hugepages when fadump is activeHari Bathini
FADump capture kernel boots in restricted memory environment preserving the context of previous kernel to save vmcore. Supporting hugepages in such environment makes things unnecessarily complicated, as hugepages need memory set aside for them. This means most of the capture kernel's memory is used in supporting hugepages. In most cases, this results in out-of-memory issues while booting FADump capture kernel. But hugepages are not of much use in capture kernel whose only job is to save vmcore. So, disabling hugepages support, when fadump is active, is a reliable solution for the out of memory issues. Introducing a flag variable to disable HugeTLB support when fadump is active. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03powerpc/fadump: exclude memory holes while reserving memory in second kernelMahesh Salgaonkar
The second kernel, during early boot after the crash, reserves rest of the memory above boot memory size to make sure it does not touch any of the dump memory area. It uses memblock_reserve() that reserves the specified memory region irrespective of memory holes present within that region. There are chances where previous kernel would have hot removed some of its memory leaving memory holes behind. In such cases fadump kernel reports incorrect number of reserved pages through arch_reserved_kernel_pages() hook causing kernel to hang or panic. Fix this by excluding memory holes while reserving rest of the memory above boot memory size during second kernel boot after crash. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-05Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"David Gibson
This reverts commit a3b2cb30f252b21a6f962e0dd107c8b897ca65e4. That commit tried to fix problems with panic on powerpc in certain circumstances, where some output from the generic panic code was being dropped. Unfortunately, it breaks things worse in other circumstances. In particular when running a PAPR guest, it will now attempt to reboot instead of informing the hypervisor (KVM or PowerVM) that the guest has crashed. The crash notification is important to some virtualization management layers. Revert it for now until we can come up with a better solution. Fixes: a3b2cb30f252 ("powerpc: Do not call ppc_md.panic in fadump panic notifier") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc/fadump: use kstrtoint to handle sysfs storeMichal Suchanek
Currently sysfs store handlers in fadump use if buf[0] == 'char'. This means input "100foo" is interpreted as '1' and "01" as '0'. Change to kstrtoint so leading zeroes and the like is handled in expected way. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Acked-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michal Suchanek <a class="moz-txt-link-rfc2396E" href="mailto:msuchanek@suse.de">&lt;msuchanek@suse.de&gt;</a></pre> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/powernv: Use kernel crash path for machine checksNicholas Piggin
There are quite a few machine check exceptions that can be caused by kernel bugs. To make debugging easier, use the kernel crash path in cases of synchronous machine checks that occur in kernel mode, if that would not result in the machine going straight to panic or crash dump. There is a downside here that die()ing the process in kernel mode can still leave the system unstable. panic_on_oops will always force the system to fail-stop, so systems where that behaviour is important will still do the right thing. As a test, when triggering an i-side 0111b error (ifetch from foreign address) in kernel mode process context on POWER9, the kernel currently dies quickly like this: Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] [ 127.426651616,0] OPAL: Reboot requested due to Platform error. Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000 opal: Reboot type 1 not supported Kernel panic - not syncing: PowerNV Unrecovered Machine Check CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35 Call Trace: [ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising code in OPAL for this BMC Rebooting in 10 seconds.. Trying to free IRQ 496 from IRQ context! After this patch, the process is killed and the kernel continues with this message, which gives enough information to identify the offending branch (i.e., with CFAR): Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] Effective address: ffff000000000000 Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ... CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36 task: c000000932300000 task.stack: c000000932380000 NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000 REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty) MSR: 90000000001c1003 <SF,HV,ME,RI,LE> CR: 24000484 XER: 20000000 CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1 GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000 GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8 GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030 GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918 NIP [ffff000000000000] 0xffff000000000000 LR [00000000217706a4] 0x217706a4 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Do not call ppc_md.panic in fadump panic notifierNicholas Piggin
If fadump is not registered, and no other crash or debug handlers are registered, the powerpc panic handler stops the guest before the generic panic code can push out debug information to the console. Currently, system reset injection causes the guest to silently stop. Stop calling ppc_md.panic in the panic notifier. crash_fadump already does rtas_os_term() to terminate the guest if fadump is registered. Remove ppc_md.panic. Move fadump panic notifier into fadump code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-12powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdrXunlei Pang
vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE. Like explained in commit 77019967f06b ("kdump: fix exported size of vmcoreinfo note"), it should not affect the actual function, but we better fix it, also this change should be safe and backward compatible. After this, we can get rid of variable vmcoreinfo_max_size, let's use the corresponding macros directly, fewer variables means more safety for vmcoreinfo operation. [xlpang@redhat.com: fix build warning] Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28powerpc/fadump: add reschedule point while releasing memoryHari Bathini
Around 95% of memory is reserved by fadump/capture kernel. All this memory is freed, one page at a time, on writing '1' to the node /sys/kernel/fadump_release_mem. On systems with large memory, this can take a long time to complete, leading to soft lockup warning messages. To avoid this, add reschedule points at regular intervals. Also, while memblock_reserve() implicitly takes care of holes in the given memory range while reserving memory, those holes need to be taken care of while releasing memory as memory is freed one page at a time. Add support to skip holes while releasing memory. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28powerpc/fadump: provide a helpful error messageHari Bathini
fadump fails to register when there are holes in boot memory area. Provide a helpful error message to the user in such case. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28powerpc/fadump: avoid holes in boot memory area when fadump is registeredHari Bathini
To register fadump, boot memory area - the size of low memory chunk that is required for a kernel to boot successfully when booted with restricted memory, is assumed to have no holes. But this memory area is currently not protected from hot-remove operations. So, fadump could fail to re-register after a memory hot-remove operation, if memory is removed from boot memory area. To avoid this, ensure that memory from boot memory area is not hot-removed when fadump is registered. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28powerpc/fadump: avoid duplicates in crash memory rangesHari Bathini
fadump sets up crash memory ranges to be used for creating PT_LOAD program headers in elfcore header. Memory chunk RMA_START through boot memory area size is added as the first memory range because firmware, at the time of crash, moves this memory chunk to different location specified during fadump registration making it necessary to create a separate program header for it with the correct offset. This memory chunk is skipped while setting up the remaining memory ranges. But currently, there is possibility that some of this memory may have duplicate entries like when it is hot-removed and added again. Ensure that no two memory ranges represent the same memory. When 5 lmbs are hot-removed and then hot-plugged before registering fadump, here is how the program headers in /proc/vmcore exported by fadump look like without this change: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000010000 0x0000000000000000 0x0000000000000000 0x0000000000001894 0x0000000000001894 0 LOAD 0x0000000000021020 0xc000000000000000 0x0000000000000000 0x0000000040000000 0x0000000040000000 RWE 0 LOAD 0x0000000040031020 0xc000000000000000 0x0000000000000000 0x0000000010000000 0x0000000010000000 RWE 0 LOAD 0x0000000050040000 0xc000000010000000 0x0000000010000000 0x0000000050000000 0x0000000050000000 RWE 0 LOAD 0x00000000a0040000 0xc000000060000000 0x0000000060000000 0x000000019ffe0000 0x000000019ffe0000 RWE 0 and with this change: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000010000 0x0000000000000000 0x0000000000000000 0x0000000000001894 0x0000000000001894 0 LOAD 0x0000000000021020 0xc000000000000000 0x0000000000000000 0x0000000040000000 0x0000000040000000 RWE 0 LOAD 0x0000000040030000 0xc000000040000000 0x0000000040000000 0x0000000020000000 0x0000000020000000 RWE 0 LOAD 0x0000000060030000 0xc000000060000000 0x0000000060000000 0x000000019ffe0000 0x000000019ffe0000 RWE 0 Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc/fadump: Set an upper limit for boot memory sizeHari Bathini
By default, 5% of system RAM is reserved for preserving boot memory. Alternatively, a user can specify the amount of memory to reserve. See Documentation/powerpc/firmware-assisted-dump.txt for details. In addition to the memory reserved for preserving boot memory, some more memory is reserved, to save HPTE region, CPU state data and ELF core headers. Memory Reservation during first kernel looks like below: Low memory Top of memory 0 boot memory size | | | |<--Reserved dump area -->| V V | Permanent Reservation V +-----------+----------/ /----------+---+----+-----------+----+ | | |CPU|HPTE| DUMP |ELF | +-----------+----------/ /----------+---+----+-----------+----+ | ^ | | \ / ------------------------------------------- Boot memory content gets transferred to reserved area by firmware at the time of crash This implicitly means that the sum of the sizes of boot memory, CPU state data, HPTE region, DUMP preserving area and ELF core headers can't be greater than the total memory size. But currently, a user is allowed to specify any value as boot memory size. So, the above rule is violated when a boot memory size around 50% of the total available memory is specified. As the kernel is not handling this currently, it may lead to undefined behavior. Fix it by setting an upper limit for boot memory size to 25% of the total available memory. Also, instead of using memblock_end_of_DRAM(), which doesn't take the holes, if any, in the memory layout into account, use memblock_phys_mem_size() to calculate the percentage of total available memory. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc/fadump: Update comment about offset where fadump is reservedHari Bathini
With commit f6e6bedb7731 ("powerpc/fadump: Reserve memory at an offset closer to bottom of RAM"), memory for fadump is no longer reserved at the top of RAM. But there are still a few places which say so. Change them appropriately. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc/fadump: Add a warning when 'fadump_reserve_mem=' is usedHari Bathini
With commit 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation"), 'fadump_reserve_mem=' parameter is deprecated in favor of 'crashkernel=' parameter. Add a warning if 'fadump_reserve_mem=' is still used. Fixes: 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation") Suggested-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Unsplit long printk strings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc/fadump: Return error when fadump registration failsMichal Suchanek
- log an error message when registration fails and no error code listed in the switch is returned - translate the hv error code to posix error code and return it from fw_register - return the posix error code from fw_register to the process writing to sysfs - return EEXIST on re-registration - return success on deregistration when fadump is not registered - return ENODEV when no memory is reserved for fadump Signed-off-by: Michal Suchanek <msuchanek@suse.de> Tested-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Use pr_err() to shrink the error print] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-08powerpc/fadump: reuse crashkernel parameter for fadump memory reservationHari Bathini
fadump supports specifying memory to reserve for fadump's crash kernel with fadump_reserve_mem kernel parameter. This parameter currently supports passing a fixed memory size, like fadump_reserve_mem=<size> only. This patch aims to add support for other syntaxes like range-based memory size <range1>:<size1>[,<range2>:<size2>,<range3>:<size3>,...] which allows using the same parameter to boot the kernel with different system RAM sizes. As crashkernel parameter already supports the above mentioned syntaxes, this patch deprecates fadump_reserve_mem parameter and reuses crashkernel parameter instead, to specify memory for fadump's crash kernel memory reservation as well. If any offset is provided in crashkernel parameter, it will be ignored in case of fadump, as fadump reserves memory at end of RAM. Advantages using crashkernel parameter instead of fadump_reserve_mem parameter are one less kernel parameter overall, code reuse and support for multiple syntaxes to specify memory. Suggested-by: Dave Young <dyoung@redhat.com> Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>