summaryrefslogtreecommitdiff
path: root/Documentation/powerpc/firmware-assisted-dump.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/powerpc/firmware-assisted-dump.txt')
-rw-r--r--Documentation/powerpc/firmware-assisted-dump.txt292
1 files changed, 0 insertions, 292 deletions
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
deleted file mode 100644
index 10e7f4d16c14..000000000000
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ /dev/null
@@ -1,292 +0,0 @@
-
- Firmware-Assisted Dump
- ------------------------
- July 2011
-
-The goal of firmware-assisted dump is to enable the dump of
-a crashed system, and to do so from a fully-reset system, and
-to minimize the total elapsed time until the system is back
-in production use.
-
-- Firmware assisted dump (fadump) infrastructure is intended to replace
- the existing phyp assisted dump.
-- Fadump uses the same firmware interfaces and memory reservation model
- as phyp assisted dump.
-- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore
- in the ELF format in the same way as kdump. This helps us reuse the
- kdump infrastructure for dump capture and filtering.
-- Unlike phyp dump, userspace tool does not need to refer any sysfs
- interface while reading /proc/vmcore.
-- Unlike phyp dump, fadump allows user to release all the memory reserved
- for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem.
-- Once enabled through kernel boot parameter, fadump can be
- started/stopped through /sys/kernel/fadump_registered interface (see
- sysfs files section below) and can be easily integrated with kdump
- service start/stop init scripts.
-
-Comparing with kdump or other strategies, firmware-assisted
-dump offers several strong, practical advantages:
-
--- Unlike kdump, the system has been reset, and loaded
- with a fresh copy of the kernel. In particular,
- PCI and I/O devices have been reinitialized and are
- in a clean, consistent state.
--- Once the dump is copied out, the memory that held the dump
- is immediately available to the running kernel. And therefore,
- unlike kdump, fadump doesn't need a 2nd reboot to get back
- the system to the production configuration.
-
-The above can only be accomplished by coordination with,
-and assistance from the Power firmware. The procedure is
-as follows:
-
--- The first kernel registers the sections of memory with the
- Power firmware for dump preservation during OS initialization.
- These registered sections of memory are reserved by the first
- kernel during early boot.
-
--- When a system crashes, the Power firmware will save
- the low memory (boot memory of size larger of 5% of system RAM
- or 256MB) of RAM to the previous registered region. It will
- also save system registers, and hardware PTE's.
-
- NOTE: The term 'boot memory' means size of the low memory chunk
- that is required for a kernel to boot successfully when
- booted with restricted memory. By default, the boot memory
- size will be the larger of 5% of system RAM or 256MB.
- Alternatively, user can also specify boot memory size
- through boot parameter 'crashkernel=' which will override
- the default calculated size. Use this option if default
- boot memory size is not sufficient for second kernel to
- boot successfully. For syntax of crashkernel= parameter,
- refer to Documentation/admin-guide/kdump/kdump.rst. If any offset is
- provided in crashkernel= parameter, it will be ignored
- as fadump uses a predefined offset to reserve memory
- for boot memory dump preservation in case of a crash.
-
--- After the low memory (boot memory) area has been saved, the
- firmware will reset PCI and other hardware state. It will
- *not* clear the RAM. It will then launch the bootloader, as
- normal.
-
--- The freshly booted kernel will notice that there is a new
- node (ibm,dump-kernel) in the device tree, indicating that
- there is crash data available from a previous boot. During
- the early boot OS will reserve rest of the memory above
- boot memory size effectively booting with restricted memory
- size. This will make sure that the second kernel will not
- touch any of the dump memory area.
-
--- User-space tools will read /proc/vmcore to obtain the contents
- of memory, which holds the previous crashed kernel dump in ELF
- format. The userspace tools may copy this info to disk, or
- network, nas, san, iscsi, etc. as desired.
-
--- Once the userspace tool is done saving dump, it will echo
- '1' to /sys/kernel/fadump_release_mem to release the reserved
- memory back to general use, except the memory required for
- next firmware-assisted dump registration.
-
- e.g.
- # echo 1 > /sys/kernel/fadump_release_mem
-
-Please note that the firmware-assisted dump feature
-is only available on Power6 and above systems with recent
-firmware versions.
-
-Implementation details:
-----------------------
-
-During boot, a check is made to see if firmware supports
-this feature on that particular machine. If it does, then
-we check to see if an active dump is waiting for us. If yes
-then everything but boot memory size of RAM is reserved during
-early boot (See Fig. 2). This area is released once we finish
-collecting the dump from user land scripts (e.g. kdump scripts)
-that are run. If there is dump data, then the
-/sys/kernel/fadump_release_mem file is created, and the reserved
-memory is held.
-
-If there is no waiting dump data, then only the memory required
-to hold CPU state, HPTE region, boot memory dump and elfcore
-header, is usually reserved at an offset greater than boot memory
-size (see Fig. 1). This area is *not* released: this region will
-be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state
-and HPTE region, in the case a crash does occur. Since this reserved
-memory area is used only after the system crash, there is no point in
-blocking this significant chunk of memory from production kernel.
-Hence, the implementation uses the Linux kernel's Contiguous Memory
-Allocator (CMA) for memory reservation if CMA is configured for kernel.
-With CMA reservation this memory will be available for applications to
-use it, while kernel is prevented from using it. With this fadump will
-still be able to capture all of the kernel memory and most of the user
-space memory except the user pages that were present in CMA region.
-
- o Memory Reservation during first kernel
-
- Low memory Top of memory
- 0 boot memory size |
- | | |<--Reserved dump area -->| |
- V V | Permanent Reservation | V
- +-----------+----------/ /---+---+----+-----------+----+------+
- | | |CPU|HPTE| DUMP |ELF | |
- +-----------+----------/ /---+---+----+-----------+----+------+
- | ^
- | |
- \ /
- -------------------------------------------
- Boot memory content gets transferred to
- reserved area by firmware at the time of
- crash
- Fig. 1
-
- o Memory Reservation during second kernel after crash
-
- Low memory Top of memory
- 0 boot memory size |
- | |<------------- Reserved dump area ----------- -->|
- V V V
- +-----------+----------/ /---+---+----+-----------+----+------+
- | | |CPU|HPTE| DUMP |ELF | |
- +-----------+----------/ /---+---+----+-----------+----+------+
- | |
- V V
- Used by second /proc/vmcore
- kernel to boot
- Fig. 2
-
-Currently the dump will be copied from /proc/vmcore to a
-a new file upon user intervention. The dump data available through
-/proc/vmcore will be in ELF format. Hence the existing kdump
-infrastructure (kdump scripts) to save the dump works fine with
-minor modifications.
-
-The tools to examine the dump will be same as the ones
-used for kdump.
-
-How to enable firmware-assisted dump (fadump):
--------------------------------------
-
-1. Set config option CONFIG_FA_DUMP=y and build kernel.
-2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
- By default, fadump reserved memory will be initialized as CMA area.
- Alternatively, user can boot linux kernel with 'fadump=nocma' to
- prevent fadump to use CMA.
-3. Optionally, user can also set 'crashkernel=' kernel cmdline
- to specify size of the memory to reserve for boot memory dump
- preservation.
-
-NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead
- use 'crashkernel=' to specify size of the memory to reserve
- for boot memory dump preservation.
- 2. If firmware-assisted dump fails to reserve memory then it
- will fallback to existing kdump mechanism if 'crashkernel='
- option is set at kernel cmdline.
- 3. if user wants to capture all of user space memory and ok with
- reserved memory not available to production system, then
- 'fadump=nocma' kernel parameter can be used to fallback to
- old behaviour.
-
-Sysfs/debugfs files:
-------------
-
-Firmware-assisted dump feature uses sysfs file system to hold
-the control files and debugfs file to display memory reserved region.
-
-Here is the list of files under kernel sysfs:
-
- /sys/kernel/fadump_enabled
-
- This is used to display the fadump status.
- 0 = fadump is disabled
- 1 = fadump is enabled
-
- This interface can be used by kdump init scripts to identify if
- fadump is enabled in the kernel and act accordingly.
-
- /sys/kernel/fadump_registered
-
- This is used to display the fadump registration status as well
- as to control (start/stop) the fadump registration.
- 0 = fadump is not registered.
- 1 = fadump is registered and ready to handle system crash.
-
- To register fadump echo 1 > /sys/kernel/fadump_registered and
- echo 0 > /sys/kernel/fadump_registered for un-register and stop the
- fadump. Once the fadump is un-registered, the system crash will not
- be handled and vmcore will not be captured. This interface can be
- easily integrated with kdump service start/stop.
-
- /sys/kernel/fadump_release_mem
-
- This file is available only when fadump is active during
- second kernel. This is used to release the reserved memory
- region that are held for saving crash dump. To release the
- reserved memory echo 1 to it:
-
- echo 1 > /sys/kernel/fadump_release_mem
-
- After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region
- file will change to reflect the new memory reservations.
-
- The existing userspace tools (kdump infrastructure) can be easily
- enhanced to use this interface to release the memory reserved for
- dump and continue without 2nd reboot.
-
-Here is the list of files under powerpc debugfs:
-(Assuming debugfs is mounted on /sys/kernel/debug directory.)
-
- /sys/kernel/debug/powerpc/fadump_region
-
- This file shows the reserved memory regions if fadump is
- enabled otherwise this file is empty. The output format
- is:
- <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
-
- e.g.
- Contents when fadump is registered during first kernel
-
- # cat /sys/kernel/debug/powerpc/fadump_region
- CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
- HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
- DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
-
- Contents when fadump is active during second kernel
-
- # cat /sys/kernel/debug/powerpc/fadump_region
- CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
- HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
- DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
- : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
-
-NOTE: Please refer to Documentation/filesystems/debugfs.txt on
- how to mount the debugfs filesystem.
-
-
-TODO:
------
- o Need to come up with the better approach to find out more
- accurate boot memory size that is required for a kernel to
- boot successfully when booted with restricted memory.
- o The fadump implementation introduces a fadump crash info structure
- in the scratch area before the ELF core header. The idea of introducing
- this structure is to pass some important crash info data to the second
- kernel which will help second kernel to populate ELF core header with
- correct data before it gets exported through /proc/vmcore. The current
- design implementation does not address a possibility of introducing
- additional fields (in future) to this structure without affecting
- compatibility. Need to come up with the better approach to address this.
- The possible approaches are:
- 1. Introduce version field for version tracking, bump up the version
- whenever a new field is added to the structure in future. The version
- field can be used to find out what fields are valid for the current
- version of the structure.
- 2. Reserve the area of predefined size (say PAGE_SIZE) for this
- structure and have unused area as reserved (initialized to zero)
- for future field additions.
- The advantage of approach 1 over 2 is we don't need to reserve extra space.
----
-Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
-This document is based on the original documentation written for phyp
-assisted dump by Linas Vepstas and Manish Ahuja.