summaryrefslogtreecommitdiff
path: root/Documentation/arch/x86/x86_64
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arch/x86/x86_64')
-rw-r--r--Documentation/arch/x86/x86_64/5level-paging.rst9
-rw-r--r--Documentation/arch/x86/x86_64/boot-options.rst319
-rw-r--r--Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst2
-rw-r--r--Documentation/arch/x86/x86_64/fsgs.rst6
-rw-r--r--Documentation/arch/x86/x86_64/index.rst1
-rw-r--r--Documentation/arch/x86/x86_64/mm.rst35
-rw-r--r--Documentation/arch/x86/x86_64/uefi.rst37
7 files changed, 60 insertions, 349 deletions
diff --git a/Documentation/arch/x86/x86_64/5level-paging.rst b/Documentation/arch/x86/x86_64/5level-paging.rst
index 71f882f4a173..ad7ddc13f79d 100644
--- a/Documentation/arch/x86/x86_64/5level-paging.rst
+++ b/Documentation/arch/x86/x86_64/5level-paging.rst
@@ -22,15 +22,6 @@ QEMU 2.9 and later support 5-level paging.
Virtual memory layout for 5-level paging is described in
Documentation/arch/x86/x86_64/mm.rst
-
-Enabling 5-level paging
-=======================
-CONFIG_X86_5LEVEL=y enables the feature.
-
-Kernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware.
-In this case additional page table level -- p4d -- will be folded at
-runtime.
-
User-space and large virtual address space
==========================================
On x86, 5-level paging enables 56-bit userspace virtual address space.
diff --git a/Documentation/arch/x86/x86_64/boot-options.rst b/Documentation/arch/x86/x86_64/boot-options.rst
deleted file mode 100644
index 137432d34109..000000000000
--- a/Documentation/arch/x86/x86_64/boot-options.rst
+++ /dev/null
@@ -1,319 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-===========================
-AMD64 Specific Boot Options
-===========================
-
-There are many others (usually documented in driver documentation), but
-only the AMD64 specific ones are listed here.
-
-Machine check
-=============
-Please see Documentation/arch/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
-
- mce=off
- Disable machine check
- mce=no_cmci
- Disable CMCI(Corrected Machine Check Interrupt) that
- Intel processor supports. Usually this disablement is
- not recommended, but it might be handy if your hardware
- is misbehaving.
- Note that you'll get more problems without CMCI than with
- due to the shared banks, i.e. you might get duplicated
- error logs.
- mce=dont_log_ce
- Don't make logs for corrected errors. All events reported
- as corrected are silently cleared by OS.
- This option will be useful if you have no interest in any
- of corrected errors.
- mce=ignore_ce
- Disable features for corrected errors, e.g. polling timer
- and CMCI. All events reported as corrected are not cleared
- by OS and remained in its error banks.
- Usually this disablement is not recommended, however if
- there is an agent checking/clearing corrected errors
- (e.g. BIOS or hardware monitoring applications), conflicting
- with OS's error handling, and you cannot deactivate the agent,
- then this option will be a help.
- mce=no_lmce
- Do not opt-in to Local MCE delivery. Use legacy method
- to broadcast MCEs.
- mce=bootlog
- Enable logging of machine checks left over from booting.
- Disabled by default on AMD Fam10h and older because some BIOS
- leave bogus ones.
- If your BIOS doesn't do that it's a good idea to enable though
- to make sure you log even machine check events that result
- in a reboot. On Intel systems it is enabled by default.
- mce=nobootlog
- Disable boot machine check logging.
- mce=monarchtimeout (number)
- monarchtimeout:
- Sets the time in us to wait for other CPUs on machine checks. 0
- to disable.
- mce=bios_cmci_threshold
- Don't overwrite the bios-set CMCI threshold. This boot option
- prevents Linux from overwriting the CMCI threshold set by the
- bios. Without this option, Linux always sets the CMCI
- threshold to 1. Enabling this may make memory predictive failure
- analysis less effective if the bios sets thresholds for memory
- errors since we will not see details for all errors.
- mce=recovery
- Force-enable recoverable machine check code paths
-
- nomce (for compatibility with i386)
- same as mce=off
-
- Everything else is in sysfs now.
-
-APICs
-=====
-
- apic
- Use IO-APIC. Default
-
- noapic
- Don't use the IO-APIC.
-
- disableapic
- Don't use the local APIC
-
- nolapic
- Don't use the local APIC (alias for i386 compatibility)
-
- pirq=...
- See Documentation/arch/x86/i386/IO-APIC.rst
-
- noapictimer
- Don't set up the APIC timer
-
- no_timer_check
- Don't check the IO-APIC timer. This can work around
- problems with incorrect timer initialization on some boards.
-
- apicpmtimer
- Do APIC timer calibration using the pmtimer. Implies
- apicmaintimer. Useful when your PIT timer is totally broken.
-
-Timing
-======
-
- notsc
- Deprecated, use tsc=unstable instead.
-
- nohpet
- Don't use the HPET timer.
-
-Idle loop
-=========
-
- idle=poll
- Don't do power saving in the idle loop using HLT, but poll for rescheduling
- event. This will make the CPUs eat a lot more power, but may be useful
- to get slightly better performance in multiprocessor benchmarks. It also
- makes some profiling using performance counters more accurate.
- Please note that on systems with MONITOR/MWAIT support (like Intel EM64T
- CPUs) this option has no performance advantage over the normal idle loop.
- It may also interact badly with hyperthreading.
-
-Rebooting
-=========
-
- reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] | p[ci] [, [w]arm | [c]old]
- bios
- Use the CPU reboot vector for warm reset
- warm
- Don't set the cold reboot flag
- cold
- Set the cold reboot flag
- triple
- Force a triple fault (init)
- kbd
- Use the keyboard controller. cold reset (default)
- acpi
- Use the ACPI RESET_REG in the FADT. If ACPI is not configured or
- the ACPI reset does not work, the reboot path attempts the reset
- using the keyboard controller.
- efi
- Use efi reset_system runtime service. If EFI is not configured or
- the EFI reset does not work, the reboot path attempts the reset using
- the keyboard controller.
- pci
- Use a write to the PCI config space register 0xcf9 to trigger reboot.
-
- Using warm reset will be much faster especially on big memory
- systems because the BIOS will not go through the memory check.
- Disadvantage is that not all hardware will be completely reinitialized
- on reboot so there may be boot problems on some systems.
-
- reboot=force
- Don't stop other CPUs on reboot. This can make reboot more reliable
- in some cases.
-
- reboot=default
- There are some built-in platform specific "quirks" - you may see:
- "reboot: <name> series board detected. Selecting <type> for reboots."
- In the case where you think the quirk is in error (e.g. you have
- newer BIOS, or newer board) using this option will ignore the built-in
- quirk table, and use the generic default reboot actions.
-
-NUMA
-====
-
- numa=off
- Only set up a single NUMA node spanning all memory.
-
- numa=noacpi
- Don't parse the SRAT table for NUMA setup
-
- numa=nohmat
- Don't parse the HMAT table for NUMA setup, or soft-reserved memory
- partitioning.
-
- numa=fake=<size>[MG]
- If given as a memory unit, fills all system RAM with nodes of
- size interleaved over physical nodes.
-
- numa=fake=<N>
- If given as an integer, fills all system RAM with N fake nodes
- interleaved over physical nodes.
-
- numa=fake=<N>U
- If given as an integer followed by 'U', it will divide each
- physical node into N emulated nodes.
-
-ACPI
-====
-
- acpi=off
- Don't enable ACPI
- acpi=ht
- Use ACPI boot table parsing, but don't enable ACPI interpreter
- acpi=force
- Force ACPI on (currently not needed)
- acpi=strict
- Disable out of spec ACPI workarounds.
- acpi_sci={edge,level,high,low}
- Set up ACPI SCI interrupt.
- acpi=noirq
- Don't route interrupts
- acpi=nocmcff
- Disable firmware first mode for corrected errors. This
- disables parsing the HEST CMC error source to check if
- firmware has set the FF flag. This may result in
- duplicate corrected error reports.
-
-PCI
-===
-
- pci=off
- Don't use PCI
- pci=conf1
- Use conf1 access.
- pci=conf2
- Use conf2 access.
- pci=rom
- Assign ROMs.
- pci=assign-busses
- Assign busses
- pci=irqmask=MASK
- Set PCI interrupt mask to MASK
- pci=lastbus=NUMBER
- Scan up to NUMBER busses, no matter what the mptable says.
- pci=noacpi
- Don't use ACPI to set up PCI interrupt routing.
-
-IOMMU (input/output memory management unit)
-===========================================
-Multiple x86-64 PCI-DMA mapping implementations exist, for example:
-
- 1. <kernel/dma/direct.c>: use no hardware/software IOMMU at all
- (e.g. because you have < 3 GB memory).
- Kernel boot message: "PCI-DMA: Disabling IOMMU"
-
- 2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU.
- Kernel boot message: "PCI-DMA: using GART IOMMU"
-
- 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
- e.g. if there is no hardware IOMMU in the system and it is need because
- you have >3GB memory or told the kernel to us it (iommu=soft))
- Kernel boot message: "PCI-DMA: Using software bounce buffering
- for IO (SWIOTLB)"
-
-::
-
- iommu=[<size>][,noagp][,off][,force][,noforce]
- [,memaper[=<order>]][,merge][,fullflush][,nomerge]
- [,noaperture]
-
-General iommu options:
-
- off
- Don't initialize and use any kind of IOMMU.
- noforce
- Don't force hardware IOMMU usage when it is not needed. (default).
- force
- Force the use of the hardware IOMMU even when it is
- not actually needed (e.g. because < 3 GB memory).
- soft
- Use software bounce buffering (SWIOTLB) (default for
- Intel machines). This can be used to prevent the usage
- of an available hardware IOMMU.
-
-iommu options only relevant to the AMD GART hardware IOMMU:
-
- <size>
- Set the size of the remapping area in bytes.
- allowed
- Overwrite iommu off workarounds for specific chipsets.
- fullflush
- Flush IOMMU on each allocation (default).
- nofullflush
- Don't use IOMMU fullflush.
- memaper[=<order>]
- Allocate an own aperture over RAM with size 32MB<<order.
- (default: order=1, i.e. 64MB)
- merge
- Do scatter-gather (SG) merging. Implies "force" (experimental).
- nomerge
- Don't do scatter-gather (SG) merging.
- noaperture
- Ask the IOMMU not to touch the aperture for AGP.
- noagp
- Don't initialize the AGP driver and use full aperture.
- panic
- Always panic when IOMMU overflows.
-
-iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
-implementation:
-
- swiotlb=<slots>[,force,noforce]
- <slots>
- Prereserve that many 2K slots for the software IO bounce buffering.
- force
- Force all IO through the software TLB.
- noforce
- Do not initialize the software TLB.
-
-
-Miscellaneous
-=============
-
- nogbpages
- Do not use GB pages for kernel direct mappings.
- gbpages
- Use GB pages for kernel direct mappings.
-
-
-AMD SEV (Secure Encrypted Virtualization)
-=========================================
-Options relating to AMD SEV, specified via the following format:
-
-::
-
- sev=option1[,option2]
-
-The available options are:
-
- debug
- Enable debug messages.
diff --git a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst
index ba74617d4999..970ee94eb551 100644
--- a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst
+++ b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst
@@ -18,7 +18,7 @@ For more information on the features of cpusets, see
Documentation/admin-guide/cgroup-v1/cpusets.rst.
There are a number of different configurations you can use for your needs. For
more information on the numa=fake command line option and its various ways of
-configuring fake nodes, see Documentation/arch/x86/x86_64/boot-options.rst.
+configuring fake nodes, see Documentation/admin-guide/kernel-parameters.txt
For the purposes of this introduction, we'll assume a very primitive NUMA
emulation setup of "numa=fake=4*512,". This will split our system memory into
diff --git a/Documentation/arch/x86/x86_64/fsgs.rst b/Documentation/arch/x86/x86_64/fsgs.rst
index 50960e09e1f6..6bda4d16d3f7 100644
--- a/Documentation/arch/x86/x86_64/fsgs.rst
+++ b/Documentation/arch/x86/x86_64/fsgs.rst
@@ -125,17 +125,17 @@ FSGSBASE instructions enablement
FSGSBASE instructions compiler support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
+GCC version 4.6.4 and newer provide intrinsics for the FSGSBASE
instructions. Clang 5 supports them as well.
=================== ===========================
_readfsbase_u64() Read the FS base register
- _readfsbase_u64() Read the GS base register
+ _readgsbase_u64() Read the GS base register
_writefsbase_u64() Write the FS base register
_writegsbase_u64() Write the GS base register
=================== ===========================
-To utilize these instrinsics <immintrin.h> must be included in the source
+To utilize these intrinsics <immintrin.h> must be included in the source
code and the compiler option -mfsgsbase has to be added.
Compiler support for FS/GS based addressing
diff --git a/Documentation/arch/x86/x86_64/index.rst b/Documentation/arch/x86/x86_64/index.rst
index ad15e9bd623f..a0261957a08a 100644
--- a/Documentation/arch/x86/x86_64/index.rst
+++ b/Documentation/arch/x86/x86_64/index.rst
@@ -7,7 +7,6 @@ x86_64 Support
.. toctree::
:maxdepth: 2
- boot-options
uefi
mm
5level-paging
diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst
index 35e5e18c83d0..f2db178b353f 100644
--- a/Documentation/arch/x86/x86_64/mm.rst
+++ b/Documentation/arch/x86/x86_64/mm.rst
@@ -29,15 +29,27 @@ Complete virtual memory map with 4-level page tables
Start addr | Offset | End addr | Size | VM area description
========================================================================================================================
| | | |
- 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
+ 0000000000000000 | 0 | 00007fffffffefff | ~128 TB | user-space virtual memory, different per mm
+ 00007ffffffff000 | ~128 TB | 00007fffffffffff | 4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
| | | |
- 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
- | | | | virtual memory addresses up to the -128 TB
+ 0000800000000000 | +128 TB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -8 EB
| | | | starting offset of kernel mappings.
+ | | | |
+ | | | | LAM relaxes canonicallity check allowing to create aliases
+ | | | | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
|
| Kernel-space virtual memory, shared between all processes:
+ __________________|____________|__________________|_________|___________________________________________________________
+ | | | |
+ 8000000000000000 | -8 EB | ffff7fffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -128 TB
+ | | | | starting offset of kernel mappings.
+ | | | |
+ | | | | LAM_SUP relaxes canonicallity check allowing to create
+ | | | | aliases for kernel memory here.
____________________________________________________________|___________________________________________________________
| | | |
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
@@ -88,16 +100,27 @@ Complete virtual memory map with 5-level page tables
Start addr | Offset | End addr | Size | VM area description
========================================================================================================================
| | | |
- 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
+ 0000000000000000 | 0 | 00fffffffffff000 | ~64 PB | user-space virtual memory, different per mm
+ 00fffffffffff000 | ~64 PB | 00ffffffffffffff | 4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
| | | |
- 0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
- | | | | virtual memory addresses up to the -64 PB
+ 0100000000000000 | +64 PB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -8EB TB
| | | | starting offset of kernel mappings.
+ | | | |
+ | | | | LAM relaxes canonicallity check allowing to create aliases
+ | | | | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
|
| Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
+ 8000000000000000 | -8 EB | feffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -64 PB
+ | | | | starting offset of kernel mappings.
+ | | | |
+ | | | | LAM_SUP relaxes canonicallity check allowing to create
+ | | | | aliases for kernel memory here.
+ ____________________________________________________________|___________________________________________________________
| | | |
ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
diff --git a/Documentation/arch/x86/x86_64/uefi.rst b/Documentation/arch/x86/x86_64/uefi.rst
index fbc30c9a071d..e84592dbd6c1 100644
--- a/Documentation/arch/x86/x86_64/uefi.rst
+++ b/Documentation/arch/x86/x86_64/uefi.rst
@@ -12,14 +12,20 @@ with EFI firmware and specifications are listed below.
1. UEFI specification: http://www.uefi.org
-2. Booting Linux kernel on UEFI x86_64 platform requires bootloader
- support. Elilo with x86_64 support can be used.
+2. Booting Linux kernel on UEFI x86_64 platform can either be
+ done using the <Documentation/admin-guide/efi-stub.rst> or using a
+ separate bootloader.
3. x86_64 platform with EFI/UEFI firmware.
Mechanics
---------
+Refer to <Documentation/admin-guide/efi-stub.rst> to learn how to use the EFI stub.
+
+Below are general EFI setup guidelines on the x86_64 platform,
+regardless of whether you use the EFI stub or a separate bootloader.
+
- Build the kernel with the following configuration::
CONFIG_FB_EFI=y
@@ -31,16 +37,27 @@ Mechanics
CONFIG_EFI=y
CONFIG_EFIVAR_FS=y or m # optional
-- Create a VFAT partition on the disk
-- Copy the following to the VFAT partition:
+- Create a VFAT partition on the disk with the EFI System flag
+ You can do this with fdisk with the following commands:
+
+ 1. g - initialize a GPT partition table
+ 2. n - create a new partition
+ 3. t - change the partition type to "EFI System" (number 1)
+ 4. w - write and save the changes
+
+ Afterwards, initialize the VFAT filesystem by running mkfs::
+
+ mkfs.fat /dev/<your-partition>
+
+- Copy the boot files to the VFAT partition:
+ If you use the EFI stub method, the kernel acts also as an EFI executable.
+
+ You can just copy the bzImage to the EFI/boot/bootx64.efi path on the partition
+ so that it will automatically get booted, see the <Documentation/admin-guide/efi-stub.rst> page
+ for additional instructions regarding passage of kernel parameters and initramfs.
- elilo bootloader with x86_64 support, elilo configuration file,
- kernel image built in first step and corresponding
- initrd. Instructions on building elilo and its dependencies
- can be found in the elilo sourceforge project.
+ If you use a custom bootloader, refer to the relevant documentation for help on this part.
-- Boot to EFI shell and invoke elilo choosing the kernel image built
- in first step.
- If some or all EFI runtime services don't work, you can try following
kernel command line parameters to turn off some or all EFI runtime
services.