diff options
Diffstat (limited to 'Documentation/arch/x86/x86_64')
-rw-r--r-- | Documentation/arch/x86/x86_64/5level-paging.rst | 9 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/boot-options.rst | 319 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/fsgs.rst | 6 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/index.rst | 1 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/mm.rst | 35 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/uefi.rst | 37 |
7 files changed, 60 insertions, 349 deletions
diff --git a/Documentation/arch/x86/x86_64/5level-paging.rst b/Documentation/arch/x86/x86_64/5level-paging.rst index 71f882f4a173..ad7ddc13f79d 100644 --- a/Documentation/arch/x86/x86_64/5level-paging.rst +++ b/Documentation/arch/x86/x86_64/5level-paging.rst @@ -22,15 +22,6 @@ QEMU 2.9 and later support 5-level paging. Virtual memory layout for 5-level paging is described in Documentation/arch/x86/x86_64/mm.rst - -Enabling 5-level paging -======================= -CONFIG_X86_5LEVEL=y enables the feature. - -Kernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware. -In this case additional page table level -- p4d -- will be folded at -runtime. - User-space and large virtual address space ========================================== On x86, 5-level paging enables 56-bit userspace virtual address space. diff --git a/Documentation/arch/x86/x86_64/boot-options.rst b/Documentation/arch/x86/x86_64/boot-options.rst deleted file mode 100644 index 137432d34109..000000000000 --- a/Documentation/arch/x86/x86_64/boot-options.rst +++ /dev/null @@ -1,319 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=========================== -AMD64 Specific Boot Options -=========================== - -There are many others (usually documented in driver documentation), but -only the AMD64 specific ones are listed here. - -Machine check -============= -Please see Documentation/arch/x86/x86_64/machinecheck.rst for sysfs runtime tunables. - - mce=off - Disable machine check - mce=no_cmci - Disable CMCI(Corrected Machine Check Interrupt) that - Intel processor supports. Usually this disablement is - not recommended, but it might be handy if your hardware - is misbehaving. - Note that you'll get more problems without CMCI than with - due to the shared banks, i.e. you might get duplicated - error logs. - mce=dont_log_ce - Don't make logs for corrected errors. All events reported - as corrected are silently cleared by OS. - This option will be useful if you have no interest in any - of corrected errors. - mce=ignore_ce - Disable features for corrected errors, e.g. polling timer - and CMCI. All events reported as corrected are not cleared - by OS and remained in its error banks. - Usually this disablement is not recommended, however if - there is an agent checking/clearing corrected errors - (e.g. BIOS or hardware monitoring applications), conflicting - with OS's error handling, and you cannot deactivate the agent, - then this option will be a help. - mce=no_lmce - Do not opt-in to Local MCE delivery. Use legacy method - to broadcast MCEs. - mce=bootlog - Enable logging of machine checks left over from booting. - Disabled by default on AMD Fam10h and older because some BIOS - leave bogus ones. - If your BIOS doesn't do that it's a good idea to enable though - to make sure you log even machine check events that result - in a reboot. On Intel systems it is enabled by default. - mce=nobootlog - Disable boot machine check logging. - mce=monarchtimeout (number) - monarchtimeout: - Sets the time in us to wait for other CPUs on machine checks. 0 - to disable. - mce=bios_cmci_threshold - Don't overwrite the bios-set CMCI threshold. This boot option - prevents Linux from overwriting the CMCI threshold set by the - bios. Without this option, Linux always sets the CMCI - threshold to 1. Enabling this may make memory predictive failure - analysis less effective if the bios sets thresholds for memory - errors since we will not see details for all errors. - mce=recovery - Force-enable recoverable machine check code paths - - nomce (for compatibility with i386) - same as mce=off - - Everything else is in sysfs now. - -APICs -===== - - apic - Use IO-APIC. Default - - noapic - Don't use the IO-APIC. - - disableapic - Don't use the local APIC - - nolapic - Don't use the local APIC (alias for i386 compatibility) - - pirq=... - See Documentation/arch/x86/i386/IO-APIC.rst - - noapictimer - Don't set up the APIC timer - - no_timer_check - Don't check the IO-APIC timer. This can work around - problems with incorrect timer initialization on some boards. - - apicpmtimer - Do APIC timer calibration using the pmtimer. Implies - apicmaintimer. Useful when your PIT timer is totally broken. - -Timing -====== - - notsc - Deprecated, use tsc=unstable instead. - - nohpet - Don't use the HPET timer. - -Idle loop -========= - - idle=poll - Don't do power saving in the idle loop using HLT, but poll for rescheduling - event. This will make the CPUs eat a lot more power, but may be useful - to get slightly better performance in multiprocessor benchmarks. It also - makes some profiling using performance counters more accurate. - Please note that on systems with MONITOR/MWAIT support (like Intel EM64T - CPUs) this option has no performance advantage over the normal idle loop. - It may also interact badly with hyperthreading. - -Rebooting -========= - - reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] | p[ci] [, [w]arm | [c]old] - bios - Use the CPU reboot vector for warm reset - warm - Don't set the cold reboot flag - cold - Set the cold reboot flag - triple - Force a triple fault (init) - kbd - Use the keyboard controller. cold reset (default) - acpi - Use the ACPI RESET_REG in the FADT. If ACPI is not configured or - the ACPI reset does not work, the reboot path attempts the reset - using the keyboard controller. - efi - Use efi reset_system runtime service. If EFI is not configured or - the EFI reset does not work, the reboot path attempts the reset using - the keyboard controller. - pci - Use a write to the PCI config space register 0xcf9 to trigger reboot. - - Using warm reset will be much faster especially on big memory - systems because the BIOS will not go through the memory check. - Disadvantage is that not all hardware will be completely reinitialized - on reboot so there may be boot problems on some systems. - - reboot=force - Don't stop other CPUs on reboot. This can make reboot more reliable - in some cases. - - reboot=default - There are some built-in platform specific "quirks" - you may see: - "reboot: <name> series board detected. Selecting <type> for reboots." - In the case where you think the quirk is in error (e.g. you have - newer BIOS, or newer board) using this option will ignore the built-in - quirk table, and use the generic default reboot actions. - -NUMA -==== - - numa=off - Only set up a single NUMA node spanning all memory. - - numa=noacpi - Don't parse the SRAT table for NUMA setup - - numa=nohmat - Don't parse the HMAT table for NUMA setup, or soft-reserved memory - partitioning. - - numa=fake=<size>[MG] - If given as a memory unit, fills all system RAM with nodes of - size interleaved over physical nodes. - - numa=fake=<N> - If given as an integer, fills all system RAM with N fake nodes - interleaved over physical nodes. - - numa=fake=<N>U - If given as an integer followed by 'U', it will divide each - physical node into N emulated nodes. - -ACPI -==== - - acpi=off - Don't enable ACPI - acpi=ht - Use ACPI boot table parsing, but don't enable ACPI interpreter - acpi=force - Force ACPI on (currently not needed) - acpi=strict - Disable out of spec ACPI workarounds. - acpi_sci={edge,level,high,low} - Set up ACPI SCI interrupt. - acpi=noirq - Don't route interrupts - acpi=nocmcff - Disable firmware first mode for corrected errors. This - disables parsing the HEST CMC error source to check if - firmware has set the FF flag. This may result in - duplicate corrected error reports. - -PCI -=== - - pci=off - Don't use PCI - pci=conf1 - Use conf1 access. - pci=conf2 - Use conf2 access. - pci=rom - Assign ROMs. - pci=assign-busses - Assign busses - pci=irqmask=MASK - Set PCI interrupt mask to MASK - pci=lastbus=NUMBER - Scan up to NUMBER busses, no matter what the mptable says. - pci=noacpi - Don't use ACPI to set up PCI interrupt routing. - -IOMMU (input/output memory management unit) -=========================================== -Multiple x86-64 PCI-DMA mapping implementations exist, for example: - - 1. <kernel/dma/direct.c>: use no hardware/software IOMMU at all - (e.g. because you have < 3 GB memory). - Kernel boot message: "PCI-DMA: Disabling IOMMU" - - 2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU. - Kernel boot message: "PCI-DMA: using GART IOMMU" - - 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used - e.g. if there is no hardware IOMMU in the system and it is need because - you have >3GB memory or told the kernel to us it (iommu=soft)) - Kernel boot message: "PCI-DMA: Using software bounce buffering - for IO (SWIOTLB)" - -:: - - iommu=[<size>][,noagp][,off][,force][,noforce] - [,memaper[=<order>]][,merge][,fullflush][,nomerge] - [,noaperture] - -General iommu options: - - off - Don't initialize and use any kind of IOMMU. - noforce - Don't force hardware IOMMU usage when it is not needed. (default). - force - Force the use of the hardware IOMMU even when it is - not actually needed (e.g. because < 3 GB memory). - soft - Use software bounce buffering (SWIOTLB) (default for - Intel machines). This can be used to prevent the usage - of an available hardware IOMMU. - -iommu options only relevant to the AMD GART hardware IOMMU: - - <size> - Set the size of the remapping area in bytes. - allowed - Overwrite iommu off workarounds for specific chipsets. - fullflush - Flush IOMMU on each allocation (default). - nofullflush - Don't use IOMMU fullflush. - memaper[=<order>] - Allocate an own aperture over RAM with size 32MB<<order. - (default: order=1, i.e. 64MB) - merge - Do scatter-gather (SG) merging. Implies "force" (experimental). - nomerge - Don't do scatter-gather (SG) merging. - noaperture - Ask the IOMMU not to touch the aperture for AGP. - noagp - Don't initialize the AGP driver and use full aperture. - panic - Always panic when IOMMU overflows. - -iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU -implementation: - - swiotlb=<slots>[,force,noforce] - <slots> - Prereserve that many 2K slots for the software IO bounce buffering. - force - Force all IO through the software TLB. - noforce - Do not initialize the software TLB. - - -Miscellaneous -============= - - nogbpages - Do not use GB pages for kernel direct mappings. - gbpages - Use GB pages for kernel direct mappings. - - -AMD SEV (Secure Encrypted Virtualization) -========================================= -Options relating to AMD SEV, specified via the following format: - -:: - - sev=option1[,option2] - -The available options are: - - debug - Enable debug messages. diff --git a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst index ba74617d4999..970ee94eb551 100644 --- a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst +++ b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst @@ -18,7 +18,7 @@ For more information on the features of cpusets, see Documentation/admin-guide/cgroup-v1/cpusets.rst. There are a number of different configurations you can use for your needs. For more information on the numa=fake command line option and its various ways of -configuring fake nodes, see Documentation/arch/x86/x86_64/boot-options.rst. +configuring fake nodes, see Documentation/admin-guide/kernel-parameters.txt For the purposes of this introduction, we'll assume a very primitive NUMA emulation setup of "numa=fake=4*512,". This will split our system memory into diff --git a/Documentation/arch/x86/x86_64/fsgs.rst b/Documentation/arch/x86/x86_64/fsgs.rst index 50960e09e1f6..6bda4d16d3f7 100644 --- a/Documentation/arch/x86/x86_64/fsgs.rst +++ b/Documentation/arch/x86/x86_64/fsgs.rst @@ -125,17 +125,17 @@ FSGSBASE instructions enablement FSGSBASE instructions compiler support ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE +GCC version 4.6.4 and newer provide intrinsics for the FSGSBASE instructions. Clang 5 supports them as well. =================== =========================== _readfsbase_u64() Read the FS base register - _readfsbase_u64() Read the GS base register + _readgsbase_u64() Read the GS base register _writefsbase_u64() Write the FS base register _writegsbase_u64() Write the GS base register =================== =========================== -To utilize these instrinsics <immintrin.h> must be included in the source +To utilize these intrinsics <immintrin.h> must be included in the source code and the compiler option -mfsgsbase has to be added. Compiler support for FS/GS based addressing diff --git a/Documentation/arch/x86/x86_64/index.rst b/Documentation/arch/x86/x86_64/index.rst index ad15e9bd623f..a0261957a08a 100644 --- a/Documentation/arch/x86/x86_64/index.rst +++ b/Documentation/arch/x86/x86_64/index.rst @@ -7,7 +7,6 @@ x86_64 Support .. toctree:: :maxdepth: 2 - boot-options uefi mm 5level-paging diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst index 35e5e18c83d0..f2db178b353f 100644 --- a/Documentation/arch/x86/x86_64/mm.rst +++ b/Documentation/arch/x86/x86_64/mm.rst @@ -29,15 +29,27 @@ Complete virtual memory map with 4-level page tables Start addr | Offset | End addr | Size | VM area description ======================================================================================================================== | | | | - 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm + 0000000000000000 | 0 | 00007fffffffefff | ~128 TB | user-space virtual memory, different per mm + 00007ffffffff000 | ~128 TB | 00007fffffffffff | 4 kB | ... guard hole __________________|____________|__________________|_________|___________________________________________________________ | | | | - 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical - | | | | virtual memory addresses up to the -128 TB + 0000800000000000 | +128 TB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -8 EB | | | | starting offset of kernel mappings. + | | | | + | | | | LAM relaxes canonicallity check allowing to create aliases + | | | | for userspace memory here. __________________|____________|__________________|_________|___________________________________________________________ | | Kernel-space virtual memory, shared between all processes: + __________________|____________|__________________|_________|___________________________________________________________ + | | | | + 8000000000000000 | -8 EB | ffff7fffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -128 TB + | | | | starting offset of kernel mappings. + | | | | + | | | | LAM_SUP relaxes canonicallity check allowing to create + | | | | aliases for kernel memory here. ____________________________________________________________|___________________________________________________________ | | | | ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor @@ -88,16 +100,27 @@ Complete virtual memory map with 5-level page tables Start addr | Offset | End addr | Size | VM area description ======================================================================================================================== | | | | - 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm + 0000000000000000 | 0 | 00fffffffffff000 | ~64 PB | user-space virtual memory, different per mm + 00fffffffffff000 | ~64 PB | 00ffffffffffffff | 4 kB | ... guard hole __________________|____________|__________________|_________|___________________________________________________________ | | | | - 0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical - | | | | virtual memory addresses up to the -64 PB + 0100000000000000 | +64 PB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -8EB TB | | | | starting offset of kernel mappings. + | | | | + | | | | LAM relaxes canonicallity check allowing to create aliases + | | | | for userspace memory here. __________________|____________|__________________|_________|___________________________________________________________ | | Kernel-space virtual memory, shared between all processes: ____________________________________________________________|___________________________________________________________ + 8000000000000000 | -8 EB | feffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -64 PB + | | | | starting offset of kernel mappings. + | | | | + | | | | LAM_SUP relaxes canonicallity check allowing to create + | | | | aliases for kernel memory here. + ____________________________________________________________|___________________________________________________________ | | | | ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI diff --git a/Documentation/arch/x86/x86_64/uefi.rst b/Documentation/arch/x86/x86_64/uefi.rst index fbc30c9a071d..e84592dbd6c1 100644 --- a/Documentation/arch/x86/x86_64/uefi.rst +++ b/Documentation/arch/x86/x86_64/uefi.rst @@ -12,14 +12,20 @@ with EFI firmware and specifications are listed below. 1. UEFI specification: http://www.uefi.org -2. Booting Linux kernel on UEFI x86_64 platform requires bootloader - support. Elilo with x86_64 support can be used. +2. Booting Linux kernel on UEFI x86_64 platform can either be + done using the <Documentation/admin-guide/efi-stub.rst> or using a + separate bootloader. 3. x86_64 platform with EFI/UEFI firmware. Mechanics --------- +Refer to <Documentation/admin-guide/efi-stub.rst> to learn how to use the EFI stub. + +Below are general EFI setup guidelines on the x86_64 platform, +regardless of whether you use the EFI stub or a separate bootloader. + - Build the kernel with the following configuration:: CONFIG_FB_EFI=y @@ -31,16 +37,27 @@ Mechanics CONFIG_EFI=y CONFIG_EFIVAR_FS=y or m # optional -- Create a VFAT partition on the disk -- Copy the following to the VFAT partition: +- Create a VFAT partition on the disk with the EFI System flag + You can do this with fdisk with the following commands: + + 1. g - initialize a GPT partition table + 2. n - create a new partition + 3. t - change the partition type to "EFI System" (number 1) + 4. w - write and save the changes + + Afterwards, initialize the VFAT filesystem by running mkfs:: + + mkfs.fat /dev/<your-partition> + +- Copy the boot files to the VFAT partition: + If you use the EFI stub method, the kernel acts also as an EFI executable. + + You can just copy the bzImage to the EFI/boot/bootx64.efi path on the partition + so that it will automatically get booted, see the <Documentation/admin-guide/efi-stub.rst> page + for additional instructions regarding passage of kernel parameters and initramfs. - elilo bootloader with x86_64 support, elilo configuration file, - kernel image built in first step and corresponding - initrd. Instructions on building elilo and its dependencies - can be found in the elilo sourceforge project. + If you use a custom bootloader, refer to the relevant documentation for help on this part. -- Boot to EFI shell and invoke elilo choosing the kernel image built - in first step. - If some or all EFI runtime services don't work, you can try following kernel command line parameters to turn off some or all EFI runtime services. |