diff options
Diffstat (limited to 'Documentation/arch')
-rw-r--r-- | Documentation/arch/arm/stm32/stm32f746-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm/stm32/stm32f769-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm/stm32/stm32h743-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm/stm32/stm32h750-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm/stm32/stm32mp13-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm/stm32/stm32mp151-overview.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm64/booting.rst | 11 | ||||
-rw-r--r-- | Documentation/arch/arm64/elf_hwcaps.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/arm64/silicon-errata.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm64/sme.rst | 14 | ||||
-rw-r--r-- | Documentation/arch/loongarch/irq-chip-model.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/powerpc/eeh-pci-error-recovery.rst | 1 | ||||
-rw-r--r-- | Documentation/arch/powerpc/index.rst | 1 | ||||
-rw-r--r-- | Documentation/arch/powerpc/vpa-dtl.rst | 156 | ||||
-rw-r--r-- | Documentation/arch/riscv/hwprobe.rst | 9 | ||||
-rw-r--r-- | Documentation/arch/x86/cpuinfo.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/x86/tdx.rst | 14 | ||||
-rw-r--r-- | Documentation/arch/x86/topology.rst | 191 |
18 files changed, 392 insertions, 29 deletions
diff --git a/Documentation/arch/arm/stm32/stm32f746-overview.rst b/Documentation/arch/arm/stm32/stm32f746-overview.rst index 78befddc7740..335f0855a858 100644 --- a/Documentation/arch/arm/stm32/stm32f746-overview.rst +++ b/Documentation/arch/arm/stm32/stm32f746-overview.rst @@ -15,7 +15,7 @@ It features: - SD/MMC/SDIO support - Ethernet controller - USB OTFG FS & HS controllers -- I2C, SPI, CAN busses support +- I2C, SPI, CAN buses support - Several 16 & 32 bits general purpose timers - Serial Audio interface - LCD controller diff --git a/Documentation/arch/arm/stm32/stm32f769-overview.rst b/Documentation/arch/arm/stm32/stm32f769-overview.rst index e482980ddf21..ef31aadee68f 100644 --- a/Documentation/arch/arm/stm32/stm32f769-overview.rst +++ b/Documentation/arch/arm/stm32/stm32f769-overview.rst @@ -15,7 +15,7 @@ It features: - SD/MMC/SDIO support*2 - Ethernet controller - USB OTFG FS & HS controllers -- I2C*4, SPI*6, CAN*3 busses support +- I2C*4, SPI*6, CAN*3 buses support - Several 16 & 32 bits general purpose timers - Serial Audio interface*2 - LCD controller diff --git a/Documentation/arch/arm/stm32/stm32h743-overview.rst b/Documentation/arch/arm/stm32/stm32h743-overview.rst index 4e15f1a42730..7659df24d362 100644 --- a/Documentation/arch/arm/stm32/stm32h743-overview.rst +++ b/Documentation/arch/arm/stm32/stm32h743-overview.rst @@ -15,7 +15,7 @@ It features: - SD/MMC/SDIO support - Ethernet controller - USB OTFG FS & HS controllers -- I2C, SPI, CAN busses support +- I2C, SPI, CAN buses support - Several 16 & 32 bits general purpose timers - Serial Audio interface - LCD controller diff --git a/Documentation/arch/arm/stm32/stm32h750-overview.rst b/Documentation/arch/arm/stm32/stm32h750-overview.rst index 0e51235c9547..be032b77d1f1 100644 --- a/Documentation/arch/arm/stm32/stm32h750-overview.rst +++ b/Documentation/arch/arm/stm32/stm32h750-overview.rst @@ -15,7 +15,7 @@ It features: - SD/MMC/SDIO support - Ethernet controller - USB OTFG FS & HS controllers -- I2C, SPI, CAN busses support +- I2C, SPI, CAN buses support - Several 16 & 32 bits general purpose timers - Serial Audio interface - LCD controller diff --git a/Documentation/arch/arm/stm32/stm32mp13-overview.rst b/Documentation/arch/arm/stm32/stm32mp13-overview.rst index 3bb9492dad49..b5e9589fb06f 100644 --- a/Documentation/arch/arm/stm32/stm32mp13-overview.rst +++ b/Documentation/arch/arm/stm32/stm32mp13-overview.rst @@ -24,7 +24,7 @@ More details: - ADC/DAC - USB EHCI/OHCI controllers - USB OTG -- I2C, SPI, CAN busses support +- I2C, SPI, CAN buses support - Several general purpose timers - Serial Audio interface - LCD controller diff --git a/Documentation/arch/arm/stm32/stm32mp151-overview.rst b/Documentation/arch/arm/stm32/stm32mp151-overview.rst index f42a2ac309c0..b58c256ede9a 100644 --- a/Documentation/arch/arm/stm32/stm32mp151-overview.rst +++ b/Documentation/arch/arm/stm32/stm32mp151-overview.rst @@ -23,7 +23,7 @@ More details: - ADC/DAC - USB EHCI/OHCI controllers - USB OTG -- I2C, SPI busses support +- I2C, SPI buses support - Several general purpose timers - Serial Audio interface - LCD-TFT controller diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst index 2f666a7c303c..e4f953839f71 100644 --- a/Documentation/arch/arm64/booting.rst +++ b/Documentation/arch/arm64/booting.rst @@ -466,6 +466,17 @@ Before jumping into the kernel, the following conditions must be met: - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1. - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1. + For CPUs with SPE data source filtering (FEAT_SPE_FDS): + + - If EL3 is present: + + - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1. + - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1. + For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS): - If the kernel is entered at EL1 and EL2 is present: diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst index f58ada4d6cb2..a15df4956849 100644 --- a/Documentation/arch/arm64/elf_hwcaps.rst +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -441,6 +441,10 @@ HWCAP3_MTE_FAR HWCAP3_MTE_STORE_ONLY Functionality implied by ID_AA64PFR2_EL1.MTESTOREONLY == 0b0001. +HWCAP3_LSFE + Functionality implied by ID_AA64ISAR3_EL1.LSFE == 0b0001 + + 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index b18ef4064bc0..a7ec57060f64 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -200,6 +200,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-V3 | #3312417 | ARM64_ERRATUM_3194386 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-V3AE | #3312417 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | ARM_SMMU_MMU_500_CPRE_ERRATA| | | | #562869,1047329 | | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/arm64/sme.rst b/Documentation/arch/arm64/sme.rst index 4cb38330e704..583f2ee9cb97 100644 --- a/Documentation/arch/arm64/sme.rst +++ b/Documentation/arch/arm64/sme.rst @@ -81,17 +81,7 @@ The ZA matrix is square with each side having as many bytes as a streaming mode SVE vector. -3. Sharing of streaming and non-streaming mode SVE state ---------------------------------------------------------- - -It is implementation defined which if any parts of the SVE state are shared -between streaming and non-streaming modes. When switching between modes -via software interfaces such as ptrace if no register content is provided as -part of switching no state will be assumed to be shared and everything will -be zeroed. - - -4. System call behaviour +3. System call behaviour ------------------------- * On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the @@ -112,7 +102,7 @@ be zeroed. exceptions for execve() described in section 6. -5. Signal handling +4. Signal handling ------------------- * Signal handlers are invoked with PSTATE.SM=0, PSTATE.ZA=0, and TPIDR2_EL0=0. diff --git a/Documentation/arch/loongarch/irq-chip-model.rst b/Documentation/arch/loongarch/irq-chip-model.rst index a7ecce11e445..8f5c3345109e 100644 --- a/Documentation/arch/loongarch/irq-chip-model.rst +++ b/Documentation/arch/loongarch/irq-chip-model.rst @@ -139,13 +139,13 @@ Feature EXTIOI_HAS_INT_ENCODE is part of standard EIOINTC. If it is 1, it indicates that CPU Interrupt Pin selection can be normal method rather than bitmap method, so interrupt can be routed to IP0 - IP15. -Feature EXTIOI_HAS_CPU_ENCODE is entension of V-EIOINTC. If it is 1, it +Feature EXTIOI_HAS_CPU_ENCODE is extension of V-EIOINTC. If it is 1, it indicates that CPU selection can be normal method rather than bitmap method, so interrupt can be routed to CPU0 - CPU255. EXTIOI_VIRT_CONFIG ------------------ -This register is read-write register, for compatibility intterupt routed uses +This register is read-write register, for compatibility interrupt routed uses the default method which is the same with standard EIOINTC. If the bit is set with 1, it indicated HW to use normal method rather than bitmap method. diff --git a/Documentation/arch/powerpc/eeh-pci-error-recovery.rst b/Documentation/arch/powerpc/eeh-pci-error-recovery.rst index d6643a91bdf8..153d0af055b6 100644 --- a/Documentation/arch/powerpc/eeh-pci-error-recovery.rst +++ b/Documentation/arch/powerpc/eeh-pci-error-recovery.rst @@ -315,7 +315,6 @@ network daemons and file systems that didn't need to be disturbed. ideally, the reset should happen at or below the block layer, so that the file systems are not disturbed. - Reiserfs does not tolerate errors returned from the block device. Ext3fs seems to be tolerant, retrying reads/writes until it does succeed. Both have been only lightly tested in this scenario. diff --git a/Documentation/arch/powerpc/index.rst b/Documentation/arch/powerpc/index.rst index 53fc9f89f3e4..1be2ee3f0361 100644 --- a/Documentation/arch/powerpc/index.rst +++ b/Documentation/arch/powerpc/index.rst @@ -37,6 +37,7 @@ powerpc vas-api vcpudispatch_stats vmemmap_dedup + vpa-dtl features diff --git a/Documentation/arch/powerpc/vpa-dtl.rst b/Documentation/arch/powerpc/vpa-dtl.rst new file mode 100644 index 000000000000..58d0022f993a --- /dev/null +++ b/Documentation/arch/powerpc/vpa-dtl.rst @@ -0,0 +1,156 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _vpa-dtl: + +=================================== +DTL (Dispatch Trace Log) +=================================== + +Athira Rajeev, 19 April 2025 + +.. contents:: + :depth: 3 + + +Basic overview +============== + +The pseries Shared Processor Logical Partition(SPLPAR) machines can +retrieve a log of dispatch and preempt events from the hypervisor +using data from Disptach Trace Log(DTL) buffer. With this information, +user can retrieve when and why each dispatch & preempt has occurred. +The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters +via perf. + +Infrastructure used +=================== + +The VPA DTL PMU counters do not interrupt on overflow or generate any +PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer +nterval can be provided by user via sample_period field in nano seconds. +vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch +Trace Log) contains information about dispatch/preempt, enqueue time etc. +We directly copy the DTL buffer data as part of auxiliary buffer and it +will be processed later. This will avoid time taken to create samples +in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL) +entries makes use of AUX support in perf infrastructure. On the tools side, +this data is made available as PERF_RECORD_AUXTRACE records. + +To correlate each DTL entry with other events across CPU's, an auxtrace_queue +is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers. +All auxtrace queues is maintained in auxtrace heap. The queues are sorted +based on timestamp. When the different PERF_RECORD_XX records are processed, +compare the timestamp of perf record with timestamp of top element in the +auxtrace heap so that DTL events can be co-related with other events +Process the auxtrace queue if the timestamp of element from heap is +lower than timestamp from entry in perf record. Sometimes it could happen that +one buffer is only partially processed. if the timestamp of occurrence of +another event is more than currently processed element in the queue, it will +move on to next perf record. So keep track of position of buffer to continue +processing next time. Update the timestamp of the auxtrace heap with the timestamp +of last processed entry from the auxtrace buffer. + +This infrastructure ensures dispatch trace log entries can be correlated +and presented along with other events like sched. + +vpa-dtl PMU example usage +========================= + +.. code-block:: sh + + # ls /sys/devices/vpa_dtl/ + events format perf_event_mux_interval_ms power subsystem type uevent + + +To capture the DTL data using perf record: +.. code-block:: sh + + # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 + +The result can be interpreted using perf record. Snippet of perf report -D + +.. code-block:: sh + + # ./perf report -D + +There are different PERF_RECORD_XX records. In that records corresponding to +auxtrace buffers includes: + +1. PERF_RECORD_AUX + Conveys that new data is available in AUX area + +2. PERF_RECORD_AUXTRACE_INFO + Describes offset and size of auxtrace data in the buffers + +3. PERF_RECORD_AUXTRACE + This is the record that defines the auxtrace data which here in case of + vpa-dtl pmu is dispatch trace log data. + +Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump + +.. code-block:: sh + +0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0 +. +. ... VPA DTL PMU data: size 1680 bytes, entries is 35 +. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000 +. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773 +. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437 +. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709 +. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243 +. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648 +. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446 +. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126 +. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665 + +Above is representation of dtl entry of below format: + +struct dtl_entry { + u8 dispatch_reason; + u8 preempt_reason; + u16 processor_id; + u32 enqueue_to_dispatch_time; + u32 ready_to_enqueue_time; + u32 waiting_to_ready_time; + u64 timebase; + u64 fault_addr; + u64 srr0; + u64 srr1; + +}; + +First two fields represent the dispatch reason and preempt reason. The post +processing of PERF_RECORD_AUXTRACE records will translate to meaningful data +for user to consume. + +Visualize the dispatch trace log entries with perf report +========================================================= + +.. code-block:: sh + + # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.300 MB perf.data ] + + # ./perf report + # Samples: 321 of event 'vpa-dtl' + # Event count (approx.): 321 + # + # Children Self Command Shared Object Symbol + # ........ ........ ....... ................. .............................. + # + 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace + +Visualize the dispatch trace log entries with perf script +========================================================= + +.. code-block:: sh + + # ./perf script + migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009 + migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10 + migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns] + migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120 + :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) + :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) + perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns] + perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010 diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst index 2aa9be272d5d..2f449c9b15bd 100644 --- a/Documentation/arch/riscv/hwprobe.rst +++ b/Documentation/arch/riscv/hwprobe.rst @@ -327,6 +327,15 @@ The following keys are defined: * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are not supported at all and will generate a misaligned address fault. +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_MIPS_0`: A bitmask containing the + mips vendor extensions that are compatible with the + :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. + + * MIPS + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XMIPSEXECTL`: The xmipsexectl vendor + extension is supported in the MIPS ISA extensions spec. + * :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the thead vendor extensions that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. diff --git a/Documentation/arch/x86/cpuinfo.rst b/Documentation/arch/x86/cpuinfo.rst index dd8b7806944e..9f2e47c4b1c8 100644 --- a/Documentation/arch/x86/cpuinfo.rst +++ b/Documentation/arch/x86/cpuinfo.rst @@ -11,7 +11,7 @@ The list of feature flags in /proc/cpuinfo is not complete and represents an ill-fated attempt from long time ago to put feature flags in an easy to find place for userspace. -However, the amount of feature flags is growing by the CPU generation, +However, the number of feature flags is growing with each CPU generation, leading to unparseable and unwieldy /proc/cpuinfo. What is more, those feature flags do not even need to be in that file diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst index 719043cd8b46..61670e7df2f7 100644 --- a/Documentation/arch/x86/tdx.rst +++ b/Documentation/arch/x86/tdx.rst @@ -142,13 +142,6 @@ but depends on the BIOS to behave correctly. Note TDX works with CPU logical online/offline, thus the kernel still allows to offline logical CPU and online it again. -Kexec() -~~~~~~~ - -TDX host support currently lacks the ability to handle kexec. For -simplicity only one of them can be enabled in the Kconfig. This will be -fixed in the future. - Erratum ~~~~~~~ @@ -171,6 +164,13 @@ If the platform has such erratum, the kernel prints additional message in machine check handler to tell user the machine check may be caused by kernel bug on TDX private memory. +Kexec +~~~~~~~ + +Currently kexec doesn't work on the TDX platforms with the aforementioned +erratum. It fails when loading the kexec kernel image. Otherwise it +works normally. + Interaction vs S3 and deeper states ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst index c12837e61bda..86bec8ac2c4d 100644 --- a/Documentation/arch/x86/topology.rst +++ b/Documentation/arch/x86/topology.rst @@ -141,6 +141,197 @@ Thread-related topology information in the kernel: +System topology enumeration +=========================== + +The topology on x86 systems can be discovered using a combination of vendor +specific CPUID leaves which enumerate the processor topology and the cache +hierarchy. + +The CPUID leaves in their preferred order of parsing for each x86 vendor is as +follows: + +1) AMD + + 1) CPUID leaf 0x80000026 [Extended CPU Topology] (Core::X86::Cpuid::ExCpuTopology) + + The extended CPUID leaf 0x80000026 is the extension of the CPUID leaf 0xB + and provides the topology information of Core, Complex, CCD (Die), and + Socket in each level. + + Support for the leaf is discovered by checking if the maximum extended + CPUID level is >= 0x80000026 and then checking if `LogProcAtThisLevel` + in `EBX[15:0]` at a particular level (starting from 0) is non-zero. + + The `LevelType` in `ECX[15:8]` at the level provides the topology domain + the level describes - Core, Complex, CCD(Die), or the Socket. + + The kernel uses the `CoreMaskWidth` from `EAX[4:0]` to discover the + number of bits that need to be right-shifted from `ExtendedLocalApicId` + in `EDX[31:0]` in order to get a unique Topology ID for the topology + level. CPUs with the same Topology ID share the resources at that level. + + CPUID leaf 0x80000026 also provides more information regarding the power + and efficiency rankings, and about the core type on AMD processors with + heterogeneous characteristics. + + If CPUID leaf 0x80000026 is supported, further parsing is not required. + + 2) CPUID leaf 0x0000000B [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnum) + + The extended CPUID leaf 0x0000000B is the predecessor on the extended + CPUID leaf 0x80000026 and only describes the core, and the socket domains + of the processor topology. + + The support for the leaf is discovered by checking if the maximum supported + CPUID level is >= 0xB and then if `EBX[31:0]` at a particular level + (starting from 0) is non-zero. + + The `LevelType` in `ECX[15:8]` at the level provides the topology domain + that the level describes - Thread, or Processor (Socket). + + The kernel uses the `CoreMaskWidth` from `EAX[4:0]` to discover the + number of bits that need to be right-shifted from the `ExtendedLocalApicId` + in `EDX[31:0]` to get a unique Topology ID for that topology level. CPUs + sharing the Topology ID share the resources at that level. + + If CPUID leaf 0xB is supported, further parsing is not required. + + + 3) CPUID leaf 0x80000008 ECX [Size Identifiers] (Core::X86::Cpuid::SizeId) + + If neither the CPUID leaf 0x80000026 nor 0xB is supported, the number of + CPUs on the package is detected using the Size Identifier leaf + 0x80000008 ECX. + + The support for the leaf is discovered by checking if the supported + extended CPUID level is >= 0x80000008. + + The shifts from the APIC ID for the Socket ID is calculated from the + `ApicIdSize` field in `ECX[15:12]` if it is non-zero. + + If `ApicIdSize` is reported to be zero, the shift is calculated as the + order of the `number of threads` calculated from `NC` field in + `ECX[7:0]` which describes the `number of threads - 1` on the package. + + Unless Extended APIC ID is supported, the APIC ID used to find the + Socket ID is from the `LocalApicId` field of CPUID leaf 0x00000001 + `EBX[31:24]`. + + The topology parsing continues to detect if Extended APIC ID is + supported or not. + + + 4) CPUID leaf 0x8000001E [Extended APIC ID, Core Identifiers, Node Identifiers] + (Core::X86::Cpuid::{ExtApicId,CoreId,NodeId}) + + The support for Extended APIC ID can be detected by checking for the + presence of `TopologyExtensions` in `ECX[22]` of CPUID leaf 0x80000001 + [Feature Identifiers] (Core::X86::Cpuid::FeatureExtIdEcx). + + If Topology Extensions is supported, the APIC ID from `ExtendedApicId` + from CPUID leaf 0x8000001E `EAX[31:0]` should be preferred over that from + `LocalApicId` field of CPUID leaf 0x00000001 `EBX[31:24]` for topology + enumeration. + + On processors of Family 0x17 and above that do not support CPUID leaf + 0x80000026 or CPUID leaf 0xB, the shifts from the APIC ID for the Core + ID is calculated using the order of `number of threads per core` + calculated using the `ThreadsPerCore` field in `EBX[15:8]` which + describes `number of threads per core - 1`. + + On Processors of Family 0x15, the Core ID from `EBX[7:0]` is used as the + `cu_id` (Compute Unit ID) to detect CPUs that share the compute units. + + + All AMD processors that support the `TopologyExtensions` feature store the + `NodeId` from the `ECX[7:0]` of CPUID leaf 0x8000001E + (Core::X86::Cpuid::NodeId) as the per-CPU `node_id`. On older processors, + the `node_id` was discovered using MSR_FAM10H_NODE_ID MSR (MSR + 0x0xc001_100c). The presence of the NODE_ID MSR was detected by checking + `ECX[19]` of CPUID leaf 0x80000001 [Feature Identifiers] + (Core::X86::Cpuid::FeatureExtIdEcx). + + +2) Intel + + On Intel platforms, the CPUID leaves that enumerate the processor + topology are as follows: + + 1) CPUID leaf 0x1F (V2 Extended Topology Enumeration Leaf) + + The CPUID leaf 0x1F is the extension of the CPUID leaf 0xB and provides + the topology information of Core, Module, Tile, Die, DieGrp, and Socket + in each level. + + The support for the leaf is discovered by checking if the supported + CPUID level is >= 0x1F and then `EBX[31:0]` at a particular level + (starting from 0) is non-zero. + + The `Domain Type` in `ECX[15:8]` of the sub-leaf provides the topology + domain that the level describes - Core, Module, Tile, Die, DieGrp, and + Socket. + + The kernel uses the value from `EAX[4:0]` to discover the number of + bits that need to be right shifted from the `x2APIC ID` in `EDX[31:0]` + to get a unique Topology ID for the topology level. CPUs with the same + Topology ID share the resources at that level. + + If CPUID leaf 0x1F is supported, further parsing is not required. + + + 2) CPUID leaf 0x0000000B (Extended Topology Enumeration Leaf) + + The extended CPUID leaf 0x0000000B is the predecessor of the V2 Extended + Topology Enumeration Leaf 0x1F and only describes the core, and the + socket domains of the processor topology. + + The support for the leaf is iscovered by checking if the supported CPUID + level is >= 0xB and then checking if `EBX[31:0]` at a particular level + (starting from 0) is non-zero. + + CPUID leaf 0x0000000B shares the same layout as CPUID leaf 0x1F and + should be enumerated in a similar manner. + + If CPUID leaf 0xB is supported, further parsing is not required. + + + 3) CPUID leaf 0x00000004 (Deterministic Cache Parameters Leaf) + + On Intel processors that support neither CPUID leaf 0x1F, nor CPUID leaf + 0xB, the shifts for the SMT domains is calculated using the number of + CPUs sharing the L1 cache. + + Processors that feature Hyper-Threading is detected using `EDX[28]` of + CPUID leaf 0x1 (Basic CPUID Information). + + The order of `Maximum number of addressable IDs for logical processors + sharing this cache` from `EAX[25:14]` of level-0 of CPUID 0x4 provides + the shifts from the APIC ID required to compute the Core ID. + + The APIC ID and Package information is computed using the data from + CPUID leaf 0x1. + + + 4) CPUID leaf 0x00000001 (Basic CPUID Information) + + The mask and shifts to derive the Physical Package (socket) ID is + computed using the `Maximum number of addressable IDs for logical + processors in this physical package` from `EBX[23:16]` of CPUID leaf + 0x1. + + The APIC ID on the legacy platforms is derived from the `Initial APIC + ID` field from `EBX[31:24]` of CPUID leaf 0x1. + + +3) Centaur and Zhaoxin + + Similar to Intel, Centaur and Zhaoxin use a combination of CPUID leaf + 0x00000004 (Deterministic Cache Parameters Leaf) and CPUID leaf 0x00000001 + (Basic CPUID Information) to derive the topology information. + + + System topology examples ======================== |