summaryrefslogtreecommitdiff
path: root/Documentation/arch/powerpc
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arch/powerpc')
-rw-r--r--Documentation/arch/powerpc/cxl.rst470
-rw-r--r--Documentation/arch/powerpc/htm.rst104
-rw-r--r--Documentation/arch/powerpc/index.rst2
-rw-r--r--Documentation/arch/powerpc/kvm-nested.rst40
4 files changed, 135 insertions, 481 deletions
diff --git a/Documentation/arch/powerpc/cxl.rst b/Documentation/arch/powerpc/cxl.rst
deleted file mode 100644
index 778adda740d2..000000000000
--- a/Documentation/arch/powerpc/cxl.rst
+++ /dev/null
@@ -1,470 +0,0 @@
-====================================
-Coherent Accelerator Interface (CXL)
-====================================
-
-Introduction
-============
-
- The coherent accelerator interface is designed to allow the
- coherent connection of accelerators (FPGAs and other devices) to a
- POWER system. These devices need to adhere to the Coherent
- Accelerator Interface Architecture (CAIA).
-
- IBM refers to this as the Coherent Accelerator Processor Interface
- or CAPI. In the kernel it's referred to by the name CXL to avoid
- confusion with the ISDN CAPI subsystem.
-
- Coherent in this context means that the accelerator and CPUs can
- both access system memory directly and with the same effective
- addresses.
-
- **This driver is deprecated and will be removed in a future release.**
-
-Hardware overview
-=================
-
- ::
-
- POWER8/9 FPGA
- +----------+ +---------+
- | | | |
- | CPU | | AFU |
- | | | |
- | | | |
- | | | |
- +----------+ +---------+
- | PHB | | |
- | +------+ | PSL |
- | | CAPP |<------>| |
- +---+------+ PCIE +---------+
-
- The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
- unit which is part of the PCIe Host Bridge (PHB). This is managed
- by Linux by calls into OPAL. Linux doesn't directly program the
- CAPP.
-
- The FPGA (or coherently attached device) consists of two parts.
- The POWER Service Layer (PSL) and the Accelerator Function Unit
- (AFU). The AFU is used to implement specific functionality behind
- the PSL. The PSL, among other things, provides memory address
- translation services to allow each AFU direct access to userspace
- memory.
-
- The AFU is the core part of the accelerator (eg. the compression,
- crypto etc function). The kernel has no knowledge of the function
- of the AFU. Only userspace interacts directly with the AFU.
-
- The PSL provides the translation and interrupt services that the
- AFU needs. This is what the kernel interacts with. For example, if
- the AFU needs to read a particular effective address, it sends
- that address to the PSL, the PSL then translates it, fetches the
- data from memory and returns it to the AFU. If the PSL has a
- translation miss, it interrupts the kernel and the kernel services
- the fault. The context to which this fault is serviced is based on
- who owns that acceleration function.
-
- - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
- - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
-
- This PSL Version 9 provides new features such as:
-
- * Interaction with the nest MMU on the P9 chip.
- * Native DMA support.
- * Supports sending ASB_Notify messages for host thread wakeup.
- * Supports Atomic operations.
- * etc.
-
- Cards with a PSL9 won't work on a POWER8 system and cards with a
- PSL8 won't work on a POWER9 system.
-
-AFU Modes
-=========
-
- There are two programming modes supported by the AFU. Dedicated
- and AFU directed. AFU may support one or both modes.
-
- When using dedicated mode only one MMU context is supported. In
- this mode, only one userspace process can use the accelerator at
- time.
-
- When using AFU directed mode, up to 16K simultaneous contexts can
- be supported. This means up to 16K simultaneous userspace
- applications may use the accelerator (although specific AFUs may
- support fewer). In this mode, the AFU sends a 16 bit context ID
- with each of its requests. This tells the PSL which context is
- associated with each operation. If the PSL can't translate an
- operation, the ID can also be accessed by the kernel so it can
- determine the userspace context associated with an operation.
-
-
-MMIO space
-==========
-
- A portion of the accelerator MMIO space can be directly mapped
- from the AFU to userspace. Either the whole space can be mapped or
- just a per context portion. The hardware is self describing, hence
- the kernel can determine the offset and size of the per context
- portion.
-
-
-Interrupts
-==========
-
- AFUs may generate interrupts that are destined for userspace. These
- are received by the kernel as hardware interrupts and passed onto
- userspace by a read syscall documented below.
-
- Data storage faults and error interrupts are handled by the kernel
- driver.
-
-
-Work Element Descriptor (WED)
-=============================
-
- The WED is a 64-bit parameter passed to the AFU when a context is
- started. Its format is up to the AFU hence the kernel has no
- knowledge of what it represents. Typically it will be the
- effective address of a work queue or status block where the AFU
- and userspace can share control and status information.
-
-
-
-
-User API
-========
-
-1. AFU character devices
-^^^^^^^^^^^^^^^^^^^^^^^^
-
- For AFUs operating in AFU directed mode, two character device
- files will be created. /dev/cxl/afu0.0m will correspond to a
- master context and /dev/cxl/afu0.0s will correspond to a slave
- context. Master contexts have access to the full MMIO space an
- AFU provides. Slave contexts have access to only the per process
- MMIO space an AFU provides.
-
- For AFUs operating in dedicated process mode, the driver will
- only create a single character device per AFU called
- /dev/cxl/afu0.0d. This will have access to the entire MMIO space
- that the AFU provides (like master contexts in AFU directed).
-
- The types described below are defined in include/uapi/misc/cxl.h
-
- The following file operations are supported on both slave and
- master devices.
-
- A userspace library libcxl is available here:
-
- https://github.com/ibm-capi/libcxl
-
- This provides a C interface to this kernel API.
-
-open
-----
-
- Opens the device and allocates a file descriptor to be used with
- the rest of the API.
-
- A dedicated mode AFU only has one context and only allows the
- device to be opened once.
-
- An AFU directed mode AFU can have many contexts, the device can be
- opened once for each context that is available.
-
- When all available contexts are allocated the open call will fail
- and return -ENOSPC.
-
- Note:
- IRQs need to be allocated for each context, which may limit
- the number of contexts that can be created, and therefore
- how many times the device can be opened. The POWER8 CAPP
- supports 2040 IRQs and 3 are used by the kernel, so 2037 are
- left. If 1 IRQ is needed per context, then only 2037
- contexts can be allocated. If 4 IRQs are needed per context,
- then only 2037/4 = 509 contexts can be allocated.
-
-
-ioctl
------
-
- CXL_IOCTL_START_WORK:
- Starts the AFU context and associates it with the current
- process. Once this ioctl is successfully executed, all memory
- mapped into this process is accessible to this AFU context
- using the same effective addresses. No additional calls are
- required to map/unmap memory. The AFU memory context will be
- updated as userspace allocates and frees memory. This ioctl
- returns once the AFU context is started.
-
- Takes a pointer to a struct cxl_ioctl_start_work
-
- ::
-
- struct cxl_ioctl_start_work {
- __u64 flags;
- __u64 work_element_descriptor;
- __u64 amr;
- __s16 num_interrupts;
- __s16 reserved1;
- __s32 reserved2;
- __u64 reserved3;
- __u64 reserved4;
- __u64 reserved5;
- __u64 reserved6;
- };
-
- flags:
- Indicates which optional fields in the structure are
- valid.
-
- work_element_descriptor:
- The Work Element Descriptor (WED) is a 64-bit argument
- defined by the AFU. Typically this is an effective
- address pointing to an AFU specific structure
- describing what work to perform.
-
- amr:
- Authority Mask Register (AMR), same as the powerpc
- AMR. This field is only used by the kernel when the
- corresponding CXL_START_WORK_AMR value is specified in
- flags. If not specified the kernel will use a default
- value of 0.
-
- num_interrupts:
- Number of userspace interrupts to request. This field
- is only used by the kernel when the corresponding
- CXL_START_WORK_NUM_IRQS value is specified in flags.
- If not specified the minimum number required by the
- AFU will be allocated. The min and max number can be
- obtained from sysfs.
-
- reserved fields:
- For ABI padding and future extensions
-
- CXL_IOCTL_GET_PROCESS_ELEMENT:
- Get the current context id, also known as the process element.
- The value is returned from the kernel as a __u32.
-
-
-mmap
-----
-
- An AFU may have an MMIO space to facilitate communication with the
- AFU. If it does, the MMIO space can be accessed via mmap. The size
- and contents of this area are specific to the particular AFU. The
- size can be discovered via sysfs.
-
- In AFU directed mode, master contexts are allowed to map all of
- the MMIO space and slave contexts are allowed to only map the per
- process MMIO space associated with the context. In dedicated
- process mode the entire MMIO space can always be mapped.
-
- This mmap call must be done after the START_WORK ioctl.
-
- Care should be taken when accessing MMIO space. Only 32 and 64-bit
- accesses are supported by POWER8. Also, the AFU will be designed
- with a specific endianness, so all MMIO accesses should consider
- endianness (recommend endian(3) variants like: le64toh(),
- be64toh() etc). These endian issues equally apply to shared memory
- queues the WED may describe.
-
-
-read
-----
-
- Reads events from the AFU. Blocks if no events are pending
- (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
- unrecoverable error or if the card is removed.
-
- read() will always return an integral number of events.
-
- The buffer passed to read() must be at least 4K bytes.
-
- The result of the read will be a buffer of one or more events,
- each event is of type struct cxl_event, of varying size::
-
- struct cxl_event {
- struct cxl_event_header header;
- union {
- struct cxl_event_afu_interrupt irq;
- struct cxl_event_data_storage fault;
- struct cxl_event_afu_error afu_error;
- };
- };
-
- The struct cxl_event_header is defined as
-
- ::
-
- struct cxl_event_header {
- __u16 type;
- __u16 size;
- __u16 process_element;
- __u16 reserved1;
- };
-
- type:
- This defines the type of event. The type determines how
- the rest of the event is structured. These types are
- described below and defined by enum cxl_event_type.
-
- size:
- This is the size of the event in bytes including the
- struct cxl_event_header. The start of the next event can
- be found at this offset from the start of the current
- event.
-
- process_element:
- Context ID of the event.
-
- reserved field:
- For future extensions and padding.
-
- If the event type is CXL_EVENT_AFU_INTERRUPT then the event
- structure is defined as
-
- ::
-
- struct cxl_event_afu_interrupt {
- __u16 flags;
- __u16 irq; /* Raised AFU interrupt number */
- __u32 reserved1;
- };
-
- flags:
- These flags indicate which optional fields are present
- in this struct. Currently all fields are mandatory.
-
- irq:
- The IRQ number sent by the AFU.
-
- reserved field:
- For future extensions and padding.
-
- If the event type is CXL_EVENT_DATA_STORAGE then the event
- structure is defined as
-
- ::
-
- struct cxl_event_data_storage {
- __u16 flags;
- __u16 reserved1;
- __u32 reserved2;
- __u64 addr;
- __u64 dsisr;
- __u64 reserved3;
- };
-
- flags:
- These flags indicate which optional fields are present in
- this struct. Currently all fields are mandatory.
-
- address:
- The address that the AFU unsuccessfully attempted to
- access. Valid accesses will be handled transparently by the
- kernel but invalid accesses will generate this event.
-
- dsisr:
- This field gives information on the type of fault. It is a
- copy of the DSISR from the PSL hardware when the address
- fault occurred. The form of the DSISR is as defined in the
- CAIA.
-
- reserved fields:
- For future extensions
-
- If the event type is CXL_EVENT_AFU_ERROR then the event structure
- is defined as
-
- ::
-
- struct cxl_event_afu_error {
- __u16 flags;
- __u16 reserved1;
- __u32 reserved2;
- __u64 error;
- };
-
- flags:
- These flags indicate which optional fields are present in
- this struct. Currently all fields are Mandatory.
-
- error:
- Error status from the AFU. Defined by the AFU.
-
- reserved fields:
- For future extensions and padding
-
-
-2. Card character device (powerVM guest only)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- In a powerVM guest, an extra character device is created for the
- card. The device is only used to write (flash) a new image on the
- FPGA accelerator. Once the image is written and verified, the
- device tree is updated and the card is reset to reload the updated
- image.
-
-open
-----
-
- Opens the device and allocates a file descriptor to be used with
- the rest of the API. The device can only be opened once.
-
-ioctl
------
-
-CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
- Starts and controls flashing a new FPGA image. Partial
- reconfiguration is not supported (yet), so the image must contain
- a copy of the PSL and AFU(s). Since an image can be quite large,
- the caller may have to iterate, splitting the image in smaller
- chunks.
-
- Takes a pointer to a struct cxl_adapter_image::
-
- struct cxl_adapter_image {
- __u64 flags;
- __u64 data;
- __u64 len_data;
- __u64 len_image;
- __u64 reserved1;
- __u64 reserved2;
- __u64 reserved3;
- __u64 reserved4;
- };
-
- flags:
- These flags indicate which optional fields are present in
- this struct. Currently all fields are mandatory.
-
- data:
- Pointer to a buffer with part of the image to write to the
- card.
-
- len_data:
- Size of the buffer pointed to by data.
-
- len_image:
- Full size of the image.
-
-
-Sysfs Class
-===========
-
- A cxl sysfs class is added under /sys/class/cxl to facilitate
- enumeration and tuning of the accelerators. Its layout is
- described in Documentation/ABI/obsolete/sysfs-class-cxl
-
-
-Udev rules
-==========
-
- The following udev rules could be used to create a symlink to the
- most logical chardev to use in any programming mode (afuX.Yd for
- dedicated, afuX.Ys for afu directed), since the API is virtually
- identical for each::
-
- SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
- SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
- KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"
diff --git a/Documentation/arch/powerpc/htm.rst b/Documentation/arch/powerpc/htm.rst
new file mode 100644
index 000000000000..fcb4eb6306b1
--- /dev/null
+++ b/Documentation/arch/powerpc/htm.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _htm:
+
+===================================
+HTM (Hardware Trace Macro)
+===================================
+
+Athira Rajeev, 2 Mar 2025
+
+.. contents::
+ :depth: 3
+
+
+Basic overview
+==============
+
+H_HTM is used as an interface for executing Hardware Trace Macro (HTM)
+functions, including setup, configuration, control and dumping of the HTM data.
+For using HTM, it is required to setup HTM buffers and HTM operations can
+be controlled using the H_HTM hcall. The hcall can be invoked for any core/chip
+of the system from within a partition itself. To use this feature, a debugfs
+folder called "htmdump" is present under /sys/kernel/debug/powerpc.
+
+
+HTM debugfs example usage
+=========================
+
+.. code-block:: sh
+
+ # ls /sys/kernel/debug/powerpc/htmdump/
+ coreindexonchip htmcaps htmconfigure htmflags htminfo htmsetup
+ htmstart htmstatus htmtype nodalchipindex nodeindex trace
+
+Details on each file:
+
+* nodeindex, nodalchipindex, coreindexonchip specifies which partition to configure the HTM for.
+* htmtype: specifies the type of HTM. Supported target is hardwareTarget.
+* trace: is to read the HTM data.
+* htmconfigure: Configure/Deconfigure the HTM. Writing 1 to the file will configure the trace, writing 0 to the file will do deconfigure.
+* htmstart: start/Stop the HTM. Writing 1 to the file will start the tracing, writing 0 to the file will stop the tracing.
+* htmstatus: get the status of HTM. This is needed to understand the HTM state after each operation.
+* htmsetup: set the HTM buffer size. Size of HTM buffer is in power of 2
+* htminfo: provides the system processor configuration details. This is needed to understand the appropriate values for nodeindex, nodalchipindex, coreindexonchip.
+* htmcaps : provides the HTM capabilities like minimum/maximum buffer size, what kind of tracing the HTM supports etc.
+* htmflags : allows to pass flags to hcall. Currently supports controlling the wrapping of HTM buffer.
+
+To see the system processor configuration details:
+
+.. code-block:: sh
+
+ # cat /sys/kernel/debug/powerpc/htmdump/htminfo > htminfo_file
+
+The result can be interpreted using hexdump.
+
+To collect HTM traces for a partition represented by nodeindex as
+zero, nodalchipindex as 1 and coreindexonchip as 12
+
+.. code-block:: sh
+
+ # cd /sys/kernel/debug/powerpc/htmdump/
+ # echo 2 > htmtype
+ # echo 33 > htmsetup ( sets 8GB memory for HTM buffer, number is size in power of 2 )
+
+This requires a CEC reboot to get the HTM buffers allocated.
+
+.. code-block:: sh
+
+ # cd /sys/kernel/debug/powerpc/htmdump/
+ # echo 2 > htmtype
+ # echo 0 > nodeindex
+ # echo 1 > nodalchipindex
+ # echo 12 > coreindexonchip
+ # echo 1 > htmflags # to set noWrap for HTM buffers
+ # echo 1 > htmconfigure # Configure the HTM
+ # echo 1 > htmstart # Start the HTM
+ # echo 0 > htmstart # Stop the HTM
+ # echo 0 > htmconfigure # Deconfigure the HTM
+ # cat htmstatus # Dump the status of HTM entries as data
+
+Above will set the htmtype and core details, followed by executing respective HTM operation.
+
+Read the HTM trace data
+========================
+
+After starting the trace collection, run the workload
+of interest. Stop the trace collection after required period
+of time, and read the trace file.
+
+.. code-block:: sh
+
+ # cat /sys/kernel/debug/powerpc/htmdump/trace > trace_file
+
+This trace file will contain the relevant instruction traces
+collected during the workload execution. And can be used as
+input file for trace decoders to understand data.
+
+Benefits of using HTM debugfs interface
+=======================================
+
+It is now possible to collect traces for a particular core/chip
+from within any partition of the system and decode it. Through
+this enablement, a small partition can be dedicated to collect the
+trace data and analyze to provide important information for Performance
+analysis, Software tuning, or Hardware debug.
diff --git a/Documentation/arch/powerpc/index.rst b/Documentation/arch/powerpc/index.rst
index 995268530f21..53fc9f89f3e4 100644
--- a/Documentation/arch/powerpc/index.rst
+++ b/Documentation/arch/powerpc/index.rst
@@ -12,7 +12,6 @@ powerpc
bootwrapper
cpu_families
cpu_features
- cxl
dawr-power9
dexcr
dscr
@@ -20,6 +19,7 @@ powerpc
elf_hwcaps
elfnote
firmware-assisted-dump
+ htm
hvcs
imc
isa-versions
diff --git a/Documentation/arch/powerpc/kvm-nested.rst b/Documentation/arch/powerpc/kvm-nested.rst
index 5defd13cc6c1..574592505604 100644
--- a/Documentation/arch/powerpc/kvm-nested.rst
+++ b/Documentation/arch/powerpc/kvm-nested.rst
@@ -208,13 +208,9 @@ associated values for each ID in the GSB::
flags:
Bit 0: getGuestWideState: Request state of the Guest instead
of an individual VCPU.
- Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
- over ownership of the VCPU state and that the L0 can free
- the storage holding the state. The VCPU state will need to
- be returned to the Hypervisor via H_GUEST_SET_STATE prior
- to H_GUEST_RUN_VCPU being called for this VCPU. The data
- returned in the dataBuffer is in a Hypervisor internal
- format.
+ Bit 1: getHostWideState: Request stats of the Host. This causes
+ the guestId and vcpuId parameters to be ignored and attempting
+ to get the VCPU/Guest state will cause an error.
Bits 2-63: Reserved
guestId: ID obtained from H_GUEST_CREATE
vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
@@ -406,9 +402,10 @@ the partition like the timebase offset and partition scoped page
table information.
+--------+-------+----+--------+----------------------------------+
-| ID | Size | RW | Thread | Details |
-| | Bytes | | Guest | |
-| | | | Scope | |
+| ID | Size | RW |(H)ost | Details |
+| | Bytes | |(G)uest | |
+| | | |(T)hread| |
+| | | |Scope | |
+========+=======+====+========+==================================+
| 0x0000 | | RW | TG | NOP element |
+--------+-------+----+--------+----------------------------------+
@@ -434,6 +431,29 @@ table information.
| | | | |- 0x8 Table size. |
+--------+-------+----+--------+----------------------------------+
| 0x0007-| | | | Reserved |
+| 0x07FF | | | | |
++--------+-------+----+--------+----------------------------------+
+| 0x0800 | 0x08 | R | H | Current usage in bytes of the |
+| | | | | L0's Guest Management Space |
+| | | | | for an L1-Lpar. |
++--------+-------+----+--------+----------------------------------+
+| 0x0801 | 0x08 | R | H | Max bytes available in the |
+| | | | | L0's Guest Management Space for |
+| | | | | an L1-Lpar |
++--------+-------+----+--------+----------------------------------+
+| 0x0802 | 0x08 | R | H | Current usage in bytes of the |
+| | | | | L0's Guest Page Table Management |
+| | | | | Space for an L1-Lpar |
++--------+-------+----+--------+----------------------------------+
+| 0x0803 | 0x08 | R | H | Max bytes available in the L0's |
+| | | | | Guest Page Table Management |
+| | | | | Space for an L1-Lpar |
++--------+-------+----+--------+----------------------------------+
+| 0x0804 | 0x08 | R | H | Cumulative Reclaimed bytes from |
+| | | | | L0 Guest's Page Table Management |
+| | | | | Space due to overcommit |
++--------+-------+----+--------+----------------------------------+
+| 0x0805-| | | | Reserved |
| 0x0BFF | | | | |
+--------+-------+----+--------+----------------------------------+
| 0x0C00 | 0x10 | RW | T |Run vCPU Input Buffer: |