documentation update

author: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-15 14:33:44 +0200
committer: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-15 14:38:00 +0200
commit: 2eaa6504ce9686fc111ecf14c0a6ebd61be23926 (patch)
tree: 49fa696cdc7d1d94da6e2d82e55ee1caef74a21e /doc
parent: 499e7f7bb43f72d76b698e6eeae92ddbcbcd2511 (diff)
3 files changed, 109 insertions, 39 deletions
diff --git a/doc/2d.md b/doc/2d.md
index 73172bb..6189c9f 100644
--- a/doc/2d.md
+++ b/doc/2d.md
@@ -1,21 +1,38 @@
 2D engine documentation
 ========================
 
-Important: make sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will hang.
+This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core
+(such as GC355) which is a beefed up 2D core with a completely different interface.
 
-As the complete state footprint is pretty small, it is recommended to program all relevant
-2D engine state for an operation (before flushing) at once before a command instead 
-of relying on a context to be maintained as with 3D rendering (although this is still a possibility).
+Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will
+hang on the first rendering command and nothing will seem to happen at all.
+
+As the state footprint is pretty small, it is recommended to program all relevant 2D engine state
+for an operation (before flushing) at once before a command instead of relying on a context to be
+maintained as with 3D rendering (although this is still a possibility).
+
+Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as
+Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have
+multiple cores. In the former case it is easy, just flush the caches and switch between the PIPEs,
+though there is some overhead involved. In the latter case, however, the cores run independently and
+synchronization has to go through the CPU. This necessitates either a stall or a complex queuing
+mechanism that waits for signals on both cores.
 
 2D commands
 -----------------
 
+2D commands are executed by setting the opcode in register `DE.DEST_CONFIG.COMMAND` (and other state
+as necessary) and then queuing `DRAW_2D` commands in the command stream.
+
 - Clear
 - Line
 - Bit blit
 - Stretch blit
 - Multi source blit
 
+Filter blits are also available as 2D commands, but I was unable to get this to do anything
+(I don't think the blob does either). Use the video rasterizer as described below.
+
 Video rasterizer
 -----------------
 
@@ -25,38 +42,62 @@ Video rasterizer
 
 Does hardware scaling using an arbitrary 9-tap separable filter and 5 bit subpixel precision,
 
-Input: Y/U/V planar or interleaved images or RGBA images
-Output: RGBA formats (planar may be possible too)
+Input: Y/U/V planar or Y/U/V interleaved images or RGBA images
+Output: RGBA formats (output to planar is possible too on some chips)
+
+Source and destination formats
+--------------------------
+
+    Format        Source    Destination     Notes
+    -----------------------------------------------
+    A1R5G5B5        +            +
+    A4R4G4B4        +            +
+    X1R5G5B5        +            +
+    X4R4G4B4        +            +
+    R5G6B5          +            +
+    A8R8G8B8        +            +
+    X8R8G8B8        +            +
+    A8              +            +          8-bit alpha only
+    MONOCHROME      +            -          1-bit monochrome
+    INDEX8          +            -          8-bit indexed
+    YUY2            +            ?          YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr
+    UYVY            +            ?          YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1
+    YV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
+    NV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
+    NV16            +            ?          YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling
+    RG16            +            ?          Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples
+
+Additional formats can be supported by using RGBA or UV swizzles.
 
 Monochrome blits
 -----------------
 
-Mono expansion can be used for primitive font rendering or b/w patterns such as checkerboards.
+Mono expansion can be used for primitive font rendering or black and white patterns such as
+checkerboards.
 
-When blitting from `LOCATION_STREAM` make sure that there are
-enough bytes available in the stream.
+When blitting from `LOCATION_STREAM` make sure that there are enough bytes available in the stream.
 Source size is ignored in the case of monochrome blits.
 
-Mono expansion uses registers
-`SRC_COLOR_FG` and `SRC_COLOR_BG`
-to determine the colors to use for 0 and 1 pixels respectively.
+Mono expansion uses registers `SRC_COLOR_FG` and `SRC_COLOR_BG` to determine the colors to use for 0
+and 1 pixels respectively.
 
 Restrictions:
 
-- In case of `LOCATION_STREAM` source can only draw one rectangle at a time.
+- In case of source `LOCATION_STREAM` can only draw one rectangle at a time. There is no such
+  restriction for `LOCATION_MEMORY`.
 
 Raster operations
 ------------------
-Raster operation foreground and background codes. Even though ROP
-is not used in `CLEAR`, `HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled
-`BIT_BLT`s, ROP code still has to be programmed, because the engine makes the
-decision whether source, destination and pattern are involved in the current
-operation and the correct decision is essential for the engine to complete
+Raster operation foreground and background codes. Even though ROP is not used in `CLEAR`,
+`HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled `BIT_BLTs`, ROP code still has to be
+programmed, because the engine makes the decision whether source, destination and pattern are
+involved in the current operation and the correct decision is essential for the engine to complete
 the operation as expected.
 
-ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs
-(depending on ROP type). So for a ROP3, for example, the ROP pattern will be
-2^3=8 bits.
+ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So
+for a ROP3, for example, the ROP pattern will be 2^3=8 bits.
+
+These are the input bit for the ROPs, per ROP type:
 
 `ROP2_PATTERN` [untested]
     bit 0 destination
@@ -88,32 +129,37 @@ ROP3/4 examples:
 
 Patterns
 ---------
-An repeated 8x8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`.
-This pattern can be combined with the color using ROP.
+An repeated 8×8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`.  This pattern
+can be combined with the color using ROP.
 
 Alpha blending
 ---------------
-- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by blend factor) are added.
+- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by
+  blend factor) are added.
 
-- Alpha values can come from the source/destination per pixel or a global value defined in the state.
+- Alpha values can come from the source/destination per pixel or a global value defined in the
+  state.
 
 Rotation and mirroring
 -----------------------
 
 - There are two ways to do source and destination rotation: through register `ROT_ANGLE` and through
-register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`.
-The former is more flexible and can rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported
-on every GPU (which ones?).
+  register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`.  The former is more flexible and can
+  rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU
+  (which ones?).
 
-- There are also two ways to do mirroring: though register `ROT_ANGLE` and through
-register `CONFIG` (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
-different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror blit extension").
-This is only available in very new hardware (gc880, gc2000).
+- There are also two ways to do mirroring: though register `ROT_ANGLE` and through register `CONFIG`
+  (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
+  different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror
+  blit extension").  This is only available in new hardware (gc880, gc2000).
 
 PE10/PE20
 ==========
-GPUs with feature bit `PE20` have various different features from GPUs without the bit
-(considered `PE10`). Also the registers that are used for the same features can be different. `PE20` registers
-are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to the PE type or
-it will not work.
+There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be
+distinguished by the feature bit `PE20`.
+
+GPUs with feature bit `PE20` have various different features from GPUs without the bit (considered
+`PE10`). Also the registers that are used for the same features can be different. `PE20` registers
+are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to
+the PE type or it will not work.
 
diff --git a/doc/hardware.md b/doc/hardware.md
index 669f896..44e26d0 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -13,6 +13,18 @@ for extra parallelism and/or granularity in power switching). For example the Ma
 with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
 GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 
+- State space is a 256kB (65536 times uint32) register file divided up into
+  separate units for parts of the chip (such as PE, RS, ...)
+
+- Most of the state is latched; that means if it's set to a certain value, it
+  will keep that value until the next change
+
+- Instead of programming the registers directly (which is possible from kernel
+  space), the FE, a DMA engine, is used to queue state changes for later
+
+- To perform an operation such as rendering, all the state for doing that
+  operation have been programmed to the desired values
+
 Feature bits
 =================
 
diff --git a/doc/kernel_bugs.md b/doc/kernel_bugs.md
index b93ed6c..0ccb33a 100644
--- a/doc/kernel_bugs.md
+++ b/doc/kernel_bugs.md
@@ -5,8 +5,9 @@ Race condition
 
 In event submission (from #cubox).
 
-    [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id, drops it, and then submits the command queue...
-    [20:20:02] <_rmk_> so two threads can do this...
+    [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id,
+    drops it, and then submits the command queue...  [20:20:02] <_rmk_> so two threads can do
+    this...
     [20:20:15] <_rmk_> CPU0: claim listMutex
     [20:20:20] <_rmk_> CPU0: get event ID
     [20:20:25] <_rmk_> CPU0: drop listMutex
@@ -15,7 +16,18 @@ In event submission (from #cubox).
     [20:20:41] <_rmk_> CPU1: drop listMutex
     [20:20:49] <_rmk_> CPU1: insert commands into the command queue
     [20:20:56] <_rmk_> CPU0: insert commands into the command queue
-    [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event handling
+    [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event
+    handling
 
 Status: still present
 
+Command buffer submission
+--------------------------
+
+Version: dove
+
+Submitting a lot of distinct command buffers without queuing (or waiting for) a synchronization
+signal causes the kernel to run out of signals. This causes hangs in rendering.
+
+Status: workaround found (see `ETNA_MAX_UNSIGNALED_FLUSHES`)
+
author	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-15 14:33:44 +0200
committer	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-15 14:38:00 +0200
commit	2eaa6504ce9686fc111ecf14c0a6ebd61be23926 (patch)
tree	49fa696cdc7d1d94da6e2d82e55ee1caef74a21e /doc
parent	499e7f7bb43f72d76b698e6eeae92ddbcbcd2511 (diff)