summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorWladimir J. van der Laan <laanwj@gmail.com>2013-08-15 14:33:44 +0200
committerWladimir J. van der Laan <laanwj@gmail.com>2013-08-15 14:38:00 +0200
commit2eaa6504ce9686fc111ecf14c0a6ebd61be23926 (patch)
tree49fa696cdc7d1d94da6e2d82e55ee1caef74a21e /doc
parent499e7f7bb43f72d76b698e6eeae92ddbcbcd2511 (diff)
documentation update
Diffstat (limited to 'doc')
-rw-r--r--doc/2d.md118
-rw-r--r--doc/hardware.md12
-rw-r--r--doc/kernel_bugs.md18
3 files changed, 109 insertions, 39 deletions
diff --git a/doc/2d.md b/doc/2d.md
index 73172bb..6189c9f 100644
--- a/doc/2d.md
+++ b/doc/2d.md
@@ -1,21 +1,38 @@
2D engine documentation
========================
-Important: make sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will hang.
+This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core
+(such as GC355) which is a beefed up 2D core with a completely different interface.
-As the complete state footprint is pretty small, it is recommended to program all relevant
-2D engine state for an operation (before flushing) at once before a command instead
-of relying on a context to be maintained as with 3D rendering (although this is still a possibility).
+Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will
+hang on the first rendering command and nothing will seem to happen at all.
+
+As the state footprint is pretty small, it is recommended to program all relevant 2D engine state
+for an operation (before flushing) at once before a command instead of relying on a context to be
+maintained as with 3D rendering (although this is still a possibility).
+
+Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as
+Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have
+multiple cores. In the former case it is easy, just flush the caches and switch between the PIPEs,
+though there is some overhead involved. In the latter case, however, the cores run independently and
+synchronization has to go through the CPU. This necessitates either a stall or a complex queuing
+mechanism that waits for signals on both cores.
2D commands
-----------------
+2D commands are executed by setting the opcode in register `DE.DEST_CONFIG.COMMAND` (and other state
+as necessary) and then queuing `DRAW_2D` commands in the command stream.
+
- Clear
- Line
- Bit blit
- Stretch blit
- Multi source blit
+Filter blits are also available as 2D commands, but I was unable to get this to do anything
+(I don't think the blob does either). Use the video rasterizer as described below.
+
Video rasterizer
-----------------
@@ -25,38 +42,62 @@ Video rasterizer
Does hardware scaling using an arbitrary 9-tap separable filter and 5 bit subpixel precision,
-Input: Y/U/V planar or interleaved images or RGBA images
-Output: RGBA formats (planar may be possible too)
+Input: Y/U/V planar or Y/U/V interleaved images or RGBA images
+Output: RGBA formats (output to planar is possible too on some chips)
+
+Source and destination formats
+--------------------------
+
+ Format Source Destination Notes
+ -----------------------------------------------
+ A1R5G5B5 + +
+ A4R4G4B4 + +
+ X1R5G5B5 + +
+ X4R4G4B4 + +
+ R5G6B5 + +
+ A8R8G8B8 + +
+ X8R8G8B8 + +
+ A8 + + 8-bit alpha only
+ MONOCHROME + - 1-bit monochrome
+ INDEX8 + - 8-bit indexed
+ YUY2 + ? YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr
+ UYVY + ? YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1
+ YV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
+ NV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
+ NV16 + ? YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling
+ RG16 + ? Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples
+
+Additional formats can be supported by using RGBA or UV swizzles.
Monochrome blits
-----------------
-Mono expansion can be used for primitive font rendering or b/w patterns such as checkerboards.
+Mono expansion can be used for primitive font rendering or black and white patterns such as
+checkerboards.
-When blitting from `LOCATION_STREAM` make sure that there are
-enough bytes available in the stream.
+When blitting from `LOCATION_STREAM` make sure that there are enough bytes available in the stream.
Source size is ignored in the case of monochrome blits.
-Mono expansion uses registers
-`SRC_COLOR_FG` and `SRC_COLOR_BG`
-to determine the colors to use for 0 and 1 pixels respectively.
+Mono expansion uses registers `SRC_COLOR_FG` and `SRC_COLOR_BG` to determine the colors to use for 0
+and 1 pixels respectively.
Restrictions:
-- In case of `LOCATION_STREAM` source can only draw one rectangle at a time.
+- In case of source `LOCATION_STREAM` can only draw one rectangle at a time. There is no such
+ restriction for `LOCATION_MEMORY`.
Raster operations
------------------
-Raster operation foreground and background codes. Even though ROP
-is not used in `CLEAR`, `HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled
-`BIT_BLT`s, ROP code still has to be programmed, because the engine makes the
-decision whether source, destination and pattern are involved in the current
-operation and the correct decision is essential for the engine to complete
+Raster operation foreground and background codes. Even though ROP is not used in `CLEAR`,
+`HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled `BIT_BLTs`, ROP code still has to be
+programmed, because the engine makes the decision whether source, destination and pattern are
+involved in the current operation and the correct decision is essential for the engine to complete
the operation as expected.
-ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs
-(depending on ROP type). So for a ROP3, for example, the ROP pattern will be
-2^3=8 bits.
+ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So
+for a ROP3, for example, the ROP pattern will be 2^3=8 bits.
+
+These are the input bit for the ROPs, per ROP type:
`ROP2_PATTERN` [untested]
bit 0 destination
@@ -88,32 +129,37 @@ ROP3/4 examples:
Patterns
---------
-An repeated 8x8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`.
-This pattern can be combined with the color using ROP.
+An repeated 8×8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`. This pattern
+can be combined with the color using ROP.
Alpha blending
---------------
-- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by blend factor) are added.
+- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by
+ blend factor) are added.
-- Alpha values can come from the source/destination per pixel or a global value defined in the state.
+- Alpha values can come from the source/destination per pixel or a global value defined in the
+ state.
Rotation and mirroring
-----------------------
- There are two ways to do source and destination rotation: through register `ROT_ANGLE` and through
-register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`.
-The former is more flexible and can rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported
-on every GPU (which ones?).
+ register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`. The former is more flexible and can
+ rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU
+ (which ones?).
-- There are also two ways to do mirroring: though register `ROT_ANGLE` and through
-register `CONFIG` (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
-different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror blit extension").
-This is only available in very new hardware (gc880, gc2000).
+- There are also two ways to do mirroring: though register `ROT_ANGLE` and through register `CONFIG`
+ (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
+ different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror
+ blit extension"). This is only available in new hardware (gc880, gc2000).
PE10/PE20
==========
-GPUs with feature bit `PE20` have various different features from GPUs without the bit
-(considered `PE10`). Also the registers that are used for the same features can be different. `PE20` registers
-are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to the PE type or
-it will not work.
+There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be
+distinguished by the feature bit `PE20`.
+
+GPUs with feature bit `PE20` have various different features from GPUs without the bit (considered
+`PE10`). Also the registers that are used for the same features can be different. `PE20` registers
+are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to
+the PE type or it will not work.
diff --git a/doc/hardware.md b/doc/hardware.md
index 669f896..44e26d0 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -13,6 +13,18 @@ for extra parallelism and/or granularity in power switching). For example the Ma
with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
+- State space is a 256kB (65536 times uint32) register file divided up into
+ separate units for parts of the chip (such as PE, RS, ...)
+
+- Most of the state is latched; that means if it's set to a certain value, it
+ will keep that value until the next change
+
+- Instead of programming the registers directly (which is possible from kernel
+ space), the FE, a DMA engine, is used to queue state changes for later
+
+- To perform an operation such as rendering, all the state for doing that
+ operation have been programmed to the desired values
+
Feature bits
=================
diff --git a/doc/kernel_bugs.md b/doc/kernel_bugs.md
index b93ed6c..0ccb33a 100644
--- a/doc/kernel_bugs.md
+++ b/doc/kernel_bugs.md
@@ -5,8 +5,9 @@ Race condition
In event submission (from #cubox).
- [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id, drops it, and then submits the command queue...
- [20:20:02] <_rmk_> so two threads can do this...
+ [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id,
+ drops it, and then submits the command queue... [20:20:02] <_rmk_> so two threads can do
+ this...
[20:20:15] <_rmk_> CPU0: claim listMutex
[20:20:20] <_rmk_> CPU0: get event ID
[20:20:25] <_rmk_> CPU0: drop listMutex
@@ -15,7 +16,18 @@ In event submission (from #cubox).
[20:20:41] <_rmk_> CPU1: drop listMutex
[20:20:49] <_rmk_> CPU1: insert commands into the command queue
[20:20:56] <_rmk_> CPU0: insert commands into the command queue
- [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event handling
+ [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event
+ handling
Status: still present
+Command buffer submission
+--------------------------
+
+Version: dove
+
+Submitting a lot of distinct command buffers without queuing (or waiting for) a synchronization
+signal causes the kernel to run out of signals. This causes hangs in rendering.
+
+Status: workaround found (see `ETNA_MAX_UNSIGNALED_FLUSHES`)
+