diff options
author | Wladimir J. van der Laan <laanwj@gmail.com> | 2013-08-15 14:33:44 +0200 |
---|---|---|
committer | Wladimir J. van der Laan <laanwj@gmail.com> | 2013-08-15 14:38:00 +0200 |
commit | 2eaa6504ce9686fc111ecf14c0a6ebd61be23926 (patch) | |
tree | 49fa696cdc7d1d94da6e2d82e55ee1caef74a21e /doc | |
parent | 499e7f7bb43f72d76b698e6eeae92ddbcbcd2511 (diff) |
documentation update
Diffstat (limited to 'doc')
-rw-r--r-- | doc/2d.md | 118 | ||||
-rw-r--r-- | doc/hardware.md | 12 | ||||
-rw-r--r-- | doc/kernel_bugs.md | 18 |
3 files changed, 109 insertions, 39 deletions
@@ -1,21 +1,38 @@ 2D engine documentation ======================== -Important: make sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will hang. +This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core +(such as GC355) which is a beefed up 2D core with a completely different interface. -As the complete state footprint is pretty small, it is recommended to program all relevant -2D engine state for an operation (before flushing) at once before a command instead -of relying on a context to be maintained as with 3D rendering (although this is still a possibility). +Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will +hang on the first rendering command and nothing will seem to happen at all. + +As the state footprint is pretty small, it is recommended to program all relevant 2D engine state +for an operation (before flushing) at once before a command instead of relying on a context to be +maintained as with 3D rendering (although this is still a possibility). + +Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as +Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have +multiple cores. In the former case it is easy, just flush the caches and switch between the PIPEs, +though there is some overhead involved. In the latter case, however, the cores run independently and +synchronization has to go through the CPU. This necessitates either a stall or a complex queuing +mechanism that waits for signals on both cores. 2D commands ----------------- +2D commands are executed by setting the opcode in register `DE.DEST_CONFIG.COMMAND` (and other state +as necessary) and then queuing `DRAW_2D` commands in the command stream. + - Clear - Line - Bit blit - Stretch blit - Multi source blit +Filter blits are also available as 2D commands, but I was unable to get this to do anything +(I don't think the blob does either). Use the video rasterizer as described below. + Video rasterizer ----------------- @@ -25,38 +42,62 @@ Video rasterizer Does hardware scaling using an arbitrary 9-tap separable filter and 5 bit subpixel precision, -Input: Y/U/V planar or interleaved images or RGBA images -Output: RGBA formats (planar may be possible too) +Input: Y/U/V planar or Y/U/V interleaved images or RGBA images +Output: RGBA formats (output to planar is possible too on some chips) + +Source and destination formats +-------------------------- + + Format Source Destination Notes + ----------------------------------------------- + A1R5G5B5 + + + A4R4G4B4 + + + X1R5G5B5 + + + X4R4G4B4 + + + R5G6B5 + + + A8R8G8B8 + + + X8R8G8B8 + + + A8 + + 8-bit alpha only + MONOCHROME + - 1-bit monochrome + INDEX8 + - 8-bit indexed + YUY2 + ? YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr + UYVY + ? YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1 + YV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes + NV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes + NV16 + ? YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling + RG16 + ? Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples + +Additional formats can be supported by using RGBA or UV swizzles. Monochrome blits ----------------- -Mono expansion can be used for primitive font rendering or b/w patterns such as checkerboards. +Mono expansion can be used for primitive font rendering or black and white patterns such as +checkerboards. -When blitting from `LOCATION_STREAM` make sure that there are -enough bytes available in the stream. +When blitting from `LOCATION_STREAM` make sure that there are enough bytes available in the stream. Source size is ignored in the case of monochrome blits. -Mono expansion uses registers -`SRC_COLOR_FG` and `SRC_COLOR_BG` -to determine the colors to use for 0 and 1 pixels respectively. +Mono expansion uses registers `SRC_COLOR_FG` and `SRC_COLOR_BG` to determine the colors to use for 0 +and 1 pixels respectively. Restrictions: -- In case of `LOCATION_STREAM` source can only draw one rectangle at a time. +- In case of source `LOCATION_STREAM` can only draw one rectangle at a time. There is no such + restriction for `LOCATION_MEMORY`. Raster operations ------------------ -Raster operation foreground and background codes. Even though ROP -is not used in `CLEAR`, `HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled -`BIT_BLT`s, ROP code still has to be programmed, because the engine makes the -decision whether source, destination and pattern are involved in the current -operation and the correct decision is essential for the engine to complete +Raster operation foreground and background codes. Even though ROP is not used in `CLEAR`, +`HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled `BIT_BLTs`, ROP code still has to be +programmed, because the engine makes the decision whether source, destination and pattern are +involved in the current operation and the correct decision is essential for the engine to complete the operation as expected. -ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs -(depending on ROP type). So for a ROP3, for example, the ROP pattern will be -2^3=8 bits. +ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So +for a ROP3, for example, the ROP pattern will be 2^3=8 bits. + +These are the input bit for the ROPs, per ROP type: `ROP2_PATTERN` [untested] bit 0 destination @@ -88,32 +129,37 @@ ROP3/4 examples: Patterns --------- -An repeated 8x8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`. -This pattern can be combined with the color using ROP. +An repeated 8×8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`. This pattern +can be combined with the color using ROP. Alpha blending --------------- -- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by blend factor) are added. +- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by + blend factor) are added. -- Alpha values can come from the source/destination per pixel or a global value defined in the state. +- Alpha values can come from the source/destination per pixel or a global value defined in the + state. Rotation and mirroring ----------------------- - There are two ways to do source and destination rotation: through register `ROT_ANGLE` and through -register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`. -The former is more flexible and can rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported -on every GPU (which ones?). + register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`. The former is more flexible and can + rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU + (which ones?). -- There are also two ways to do mirroring: though register `ROT_ANGLE` and through -register `CONFIG` (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be -different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror blit extension"). -This is only available in very new hardware (gc880, gc2000). +- There are also two ways to do mirroring: though register `ROT_ANGLE` and through register `CONFIG` + (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be + different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror + blit extension"). This is only available in new hardware (gc880, gc2000). PE10/PE20 ========== -GPUs with feature bit `PE20` have various different features from GPUs without the bit -(considered `PE10`). Also the registers that are used for the same features can be different. `PE20` registers -are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to the PE type or -it will not work. +There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be +distinguished by the feature bit `PE20`. + +GPUs with feature bit `PE20` have various different features from GPUs without the bit (considered +`PE10`). Also the registers that are used for the same features can be different. `PE20` registers +are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to +the PE type or it will not work. diff --git a/doc/hardware.md b/doc/hardware.md index 669f896..44e26d0 100644 --- a/doc/hardware.md +++ b/doc/hardware.md @@ -13,6 +13,18 @@ for extra parallelism and/or granularity in power switching). For example the Ma with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine. +- State space is a 256kB (65536 times uint32) register file divided up into + separate units for parts of the chip (such as PE, RS, ...) + +- Most of the state is latched; that means if it's set to a certain value, it + will keep that value until the next change + +- Instead of programming the registers directly (which is possible from kernel + space), the FE, a DMA engine, is used to queue state changes for later + +- To perform an operation such as rendering, all the state for doing that + operation have been programmed to the desired values + Feature bits ================= diff --git a/doc/kernel_bugs.md b/doc/kernel_bugs.md index b93ed6c..0ccb33a 100644 --- a/doc/kernel_bugs.md +++ b/doc/kernel_bugs.md @@ -5,8 +5,9 @@ Race condition In event submission (from #cubox). - [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id, drops it, and then submits the command queue... - [20:20:02] <_rmk_> so two threads can do this... + [20:19:57] <_rmk_> so, gckEVENT_Submit claims the event listMutex... allocates an event id, + drops it, and then submits the command queue... [20:20:02] <_rmk_> so two threads can do + this... [20:20:15] <_rmk_> CPU0: claim listMutex [20:20:20] <_rmk_> CPU0: get event ID [20:20:25] <_rmk_> CPU0: drop listMutex @@ -15,7 +16,18 @@ In event submission (from #cubox). [20:20:41] <_rmk_> CPU1: drop listMutex [20:20:49] <_rmk_> CPU1: insert commands into the command queue [20:20:56] <_rmk_> CPU0: insert commands into the command queue - [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event handling + [20:21:16] <_rmk_> and then we have the second event due to fire first, which upsets the event + handling Status: still present +Command buffer submission +-------------------------- + +Version: dove + +Submitting a lot of distinct command buffers without queuing (or waiting for) a synchronization +signal causes the kernel to run out of signals. This causes hangs in rendering. + +Status: workaround found (see `ETNA_MAX_UNSIGNALED_FLUSHES`) + |