summaryrefslogtreecommitdiff
path: root/doc/2d.md
blob: 74aaa6f18a5bcd6892d66c6f205973066ac0fac0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
2D engine documentation
========================

This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core
(such as GC355) which is a beefed up 2D core with a completely different interface.

Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will
hang on the first rendering command and nothing will seem to happen at all.

As the state footprint is pretty small, it is recommended to program all relevant 2D engine state
for an operation (before flushing) at once before a command instead of relying on a context to be
maintained as with 3D rendering (although this is still a possibility).

Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as
Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have
multiple cores. In the first case it is easy, just flush the caches and switch between the PIPEs,
though there is some overhead involved. In the latter case, however, the cores run independently and
synchronization has to go through the CPU. This necessitates either a stall or a complex queuing
mechanism that waits for signals on both cores.

2D commands
-----------------

The 2D engine supports the following top-level commands:

- Clear
- Line
- Bit blit
- Stretch blit
- Multi source blit

2D commands are executed by setting the opcode in register `DE.DEST_CONFIG.COMMAND` (and other state
as necessary) and then queuing `DRAW_2D` commands in the command stream. These commands can be supplied
with up to 256 rectangles which will be drawn using the same state and origin settings.

Filter blits are also available as 2D commands, but I was unable to get this to do anything
(I don't think the blob does either). Use the video rasterizer for these as described below.

Video rasterizer
-----------------

The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
9-tap separable filter with 5 bit subpixel precision,

It supports the following top-level commands:

- Horizontal filter blit
- Vertical filter blit
- One-pass filter blit

Video rasterizer commands are submitted in a different way from normal 2D commands, as they are
triggered by a write to a register instead of through the command stream. The video rasterizer
only processes one rectangle per invocation.

Input: Y/U/V planar or Y/U/V interleaved images or RGBA images
Output: RGBA formats (output to planar is possible too on some chips)

Source and destination formats
--------------------------

    Format        Source    Destination     Notes
    -----------------------------------------------
    A1R5G5B5        +            +
    A4R4G4B4        +            +
    X1R5G5B5        +            +
    X4R4G4B4        +            +
    R5G6B5          +            +
    A8R8G8B8        +            +
    X8R8G8B8        +            +
    A8              +            +          8-bit alpha only
    MONOCHROME      +            -          1-bit monochrome
    INDEX8          +            -          8-bit indexed
    YUY2            +            ?          YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr
    UYVY            +            ?          YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1
    YV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
    NV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
    NV16            +            ?          YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling
    RG16            +            ?          Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples

Additional formats can be supported by using RGBA or UV swizzles.

PE10/PE20
-----------
There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be
distinguished by the feature bit `PE20`.

GPUs with feature bit `PE20` have various different features from GPUs without the bit (considered
`PE10`). Also the registers that are used for the same features can be different. `PE20` registers
are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to
the PE type or it will not work.

Commands
============

Clear
------

Fills an area with a solid color.

Even though ROP is not used for clears, set it to 0xcc when clearing to avoid loading the pattern or destination.

For PE20 the clear color is specified in register `DE_CLEAR_PIXEL_VALUE32`, and is always specified in A8R8G8B8 format.

For PE10 the clear color is specified in registers `DE_CLEAR_PIXEL_VALUE_LOW` and `DE_CLEAR_PIXEL_VALUE_HIGH` as
an 8-byte pattern, so it has to be provided in the format of the target surface. An additional register `CLEAR_BYTE_MASK`
contains a bit mask that specifies which bytes within each 8-byte unit to clear.

Lines
--------

Draws lines using the Bresenham algorithm.

The first pixel (x1,y1) is drawn, last pixel (x2,y2) is not

Monochrome blits
-----------------

Mono expansion can be used for primitive font rendering or black and white patterns such as
checkerboards.

When blitting from `LOCATION_STREAM` make sure that there are enough bytes available in the stream.
Source size is ignored in the case of monochrome blits.

Mono expansion uses registers `SRC_COLOR_FG` and `SRC_COLOR_BG` to determine the colors to use for 0
and 1 pixels respectively.

Restrictions:

- In case of source `LOCATION_STREAM` can only draw one rectangle at a time. There is no such
  restriction for `LOCATION_MEMORY`.

Raster operations
------------------
Raster operation foreground and background codes. Even though ROP is not used in `CLEAR`,
`HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled `BIT_BLTs`, ROP code still has to be
programmed, because the engine makes the decision whether source, destination and pattern are
involved in the current operation and the correct decision is essential for the engine to complete
the operation as expected.

ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So
for a ROP3, for example, the ROP pattern will be 2^3=8 bits.

These are the input bit for the ROPs, per ROP type:

`ROP2_PATTERN` [untested]
    bit 0 destination
    bit 1 pattern

`ROP2_SOURCE` [untested]
    bit 0 destination
    bit 1 source

`ROP3` (uses `ROP_FG` only)
    bit 0 destination
    bit 1 source
    bit 2 pattern

`ROP4` (uses `ROP_FG` and `ROP_BG`)
    bit 0 destination
    bit 1 source
    bit 2 pattern
    bit "3" foreground/background (`ROP_FG` / `ROP_BG`)

ROP3/4 examples:

    10101010  0xaa   destination
    01010101  0x55   !destination
    11001100  0xcc   source
    00110011  0x33   !source
    11110000  0xf0   pattern
    00001111  0x0f   !pattern

Patterns
---------
An repeated 8×8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`.  This pattern
can be combined with the color using ROP.

Alpha blending
---------------
- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by
  blend factor) are added.

- Alpha values can come from the source/destination per pixel or a global value defined in the
  state.

Rotation and mirroring
-----------------------

- There are two ways to do source and destination rotation: through register `ROT_ANGLE` and through
  register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`.  The former is more flexible and can
  rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU
  (which ones?).

- There are also two ways to do mirroring: though register `ROT_ANGLE` and through register `CONFIG`
  (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
  different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror
  blit extension").  This is only available in new hardware (gc880, gc2000).


Miscellaneous notes
====================
Stretch blit
-------------
Stretch blit destination format must support alpha blending.

YUV color spaces
-----------------
The 2D engine supports two YUV color spaces [1]: BT.601 and BT.709.
Conversion from YUV to RGB is done in the following way for each of the color spaces:

    16 ≤ Y ≤ 235
    16 ≤ U ≤ 240
    16 ≤ V ≤ 240
    A = Y - 16
    B = U - 128
    C = V - 128

BT.601:

    R = clip((298*A + 410*C + 128) >> 8)
    G = clip((298*A - 101*B - 209*C + 12
    B = clip((298*A + 519*B + 128) >> 8)

BT.709:

    R = clip((298*A + 461*C + 128) >> 8)
    G = clip((298*A - 55*B - 137*C + 128) >> 8)
    B = clip((298*A + 543*B + 128) >> 8)

References
========
[1] Vivante GC320 reference manual (used to be hosted on the Vivante site, but was taken off line)