1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
|
2D engine documentation
========================
This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core
(such as GC355) which is a beefed up 2D core with a completely different interface.
Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will
hang on the first rendering command and nothing will seem to happen at all.
As the state footprint is pretty small, it is recommended to program all relevant 2D engine state
for an operation (before flushing) at once before a command instead of relying on a context to be
maintained as with 3D rendering (although this is still a possibility).
Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as
Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have
multiple cores. In the first case it is easy, just flush the caches and switch between the PIPEs,
though there is some overhead involved. In the latter case, however, the cores run independently and
synchronization has to go through the CPU. This necessitates either a stall or a complex queuing
mechanism that waits for signals on both cores.
2D commands
-----------------
The 2D engine supports the following top-level commands:
- Clear
- Line
- Bit blit
- Stretch blit
- Multi source blit
2D commands are executed by setting the opcode in register `DE.DEST_CONFIG.COMMAND` (and other state
as necessary) and then queuing `DRAW_2D` commands in the command stream. These commands can be supplied
with up to 256 rectangles which will be drawn using the same state and origin settings.
Filter blits are also available as 2D commands, but I was unable to get this to do anything
(I don't think the blob does either). Use the video rasterizer for these as described below.
Video rasterizer
-----------------
The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
9-tap separable filter with 5 bit subpixel precision,
It supports the following top-level commands:
- Horizontal filter blit
- Vertical filter blit
- One-pass filter blit
Video rasterizer commands are submitted in a different way from normal 2D commands, as they are
triggered by a write to a register instead of through the command stream. The video rasterizer
only processes one rectangle per invocation.
Input: Y/U/V planar or Y/U/V interleaved images or RGBA images
Output: RGBA formats (output to planar is possible too on some chips)
Source and destination formats
--------------------------
Format Source Destination Notes
-----------------------------------------------
A1R5G5B5 + +
A4R4G4B4 + +
X1R5G5B5 + +
X4R4G4B4 + +
R5G6B5 + +
A8R8G8B8 + +
X8R8G8B8 + +
A8 + + 8-bit alpha only
MONOCHROME + - 1-bit monochrome
INDEX8 + - 8-bit indexed
YUY2 + ? YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr
UYVY + ? YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1
YV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
NV12 + ? YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
NV16 + ? YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling
RG16 + ? Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples
Additional formats can be supported by using RGBA or UV swizzles.
PE10/PE20
-----------
There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be
distinguished by the feature bit `PE20`.
GPUs with feature bit `PE20` have various different features from GPUs without the bit (considered
`PE10`). Also the registers that are used for the same features can be different. `PE20` registers
are usually a superset of the `PE10` equivalent. Make sure to use the right registers according to
the PE type or it will not work.
Commands
============
Clear
------
Fills an area with a solid color.
Even though ROP is not used for clears, set it to 0xcc when clearing to avoid loading the pattern or destination.
For PE20 the clear color is specified in register `DE_CLEAR_PIXEL_VALUE32`, and is always specified in A8R8G8B8 format.
For PE10 the clear color is specified in registers `DE_CLEAR_PIXEL_VALUE_LOW` and `DE_CLEAR_PIXEL_VALUE_HIGH` as
an 8-byte pattern, so it has to be provided in the format of the target surface. An additional register `CLEAR_BYTE_MASK`
contains a bit mask that specifies which bytes within each 8-byte unit to clear.
Lines
--------
Draws lines using the Bresenham algorithm.
The first pixel (x1,y1) is drawn, last pixel (x2,y2) is not
Monochrome blits
-----------------
Mono expansion can be used for primitive font rendering or black and white patterns such as
checkerboards.
When blitting from `LOCATION_STREAM` make sure that there are enough bytes available in the stream.
Source size is ignored in the case of monochrome blits.
Mono expansion uses registers `SRC_COLOR_FG` and `SRC_COLOR_BG` to determine the colors to use for 0
and 1 pixels respectively.
Restrictions:
- In case of source `LOCATION_STREAM` can only draw one rectangle at a time. There is no such
restriction for `LOCATION_MEMORY`.
Raster operations
------------------
Raster operation foreground and background codes. Even though ROP is not used in `CLEAR`,
`HOR_FILTER_BLT`, `VER_FILTER_BLT` and alpha-enabled `BIT_BLTs`, ROP code still has to be
programmed, because the engine makes the decision whether source, destination and pattern are
involved in the current operation and the correct decision is essential for the engine to complete
the operation as expected.
ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So
for a ROP3, for example, the ROP pattern will be 2^3=8 bits.
These are the input bit for the ROPs, per ROP type:
`ROP2_PATTERN` [untested]
bit 0 destination
bit 1 pattern
`ROP2_SOURCE` [untested]
bit 0 destination
bit 1 source
`ROP3` (uses `ROP_FG` only)
bit 0 destination
bit 1 source
bit 2 pattern
`ROP4` (uses `ROP_FG` and `ROP_BG`)
bit 0 destination
bit 1 source
bit 2 pattern
bit "3" foreground/background (`ROP_FG` / `ROP_BG`)
ROP3/4 examples:
10101010 0xaa destination
01010101 0x55 !destination
11001100 0xcc source
00110011 0x33 !source
11110000 0xf0 pattern
00001111 0x0f !pattern
Patterns
---------
An repeated 8×8 pattern can be used with 2D engine operations `LINE` and `BIT_BLT`. This pattern
can be combined with the color using ROP.
Alpha blending
---------------
- The blend equation is always akin OpenGL's `GL_FUNC_ADD`, source and destination (multiplied by
blend factor) are added.
- Alpha values can come from the source/destination per pixel or a global value defined in the
state.
Rotation and mirroring
-----------------------
- There are two ways to do source and destination rotation: through register `ROT_ANGLE` and through
register `SOURCE_ROTATION_CONFIG` / `DEST_ROTATION_CONFIG`. The former is more flexible and can
rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU
(which ones?).
- There are also two ways to do mirroring: though register `ROT_ANGLE` and through register `CONFIG`
(enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be
different. Mirroring seems to be `ROT_ANGLE` is supported with the `NEW_2D` capability ("mirror
blit extension"). This is only available in new hardware (gc880, gc2000).
Miscellaneous notes
====================
Stretch blit
-------------
Stretch blit destination format must support alpha blending.
YUV color spaces
-----------------
The 2D engine supports two YUV color spaces [1]: BT.601 and BT.709.
Conversion from YUV to RGB is done in the following way for each of the color spaces:
16 ≤ Y ≤ 235
16 ≤ U ≤ 240
16 ≤ V ≤ 240
A = Y - 16
B = U - 128
C = V - 128
BT.601:
R = clip((298*A + 410*C + 128) >> 8)
G = clip((298*A - 101*B - 209*C + 12
B = clip((298*A + 519*B + 128) >> 8)
BT.709:
R = clip((298*A + 461*C + 128) >> 8)
G = clip((298*A - 55*B - 137*C + 128) >> 8)
B = clip((298*A + 543*B + 128) >> 8)
References
========
[1] Vivante GC320 reference manual (used to be hosted on the Vivante site, but was taken off line)
|