summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorWladimir J. van der Laan <laanwj@gmail.com>2013-08-23 18:03:14 +0200
committerWladimir J. van der Laan <laanwj@gmail.com>2013-08-23 18:03:14 +0200
commitc51729b9c122e6169103be1a0f0a133ba2bcbef6 (patch)
treef30c3c7002e35b8a6121e5ba00020a001a215162 /doc
parentd9dcbafc88dd396d1e7e3b84c9ed37b4afdbc1aa (diff)
remove lots of trailing spaces
whitespace only changes
Diffstat (limited to 'doc')
-rw-r--r--doc/2d.md4
-rw-r--r--doc/blob_extensions.md92
-rw-r--r--doc/hardware.md88
-rw-r--r--doc/isa.md32
-rw-r--r--doc/kernel_interface.md104
-rw-r--r--doc/patents.md14
6 files changed, 167 insertions, 167 deletions
diff --git a/doc/2d.md b/doc/2d.md
index e09fe76..8a35c67 100644
--- a/doc/2d.md
+++ b/doc/2d.md
@@ -39,7 +39,7 @@ Filter blits are also available as 2D commands, but I was unable to get this to
Video rasterizer
-----------------
-The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
+The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
9-tap separable filter with 5 bit subpixel precision,
It supports the following top-level commands:
@@ -160,7 +160,7 @@ These are the input bit for the ROPs, per ROP type:
bit 1 source
bit 2 pattern
bit "3" foreground/background (`ROP_FG` / `ROP_BG`)
-
+
ROP3/4 examples:
10101010 0xaa destination
diff --git a/doc/blob_extensions.md b/doc/blob_extensions.md
index 6683798..0b32e7f 100644
--- a/doc/blob_extensions.md
+++ b/doc/blob_extensions.md
@@ -8,56 +8,56 @@ VERSION 4.6.9:1478, PLATFORM Android
EGL Extensions:
- EGL_KHR_reusable_sync
- EGL_KHR_fence_sync
- EGL_KHR_image_base
- EGL_KHR_image_pixmap
- EGL_KHR_image
- EGL_KHR_gl_texture_2D_image
- EGL_KHR_gl_texture_cubmap_image
- EGL_KHR_gl_renderbuffer_image
- EGL_KHR_lock_surface
- EGL_ANDROID_image_native_buffer
- EGL_ANDROID_swap_rectangle
+ EGL_KHR_reusable_sync
+ EGL_KHR_fence_sync
+ EGL_KHR_image_base
+ EGL_KHR_image_pixmap
+ EGL_KHR_image
+ EGL_KHR_gl_texture_2D_image
+ EGL_KHR_gl_texture_cubmap_image
+ EGL_KHR_gl_renderbuffer_image
+ EGL_KHR_lock_surface
+ EGL_ANDROID_image_native_buffer
+ EGL_ANDROID_swap_rectangle
EGL_ANDROID_blob_cache
EGL_ANDROID_recordable
GLES2 Extensions:
- GL_OES_compressed_ETC1_RGB8_texture
- GL_OES_compressed_paletted_texture
- GL_OES_EGL_image
- GL_OES_depth24
- GL_OES_element_index_uint
- GL_OES_fbo_render_mipmap
- GL_OES_fragment_precision_high
- GL_OES_rgb8_rgba8
- GL_OES_stencil1
- GL_OES_stencil4
- GL_OES_texture_npot
- GL_OES_vertex_half_float
- GL_OES_depth_texture
- GL_OES_packed_depth_stencil
- GL_OES_standard_derivatives
- GL_OES_get_program_binary
- GL_EXT_texture_format_BGRA8888
- GL_IMG_read_format
- GL_EXT_blend_minmax
- GL_EXT_read_format_bgra
- GL_EXT_multi_draw_arrays
- GL_APPLE_texture_format_BGRA8888
- GL_APPLE_texture_max_level
- GL_ARM_rgba8
- GL_EXT_frag_depth
- GL_VIV_shader_binary
- GL_VIV_timestamp
- GL_OES_mapbuffer
- GL_OES_EGL_image_external
- GL_EXT_texture_compression_dxt1
- GL_EXT_texture_compression_s3tc
- GL_IMG_texture_compression_pvrtc
- GL_EXT_discard_framebuffer
- GL_OES_vertex_type_10_10_10_2
- GL_EXT_texture_type_2_10_10_10_REV
+ GL_OES_compressed_ETC1_RGB8_texture
+ GL_OES_compressed_paletted_texture
+ GL_OES_EGL_image
+ GL_OES_depth24
+ GL_OES_element_index_uint
+ GL_OES_fbo_render_mipmap
+ GL_OES_fragment_precision_high
+ GL_OES_rgb8_rgba8
+ GL_OES_stencil1
+ GL_OES_stencil4
+ GL_OES_texture_npot
+ GL_OES_vertex_half_float
+ GL_OES_depth_texture
+ GL_OES_packed_depth_stencil
+ GL_OES_standard_derivatives
+ GL_OES_get_program_binary
+ GL_EXT_texture_format_BGRA8888
+ GL_IMG_read_format
+ GL_EXT_blend_minmax
+ GL_EXT_read_format_bgra
+ GL_EXT_multi_draw_arrays
+ GL_APPLE_texture_format_BGRA8888
+ GL_APPLE_texture_max_level
+ GL_ARM_rgba8
+ GL_EXT_frag_depth
+ GL_VIV_shader_binary
+ GL_VIV_timestamp
+ GL_OES_mapbuffer
+ GL_OES_EGL_image_external
+ GL_EXT_texture_compression_dxt1
+ GL_EXT_texture_compression_s3tc
+ GL_IMG_texture_compression_pvrtc
+ GL_EXT_discard_framebuffer
+ GL_OES_vertex_type_10_10_10_2
+ GL_EXT_texture_type_2_10_10_10_REV
GL_EXT_texture_filter_anisotropic
diff --git a/doc/hardware.md b/doc/hardware.md
index b36ce9a..8c55c39 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -8,8 +8,8 @@ Major optional blocks: each of these can be present or not depending on the spec
- 3D engine
- VG engine
-Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose
-for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000
+Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose
+for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000
with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
@@ -28,7 +28,7 @@ GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
Feature bits
=================
-Which features are supported on a certain Vivante core is not only determined by the model number
+Which features are supported on a certain Vivante core is not only determined by the model number
(which AFAIK mainly determines the performance), but specified by a combination of factors:
1) Chip features and minor feature flags
@@ -37,7 +37,7 @@ Which features are supported on a certain Vivante core is not only determined by
4) Chip revision of the form 0x1234
All of these are available in read-only registers on the hardware. On most cases it suffices to check the feature flags as
-Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes
+Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes
have their own feature bit).
For an overview of the feature bits see the enumerations in `state.xml`.
@@ -104,7 +104,7 @@ Thread walker = Rectangle walker? (seems to have to do with OpenCL)
[1] http://www.vivantecorp.com/Vivante_GC320_Technical_Reference_Manual_V1.0_A.pdf
[2] http://2012ftf.ccidnet.com/pdf/0049.pdf
-Connections
+Connections
-------------
Connections between the different modules follow the OpenGL pipeline design [3].
@@ -123,7 +123,7 @@ See also [1]
- SE determines rasterization starting point for each primitive, and also culls based on trivial rejection
- RA performs per-tile, per-subtile, per-quad and per-pixel clipping
- [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME
+ [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME
http://www.freepatentsonline.com/y2010/0271370.html
[2] Efficient tile-based rasterization
http://www.google.com/patents/US8009169
@@ -133,7 +133,7 @@ See also [1]
Command stream
-------------------
-Commands and data are sent to the GPU through the FE (Front End interface). The
+Commands and data are sent to the GPU through the FE (Front End interface). The
command stream of the front-end interface has a specific format described in this section.
Overall format
@@ -154,11 +154,11 @@ Opcodes
00111 Wait ([15-0] count)
01000 Link ([15-0] number of bytes, arg address)
01001 Stall (argument seems same format as state 0380C)
- 01010 Call
+ 01010 Call
01011 Return
01101 Chip select
-Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and
+Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and
sometimes on the first word of the command.
See `cmdstream.xml` for detailed overview of commands and arguments. The most commonly used command is
@@ -184,20 +184,20 @@ The following sequence of states is common:
GL.SEMAPHORE_TOKEN := FROM=RA,TO=PE
GL.STALL_TOKEN := FROM=RA,TO=PE
-The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In
-this example it stalls the rasterizer until the pixel engine has completed the commands up until now.
+The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In
+this example it stalls the rasterizer until the pixel engine has completed the commands up until now.
The `STALL` command is used to stall the command queue until the semaphore has been received. The stall command has
-one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE.
+one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE.
Within the 3D engine, not many explicit synchronization points appear to be needed. Some exceptions:
-- The blob issues a semaphore and stall from RA to PE when
+- The blob issues a semaphore and stall from RA to PE when
- Changing depth configuration in PE
- Sometimes when changing stencil config in PE
-- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when
+- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when
- Tile status address/configuration changes
- Clearing depth
@@ -210,9 +210,9 @@ XXX (cwabbott) usually, isa's have some sort of texture barrier or sync operatio
Resolve
-----------
-The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another,
-optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and
-destination address can be the same to fill in tiles that were not touched during the rendering process
+The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another,
+optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and
+destination address can be the same to fill in tiles that were not touched during the rendering process
(according to the Tile Status, see below) with the background color.
The RS and PE (drawing) share one set of pixel pipes. They will never be active concurrently (AFAIK).
@@ -244,18 +244,18 @@ the GPU to hang mysteriously on rendering.
Shader ISA
================
-Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same
+Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same
instruction set. See `isa.xml` and `isa.md` for details of the instructions, this section only provides a high-level overview.
-- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields
+- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields
that have a meaning which differs only very little per opcode. Which of these fields is used (which operands) does differ per opcode.
- Four-component SIMD processor (for most of the instructions)
-- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL.
+- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL.
The split is around GC1000, though this being Vivante there is likely some feature bit for it.
-- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`).
+- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`).
In addition to that, there is a specific operand for texture sampling (`TEX_*`).
- Operands can have these properties:
@@ -269,7 +269,7 @@ that have a meaning which differs only very little per opcode. Which of these fi
- Registers:
- `N` four-component float temporary registers `tX` (actual number depends on the hardware, maximum seems to be 64 for all
- vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict
+ vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict
the available paralellism)
- `1` four-component address register `a0`
@@ -285,16 +285,16 @@ of the framebuffer using the `FBIOGET_VSCREENINFO` and `FBIOGET_FSCREENINFO` ioc
This physical address can then directly be used as target address for a resolve operation, just like when copying
to a normal bitmap.
-Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer
+Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer
directly for rendering, as it only possible to render to tiled and supertiled surfaces, and (afaik) no display controller
supports scan out from tiled formats.
In many cases there is more framebuffer memory than that which is used for the current screen, which causes larger virtual resolution
-to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer.
+to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer.
Operations
========================
-An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify
+An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify
their operation.
- RS: Kick off resolve by writing a value with bit 0 set to `RS_KICKER`. State used:
@@ -328,8 +328,8 @@ Programming pecularities
support. The blob driver uses it for some states (viewport scaling, offset, scissor, ...)
but not others (uniforms etc).
-- It is quite easy to hang the GPU when making a minor programming mistake.
- When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing
+- It is quite easy to hang the GPU when making a minor programming mistake.
+ When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing
ever finishes.
Ways I've already made it crash:
@@ -339,11 +339,11 @@ Programming pecularities
- Sending 3D commands in the 2D pipe instead of 3D pipe (then using a signal waiting for them to complete)
- Wrong length of shader
- Texture sampling without properly setup texture units
- - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions
+ - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions
such as 1920x1080 on GC600. I don't know why, maybe some buffer or cache overflow. The rockchip vivante driver always uses |5 AFAIK,
this offset appears to be different per specific chip/revision.
- This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but
+ This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but
this usually is not enough to make it un-stuck. It would probably be a better solution to introduce a kernel-based timeout
instead of relying on userspace to be 100% correct (may exist on v4?).
@@ -355,10 +355,10 @@ When the mask bit belonging to a group of state bits is *set* on a state write,
state bits will be unaffected. If the mask bit is *unset*, the state bits will be written.
This allows setting state per group of bits. For example, it allows setting only
-the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the
+the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the
other bits in that state word.
-If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all
+If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all
bits at once. This is what I used in `etna_pipe`, as I keep track of all state myself.
Texture tiling
@@ -380,7 +380,7 @@ Supertiling
![supertile ordering](images/supertile.png)
It appears that the blob always pads render buffers pixel sizes to a multiple of 64, ie, a width of 400 becomes 448 and 800 becomes 832.
-This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures.
+This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures.
On a fine level, every tile is the same as for normal tiled surfaces:
0 1 2 3
@@ -416,7 +416,7 @@ but is only nested one level, in total this results in 64x64 sized tiles.
The GPU can render to normal tiled surfaces (such as used by textures) as well as supertiled surfaces. However,
rendering to supertiled surfaces is likely faster due to better cache locality.
-Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400),
+Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400),
0x3400 for width 832 (originally 800).
Multisampling
@@ -445,7 +445,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
- 256x256 target with 4 samples creates a 512x512 render target and depth buffer
GL.MULTI_SAMPLE_CONFIG := MSAA_SAMPLES=4X,MSAA_ENABLES=0xf,UNK12=0x0,UNK16=0x0
- RA.MULTISAMPLE_UNK00E04 := 0x0
+ RA.MULTISAMPLE_UNK00E04 := 0x0
RA.MULTISAMPLE_UNK00E10[2] := 0xaaa22a22
RA.CENTROID_TABLE[8] := 0x262a2288
RA.CENTROID_TABLE[9] := 0x886688a2
@@ -466,7 +466,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
Other differences when MSAA is enabled:
-- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`).
+- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`).
- The TS surface belonging to the enlarged in the same way; it is treated as if there simply is a bigger render target.
- It also looks like the PS gets an extra input/temporary when MSAA is enabled:
@@ -494,25 +494,25 @@ When rendering points (PRIMITIVE_TYPE_POINTS) there are some differences:
- There is an extra varying for `gl_pointCoord` with two components. This varying has
its components in `GL_VARYING_COMPONENT_USE` set to `POINTCOORD_X` and `POINTCOORD_Y`.
Its `PA_SHADER_ATTRIBUTES` is set to `0x000002f1`.
- The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set
+ The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set
to any output register.
-- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point
+- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point
size per vertex is not set, the value in `PA.POINT_SIZE` is used.
-- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding
+- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding
a 1.0-y instruction when glPointCoord is used. XXX figure out what is the default.
Vertex texture fetch
--------------------
Vertex samplers live in the same space as fragment samplers. The blob uses a fixed mapping:
-sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers.
+sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers.
The shaders themselves refer to the absolute shader number; so tex8 is the first texture unit used in a
vertex shader. The normal TEX instruction can be used to sample textures from a vertex shader.
-Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders
+Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders
one for vertex shaders (bits `GL.FLUSH_CACHE.TEXTURE` and `GL.FLUSH_CACHE.TEXTUREVS` respectively).
This solves a problem with running `cubemap_sphere` after `displacement` demo;
@@ -525,16 +525,16 @@ Even adding a PE to FE semaphore afterwards or dummy state loads does not fix th
All texture filtering options are allowed for vertex texture fetch.
-XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is
+XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is
not important for GL/Gallium because it already lives with the assumption that vertex and fragment shaders
are distinct.
Shader size on GC2000
----------------------
-The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command
-stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit
-instructions this indeed maps to 512 instructions.
+The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command
+stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit
+instructions this indeed maps to 512 instructions.
XXX does the VS/PS split at instruction 256 during rendering affect OpenCL? Hopefully not...
diff --git a/doc/isa.md b/doc/isa.md
index faa6006..91929e3 100644
--- a/doc/isa.md
+++ b/doc/isa.md
@@ -12,15 +12,15 @@ Basic vertex shader
uniform mat4 modelviewMatrix;
uniform mat4 modelviewprojectionMatrix;
uniform mat3 normalMatrix;
-
+
attribute vec4 in_position;
attribute vec3 in_normal;
attribute vec4 in_color;
-
+
vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-
+
varying vec4 vVaryingColor;
-
+
void main()
{
gl_Position = modelviewprojectionMatrix * in_position;
@@ -73,16 +73,16 @@ Vertex shader with texture coordinates
uniform mat4 modelviewMatrix;
uniform mat4 modelviewprojectionMatrix;
uniform mat3 normalMatrix;
-
+
attribute vec4 in_position;
attribute vec3 in_normal;
attribute vec2 in_coord;
-
+
vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-
+
varying vec4 vVaryingColor;
varying vec2 coord;
-
+
void main()
{
gl_Position = modelviewprojectionMatrix * in_position;
@@ -129,9 +129,9 @@ Empty (passthrough)
--------------------
precision mediump float;
-
+
varying vec4 vVaryingColor;
-
+
void main()
{
gl_FragColor = vVaryingColor;
@@ -145,12 +145,12 @@ Texture sampling
------------------
precision mediump float;
-
+
varying vec4 vVaryingColor;
varying vec2 coord;
-
+
uniform sampler2D in_texture;
-
+
void main()
{
gl_FragColor = 3.0 * vVaryingColor * texture2D(in_texture, coord);
@@ -174,8 +174,8 @@ This adjusts the output position z, based on w. Likely this works around a diffe
the hardware and the OpenGL standard.
For the gc2000 in the i.mx6 these two instructions are no longer appended (the only difference in the vertex shader for
-smoothed cube between gc600 and gc2000, as generated by the blob driver, is this).
-
+smoothed cube between gc600 and gc2000, as generated by the blob driver, is this).
+
The cutoff point for this is at GC1000. All Vivante GPUs before GC1000 require these two instructions, except
the GC880.
@@ -191,7 +191,7 @@ Misc notes
gl_fragCoord: contains the window-relative coordinates of the current fragment
-- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing.
+- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing.
i0.y also contains a non-zero value. i0.zw are zero.
- i1..i127 are simply aliases of i0, at least on my GC600.
diff --git a/doc/kernel_interface.md b/doc/kernel_interface.md
index 0906099..9d56689 100644
--- a/doc/kernel_interface.md
+++ b/doc/kernel_interface.md
@@ -20,14 +20,14 @@ along with the values on a RK2918 device:
Most important to get right are registerMemSize, registerMemBase and irqLine as these allow the driver to find and
communicate with the GPU hardware. They depend on the board, not on the GPU. For example, on a CuBox these settings are:
-
+
irqLine 42
registerMemBase 0xf1840000
contiguousBase 0x08000000
The `dove` (cubox) driver also has a `gpu_frequency` parameter that sets the AXICLK/GCCLK clock at startup,
if compiled with `ENABLE_GPU_CLOCK_BY_DRIVER`. Some devices may need this, although not the CuBox itself (it is disabled in the makefile).
-In that case your GPU will have an entry `GC` in `/proc/clocks`.
+In that case your GPU will have an entry `GC` in `/proc/clocks`.
On a Freescale i.MX6 (GK802) device the parameters are:
@@ -44,7 +44,7 @@ On a Freescale i.MX6 (GK802) device the parameters are:
contiguousSize 0x0c000000 (192 MB)
coreClock 156000000
signal 48
- baseAddress 0
+ baseAddress 0
Diagnostics
==============
@@ -173,9 +173,9 @@ At startup, the application connects to galcore device using `open` with the dev
- `/dev/galcore`, or
- `/dev/graphics/galcore`
-After connecting to the device the entire chunk of contiguous memory, after requesting its address and size,
+After connecting to the device the entire chunk of contiguous memory, after requesting its address and size,
is mapped into user space using `mmap`. The kernel will return addresses in this range when the user space driver allocates
-contiguous (unified) memory used for communication with the GPU.
+contiguous (unified) memory used for communication with the GPU.
Ioctl
-------
@@ -189,7 +189,7 @@ Communication with the kernel driver happens through ioctl calls on the resultin
`IOCTL_GCHAL_INTERFACE` is the only one of these that is actually used by the userspace blob. This ioctl is passed one argument
which is a pointer to the following structure:
- typedef struct
+ typedef struct
{
void *in_buf;
uint32_t in_buf_size;
@@ -197,28 +197,28 @@ which is a pointer to the following structure:
uint32_t out_buf_size;
} vivante_ioctl_data_t;
-When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is
+When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is
used both for input and output arguments.
Command structure
------------------
-The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the
-kernel. It can be seen as a communication packet with a command opcode and an union with parameters.
+The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the
+kernel. It can be seen as a communication packet with a command opcode and an union with parameters.
Depending on the `command` a different field of this union is used. The same structure is used both for input and output
-arguments.
+arguments.
-For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on)
-uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but
-also to pass out the number of bytes actually allocated.
+For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on)
+uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but
+also to pass out the number of bytes actually allocated.
-What is curious about the ioctl protocol is that the communication structures contains fields that are not
-used by the kernel at all. There is no good reason why these values would need
+What is curious about the ioctl protocol is that the communication structures contains fields that are not
+used by the kernel at all. There is no good reason why these values would need
to be present in kernel-facing structures. The line is blurry sometimes.
It also appears that the structure has been designed with platform-independence in mind, and so some of the fields are not used in the Linux
drivers such as `status`, `handle`, `pid`.
A possibly worthwhile long-term goal would be to clean up the kernel driver interface. This would break compatibility with
-the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI
+the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI
infrastructure driver instead.
Allocations
@@ -240,29 +240,29 @@ Memory management happens in the kernel. Two types of memory are allocated:
Allocated with command `ALLOCATE_LINEAR_VIDEO_MEMORY`
Device memory, from one of the pools (default, local, unified or contiguous system memory)
- The available pools depend on the hardware; many of the devices have no local memory, and simply
+ The available pools depend on the hardware; many of the devices have no local memory, and simply
use a part of system memory as video memory.
-`LOCK_VIDEO_MEMORY` locks the video memory both
+`LOCK_VIDEO_MEMORY` locks the video memory both
- into the GPU memory space so that it can be used by the GPU
-- into CPU memory so that the application can read/write.
+- into CPU memory so that the application can read/write.
It is interesting that these are done by
the same call.
Command buffers
-------------------
-Like many other GPUs, the primary means of programming the chip is through a command stream
+Like many other GPUs, the primary means of programming the chip is through a command stream
interpreted by a DMA engine. This "Front End" takes care of distributing state changes through
-the individual modules of the GPU, kicking off primitive rendering, synchronization,
+the individual modules of the GPU, kicking off primitive rendering, synchronization,
and also supports some primitive flow control (branch, call, return).
-The command stream is submitted to the kernel by means of command buffers. As most important part these
-structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`)
+The command stream is submitted to the kernel by means of command buffers. As most important part these
+structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`)
where the commands start.
-Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the
-`COMMIT` command.
+Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the
+`COMMIT` command.
The following structure fields of `gcoCMDBUF` are used by the kernel:
@@ -276,16 +276,16 @@ The following structure fields of `gcoCMDBUF` are used by the kernel:
User signal API
----------------
-Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver.
+Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver.
Note: the contents in this section only apply as-is if the kernel was *not* compiled with `USE_NEW_LINUX_SIGNAL`. If this
-flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the
+flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the
`USER_SIGNAL_WAIT` is a no-op.
The subcommands are:
- `USER_SIGNAL_CREATE` Create a new signal
- Inputs:
+ Inputs:
- manualReset
If set to gcvTRUE, the `SIGNAL` command must be used with state false to
reset the signal. If set to gcvFALSE, the signal automatically resets
@@ -306,7 +306,7 @@ The subcommands are:
Outputs: N/A
- `USER_SIGNAL_WAIT` Wait on the signal (block current thread)
- Inputs:
+ Inputs:
- id Signal id to wait for
- wait Maximum duration to wait (in milliseconds)
Outputs: N/A
@@ -319,7 +319,7 @@ The subcommands are:
Inputs: id
Outputs: N/A
-This is used to synchronize GPU and CPU.
+This is used to synchronize GPU and CPU.
Signals can be scheduled to be signalled/unsignalled when the GPU finished a certain operation (using an Event).
They are also used for inter-thread synchronization by the EGL driver.
@@ -384,7 +384,7 @@ Context switching
==================
Clients manage their own context, which is passed to COMMIT preemptively in case a context switch is needed.
-It appears that context switching is manual. Every process has to keep its own context structure for
+It appears that context switching is manual. Every process has to keep its own context structure for
context switching, and pass this to COMMIT. In case this is needed the kernel will then load the state
from the context buffer.
@@ -399,7 +399,7 @@ The state `FE.VERTEX_ELEMENT_CONFIG` is handled specially: write only the elemen
Used fields in `struct _gcoCONTEXT` from the kernel:
-- `id`
+- `id`
[in] This id is used to determine wether to switch context
[out] A unique id for the context is generated the first time a COMMIT is done, with context->id==0
- `hint*` only used when `SECURE_USER` is set
@@ -421,7 +421,7 @@ Profiling
To enable profiling, the kernel most have been built with `VIVANTE_PROFILER` enabled in `gc_hal_options.h` or the appropriate
`config` file.
-
+
USE_PROFILER = 1
Vivante also recommends disabling power management features while profiling,
@@ -430,7 +430,7 @@ Vivante also recommends disabling power management features while profiling,
HW profiling registers can be read using the command `READ_ALL_PROFILE_REGISTERS`.
-There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for
+There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for
logging to a file (`vprofiler.xml` by default), but this flag doesn't do anything in the kernel driver,
likely it's meant to be read out by the user space driver.
@@ -539,12 +539,12 @@ TODO: input/output arguments.
* `QUERY_CHIP_IDENTITY`
Query chip identity.
-
- Calls: gckHARDWARE_QueryChipIdentity
+
+ Calls: gckHARDWARE_QueryChipIdentity
* `ALLOCATE_NON_PAGED_MEMORY`
- Allocate non-paged memory.
+ Allocate non-paged memory.
Calls: gckOS_AllocateNonPagedMemory
@@ -558,7 +558,7 @@ TODO: input/output arguments.
Allocate contiguous non-paged memory (used for command buffers).
- Calls: gckOS_AllocateContiguous
+ Calls: gckOS_AllocateContiguous
* `FREE_CONTIGUOUS_MEMORY`
@@ -579,7 +579,7 @@ TODO: input/output arguments.
Walks all required memory pools to allocate the requested amount of video memory.
gcvPOOL_VIRTUAL: Virtual memory, allocated using gckVIDMEM_ConstructVirtual
- gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual
+ gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual
gcvPOOL_SYSTEM: Contiguous system memory
gcvPOOL_LOCAL_INTERNAL: Internal memory
gcvPOOL_LOCAL_EXTERNAL: External memory
@@ -599,7 +599,7 @@ TODO: input/output arguments.
* `FREE_VIDEO_MEMORY`
- Calls: gckVIDMEM_Free
+ Calls: gckVIDMEM_Free
* `MAP_MEMORY`
@@ -610,7 +610,7 @@ TODO: input/output arguments.
* `UNMAP_MEMORY`
Unmap memory mapped with `MAP_MEMORY`.
-
+
Calls: gckKERNEL_UnmapMemory (gckOS_UnmapMemory)
* `MAP_USER_MEMORY`
@@ -630,20 +630,20 @@ TODO: input/output arguments.
Surface lock.
- Calls: gckVIDMEM_Lock
+ Calls: gckVIDMEM_Lock
* `UNLOCK_VIDEO_MEMORY`
-
+
Surface unlock.
-
- Calls: gckVIDMEM_Unlock
+
+ Calls: gckVIDMEM_Unlock
* `EVENT_COMMIT`
-
+
Commit an event queue.
Calls: gckEVENT_Commit
-
+
* `USER_SIGNAL`
Dispatch depends on the user signal subcommands (refer to section `User signal API`).
@@ -662,7 +662,7 @@ TODO: input/output arguments.
* `COMMIT`
Commit a command and context buffer.
-
+
Calls: gckCOMMAND_Commit
* `STALL`
@@ -680,7 +680,7 @@ TODO: input/output arguments.
Calls: gckOS_ReadRegister
* `WRITE_REGISTER`
-
+
Write a GPU register. Only enabled if kernel compiled with `gcdREGISTER_ACCESS_FROM_USER` (which
is obviously an security risk, as it allows user-space to read and write arbitrary registers).
@@ -704,7 +704,7 @@ TODO: input/output arguments.
Calls: gckHARDWARE_QueryProfileRegisters
* `PROFILE_REGISTERS_2D`
-
+
Read all 2D profile registers. Only available if kernel compiled with `VIVANTE_PROFILER` enabled.
Calls: gckHARDWARE_ProfileEngine2D
@@ -763,7 +763,7 @@ TODO: input/output arguments.
Flush or invalidate the cache.
NOTE: unimplemented on Linux, and also apparently not called by the blob on Linux.
- In:
+ In:
invalidate: If FALSE, flush the cache (the GPU is going to need the data)
if TRUE, flush and invalidate the cache (if the GPU is going to modify the data)
process: Process handle Logical belongs to or gcvNULL if Logical belongs to the kernel.
@@ -776,7 +776,7 @@ TODO: input/output arguments.
Broadcast GPU stuck.
- Calls: gckOS_Broadcast
+ Calls: gckOS_Broadcast
Crash recovery
================
diff --git a/doc/patents.md b/doc/patents.md
index 47c0280..b75a82f 100644
--- a/doc/patents.md
+++ b/doc/patents.md
@@ -21,7 +21,7 @@ optionally blended with another data value and written to a memory device. Regis
of filtering with the first coefficients. The block of data may be read from a location including a
source coordinate. The final result of filtering may be written to a destination coordinate obtained
by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the
-first coefficients varies according to a rotation mode.
+first coefficients varies according to a rotation mode.
- [US20130091189](https://www.google.com/patents/US20130091189) Single datapath floating point implementation of RCP, SQRT, EXP and LOG functions
and a low latency RCP based on the same techniques
@@ -33,7 +33,7 @@ for performing a polynomial approximation (e.g. a quadratic polynomial approxima
and one or more data tables corresponding to at least one of the RCP, SQRT, EXP or LOG functions
operable to be coupled to the single pipeline according to one or more opcodes; wherein the single
pipeline is operable for computing at least one of RCP, SQRT, EXP or LOG functions according to the
-one or more opcodes.
+one or more opcodes.
- [US20130002651](https://www.google.com/patents/US20130002651) Apparatus and Method For Texture Level Of Detail Computation
@@ -71,7 +71,7 @@ detect an object edge within the image. An edge style detector is configured to
edge end and a second edge end. The edge style detector also identifies an edge style associated
with the detected edge based on the first edge end and the second edge end. The system also includes
a restoration module configured to identify pixel data associated with the detected edge and a
-blending module configured to blend the pixel data associated with the detected edge.
+blending module configured to blend the pixel data associated with the detected edge.
- [US20110234609](https://www.google.com/patents/US20110234609) Hierarchical tile-based rasterization algorithm
@@ -98,7 +98,7 @@ as X, Y and far Z clipping. In one embodiment, the SE module performs clipping
a initial point of rasterization. In one embodiment, the RA module performs clipping by way of
conducting the rendering step of the rasterization process. This approach distributes the complexity
in the graphics processing pipeline and makes the design simpler and faster, therefore design
-complexity, cost and performance may all be improved in hardware implementation.
+complexity, cost and performance may all be improved in hardware implementation.
- [US20100131786](https://www.google.com/patents/US20100131786) Single Chip 3D and 2D Graphics Processor with Embedded Memory and Multiple Levels of
Power Controls
@@ -124,7 +124,7 @@ and to apply a de-ringing filter to a pixel within a pixel subset of the pixel l
determination that the pixel is not an edge pixel. The determination that the pixel is not an edge
pixel is based on the identified maximum pixel jump.
-- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method
+- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method
Published: 2009-05-14
@@ -138,7 +138,7 @@ of the pre-determined thin lines, the pixel block may be deemed to describe a th
apparatus and method may preclude application of an anti-aliasing filter to the substantially
central pixel of the pixel block in the event it describes a thin line.
-- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator
+- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator
Published: 2009-05-14
@@ -149,7 +149,7 @@ greater than a selected threshold, the graphics system is configured to operate
wherein vertex data is rendered immediately upon reception. In the event the rate is less than the
selected threshold, the graphics system is configured to operate in retained mode, wherein vertex
data is stored prior to being rendered. The apparatus and method switches between each of the modes
-on-the-fly in a manner that is transparent to the application.
+on-the-fly in a manner that is transparent to the application.
- [US20090122064](https://www.google.com/patents/US20090122064) Efficient tile-based rasterization