remove lots of trailing spaces

whitespace only changes
author: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-23 18:03:14 +0200
committer: Wladimir J. van der Laan <laanwj@gmail.com> 2013-08-23 18:03:14 +0200
commit: c51729b9c122e6169103be1a0f0a133ba2bcbef6 (patch)
tree: f30c3c7002e35b8a6121e5ba00020a001a215162 /doc
parent: d9dcbafc88dd396d1e7e3b84c9ed37b4afdbc1aa (diff)
6 files changed, 167 insertions, 167 deletions
diff --git a/doc/2d.md b/doc/2d.md
index e09fe76..8a35c67 100644
--- a/doc/2d.md
+++ b/doc/2d.md
@@ -39,7 +39,7 @@ Filter blits are also available as 2D commands, but I was unable to get this to
 Video rasterizer
 -----------------
 
-The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary 
+The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary
 9-tap separable filter with 5 bit subpixel precision,
 
 It supports the following top-level commands:
@@ -160,7 +160,7 @@ These are the input bit for the ROPs, per ROP type:
     bit 1 source
     bit 2 pattern
     bit "3" foreground/background (`ROP_FG` / `ROP_BG`)
-    
+
 ROP3/4 examples:
 
     10101010  0xaa   destination
diff --git a/doc/blob_extensions.md b/doc/blob_extensions.md
index 6683798..0b32e7f 100644
--- a/doc/blob_extensions.md
+++ b/doc/blob_extensions.md
@@ -8,56 +8,56 @@ VERSION 4.6.9:1478, PLATFORM Android
 
 EGL Extensions:
 
-    EGL_KHR_reusable_sync 
-    EGL_KHR_fence_sync 
-    EGL_KHR_image_base 
-    EGL_KHR_image_pixmap 
-    EGL_KHR_image 
-    EGL_KHR_gl_texture_2D_image 
-    EGL_KHR_gl_texture_cubmap_image 
-    EGL_KHR_gl_renderbuffer_image 
-    EGL_KHR_lock_surface 
-    EGL_ANDROID_image_native_buffer 
-    EGL_ANDROID_swap_rectangle 
+    EGL_KHR_reusable_sync
+    EGL_KHR_fence_sync
+    EGL_KHR_image_base
+    EGL_KHR_image_pixmap
+    EGL_KHR_image
+    EGL_KHR_gl_texture_2D_image
+    EGL_KHR_gl_texture_cubmap_image
+    EGL_KHR_gl_renderbuffer_image
+    EGL_KHR_lock_surface
+    EGL_ANDROID_image_native_buffer
+    EGL_ANDROID_swap_rectangle
     EGL_ANDROID_blob_cache
     EGL_ANDROID_recordable
 
 GLES2 Extensions:
 
-    GL_OES_compressed_ETC1_RGB8_texture 
-    GL_OES_compressed_paletted_texture 
-    GL_OES_EGL_image 
-    GL_OES_depth24 
-    GL_OES_element_index_uint 
-    GL_OES_fbo_render_mipmap 
-    GL_OES_fragment_precision_high 
-    GL_OES_rgb8_rgba8 
-    GL_OES_stencil1 
-    GL_OES_stencil4 
-    GL_OES_texture_npot 
-    GL_OES_vertex_half_float 
-    GL_OES_depth_texture 
-    GL_OES_packed_depth_stencil 
-    GL_OES_standard_derivatives 
-    GL_OES_get_program_binary 
-    GL_EXT_texture_format_BGRA8888 
-    GL_IMG_read_format 
-    GL_EXT_blend_minmax 
-    GL_EXT_read_format_bgra 
-    GL_EXT_multi_draw_arrays 
-    GL_APPLE_texture_format_BGRA8888 
-    GL_APPLE_texture_max_level 
-    GL_ARM_rgba8 
-    GL_EXT_frag_depth 
-    GL_VIV_shader_binary 
-    GL_VIV_timestamp 
-    GL_OES_mapbuffer 
-    GL_OES_EGL_image_external 
-    GL_EXT_texture_compression_dxt1 
-    GL_EXT_texture_compression_s3tc 
-    GL_IMG_texture_compression_pvrtc 
-    GL_EXT_discard_framebuffer 
-    GL_OES_vertex_type_10_10_10_2 
-    GL_EXT_texture_type_2_10_10_10_REV 
+    GL_OES_compressed_ETC1_RGB8_texture
+    GL_OES_compressed_paletted_texture
+    GL_OES_EGL_image
+    GL_OES_depth24
+    GL_OES_element_index_uint
+    GL_OES_fbo_render_mipmap
+    GL_OES_fragment_precision_high
+    GL_OES_rgb8_rgba8
+    GL_OES_stencil1
+    GL_OES_stencil4
+    GL_OES_texture_npot
+    GL_OES_vertex_half_float
+    GL_OES_depth_texture
+    GL_OES_packed_depth_stencil
+    GL_OES_standard_derivatives
+    GL_OES_get_program_binary
+    GL_EXT_texture_format_BGRA8888
+    GL_IMG_read_format
+    GL_EXT_blend_minmax
+    GL_EXT_read_format_bgra
+    GL_EXT_multi_draw_arrays
+    GL_APPLE_texture_format_BGRA8888
+    GL_APPLE_texture_max_level
+    GL_ARM_rgba8
+    GL_EXT_frag_depth
+    GL_VIV_shader_binary
+    GL_VIV_timestamp
+    GL_OES_mapbuffer
+    GL_OES_EGL_image_external
+    GL_EXT_texture_compression_dxt1
+    GL_EXT_texture_compression_s3tc
+    GL_IMG_texture_compression_pvrtc
+    GL_EXT_discard_framebuffer
+    GL_OES_vertex_type_10_10_10_2
+    GL_EXT_texture_type_2_10_10_10_REV
     GL_EXT_texture_filter_anisotropic
 
diff --git a/doc/hardware.md b/doc/hardware.md
index b36ce9a..8c55c39 100644
--- a/doc/hardware.md
+++ b/doc/hardware.md
@@ -8,8 +8,8 @@ Major optional blocks: each of these can be present or not depending on the spec
 - 3D engine
 - VG engine
 
-Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose 
-for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000 
+Some SoCs have multiple GPU cores, and have distributed the blocks mentioned above over the cores (I suppose
+for extra parallelism and/or granularity in power switching). For example the Marvell Armada 620 has a GC2000
 with only the 3D engine as well as a GC300 with only the 2D engine. Similarly, the Freescale i.mx6 SoC has a
 GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 
@@ -28,7 +28,7 @@ GC2000 with the 3D engine, a GC320 with 2D engine and a GC355 with VG engine.
 Feature bits
 =================
 
-Which features are supported on a certain Vivante core is not only determined by the model number 
+Which features are supported on a certain Vivante core is not only determined by the model number
 (which AFAIK mainly determines the performance), but specified by a combination of factors:
 
  1) Chip features and minor feature flags
@@ -37,7 +37,7 @@ Which features are supported on a certain Vivante core is not only determined by
  4) Chip revision of the form 0x1234
 
 All of these are available in read-only registers on the hardware. On most cases it suffices to check the feature flags as
-Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes 
+Unlike NV, which parametrizes everything on the model and revision, for GC this is left for bugfixes (even these sometimes
 have their own feature bit).
 
 For an overview of the feature bits see the enumerations in `state.xml`.
@@ -104,7 +104,7 @@ Thread walker = Rectangle walker? (seems to have to do with OpenCL)
 [1] http://www.vivantecorp.com/Vivante_GC320_Technical_Reference_Manual_V1.0_A.pdf
 [2] http://2012ftf.ccidnet.com/pdf/0049.pdf
 
-Connections 
+Connections
 -------------
 Connections between the different modules follow the OpenGL pipeline design [3].
 
@@ -123,7 +123,7 @@ See also [1]
 - SE determines rasterization starting point for each primitive, and also culls based on trivial rejection
 - RA performs per-tile, per-subtile, per-quad and per-pixel clipping
 
-  [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME 
+  [1] METHOD FOR DISTRIBUTED CLIPPING OUTSIDE OF VIEW VOLUME
     http://www.freepatentsonline.com/y2010/0271370.html
   [2] Efficient tile-based rasterization
     http://www.google.com/patents/US8009169
@@ -133,7 +133,7 @@ See also [1]
 Command stream
 -------------------
 
-Commands and data are sent to the GPU through the FE (Front End interface). The 
+Commands and data are sent to the GPU through the FE (Front End interface). The
 command stream of the front-end interface has a specific format described in this section.
 
 Overall format
@@ -154,11 +154,11 @@ Opcodes
     00111 Wait ([15-0] count)
     01000 Link ([15-0] number of bytes, arg address)
     01001 Stall (argument seems same format as state 0380C)
-    01010 Call 
+    01010 Call
     01011 Return
     01101 Chip select
 
-Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and 
+Arguments are always padded to 2 32-bit words. Number of argument words depends on the opcode, and
 sometimes on the first word of the command.
 
 See `cmdstream.xml` for detailed overview of commands and arguments. The most commonly used command is
@@ -184,20 +184,20 @@ The following sequence of states is common:
     GL.SEMAPHORE_TOKEN := FROM=RA,TO=PE
     GL.STALL_TOKEN := FROM=RA,TO=PE
 
-The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In 
-this example it stalls the rasterizer until the pixel engine has completed the commands up until now. 
+The first state load arms the semaphore, the second one stalls the FROM module until the TO module has raised its semaphore. In
+this example it stalls the rasterizer until the pixel engine has completed the commands up until now.
 
 The `STALL` command is used to stall the command queue until the semaphore has been received. The stall command has
-one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE. 
+one argument that has the same format as the `_TOKEN` states above, except that the FROM module is always the FE.
 
 Within the 3D engine, not many explicit synchronization points appear to be needed. Some exceptions:
 
-- The blob issues a semaphore and stall from RA to PE when 
+- The blob issues a semaphore and stall from RA to PE when
 
   - Changing depth configuration in PE
   - Sometimes when changing stencil config in PE
 
-- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when 
+- The blob issues a just a semaphore from RA to PE, and a stall before drawing a primitive when
 
   - Tile status address/configuration changes
   - Clearing depth
@@ -210,9 +210,9 @@ XXX (cwabbott) usually, isa's have some sort of texture barrier or sync operatio
 
 Resolve
 -----------
-The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another, 
-optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and 
-destination address can be the same to fill in tiles that were not touched during the rendering process 
+The resolve module is a copy and fill engine. It can copy blocks of pixels from one GPU address to another,
+optionally tiling/detiling, converting between pixel formats, or scaling down by a factor of 2. The source and
+destination address can be the same to fill in tiles that were not touched during the rendering process
 (according to the Tile Status, see below) with the background color.
 
 The RS and PE (drawing) share one set of pixel pipes. They will never be active concurrently (AFAIK).
@@ -244,18 +244,18 @@ the GPU to hang mysteriously on rendering.
 Shader ISA
 ================
 
-Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same 
+Vivante GPUs have a unified shader ISA, this means that vertex and pixel shaders share the same
 instruction set. See `isa.xml` and `isa.md` for details of the instructions, this section only provides a high-level overview.
 
-- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields 
+- Each instruction consists of 4 32-bit words. These have a fixed format, with bitfields
 that have a meaning which differs only very little per opcode. Which of these fields is used (which operands) does differ per opcode.
 
 - Four-component SIMD processor (for most of the instructions)
 
-- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL. 
+- Older GPUs have floating point operations only, the newer ones have support for integer operations in the context of OpenCL.
   The split is around GC1000, though this being Vivante there is likely some feature bit for it.
 
-- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`). 
+- Instructions can have up to three source operands (`SRC0_*`, `SRC1_*`, `SRC2_*`), and one destination operand (`DST_`).
    In addition to that, there is a specific operand for texture sampling (`TEX_*`).
 
 - Operands can have these properties:
@@ -269,7 +269,7 @@ that have a meaning which differs only very little per opcode. Which of these fi
 
 - Registers:
   - `N` four-component float temporary registers `tX` (actual number depends on the hardware, maximum seems to be 64 for all
-      vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict 
+      vivante GPUs I've encountered up until now), but like with other GPUs using more registers will likely restrict
       the available paralellism)
   - `1` four-component address register `a0`
 
@@ -285,16 +285,16 @@ of the framebuffer using the `FBIOGET_VSCREENINFO` and `FBIOGET_FSCREENINFO` ioc
 This physical address can then directly be used as target address for a resolve operation, just like when copying
 to a normal bitmap.
 
-Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer 
+Even though it would save a resolve operation it is not useful to use the physical address of the frame buffer
 directly for rendering, as it only possible to render to tiled and supertiled surfaces, and (afaik) no display controller
 supports scan out from tiled formats.
 
 In many cases there is more framebuffer memory than that which is used for the current screen, which causes larger virtual resolution
-to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer. 
+to be returned than the physical resolution. Double-buffering is achieved by changing the y-offset within that virtual frame buffer.
 
 Operations
 ========================
-An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify 
+An attempt to figure out which operations can be triggered in the hardware, and what state is used to specify
 their operation.
 
 - RS: Kick off resolve by writing a value with bit 0 set to `RS_KICKER`. State used:
@@ -328,8 +328,8 @@ Programming pecularities
   support. The blob driver uses it for some states (viewport scaling, offset, scissor, ...)
   but not others (uniforms etc).
 
-- It is quite easy to hang the GPU when making a minor programming mistake. 
-  When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing 
+- It is quite easy to hang the GPU when making a minor programming mistake.
+  When the GPU is stuck it is possible to submit command buffers, however nothing gets drawn and nothing
   ever finishes.
 
   Ways I've already made it crash:
@@ -339,11 +339,11 @@ Programming pecularities
   - Sending 3D commands in the 2D pipe instead of 3D pipe (then using a signal waiting for them to complete)
   - Wrong length of shader
   - Texture sampling without properly setup texture units
-  - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions 
+  - `SE_SCISSOR`: setting SCISSOR bottom/right to `(x<<16)|5` instead of `(x<<16)-1` causes crashes for higher resolutions
     such as 1920x1080 on GC600. I don't know why, maybe some buffer or cache overflow. The rockchip vivante driver always uses |5 AFAIK,
     this offset appears to be different per specific chip/revision.
 
-  This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but 
+  This may be a (kernel) driver problem. It is possible to reset the GPU from user space with an ioctl, but
   this usually is not enough to make it un-stuck. It would probably be a better solution to introduce a kernel-based timeout
   instead of relying on userspace to be 100% correct (may exist on v4?).
 
@@ -355,10 +355,10 @@ When the mask bit belonging to a group of state bits is *set* on a state write,
 state bits will be unaffected. If the mask bit is *unset*, the state bits will be written.
 
 This allows setting state per group of bits. For example, it allows setting only
-the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the 
+the destination alpha function (`ALPHA_CONFIG.DST_FUNC_ALPHA`) without affecting the
 other bits in that state word.
 
-If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all 
+If masking functionality is not desired, simply keep all the `_MASK` bits at zero and write all
 bits at once. This is what I used in `etna_pipe`, as I keep track of all state myself.
 
 Texture tiling
@@ -380,7 +380,7 @@ Supertiling
 ![supertile ordering](images/supertile.png)
 
 It appears that the blob always pads render buffers pixel sizes to a multiple of 64, ie, a width of 400 becomes 448 and 800 becomes 832.
-This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures. 
+This is because the render buffer is also tiled, albeit differently than the 4x4 tiling format of the textures.
 On a fine level, every tile is the same as for normal tiled surfaces:
 
      0  1  2  3
@@ -416,7 +416,7 @@ but is only nested one level, in total this results in 64x64 sized tiles.
 The GPU can render to normal tiled surfaces (such as used by textures) as well as supertiled surfaces. However,
 rendering to supertiled surfaces is likely faster due to better cache locality.
 
-Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400), 
+Stride, as used for resolve operations, is for a row of tiles not a row of pixels; 0x1c00 for width 448 (originally 400),
 0x3400 for width 832 (originally 800).
 
 Multisampling
@@ -445,7 +445,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
 - 256x256 target with 4 samples creates a 512x512 render target and depth buffer
 
         GL.MULTI_SAMPLE_CONFIG := MSAA_SAMPLES=4X,MSAA_ENABLES=0xf,UNK12=0x0,UNK16=0x0
-        RA.MULTISAMPLE_UNK00E04 := 0x0 
+        RA.MULTISAMPLE_UNK00E04 := 0x0
         RA.MULTISAMPLE_UNK00E10[2] := 0xaaa22a22
         RA.CENTROID_TABLE[8] := 0x262a2288
         RA.CENTROID_TABLE[9] := 0x886688a2
@@ -466,7 +466,7 @@ GC600 supports 1, 2, or 4 MSAA samples. Vivante's patent [1] on anti-aliasing ma
 
 Other differences when MSAA is enabled:
 
-- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`). 
+- `TS.MEM_CONFIG` is different when MSAA is used (see descriptions for fields `MSAA` and `MSAA_FORMAT`).
 - The TS surface belonging to the enlarged in the same way; it is treated as if there simply is a bigger render target.
 - It also looks like the PS gets an extra input/temporary when MSAA is enabled:
 
@@ -494,25 +494,25 @@ When rendering points (PRIMITIVE_TYPE_POINTS) there are some differences:
 - There is an extra varying for `gl_pointCoord` with two components. This varying has
   its components in `GL_VARYING_COMPONENT_USE` set to `POINTCOORD_X` and `POINTCOORD_Y`.
   Its `PA_SHADER_ATTRIBUTES` is set to `0x000002f1`.
-  The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set 
+  The VS output associated to this varying in `VS_OUTPUT` is discarded, so can be set
   to any output register.
 
-- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point 
+- `rasterizer.point_size_per_vertex` affects number of vs outputs (just like MSAA!). If point
   size per vertex is not set, the value in `PA.POINT_SIZE` is used.
 
-- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding 
+- Distinction between sprite coordinate origin `UPPER_LEFT` / `LOWER_LEFT` is implemented by adding
   a 1.0-y instruction when glPointCoord is used. XXX figure out what is the default.
 
 Vertex texture fetch
 --------------------
 
 Vertex samplers live in the same space as fragment samplers. The blob uses a fixed mapping:
-sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers. 
+sampler 0..7 are used as fragment samplers and 8..11 are used as vertex samplers.
 
 The shaders themselves refer to the absolute shader number; so tex8 is the first texture unit used in a
 vertex shader. The normal TEX instruction can be used to sample textures from a vertex shader.
 
-Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders 
+Vivante hw has two texture caches that need to be flushed separately, one for fragment shaders
 one for vertex shaders (bits `GL.FLUSH_CACHE.TEXTURE` and `GL.FLUSH_CACHE.TEXTUREVS` respectively).
 
 This solves a problem with running `cubemap_sphere` after `displacement` demo;
@@ -525,16 +525,16 @@ Even adding a PE to FE semaphore afterwards or dummy state loads does not fix th
 
 All texture filtering options are allowed for vertex texture fetch.
 
-XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is 
+XXX maybe figure out if the sampler units are shared between fragment and vertex shaders and thus interchangeable. This is
   not important for GL/Gallium because it already lives with the assumption that vertex and fragment shaders
   are distinct.
 
 Shader size on GC2000
 ----------------------
 
-The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command 
-stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit 
-instructions this indeed maps to 512 instructions. 
+The "query chip identity" ioctl on GC2000 reports an instructionCount of 512. Looking at the low-level command
+stream dumps the device appears to have 0x0E000 - 0x0C000 = 8192 bytes of instruction memory, with 128 bit
+instructions this indeed maps to 512 instructions.
 
 XXX does the VS/PS split at instruction 256 during rendering affect OpenCL? Hopefully not...
 
diff --git a/doc/isa.md b/doc/isa.md
index faa6006..91929e3 100644
--- a/doc/isa.md
+++ b/doc/isa.md
@@ -12,15 +12,15 @@ Basic vertex shader
     uniform mat4 modelviewMatrix;
     uniform mat4 modelviewprojectionMatrix;
     uniform mat3 normalMatrix;
-    
+
     attribute vec4 in_position;
     attribute vec3 in_normal;
     attribute vec4 in_color;
-    
+
     vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-    
+
     varying vec4 vVaryingColor;
-    
+
     void main()
     {
         gl_Position = modelviewprojectionMatrix * in_position;
@@ -73,16 +73,16 @@ Vertex shader with texture coordinates
     uniform mat4 modelviewMatrix;
     uniform mat4 modelviewprojectionMatrix;
     uniform mat3 normalMatrix;
-   
+
     attribute vec4 in_position;
     attribute vec3 in_normal;
     attribute vec2 in_coord;
-    
+
     vec4 lightSource = vec4(2.0, 2.0, 20.0, 0.0);
-    
+
     varying vec4 vVaryingColor;
     varying vec2 coord;
-    
+
     void main()
     {
         gl_Position = modelviewprojectionMatrix * in_position;
@@ -129,9 +129,9 @@ Empty (passthrough)
 --------------------
 
     precision mediump float;
-    
+
     varying vec4 vVaryingColor;
-    
+
     void main()
     {
         gl_FragColor = vVaryingColor;
@@ -145,12 +145,12 @@ Texture sampling
 ------------------
 
     precision mediump float;
-    
+
     varying vec4 vVaryingColor;
     varying vec2 coord;
-    
+
     uniform sampler2D in_texture;
-    
+
     void main()
     {
         gl_FragColor = 3.0 * vVaryingColor * texture2D(in_texture, coord);
@@ -174,8 +174,8 @@ This adjusts the output position z, based on w. Likely this works around a diffe
 the hardware and the OpenGL standard.
 
 For the gc2000 in the i.mx6 these two instructions are no longer appended (the only difference in the vertex shader for
-smoothed cube between gc600 and gc2000, as generated by the blob driver, is this). 
-  
+smoothed cube between gc600 and gc2000, as generated by the blob driver, is this).
+
 The cutoff point for this is at GC1000. All Vivante GPUs before GC1000 require these two instructions, except
 the GC880.
 
@@ -191,7 +191,7 @@ Misc notes
 
   gl_fragCoord: contains the window-relative coordinates of the current fragment
 
-- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing. 
+- In PS, RGROUP 1 register i0.x contains the value of gl_FrontFacing.
   i0.y also contains a non-zero value. i0.zw are zero.
 
     - i1..i127 are simply aliases of i0, at least on my GC600.
diff --git a/doc/kernel_interface.md b/doc/kernel_interface.md
index 0906099..9d56689 100644
--- a/doc/kernel_interface.md
+++ b/doc/kernel_interface.md
@@ -20,14 +20,14 @@ along with the values on a RK2918 device:
 
 Most important to get right are registerMemSize, registerMemBase and irqLine as these allow the driver to find and
 communicate with the GPU hardware. They depend on the board, not on the GPU. For example, on a CuBox these settings are:
-    
+
     irqLine         42
     registerMemBase 0xf1840000
     contiguousBase  0x08000000
 
 The `dove` (cubox) driver also has a `gpu_frequency` parameter that sets the AXICLK/GCCLK clock at startup,
 if compiled with `ENABLE_GPU_CLOCK_BY_DRIVER`. Some devices may need this, although not the CuBox itself (it is disabled in the makefile).
-In that case your GPU will have an entry `GC` in `/proc/clocks`. 
+In that case your GPU will have an entry `GC` in `/proc/clocks`.
 
 On a Freescale i.MX6 (GK802) device the parameters are:
 
@@ -44,7 +44,7 @@ On a Freescale i.MX6 (GK802) device the parameters are:
     contiguousSize    0x0c000000  (192 MB)
     coreClock         156000000
     signal            48
-    baseAddress       0 
+    baseAddress       0
 
 Diagnostics
 ==============
@@ -173,9 +173,9 @@ At startup, the application connects to galcore device using `open` with the dev
 - `/dev/galcore`, or
 - `/dev/graphics/galcore`
 
-After connecting to the device the entire chunk of contiguous memory, after requesting its address and size, 
+After connecting to the device the entire chunk of contiguous memory, after requesting its address and size,
 is mapped into user space using `mmap`. The kernel will return addresses in this range when the user space driver allocates
-contiguous (unified) memory used for communication with the GPU. 
+contiguous (unified) memory used for communication with the GPU.
 
 Ioctl
 -------
@@ -189,7 +189,7 @@ Communication with the kernel driver happens through ioctl calls on the resultin
 `IOCTL_GCHAL_INTERFACE` is the only one of these that is actually used by the userspace blob. This ioctl is passed one argument
 which is a pointer to the following structure:
 
-    typedef struct 
+    typedef struct
     {
         void *in_buf;
         uint32_t in_buf_size;
@@ -197,28 +197,28 @@ which is a pointer to the following structure:
         uint32_t out_buf_size;
     } vivante_ioctl_data_t;
 
-When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is 
+When used by the blob, `in_buf` and `out_buf` point to the same memory address: a `gcsHAL_INTERFACE` structure that is
 used both for input and output arguments.
 
 Command structure
 ------------------
-The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the 
-kernel. It can be seen as a communication packet with a command opcode and an union with parameters. 
+The `gcsHAL_INTERFACE` (defined in `gc_hal_driver`) is the structure used by the driver to communicate with the
+kernel. It can be seen as a communication packet with a command opcode and an union with parameters.
 Depending on the `command` a different field of this union is used. The same structure is used both for input and output
-arguments. 
+arguments.
 
-For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on) 
-uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but 
-also to pass out the number of bytes actually allocated. 
+For example, the command `gcvHAL_ALLOCATE_LINEAR_VIDEO_MEMORY` (I will leave off the `gcvHAL_` from now on)
+uses the fields in `interface->u.AllocateLinearVideoMemory` to pass in the number of bytes to allocate, but
+also to pass out the number of bytes actually allocated.
 
-What is curious about the ioctl protocol is that the communication structures contains fields that are not 
-used by the kernel at all. There is no good reason why these values would need 
+What is curious about the ioctl protocol is that the communication structures contains fields that are not
+used by the kernel at all. There is no good reason why these values would need
 to be present in kernel-facing structures. The line is blurry sometimes.
 It also appears that the structure has been designed with platform-independence in mind, and so some of the fields are not used in the Linux
 drivers such as `status`, `handle`, `pid`.
 
 A possibly worthwhile long-term goal would be to clean up the kernel driver interface. This would break compatibility with
-the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI 
+the Vivante binary blobs, though, so maybe the effort would be better spent building a fully-fledged DRM/DRI
 infrastructure driver instead.
 
 Allocations
@@ -240,29 +240,29 @@ Memory management happens in the kernel. Two types of memory are allocated:
   Allocated with command `ALLOCATE_LINEAR_VIDEO_MEMORY`
 
   Device memory, from one of the pools (default, local, unified or contiguous system memory)
-  The available pools depend on the hardware; many of the devices have no local memory, and simply 
+  The available pools depend on the hardware; many of the devices have no local memory, and simply
   use a part of system memory as video memory.
 
-`LOCK_VIDEO_MEMORY` locks the video memory both 
+`LOCK_VIDEO_MEMORY` locks the video memory both
 - into the GPU memory space so that it can be used by the GPU
-- into CPU memory so that the application can read/write. 
+- into CPU memory so that the application can read/write.
 It is interesting that these are done by
 the same call.
 
 Command buffers
 -------------------
 
-Like many other GPUs, the primary means of programming the chip is through a command stream 
+Like many other GPUs, the primary means of programming the chip is through a command stream
 interpreted by a DMA engine. This "Front End" takes care of distributing state changes through
-the individual modules of the GPU, kicking off primitive rendering, synchronization, 
+the individual modules of the GPU, kicking off primitive rendering, synchronization,
 and also supports some primitive flow control (branch, call, return).
 
-The command stream is submitted to the kernel by means of command buffers. As most important part these 
-structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`) 
+The command stream is submitted to the kernel by means of command buffers. As most important part these
+structures contain a pointer to contiguous memory (allocated with command `ALLOCATE_CONTIGUOUS_MEMORY`)
 where the commands start.
 
-Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the 
-`COMMIT` command. 
+Command buffers are built in user space by the driver in a `gcoCMDBUF` structure, then submitted to the kernel with the
+`COMMIT` command.
 
 The following structure fields of `gcoCMDBUF` are used by the kernel:
 
@@ -276,16 +276,16 @@ The following structure fields of `gcoCMDBUF` are used by the kernel:
 
 User signal API
 ----------------
-Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver. 
+Command `USER_SIGNAL` is used for synchronization signals between the kernel and userspace driver.
 
 Note: the contents in this section only apply as-is if the kernel was *not* compiled with `USE_NEW_LINUX_SIGNAL`. If this
-flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the 
+flag was set, then a posix real-time signal will be used to notify the process of incoming signals, and the
 `USER_SIGNAL_WAIT` is a no-op.
 
 The subcommands are:
 
 - `USER_SIGNAL_CREATE` Create a new signal
-  Inputs: 
+  Inputs:
      - manualReset
      If set to gcvTRUE, the `SIGNAL` command must be used with state false to
      reset the signal. If set to gcvFALSE, the signal automatically resets
@@ -306,7 +306,7 @@ The subcommands are:
   Outputs: N/A
 
 - `USER_SIGNAL_WAIT` Wait on the signal (block current thread)
-  Inputs: 
+  Inputs:
     - id     Signal id to wait for
     - wait   Maximum duration to wait (in milliseconds)
   Outputs: N/A
@@ -319,7 +319,7 @@ The subcommands are:
   Inputs: id
   Outputs: N/A
 
-This is used to synchronize GPU and CPU. 
+This is used to synchronize GPU and CPU.
 Signals can be scheduled to be signalled/unsignalled when the GPU finished a certain operation (using an Event).
 They are also used for inter-thread synchronization by the EGL driver.
 
@@ -384,7 +384,7 @@ Context switching
 ==================
 Clients manage their own context, which is passed to COMMIT preemptively in case a context switch is needed.
 
-It appears that context switching is manual. Every process has to keep its own context structure for 
+It appears that context switching is manual. Every process has to keep its own context structure for
 context switching, and pass this to COMMIT. In case this is needed the kernel will then load the state
 from the context buffer.
 
@@ -399,7 +399,7 @@ The state `FE.VERTEX_ELEMENT_CONFIG` is handled specially: write only the elemen
 
 Used fields in `struct _gcoCONTEXT` from the kernel:
 
-- `id` 
+- `id`
     [in] This id is used to determine wether to switch context
     [out] A unique id for the context is generated the first time a COMMIT is done, with context->id==0
 - `hint*` only used when `SECURE_USER` is set
@@ -421,7 +421,7 @@ Profiling
 
 To enable profiling, the kernel most have been built with `VIVANTE_PROFILER` enabled in `gc_hal_options.h` or the appropriate
 `config` file.
-   
+
     USE_PROFILER                        = 1
 
 Vivante also recommends disabling power management features while profiling,
@@ -430,7 +430,7 @@ Vivante also recommends disabling power management features while profiling,
 
 HW profiling registers can be read using the command `READ_ALL_PROFILE_REGISTERS`.
 
-There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for 
+There are also the commands `GET_PROFILE_SETTING` and `SET_PROFILE_SETTING`, which set a flag for
 logging to a file (`vprofiler.xml` by default), but this flag doesn't do anything in the kernel driver,
 likely it's meant to be read out by the user space driver.
 
@@ -539,12 +539,12 @@ TODO: input/output arguments.
 * `QUERY_CHIP_IDENTITY`
 
         Query chip identity.
-        
-        Calls: gckHARDWARE_QueryChipIdentity 
+
+        Calls: gckHARDWARE_QueryChipIdentity
 
 * `ALLOCATE_NON_PAGED_MEMORY`
 
-        Allocate non-paged memory. 
+        Allocate non-paged memory.
 
         Calls: gckOS_AllocateNonPagedMemory
 
@@ -558,7 +558,7 @@ TODO: input/output arguments.
 
         Allocate contiguous non-paged memory (used for command buffers).
 
-        Calls: gckOS_AllocateContiguous 
+        Calls: gckOS_AllocateContiguous
 
 * `FREE_CONTIGUOUS_MEMORY`
 
@@ -579,7 +579,7 @@ TODO: input/output arguments.
         Walks all required memory pools to allocate the requested amount of video memory.
 
         gcvPOOL_VIRTUAL: Virtual memory, allocated using gckVIDMEM_ConstructVirtual
-        gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual 
+        gcvPOOL_CONTIGUOUS: Contiguous memory, allocated using gckVIDMEM_ConstructVirtual
         gcvPOOL_SYSTEM: Contiguous system memory
         gcvPOOL_LOCAL_INTERNAL: Internal memory
         gcvPOOL_LOCAL_EXTERNAL: External memory
@@ -599,7 +599,7 @@ TODO: input/output arguments.
 
 * `FREE_VIDEO_MEMORY`
 
-        Calls: gckVIDMEM_Free 
+        Calls: gckVIDMEM_Free
 
 * `MAP_MEMORY`
 
@@ -610,7 +610,7 @@ TODO: input/output arguments.
 * `UNMAP_MEMORY`
 
         Unmap memory mapped with `MAP_MEMORY`.
-        
+
         Calls: gckKERNEL_UnmapMemory (gckOS_UnmapMemory)
 
 * `MAP_USER_MEMORY`
@@ -630,20 +630,20 @@ TODO: input/output arguments.
 
         Surface lock.
 
-        Calls: gckVIDMEM_Lock 
+        Calls: gckVIDMEM_Lock
 
 * `UNLOCK_VIDEO_MEMORY`
-        
+
         Surface unlock.
-        
-        Calls: gckVIDMEM_Unlock 
+
+        Calls: gckVIDMEM_Unlock
 
 * `EVENT_COMMIT`
-    
+
         Commit an event queue.
 
         Calls: gckEVENT_Commit
-    
+
 * `USER_SIGNAL`
 
         Dispatch depends on the user signal subcommands (refer to section `User signal API`).
@@ -662,7 +662,7 @@ TODO: input/output arguments.
 * `COMMIT`
 
         Commit a command and context buffer.
-        
+
         Calls: gckCOMMAND_Commit
 
 * `STALL`
@@ -680,7 +680,7 @@ TODO: input/output arguments.
         Calls: gckOS_ReadRegister
 
 * `WRITE_REGISTER`
-        
+
         Write a GPU register. Only enabled if kernel compiled with `gcdREGISTER_ACCESS_FROM_USER` (which
         is obviously an security risk, as it allows user-space to read and write arbitrary registers).
 
@@ -704,7 +704,7 @@ TODO: input/output arguments.
         Calls: gckHARDWARE_QueryProfileRegisters
 
 * `PROFILE_REGISTERS_2D`
-        
+
         Read all 2D profile registers. Only available if kernel compiled with `VIVANTE_PROFILER` enabled.
 
         Calls: gckHARDWARE_ProfileEngine2D
@@ -763,7 +763,7 @@ TODO: input/output arguments.
         Flush or invalidate the cache.
         NOTE: unimplemented on Linux, and also apparently not called by the blob on Linux.
 
-        In: 
+        In:
           invalidate: If FALSE, flush the cache (the GPU is going to need the data)
                       if TRUE, flush and invalidate the cache (if the GPU is going to modify the data)
           process: Process handle Logical belongs to or gcvNULL if Logical belongs to the kernel.
@@ -776,7 +776,7 @@ TODO: input/output arguments.
 
         Broadcast GPU stuck.
 
-        Calls: gckOS_Broadcast 
+        Calls: gckOS_Broadcast
 
 Crash recovery
 ================
diff --git a/doc/patents.md b/doc/patents.md
index 47c0280..b75a82f 100644
--- a/doc/patents.md
+++ b/doc/patents.md
@@ -21,7 +21,7 @@ optionally blended with another data value and written to a memory device. Regis
 of filtering with the first coefficients. The block of data may be read from a location including a
 source coordinate. The final result of filtering may be written to a destination coordinate obtained
 by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the
-first coefficients varies according to a rotation mode. 
+first coefficients varies according to a rotation mode.
 
 - [US20130091189](https://www.google.com/patents/US20130091189) Single datapath floating point implementation of RCP, SQRT, EXP and LOG functions
 and a low latency RCP based on the same techniques
@@ -33,7 +33,7 @@ for performing a polynomial approximation (e.g. a quadratic polynomial approxima
 and one or more data tables corresponding to at least one of the RCP, SQRT, EXP or LOG functions
 operable to be coupled to the single pipeline according to one or more opcodes; wherein the single
 pipeline is operable for computing at least one of RCP, SQRT, EXP or LOG functions according to the
-one or more opcodes. 
+one or more opcodes.
 
 - [US20130002651](https://www.google.com/patents/US20130002651) Apparatus and Method For Texture Level Of Detail Computation
 
@@ -71,7 +71,7 @@ detect an object edge within the image. An edge style detector is configured to
 edge end and a second edge end. The edge style detector also identifies an edge style associated
 with the detected edge based on the first edge end and the second edge end. The system also includes
 a restoration module configured to identify pixel data associated with the detected edge and a
-blending module configured to blend the pixel data associated with the detected edge. 
+blending module configured to blend the pixel data associated with the detected edge.
 
 - [US20110234609](https://www.google.com/patents/US20110234609) Hierarchical tile-based rasterization algorithm
 
@@ -98,7 +98,7 @@ as X, Y and far Z clipping.  In one embodiment, the SE module performs clipping
 a initial point of rasterization. In one embodiment, the RA module performs clipping by way of
 conducting the rendering step of the rasterization process. This approach distributes the complexity
 in the graphics processing pipeline and makes the design simpler and faster, therefore design
-complexity, cost and performance may all be improved in hardware implementation. 
+complexity, cost and performance may all be improved in hardware implementation.
 
 - [US20100131786](https://www.google.com/patents/US20100131786) Single Chip 3D and 2D Graphics Processor with Embedded Memory and Multiple Levels of
 Power Controls
@@ -124,7 +124,7 @@ and to apply a de-ringing filter to a pixel within a pixel subset of the pixel l
 determination that the pixel is not an edge pixel. The determination that the pixel is not an edge
 pixel is based on the identified maximum pixel jump.
 
-- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method 
+- [US20090122076](https://www.google.com/patents/US20090122076) Thin-line detection apparatus and method
 
 Published: 2009-05-14
 
@@ -138,7 +138,7 @@ of the pre-determined thin lines, the pixel block may be deemed to describe a th
 apparatus and method may preclude application of an anti-aliasing filter to the substantially
 central pixel of the pixel block in the event it describes a thin line.
 
-- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator 
+- [US20090122068](https://www.google.com/patents/US20090122068) Intelligent configurable graphics bandwidth modulator
 
 Published: 2009-05-14
 
@@ -149,7 +149,7 @@ greater than a selected threshold, the graphics system is configured to operate
 wherein vertex data is rendered immediately upon reception. In the event the rate is less than the
 selected threshold, the graphics system is configured to operate in retained mode, wherein vertex
 data is stored prior to being rendered. The apparatus and method switches between each of the modes
-on-the-fly in a manner that is transparent to the application. 
+on-the-fly in a manner that is transparent to the application.
 
 - [US20090122064](https://www.google.com/patents/US20090122064) Efficient tile-based rasterization
author	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-23 18:03:14 +0200
committer	Wladimir J. van der Laan <laanwj@gmail.com>	2013-08-23 18:03:14 +0200
commit	c51729b9c122e6169103be1a0f0a133ba2bcbef6 (patch)
tree	f30c3c7002e35b8a6121e5ba00020a001a215162 /doc
parent	d9dcbafc88dd396d1e7e3b84c9ed37b4afdbc1aa (diff)