determines how pixel blending combines source and destination DXT1 compressed textures are stored untiled. DXT2/3 compressed textures are stored untiled. DXT4/DXT5 compressed textures are stored untiled. ETC compressed textures are stored untiled. Offset into texture memory for cube map face The vertex shader to be used for 3D rendering is configured here. index of last instruction + 1 Each bitfield (up to 16 in total) contains a temporary register number that is used as output at the end of the shader for that varying. Each bitfield (up to 16 in total) contains the number of a temporary register that is assigned the input for that attribute at the beginning of shader execution. The thread walker drives shaders in a predefined grid for GPGPU computing (OpenCL). These states are not used for normal rendering. At the beginning of a thread the first temporary registers will contain the local and global ids. This value what ids will be present, and in which order. Write some value to this register to kick off thread walker Primitive assembly assembles primitives (tris, quads, lines, points etc) from vertices for 3D rendering. Viewport scaling, line width and point size is configured here. 0x11 for OpenGL, 0x00 for D3D9 These can be set either per group of bits, or all at once, by using masking flags. Each group of state flags has a masking flag that prevents overwriting the flags in that group. Adds an extra output to VS (at the end). Flags word per shader attribute. I suspect that these determine the type of interpolation (color, perspective, linear, ...). The setup engine takes care of scissor, clipping, and depth scale. Always disabled for OpenGL Configuration for the rasterizer. This mainly controls multisampling. The Pixel (Fragment) shader to use is configured here The Pixel Engine takes care of writing pixels to the framebuffer, doing blending, depth testing and alpha testing if needed. Some flags can be set either per group of bits, or all at once, by using masking flags. Each group of state flags has a masking flag that when set prevents overwriting the flags in that group. These will be called FOO_MASK if the state to be masked is called FOO. Warning: confusing terminology. WRITE_MASK is the stencil write mask, the state bits can be masked with WRITE_MASK_MASK. REF_BACK is in register STENCIL_CONFIG_EXT. XXX there appears to be no specific MASK_BACK, this state is used for both front and back? Hardware composer. This functionality is present on some GCxxxx chips and allows for blending surfaces together with Porter-Diff composition methods, to accelerate the likes of Surfaceflinger (Android). To my current understanding, RESOLVE is a multifunctional copy/fill engine that can copy blocks of pixels from one place in memory to another, actually clearing tiles that are marked as cleared in the process. Other capabilities are: - Conversion between pixel formats - Downsampling (2x horizontal or horizontal and vertical) - Fill with constant value - Partial fill (only clear part of the channels) - Tiling / untiling, for normal tiled and supertiled surfaces - Swap blue and red channels, flip image in Y - Endian swapping - Fill tiles that are marked as 'cleared' in the Tile Status The following render target tilings are possible: A B C 0 x x Linear 1 0 0 Tiled (4x4, like textures) 1 1 0 Supertiled (64x64) 1 0 1 Multi-tiled (4x4, like textures) 1 1 1 Multi-supertiled (64x64) A) tiled: SOURCE_TILED, DEST_TILED in CONFIG word B) supertiled: TILING bit in SOURCE_STRIDE / DEST_STRIDE C) multi: MULTI bit in SOURCE_STRIDE / DEST_STRIDE GC2000 and other GPUs with multiple pixel pipes have additional multi-pipe tiling formats, which are used by the PE when rendering as an extra form of paralelism. When multitiling the image is divided up vertically into separate units with their own starting address. Write some value to this register to kick off resolver For clear operations, this specifies the format that CLEAR_CONTROL.BITS is in. When downsampling the source and destination size will be different. In this case, the WINDOW_SIZE will be the (unscaled) source size. Four groups (per tile) of four bits (per channel) that affect which channels of which tiles are cleared. In mode 'enabled' only the lower four bits are used, in 'enabled2' all four groups are used. Components that are disabled are not written at all by the clear logic (they keep their old value) they are not copied from the source. Depending on the clear mode, the RS does different things: - If disabled, it is a copy engine - If enabled, it fills the target area with FILL_VALUE(0) and disregards the source - If enabled2, it fills the target area with the four FILL_VALUEs (results in vertical stripes of width 4, at least with supertiled target) and disregards the source. Tile status block contains information about the tiles to be resolved. It is used by the PE (to read/update tile status) as well as the RS (to read tile status for source). Tile status config. Setting this value to 0 disables tile status and makes the resolve work like a normal copy engine. The main bits for switching MSAA in rendering are in register #0x03818, these bits in the TS memory configuration appear to affect the writing of tiles in a minor way. Hierarchical Z allocates multiple depth buffers for one surface, which have their own TS. The YUV tiler can combine planar YUV formats to RGB or non-planar YUV formats. To disable mipmapping, set this to NONE as well as set the minimum and maximum LOD level to 0..0. Logarithm of size of anisotropic filter, in 3.5 format. This fixed-point value is the maximum LOD level. It can be a fractional value, up to the number of defined mipmaps. This fixed-point value is the minimum LOD level. It can be a fractional value. This fixed-point value is added to the computed LOD level when BIAS_ENABLE is on. It appears that it can also be negative by using two's complement arithmetic. Texture sampling, filtering, LOD, etc 8 fragment texture samplers, 4 vertex texture samplers Extra texture states for newer hardware. These exist if chipMinorFeatures2 bit 11 set. 16 fragment texture samplers, 16 vertex texture samplers Shader instruction memory on new hardware that supports more than 256, or more than 1024 shader instructions (different areas are used based on these cases).