DMA engine. This is the frontend from the CPU to the GPU, and takes care of parsing the command stream and loading states, as well as loading vertex streams. Wraps around (1=1,2=2,3=3,4=0) The address must be 64-bit aligned and it is always physical. This register cannot be read. This is useful for debugging a stuck command queue. The GPU's DMA engine fetches 64-bit words at once. This register will read the lower 32 bits (word 0) of the last-fetched DMA word. The GPU's DMA engine fetches 64-bit words at once. This register will read the upper 32 bits (word 1) of the last-fetched DMA word. Global device control states. Here is configured what pipe to use (2D or 3D), when to send event, when to wait on semaphores and the API mode (OGL or D3D). Make sure that the PE is idle before switching pipes. Warning: setting the `TEXTUREVS` bit seems to result in crashes when rendering directly afterwards. Even adding a PE to FE semaphore afterwards or dummy state loads does not fix this. Number of components for all varyings together, rounded to a multiple of 2. Number of components per varying (PS). 2 bits per varying component, 16 components per 32-bit word. The current context pointer is stored here by the v4 kernel driver if debugging is enabled. Likely this register is not used by the hardware, but only by debugging software. Dummy state write, sometimes used for inserting padding or small delays into the command stream.