While redesigning their graphics core, ATI also decided a new take on the memory controller was in order. Their goals were to make a much more efficient controller that would be able to handle tomorrow’s memory frequencies, future types of memory, all while increasing memory bandwidth with technologies available today.
In order to improve memory bandwidth, ATI decided to forego the norm of simply increasing the width and clock speed of the data path. ATI’s new Ring Bus System allows better cache management and hides latency. They were also able to improve their compression techniques.
An efficient memory controller is paramount. Understanding the Ring Bus and the advantages that it provides, you must first understand the nature of the memory system in a video card. The memory controller has multiple items trying to read and write to the video memory such as the GPU, the PCI-Express bus, shader processor units, texture units, etc. Knowing this, what you need is a very smart way of getting to and from them to maximize efficiency. This is what the memory controller does: it assigns tasks and manages the data to and from the memory modules. As devices requesting memory access increase, along with increased amounts of memory, a more efficient means to route all the data is needed.
The X1000 series Ring Bus is made up of two internal 256-bit rings that operate in opposite directions. You might see this referred to as a 512-bit internal memory bus. However, the link to the actual memory modules is 256-bit. The bus itself in the Ring Bus system runs along the outer edge of the chip to simplify routing of the wires to provide cleaner signals, which can allow for higher clock speeds. Along these two internal rings are four ring stops, one for each part of the memory channels. The data travels between ring stops each clock cycle until it reaches its destination.
The memory controller itself is fully programmable so this means that ATI can tweak the drivers and improve memory efficiency for a certain application or game.
ATI has also divided their memory channels on the X1000 series up into eight 32-bit channels. On the previous generation X850 series and lower, ATI employed four 64-bit channels. What this means is that on the X850 series and below, if 32-bits of data needed to be transmitted to a piece of memory, then there would literally be 32-bits of the interface not being used at all. By making smaller channels, less bandwidth is wasted if data chunks are small.
Along with the new Ring Bus, ATI has also improved the latency and cache design on the X1000 series. Latency measures the time that passes between when a request is made and when the request is actually served. The X1000 series now uses a fully associative cache for its texture, color, and depth/stencil buffer caches. The advantage here is that the cache does not have to be mapped to a specific block of graphics memory. This allows them to map to whatever free memory is available, which allows them to avoid cache contention stalls and increase hit rates.
The arbitration logic in the Radeon X1800 Memory Controller is much more sophisticated than in previous GPU designs. It uses a feedback system that collects a variety of information from each memory client when it makes a read or write request. This data is run through an algorithm that calculates a priority value to assign to each request. The arbitration logic also tracks how successful the priority value calculations are over time by monitoring how long each client is kept waiting for data, and identifying cases where delays cause performance bottlenecks to develop. This information is then fed back into future priority calculations using weighting parameters to improve their accuracy.
In previous generations, ATI has used a direct mapped cache system. This means that each cache entry was mapped to a specific, pre-determined block of graphics memory. This design is prone to stalls when a memory client attempts to write values to a different memory address that mapped to the same cache entry.
ATI has also made improvements to Hyper Z, which is their hierarchical Z buffer technology. There have been improvements to help reduce the number of pixels that actually need to be rendered by removing more hidden pixels. ATI claims up to 60% more hidden pixels will be removed compared to the X850.
Along with removing unseen pixels, compression is another improvement ATI has made. DirectX Texture Compression (DXTC) is a common compression scheme that is used for textures, but there are other things that DXTC does not work well for. The X800 series introduced 3Dc, which is a compression scheme that can help with things like normal maps. The X1000 series has improved upon this support and now has 3Dc+. 3Dc+ adds support for 2:1 compression of single component 8-bit integer textures. So now 3Dc supports 4:1 for normal maps and textures consisting of two 8-bit integer components and 2:1 compression for single component 8-bit integer textures. Note that the game content developer must specifically code for this feature to take advantage of it.