Radeon X1000 series

ATI's "R520" core is the foundation for a line of DirectX 9c 3D accelerator x1000 video cards. It is ATI's first major architectural overhaul since the "R300" core and is highly optimized for Shader Model 3. The Radeon X1000 series using the core was introduced on October 5 2005

File:Wp-RX1900-001-1024X768b.jpg

ATI Radeon X1900 advertisement, using the R580 core

Features

X1800 chip

The R520 core architecture is referred to by ATI as an "Ultra Threaded Dispatch Processor". This refers to ATI's plan to boost the efficiency of their core, instead of going with a brute force increase in the number of processing units. A central pixel shader "dispatch unit" breaks shaders down into threads (batches) of 16 pixels (4x4) and can track and distribute up to 128 threads per pixel "quad" (4 pipelines each). When one of the shader quads becomes idle, due to a completion of a task or waiting for other data, the dispatch engine will assign the quad with another task to do in the meantime, with the overall result being a greater utilization of the shader units, theoretically. With such a large number of threads per "quad", ATI created a very large general purpose register array that is capable of multiple concurrent reads and writes and has a high-bandwidth connection to each shader array. This provides temporary storage necessary to keep the pipelines fed by having work available as much as possible. With chips such as RV530 and R580, where the number of shader units per pipeline triples, the efficiency of pixel shading drops off slightly because these shaders still have the same level of threading resources as the less endowed RV515 and R520.

The next major change to the core is with its memory bus. R420 and R300 had nearly identical memory controller designs, with the former being a bug fixed release designed for higher clock speeds. R520, however, differs with its central controller (arbiter) that connects to the "memory clients". Around the chip there are two 256-bit ring busses running at the same speed as the DRAM chips, but in opposite directions to reduce latency. Along these ring buses are 4 "stop" points where data exits the ring and going into or out of the memory chips. There is actually a fifth stop, one that is significantly less complex, designed for the PCI Express interface and video input. This design allows memory accesses to be far quicker though lower latency by virtue of the smaller distance the signals need to move through the GPU, and by increasing the number of banks per DRAM. Basically the chip can spread out memory requests faster and more directly to the RAM chips. ATI claims a 40% improvement in efficiency over older designs. Again, the smaller cores such as RV515 and RV530 receive cutbacks due to their smaller, less costly designs. RV530, for example, has two internal 128-bit busses instead. This generation has support for all recent memory types, including GDDR4.

File:Ruby's Revenge 1024.jpg

Ruby returns in The Assassin

The vertex shader engines were already of the required FP32 precision in ATI's older products. Changes necessary for SM3.0 included longer instruction lengths, dynamic flow control instructions, with branches, loops and subroutines and a larger temporary register space. The pixel shader engines are actually quite similar in computational layout to their R420 counterparts, although they were heavily optimized and tweaked to reach high clock speeds on the 90 nm process. ATI has been working for years on a high-performance shader compiler in their driver for their older hardware, so staying with a similar basic design that is compatible offered obvious cost and time savings. At the end of the pipeline, the texture addressing processors saw some improvements, such as 4096x4096 texture support and ATI's 3Dc normal map compression sees an improvement in compression ratio for more specific situations.

As is typical for an ATI video card release, a selection of real-time 3D demonstration programs were released at launch. ATI's development of their "digital superstar", Ruby, continued with a new demo named The Assassin. The demo showcased a highly-complex environment, with high dynamic range lighting (HDR) and dynamic soft shadows. Ruby's latest nemesis, Cyn, was composed of 120,000 polygons. [1]

Unlike its Xenos sibling in Xbox 360, the R520 is a more traditional non-unified shader design, in that it has functional units designed only for their specific task (vertex shaders and pixel shaders).

Variants

X1300 series

The X1300 series is the budget card of the X1000 series. It replaces the X300/X600/X550 series from the previous generation, and shares similar capabilities. The chip carries a 4 pipeline design, similar to those older cards. However, it has all the same capabilities of the higher end boards. In fact, it is said that the chip uses 1 "quad" (4 pipelines per quad), whereas the faster boards use just more of these "quads". For example, the X1800 uses 4 "quads". This modular design allows ATI to build a "top to bottom" line-up using identical technology, saving time and money.

The X1300 series used the RV515 core. There are 3 main variations of the X1300. The HyperMemory variant is quite different from the regular X1300. The local memory is clocked at 400 MHz, higher than the standard X1300, but its bus width is very limited at 32-bit, meaning it has far less available local memory bandwidth. This lowers the cost of the board (to ATI) significantly by reducing board complexity and using fewer and cheaper RAM chips. The onboard memory plays the role of a sort of cache, while most memory operations are performed on system RAM instead. The X1300 Pro gets faster memory, using DDR2 memory as compared to the DDR memory of the normal X1300.

As of late June 2006, reports have surfaced of a PCI card released in Japan using this chip thru a PCIe-to-PCI bridge. [2] This would be ATI's very first DirectX 9 chip available on PCI; until now NVIDIA has had this niche all to itself, except for the now-discontinued XGI Technology Volari V3XT. DirectX 9 on PCI is a rather small niche for now, but with Windows Vista demanding DirectX 9 for its signature Windows Aero interface, it is likely to grow dramatically as those who bought integrated-graphics systems without AGP or PCIe slots in the last few years prepare to upgrade to Vista.

X1400 series

A slightly faster Mobility Radeon X1300. Uses the same GPU core as X1300, but with a higher clock speed. Performance comparisons put it well behind X1600 but ahead of X600. [3]

X1600 series

X1600 uses the RV530 core, a core that is quite a bit different from the RV515 or X1300 and the R520 of X1800. The X1600 is positioned to replace Radeon X600 and Radeon X700 as ATI's mid-range GPU.

It shares design philosophy with the X1900, in that it has a far different ratio of pixel shader processors. ATI has stated that the X1600 is designed with a far greater shader computational load, a prediction of future game workloads. Whereas the X1300 and X1800 have an equal pixel shader to texturing unit ratio, which targets a more equal workload of shaders and texturing in games, the RV530 of X1600 alters this to 12 pixel shaders and 4 texturing units. The chip's single "quad" has 3 pixel shader processors per pipeline. This means the chip has the same texturing ability as the X1300 at the same clock speed, but with its 12 pixel shaders it encroaches on X1800's territory in shader computational performance. While the performance is no where near that of an x1800 it still manages to lead the x1300 by a decent margin across the board. The X1600 also receives a boost in the vertex shader department, with the addition of 3 more units (total of 5) over the X1300.

The X1600's core clock speeds are similar to X1300's while the memory attached is usually clocked higher. However, benchmarks shows that the X1600 is a decent step up from the x1300. The reasons for this is that the X1600 while having some of the same limitations has much greater ability to process complex shaders having triple the number of pipelines.

X1700 series

One of possible names for RV570. The RV570 is designed to be an upper mid-range competitor to the 7900GT, carrying the name Radeon X1900GT. RV570 is an 80nm GPU with 12 ROP units, 36 Pixel Shader and eight Vertex Shader Units. Performance is greatly improved thank to inclusion of 256-bit memory interface. The launch date is set for second week of September, availability in late September, beginning of October.

Information and picture of the R570 product manufactured by MSI can be found on The INQUIRER, following this LINK. The picture of the ATI's reference board can be found HERE.

Edit: The INQUIRER article on Wiki. Internal link sends to Edit page, which is currently disabled due to article dispute.

X1800 series

ATI X1800 logo

Originally the flagship of the X1000 series, the X1800 series was released with little fanfare due to the rolling release and the gain by its competitor at that time, NVIDIA’s GeForce 7800 series. The reason for the delayed release was that ATI engineers had found a bug within the core caused by a faulty 3rd party 90 nm chip design library which greatly hampered clock speed ramping, and so they had to "respin" it for another revision. The problem had been almost random in how it affected the prototype chips, making it quite difficult to finally identify. When the R520 hit the market in late 2005, the X1800 was the first high-end 90 nm GPU. ATI opted to fit the cards with either with 256MB or 512MB on-board memory (foreseeing a future of ever growing demands on local memory size). The X1800XT PE was exclusively on 512MB on-board memory. The X1800 replaced the R420-based Radeon X850 as ATI's premier performance GPU.

With R520's delayed release, its competition was far more impressive than it would have been if the chip had made its originally scheduled Spring/Summer '05 release. Like its predecessor X850, the R420 chip carries 4 "quads" (4 pipelines each), which means it has similar texturing capability if at the same clock speed as its ancestor, and the NVIDIA 6800 series. Contrasting the X850 however, R520's shader units are vastly improved. Not only are they fully Shader Model 3 capable, but ATI introduced some innovative advancements in shader threading that can greatly improve the efficiency of the shader units. Unlike the X1900, the X1800 has 16 pixel shader processors as well, and equal ratio of texturing to pixel shading capability. The chip also ups the vertex shader number from 6 on X800 to 8. And, with the use of the 90 nm Low-K fabrication process, these high-transistor chips could still be clocked at very high frequencies. This is what gives the X1800 series the ability to be competitive with GPUs with more pipelines but lower clock speeds, such as the NVIDIA 7800 and 7900 series that use 24 pipelines.

X1800 was quickly replaced by X1900 because of its delayed release. X1900 was not behind schedule, and was always planned as the "spring refresh" chip. However, due to the large quantity of unused X1800 chips, ATI decided to kill 1 quad of pixel pipelines and sell them off as the X1800GTO.

X1900 series

File:RX1900 Board lg.jpg

X1900 board

The X1900 series fixes several flaws in the X1800 design and adds a pixel shading performance boost. Interestingly, the R580 core is pin compatible with the R520 PCBs meaning that a redesign of the X1800 PCB was not needed. X1900 boards carry either 256 MB or 512 MB onboard GDDR3 memory depending on the variant. The primary change between R580 and R520 is that ATI changed the pixel shader processor per pipeline ratio. The X1900 cards have 3 pixel shaders on each pipeline, giving a total of 48 pixel shader units. ATI has taken this step with the expectation that future videogames will be more pixel shader intensive than previously.

The upcoming card based of the R580+ core is expected to port the X1900XTX design to TSMC's 80nm production process in fall.

Chipset table

Board Name	Core Type	Die Process	Clocks (MHz) Core/RAM	Core Config¹	MTex/s²	MTri/s³	Memory Interface	Memory Bandwidth	Notes
Desktop Graphics Boards
X1300 HM	RV515	90 nm	400/400	4:4:4:2	1600	200	64-bit	6.4 GB/s	32-128 MB local RAM.
X1300	RV515	90 nm	450/250	4:4:4:2	1800	225	128-bit	8.0 GB/s	Can be 64 or 128-bit bus
X1300 Pro	RV515	90 nm	600/400	4:4:4:2	2400	300	128-bit	12.8 GB/s
X1600 Pro	RV530	90 nm	500/340	4:12:4:5	2000	625	128-bit	10.9 GB/s	128/256 MB
X1600 XT	RV530	90 nm	590/690	4:12:4:5	2360	738	128-bit	22.1 GB/s	128/256 MB
X1800 GTO	R520	90 nm	500/500	12:12:12:8	6000	1000	256-bit	32.0 GB/s
X1800 XL	R520	90 nm	500/500	16:16:16:8	8000	1000	256-bit	32.0 GB/s
X1800 XT	R520	90 nm	625/750	16:16:16:8	10000	1250	256-bit	48.0 GB/s	256/512 MB
X1900 GT	R580, R570	90 nm	575/600	12:36:12:8	6900	1150	256-bit	38.4 GB/s
X1900 XT	R580	90 nm	625/725	16:48:16:8	10000	1250	256-bit	46.4 GB/s	similar ratio of units as RV530
X1900 XTX	R580	90 nm	650/775	16:48:16:8	10400	1250	256-bit	49.6 GB/s
Mobility Radeons and Integrated Graphics Processors
MR X1300 HM	RV515	90 nm	400/325	4:4:4:2	1600	200	64-bit	5.2 GB/s	32-128 MB local RAM.
MR X1300	RV515	90 nm	?	4:4:4:2	?	?	128-bit	?
MR X1400 HM	RV515	90 nm	432/400	4:4:4:2	1728	216	64-bit	6.4 GB/s	32-128 MB local RAM.
MR X1400	RV515	90 nm	432/200	4:4:4:2	1728	216	128-bit	6.4 GB/s
MR X1600 HM	RV530	90 nm	?	4:12:4:5	?	?	64-bit	?	32-128 MB local RAM.
MR X1600	RV530	90 nm	470/?	4:12:4:5	1880	?	128-bit	?
MR X1800	R5x0	90 nm	?	12:?:12:8	?	?	256-bit	?

Bold rows designate initial showings of the major core types.
¹ (Texture Units:Pixel Shaders:ROPs:Vertex Shaders). All chips of this generation have 1 texture mapping unit (TMU) per pixel pipeline.
² MTex/s = Million Texels per second, a measure of texturing fillrate. All chips of this generation have equal texture and pixel fillrates because of having only a single TMU per pipeline.
³ MTri/s = Million triangles per second, a measure of the core's geometric calculation capabilities. Related to core speed and the number of vertex shaders.

References