Cell (processor)
Cell is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM alliance known as STI over a four year period beginning March 2001 on a design budget informally reported by IBM as being in the range of $400 million. Cell is a shorthand for Cell Broadband Engine Architecture, commonly abbreviated CBEA in full or Cell BE in part. Cell combines a general purpose POWER-architecture core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.
The major commercial application of Cell is in Sony's upcoming PlayStation 3 game console which is slated to launch in November 2006. It will also become available in a blade configuration from Mercury Computer Systems. Toshiba has announced plans to incorporate Cell in high definition television sets. Exotic features such as the XDR memory subsystem and coherent EIB interconnect appear to position Cell for future applications in the supercomputing space to exploit the Cell processor's prowess in floating point kernels.
The Cell architecture breaks ground in combining a light-weight general-purpose processor with multiple GPU-like coprocessors into a coordinated whole, a feat which involves a novel memory coherence architecture for which IBM received many patents. The resulting architecture emphasizes efficiency/watt and prioritizes bandwidth over latency, and peak computational throughput over simplicity of program code. For these reasons, Cell is widely regarded as a challenging environment for software development. IBM provides a comprehensive Linux-based Cell development platform to assist developers in confronting these challenges. Software adoption remains a key issue in whether Cell ultimately delivers on its performance potential.
History
In 2000, Sony Computer Entertainment, Toshiba Corp., and IBM formed an alliance ("STI") to design and build the processor. The STI Design Center in Austin, Texas opened in March 2001. [1] The Cell was designed over a period of four years, using enhanced versions of the design tools for the POWER4 processor. Over 400 engineers from the three companies worked together in Austin, with critical support from eleven of IBM's design centers. [2]
On May 17, 2005, Sony Computer Entertainment confirmed some specifications of the Cell processor that would be shipping in the forthcoming PlayStation 3 console. This Cell will have one POWER processing element (PPE) on the core, with seven SPEs and one SPE reserved for redundancy (to help increase manufacturing yield). All of these are clocked at 3.2 GHz. The chips will be fabricated using a 90 nanometre SOI process, at IBM's facility in East Fishkill, New York.
On June 28 2005, IBM and Mercury Computer Systems announced a partnership agreement to build Cell-based computer systems for embedded applications such as medical imaging, industrial inspection, aerospace and defense, seismic processing, and telecommunications.
Overview
The Cell Broadband Engine, or Cell as it is more commonly known, is a microprocessor designed to bridge the gap between conventional desktop processors (Pentium, PowerPC etc) and more specialised high performance processors (eg Nvidia and ATi graphics chips). The name belies its intended use, namely as a component in current and future digital distribution systems; as such it may be utilised in high definition displays and recording equipment, as well as computer entertainment systems for the 'Hi Def' era. Additionally the processor should be well suited to digital imaging systems (Medical, Scientific etc) as well as physical simulation (eg Scientific and Structural Engineering modelling).
In a simple analysis the Cell processor can be split into four components—external input and ouput structures, the main processor called the Power Processing Element PPE (a two-way SMT multithreaded Power 970 architecture compliant core), eight fully functional co-processors called the Synergystic Processing Elements or SPEs and a specialised high bandwidth circular data bus connecting the PPE, input/output elements and the SPEs, called the Element Interconnect Bus or EIB.
To achieve the high performance needed for mathematically intensive tasks such as decoding/encoding MPEG streams, generating or transforming three dimensional data or undertaking Fourier analysis of data the Cell processor simply marries the SPEs and the PPE via the EIB to give both access to main memory or other external data storage. The PPE which is capable of running a conventional operating system has control over the SPEs and can start, stop, interrupt and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. Despite having Turing complete architectures the SPEs are not fully autonomous and require the PPE to initiate them before they can do any useful work. Most of the "horsepower" of the system comes from the synergistic processing elements.
The PPE and bus architecture includes various modes of operation giving different levels of protection, allowing areas of memory to be protected from access by specific processes running on the SPEs or PPE.
Both the PPE and SPE are RISC architectures with a fixed-width 32-bit instruction format. The PPE contains a 64-bit general purpose register set (GPR), a 64-bit floating point register set (FPR), and a 128-bit VMX register set. The SPE contains 128-bit registers only. These can be used for scalar data types ranging from 8-bits to 128-bits in size or for SIMD computations on a variety of integer and floating point formats. System memory addresses for both the PPE and SPE are expressed as 64-bit values for a theoretic address range of 264 bytes. In practice, not all of these bits are implemented in hardware; the address space is extremely large nevertheless. Local store addresses internal to the SPU processor are expressed as a 32-bit word. In documentation relating to Cell a word is always taken to mean 32 bits, a doubleword means 64 bits, and a quadword means 128 bits.
Influence and contrast
In some ways the Cell system resembles early Seymour Cray designs in reverse. The famed CDC 6600 used a single very fast processor to handle the mathematical calculations, while a series of ten slower systems were given smaller programs to keep the main memory fed with data. In the Cell the problem has been reversed: reading the data is no longer the difficult problem due to the complex encodings used in industry; today the problem is efficiently decoding that data into an ever-less-compressed version as quickly as possible.
Modern graphics cards have multiple elements very similar to the SPE's, known as vertex shader units, with an attached high speed memory. Programs, known as shaders, are loaded onto the units to process the basic geometry fed from the computer's CPU, apply styles and display it.
The main differences are that the Cell's SPEs are much more general purpose than shader units, and the ability to chain the SPEs under program control offers considerably more flexibility, allowing the Cell to handle graphics, sound, or anything else.
Architecture
While the Cell chip can have a number of different configurations, the basic configuration is composed of one "Power Processor Element" ("PPE") (sometimes called "Processing Element", or "PE"), and multiple "Synergistic Processing Elements" ("SPE") [3]. The PPE and SPEs are linked together by an internal high speed bus dubbed "Element Interconnect Bus" ("EIB"). Due to the nature of its applications, Cell is optimized towards single precision floating point computation. The SPEs are capable of performing double precision calculations, albeit with an order of magnitude performance penalty. More general purpose computing tasks can be done on the PPE.
Power Processor Element
The PPE is based on the POWER Architecture, which is the basis of IBM's line of POWER and PowerPC offerings. The PPE is not intended to perform all primary processing for the system, but rather to act as a controller for the other eight SPEs, which handle most of the computational workload. The PPE will work with conventional operating systems due to its similarity to other 64-bit PowerPC processors, and because the SPEs are designed for vectorized floating point code execution. The PPE contains a 16 KiB instruction and data Level 1 cache and a 512 KiB Level 2 cache. Additionally, IBM has included a VMX (AltiVec) unit in the Cell PPE. [4]
Synergistic Processing Elements (SPE)
Each SPE is composed of a "Streaming Processing Unit" ("SPU"), and an SMF unit (DMA, MMU, and bus interface). [5] An SPE is a RISC processor with 128-bit SIMD organization [6] for single and double precision instructions. With the current generation of the Cell, each SPE contains a 256 KiB instruction and data local memory area (called "local store") which is visible to the PPE and can be addressed directly by software. Each of these SPE can support up to 4 GB of local store memory. The local store does not operate like a conventional CPU cache since it is neither transparent to software nor does it contain hardware structures that predict what data to load. The SPEs contain a 128 × 128 register file and measure 14.5 mm² on a 90 nm process. An SPE can operate on 16 8-bit integers, 8 16-bit integers, 4 32-bit integers, or 4 single precision floating-point numbers in a single clock cycle. Note that the SPU processor can not directly access system memory; the 64-bit memory addresses formed by the SPU must be passed from the SPU processor to the SPE memory flow controller (MFC) to set up a DMA operation within the system address space.
In one typical usage scenario, the system will load the SPEs with small programs (similar to threads), chaining the SPEs together to handle each step in a complex operation. For instance, a set-top box might load programs for reading a DVD, video and audio decoding, and display, and the data would be passed off from SPE to SPE until finally ending up on the TV. Another possibility is to partition the input data set and have several SPEs performing the same kind of operation in parallel. At 3.2 GHz, each SPE gives a theoretical 25.6 GFLOPS of single precision performance. The PPE's VMX (AltiVec) unit is fully pipelined for double precision floating point and can complete two double precision operations per clock cycle, which translates to 6.4 GFLOPS at 3.2 GHz; or eight single precision operations per clock cycle, which translates to 25.6 GFLOPS at 3.2 GHz[7].
Compared to a modern personal computer, the relatively high overall floating point performance of a Cell processor seemingly dwarfs the abilities of the SIMD unit in desktop CPUs like the Pentium 4 and the Athlon 64. But, comparing only floating point abilities of a system is a one-dimensional and application-specific metric. Unlike a Cell processor, such desktop CPUs are more suited to the general purpose software usually run on personal computers. Also, Cell is optimized for single-precision calculations; for double-precision, as used in personal computers, Cell performance drops by an order of magnitude to levels similar to desktops.
Recent tests by IBM [8] show that the SPEs can reach 98% of their theoretical peak performance using optimized parallel Matrix Multiplication.
Element Interconnect Bus (EIB)
The EIB is a communication bus internal to the Cell processor which connects the various on-chip system elements: the PPE processor, the memory controller (MIC), the eight SPE coprocessors, and two off-chip I/O interfaces, for a total of 12 participants. The EIB also includes an arbitration unit which functions as a set of traffic lights. In some documents IBM refers to EIB bus participants as 'units'.
The EIB is presently implemented as a circular ring comprised of four 16B-wide unidirectional channels which counter-rotate in pairs. When traffic patterns permit, each channel can convey up to three transactions concurrently. As the EIB runs at half the system clock rate the effective channel rate is 16 bytes every two system clocks. At maximum concurrency, with three active transactions on each of the four rings, the peak instantaneous EIB bandwidth is 96B per clock (12 concurrent transactions * 16 bytes wide / 2 system clocks per transfer). While this figure is often quoted in IBM literature it is unrealistic to simply scale this number by processor clock speed. The arbitration unit imposes additional constraints which are discussed in the Bandwidth Assessment section below.
IBM Senior Engineer David Krolak, EIB lead designer, explains the concurrency model:
- A ring can start a new op every three cycles. Each transfer always takes eight beats. That was one of the simplifications we made, it's optimized for streaming a lot of data. If you do small ops, it doesn't work quite as well. If you think of eight-car trains running around this track, as long as the trains aren't running into each other, they can coexist on the track.
Each participant on the EIB has one 16B read port and one 16B write port. The limit for a single participant is to read and write at a rate of 16B per EIB clock (for simplicity often regarded 8B per system clock). Note that each SPU processor contains a dedicated DMA management queue capable of scheduling long sequences of transactions to various endpoints without interfering with the SPU's ongoing computations; these DMA queues can be managed locally or remotely as well, providing additional flexibility in the control model.
Data flows on an EIB channel stepwise around the ring. Since there are twelve participants, the total number of steps around the channel back to the point of origin is twelve. Six steps is the longest distance between any pair of participants. An EIB channel is not permitted to convey data requiring more than six steps; such data must take the shorter route around the circle in the other direction. The number of steps involved in sending the packet has very little impact on transfer latency: the clock speed driving the steps is very fast relative to other considerations. However, longer communication distances are detrimental to the overall performance of the EIB as they reduce available concurrency.
Despite IBM's original desire to implement the EIB as a more powerful cross-bar, the circular configuration they adopted to spare resources rarely represents a limiting factor on the performance of the Cell chip as a whole. In the worst case, the programmer must take extra care to schedule communication patterns where the EIB is able to function at high concurrency levels.
David Krolak explains:
- Well, in the beginning, early in the development process, several people were pushing for a crossbar switch, and the way the bus is architected, you could actually pull out the EIB and put in a crossbar switch if you were willing to devote more silicon space on the chip to wiring. We had to find a balance between connectivity and area, and there just wasn't enough room to put a full crossbar switch in. So we came up with this ring structure which we think is very interesting. It fits within the area constraints and still has very impressive bandwidth.
Bandwidth Assessment
For the sake of quoting performance numbers, we will assume a Cell processor running at 3.2 GHz, the clock speed most often cited.
At this clock frequency each channel flows at a rate of 25.6 GB/s. Viewing the EIB in isolation from the system elements it connects, achieving twelve concurrent transactions at this flow rate works out to an abstract EIB bandwidth of 307.2 GB/s. Based on this view many IBM publications depict available EIB bandwidth as "greater than 300 GB/s". This number reflects the peak instantaneous EIB bandwidth blithely scaled by processor frequency.
However, other technical restrictions are involved in the arbitration mechanism for packets accepted onto the bus. The IBM Systems Performance group explains:
- Each unit on the EIB can simultaneously send and receive 16B of data every bus cycle. The maximum data bandwidth of the entire EIB is limited by the maximum rate at which addresses are snooped across all units in the system, which is one per bus cycle. Since each snooped address request can potentially transfer up to 128B, the theoretical peak data bandwidth on the EIB at 3.2 GHz is 128Bx1.6 GHz = 204.8 GB/s.
This quote apparently represents the full extent of IBM's public disclosure of this mechanism and its impact. The EIB arbitration unit, the snooping mechanism, and interrupt generation on segment or page translation faults are not well described in the documentation set as yet made public by IBM.
In practice effective EIB bandwidth can also be limited by the ring participants involved. While each of the nine processing cores can sustain 25.6 GB/s read and write concurrently, the memory interface controller (MIC) is tied to a pair of XDR memory channels permitting a maximum flow of 25.6 GB/s for reads and writes combined and the two IO controllers are documented as supporting a peak combined input speed of 25.6 GB/s and a peak combined output speed of 35 GB/s.
To add further to the confusion, some older publications cite EIB bandwidth assuming a 4 GHz system clock. This reference frame results in an instantaneous EIB bandwidth figure of 384 GB/s and an arbitration-limited bandwidth figure of 256 GB/s.
All things considered the theoretic 204.8 GB/s number most often cited is the best one to bear in mind. The IBM Systems Performance group has demonstrated SPU-centric data flows achieving 197 GB/s on a Cell processor running at 3.2 GHz so this number is a fair reflection on practice as well.
Memory controller and I/O
Cell contains a dual channel next-generation Rambus XIO macro which interfaces to Rambus XDR memory. The memory interface controller (MIC) is separate from the XIO macro and is designed by IBM. The XIO-XDR link runs at 3.2 Gbit/s per pin. Two 32 bit channels can provide a theoretical maximum of 25.6 GB/s.
The system interface used in Cell, also a Rambus design, is known as FlexIO. The FlexIO interface is organized into 12 "lanes," each lane being a unidirectional 8-bit wide point-to-point path. Five 8-bit wide point-to-point path are inbound lanes to Cell, while the remaining seven are outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) at 2.6 GHz. The FlexIO interface can be clocked independently, typ. at 3.2 GHz. 4 inbound + 4 outbound lanes are supporting memory coherency.
Broadband Engine
Much less information is available about the 'broadband engine', most coming from patent applications. It is believed that Cell allows for multiple processing cores to be put onto one die, and the patent shows four cores on one die. Sony, Toshiba, and IBM have claimed that they intend to scale the processor for various uses, both low-end and high-end, by varying the number of cores on the chip, the number of units in a single core, and by linking multiple chips to each other via network or memory bus.
Possible applications
Blade server
IBM has already presented a blade server prototype based on two Cell processors, running the 2.6.11 Linux kernel. [9] The processors ran at 2.4–2.8 GHz. IBM expects soon to run them at 3.0 GHz, providing 200 GFLOPS single-precision floating point performance per CPU (or 400 GFLOPS per board). IBM also expects to arrange seven blades in a single rackmount chassis (similar to their BladeCenter product line) for a total performance of 2.8 TFLOPS (or 284 GFLOPS in double precision) per chassis. However, the performance numbers released by IBM are still theoretical, and the real-world performance may fall significantly short of theoretical expectations.
IBM's H-series Blade servers will incorporate the cell processor as of March 2006.
Mercury Computer Systems, Inc. has released preproduction blades with cell microprocessors that are currently shipping.
Console videogames
Sony's PlayStation 3 video game console will contain the first production application of the Cell processor, clocked at 3.2 GHz and containing seven out of eight operational SPEs, in order to allow Sony to increase the yield on the processor manufacture.
Home cinema
Reportedly, Toshiba is considering producing HDTVs using Cell. They have already presented a system to decode 48 MPEG-2 streams simultaneously on a 1920×1080 screen. [10][11] This can enable a viewer to choose a channel based on dozens of thumbnail videos displayed simultaneously on the screen.
Software engineering
Due to the flexible nature of the Cell, there are several possibilities for the utilization of its resources: [12]
Job queue
The PPE maintains a job queue, schedules jobs in SPEs, and monitors progress. Each SPE runs a "mini kernel" whose role is to fetch a job, execute it, and synchronize with the PPE.
Self-multitasking of SPEs
The kernel and scheduling is distributed across the SPEs. Tasks are synchronized using mutexes or semaphores as in a conventional operating system. Ready-to-run tasks wait in a queue for a SPE to execute them. The SPEs use shared memory for all tasks in this configuration.
Stream processing
Each SPE runs a distinct program. Data comes from an input stream, and is sent to SPEs. When an SPE has terminated the processing, the output data is sent to output stream.
This actually provides a very flexible, yet powerful architecture for stream processing, allowing to explicitly schedule each SPE separately. Other processors are also able to perform this kind of processing but this comes often with limitations on the possible kernels to be loaded.
Open source software development
As of 2005-06-23, patches enabling Cell support in the Linux kernel were submitted for inclusion by IBM developers [13]. Arnd Bergmann (one of the developers of the aforementioned patches) also described the Linux-based Cell architecture at LinuxTag 2005. [14]
Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. According to Sony, a compiler, debugger, IDE, performance analyzer, and Cell emulator should be made available soon. [15] IBM has developed a pseudo-filesystem for Linux coined "Spufs" that simplifies access to and use of the SPE resources.
IBM is currently maintaining the Linux kernel and GDB ports, while Sony maintains the GNU toolchain (GCC, binutils). [16].
In November 2005, IBM released a "Cell Broadband Engine (CBE) Software Development Kit Version 1.0, consisting of a simulator and assorted tools, to its web site. Development versions of the latest kernel and tools for Fedora core 4 are maintained at the Barcelona Supercomputing Center website[17].
With the release of kernel version 2.6.16 on 20 March 2006, the Linux kernel officially supports the Cell processor.
Acronyms
- EIB
- Element Interconnect Bus [18]
- LS
- Local Storage (SPE's local memory) [19]
- MIC
- Memory Interface Controller [20]
- PPE
- Power Processor Element [21]
- SMF
- Synergistic Memory Flow Controller
- SPE
- Synergistic Processing Element [22]
- SPU
- Streaming Processor Unit [23]
- STI
- Sony Computer Entertainment Inc., Toshiba Corp., IBM
References
- ^ "Introduction to the Cell multiprocessor". IBM Journal of Research and Development. September 7 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "CELL Processor Gets Ready To Entertain The Masses". Electronic Design. February 8 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Arnd Bergmann on Cell". IBM developerWorks. June 25 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Spufs: The Cell Synergistic Processing Unit as a virtual file system". IBM developerWorks. June 25 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Cell-CPU auf dem LinuxTag (at the LinuxTag)". pro-linux. June 25 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Winner: Multimedia Monster". IEEE Spectrum. 1 January 2006.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Open sourcing of Cell coming to fruition". IT Manager's Journal. June 10 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Unleashing the power: A programming example of large FFTs on Cell (broadcast replay)". power.org. June 9 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "IBM Discloses Cell Based Blade Server Board Prototype". Tech-On!. May 25 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "IBM will unlock door to Cell". EETimes.com. May 23 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Toshiba Demonstrates Cell Microprocessor Simultaneously Decoding 48 MPEG-2 Streams". Tech-On!. April 25 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "CELL: A New Platform for Digital Entertainment". Sony Computer Entertainment Inc. March 9 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "CELL Microprocessor Revisited". Real World Technologies. 28 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Power Efficient Processor Design and the Cell Processor" (PDF). IBM. 16 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Prospects For the CELL Microprocessor Beyond Games". Slashdot. 11 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "ISSCC 2005: The CELL Microprocessor". Real World Technologies. 10 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "A 4.8 GHz Fully Pipelined Embedded SRAM in the Streaming Processor of a CELL Processor" (PDF). Sony Computer Entertainment Inc., Toshiba Corp., IBM. 9 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "The Design and Implementation of a First-Generation CELL Processor" (PDF). Sony Computer Entertainment Inc., Toshiba Corp., IBM. 8 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "IBM, Sony, Toshiba unveil nine-core Cell processor". Macworld. 7 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "Cell Microprocessor Briefing". IBM, Sony Computer Entertainment Inc., Toshiba Corp. 7 February 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "The Cell Processor Programming Model". LinuxTag 2005. Retrieved 11 June.
{{cite web}}
: Check date values in:|accessdate=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "IBM Research - Cell". IBM. Retrieved 11 june.
{{cite web}}
: Check date values in:|accessdate=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "Understanding the Cell Microprocessor". Anand Lal Shimpi. Retrieved 17 March.
{{cite web}}
: Check date values in:|accessdate=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "Cell DMA Engines". IBM developerWorks. Dec 06 2005.
{{cite news}}
: Check date values in:|date=
(help) - ^ "The Microarchitecture of the Synergistic Processor for a Cell Processor" (PDF). IEEE Journal of Solid-State Circuits, Vol.41, No.1. Jan 1 2006. Retrieved 4 April.
{{cite web}}
: Check date values in:|accessdate=
and|date=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "IBM Showcases Cell Processors in Action at CeBIT". CDRInf. Mar 13 2006. Retrieved 4 April.
{{cite web}}
: Check date values in:|accessdate=
and|date=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "IBM, Sony, Toshiba extend chip development work". IDG News Service. January 12 2006. Retrieved 4 April.
{{cite web}}
: Check date values in:|accessdate=
and|date=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "Cell Broadband Engine Architecture and its first implementation". IBM developerWorks. Nov 29 2005. Retrieved 6 April.
{{cite web}}
: Check date values in:|accessdate=
and|date=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help) - ^ "PowerPC Microprocessor Family Vector/SIMD Multimedia Extension Technology Programming Environments Manual". IBM. Sep 30 2005. Retrieved 8 April.
{{cite web}}
: Check date values in:|accessdate=
and|date=
(help); Unknown parameter|accessyear=
ignored (|access-date=
suggested) (help)
External links
- The Cell BE Processor Security Architecture
- Upgrade your Cell BE SDK Components
- Sony Computer Entertainment International's CELL resource page
- IBM Research Labs
- Power.org Community
- Site offering news and info on the Cell processor
- Patent #6,526,491 (related to the Cell processor)
- Cell Broadband Engine resource center
- 60 page Cell intro presentation from Sony Computer Entertainment US Research and Development
- Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
- Barcelona Supercomputing Center - Linux/Cell SDK + Source
News
- Sony, IBM, and Toshiba announces Cell development (2001-03-12)
- Sony/Toshiba Press Release on Cell Production (2004-11-29)
- Sony PR on one-rack 16 TFLOP workstation (2004-11-29)
- IBM/Sony/Toshiba PR on key details of the Cell Chip (2005-02-07)
- IBM/Sony/Toshiba PR on the release of the Cell SDK (2005-11-09)
Articles
- Winner: Multimedia Monster
- Holy Chip!
- Military to Begin Using Cell Processor Technology
- "PlayStation 3 chip has split personality" — By David Becker, CNET News.com, 7 February 2005
- "It's the Software, Stupid!" — Robert X. Cringely piece about why software is key to the Cell success.
- "Because It's an Once in a Lifetime Challenge" — Ken Kutaragi
- "Introducing the IBM/Sony/Toshiba Cell Processor" — Jon "Hannibal" Stokes
- EE Times article on ISSCC paper presentation
- Link to image of ISSCC presentation abstract for 90nm process
- Cell Architecture Explained
- "The Soul of Cell" Interview with Dr. H. Peter Hofstee, Cell Chief Scientist and Cell Synergistic Processor Chief Architect, with the IBM Systems and Technology Group