The UltraSPARC T2 (code-named "Niagara 2") contains up to eight processor cores, which are able to execute 8 threads simultaneoulsy each. Thus, within a single processor chip 64 processes can operate on eight 8-stage Integer pipelines and eight 12-stage floating point pipelines. The objective of this design is, to overlap computation and waiting for memory or to have multiple threads wait for memory simultaneously. Whereas traditional processors only ran at some 5 percent of the peak performance when executing memory hungry programs, the UltraSPARC T2 processor promises to operate at a higher percentage of its theoretical peak performance.
Each of the processor cores has a separate instruction and data cache and accesses a shared L2 cache and the shared main memory via an internal crossbar. Thus the UltraSPARC T2 processor is a shared memory machine on a single chip with a flat memory ( UMA = uniform memory architecture) from the programmer's perspective.
The above Niagara 2 diagram is from Robert Golla's presentation "A Highly Threaded Server-on-a-Chip"
A single process achieves up to 1.4 GFlop/s, because one core can only execute one floating point operation per cycle. Therefore the peak performance of the whole chip is quite moderate for today's standards: 11,2 GFlop/s. The high potential of the Niagara 2 is revealed, if many threads are active and the high memory bandwidth of some 60 GB/s (in theory) can be exploited - a frequent bottelneck of standard architecures when executing HPC applications. Furthermore, the UltraSPARC T2 processor contains two 10/1 Gbit-Ethernet (up to 3,125 Gb/s), and one PCI-Express x8 1.0A Port (2,5 Gb/s) "on Chip".
| Processor | Ultra SPARC T2 (Niagara2) |
|---|---|
| Manufacturer | Texas Instruments |
| Architecture | SPARC V9 |
| Address space | 48-bit virtual, 40-bit physical |
| Cores | 8 cores with 8 threads each |
| Pipelines | 2 Instruction-Pipelines, |
| Clock cycle | 0,9 GHz - 1.4 GHz |
| L1 Cache | 16 KByte Instruction, 8 KB data cache |
| L2 Cache | 4 MByte on chip 16-way associative, 8 banks à 512 KByte |
| Memory Controller | up to 64 FB-DIMMs, 4 dual-channel FB-DIMM Memory Controllers on chip |
| Crossbar | 8x9 non-blocking, 90 GB/s write and 180 GB/s read per channek approximately |
| Technology | CMOS, 65 nm Die-size 342 mm2 Transistors: 503 Million Pins: 1831 |
| Power consumption | 95 Watt nominal, 123 Watt max. at 1.4 GHz Voltage: 1,2 V (Core), 1,5 V (Analog) |
| Misc | on Chip: |
You can find more details about the UltraSPARC T2 architecture on the following web sites: