In the last few weeks, there have been a number of important introductions of new computing platforms designed specifically for working on deep neural networks for machine learning, includingand Nvidia’s new Volta design.
To me, this is the most interesting trend in computer architecture—even more thanand introducing 16-core and 18-core CPUs. Of course, there are other alternative approaches, but Nvidia and Google are deservedly getting a lot of attention for their unique approaches.
At Google I/O, I saw it introduce what a “cloud TPU” (for Tensor Processing Unit, indicating that it is optimized for Google’s TensorFlow machine learning framework). The previous generation TPU,, is an ASIC designed primarily for inferencing—running machine learning operations—but the new version is designed for inferencing and training such algorithms.
In a recent paper, Google gave more details on the original TPU, which it described as containing a matrix of 256-by-256 multiple-accumulate (MAC) units (65,536 in total) with a peak performance of 92 teraops (trillion operations per second). It gets its instructions from a host CPU over PCIe Gen 3 bus. Google said this was a 28nm die that was less than half the size of an Intel Haswell Xeon 22nm processor, and that it outperformed that processor and Nvidia’s 28nm K80 processor.
The new version, dubbed TPU 2.0 or cloud TPU, (seen above), actually contains four processors on the board, and Google said each board is capable of reaching 180 teraflops (180 trillion floating point operations per second). Just as importantly, the boards are designed to work together, using a custom high-speed network, so they act as a single machine learning supercomputing that Google calls a “TPU pod.”
This TPU pod contains 64 second-generation TPUs and provides up to 11.5 petaflops to accelerate the training of a single large machine learning model. At the conference, Fei Fei Li, who heads Google’s AI research, said that while one of the company’s large-scale learning models for translation takes a full day to train on 32 of the best commercially available GPUs, it can now be training to the same accuracy in an afternoon using one-eighth of a TPU pod. That’s a big jump.
Understand that these are not small systems—a Pod looks to be about the size of four normal computing racks.
And each of the individual processors seem to have very large heat sinks, meaning the boards can’t be stacked too tightly. Google hasn’t yet given a lot of detail on what has changed in this version of the processors or the interconnect, but it’s likely this as well is based around 8-bit MACs.
The week before, Nvidia introduced its latest entry in this category, a massive chip known as the, which it described as the first CPU with this new Volta architecture, designed for high-end GPUs.
Nvidia said the new chip is capable of 120 TensorFlow teraflops (or 15 32-bit TFLOPS or 7.5 64-bit ones.) This uses a new architecture that includes 80 Streaming Multiprocessors (SMs), each of which includes eight new “Tensor Cores” and is a 4x4x4 array capable of performing 64 FMA (Fused Multiply-Add) operations per clock. Nvidia said it will offer the chip in its DGX-1V workstations with 8 V100 boards in the third quarter, following the firm’s earlier DGX-1 that used the earlier P100 architecture.
The company said this $149,000 box should deliver 960 teraflops of training performance, using 3200 watts. Later on, the first said, it would ship a Personal DGX Station with a four V100s, and in the fourth quarter, it said the big server vendors will ship V100 servers.
This chip is the first announced to use TSMC’s 12nm processor, and it will be a huge chip with 21.1 billion transistors on 815 square millimeter die. Nvidia cited both Microsoft and Amazon as early customers for the chip.
Note there are big differences between these approaches. The Google TPUs are really custom chips, designed for TensorFlow applications, while the Nvidia V100 is a somewhat more general chip, capable of different kinds of math for other applications.
Meanwhile, the other big cloud providers are looking at alternatives, with Microsoft, and offering both to customers. Amazon Web Services now make both GPU and FPGA instances available to developers. And and a . Meanwhile, a number of new start-ups are working on alternative approaches.
In some ways, this is the most drastic change we’ve seen in workstation and server processors in years, at least since developers first started using “GPU compute” several years ago. It will be fascinating to see how this develops.