Barcelona Plans The World’s Most Diverse Supercomputer

These days, there are a number of different approaches to high-performance computing, systems usually referred to as supercomputers. Most of these systems use a massive number of Xeon processors, but we are starting to see the most interesting new machines run accelerators, such as Nvidia’s Tesla or Intel Xeon Phi. There’s even some talk that massive ARM-based systems could be effective in the future. But what if you could try all of these architectures in one location?

That’s the challenge and promise of the new MareNostrum 4 computer, which is being readied for installation at the Barcelona Supercomputing Center. The new design includes a main system for general-purpose use based on traditional Xeons, plus three new emerging technology clusters, based on IBM Power and Nvidia, Xeon Phi, and ARM-based computing. While I was in Barcelona for Mobile World Congress, I had a chance to talk to Sergi Girona, Operations Director for the BSC, who explained the reasoning behind the four different clusters.

Girona said the center’s main mission is to provide supercomputing services for Spanish and other European researchers, in addition to industry. As part of this mission the center wants to have at least three “emerging tech clusters,” so it can test different alternatives.

For the general computing cluster, Girona says the center chose a traditional Xeon design because it was easier to migrate applications that run on the current MareNostrum 3, slated to be disconnected next week. The design also had to fit the existing space, within a chapel. (I visited the center last year and the current supercomputer a year ago.)

The new design, to be built by Lenovo, will be based on the new Xeon v5 (Skylake), with 3,456 nodes, each with two sockets, and each chip will contain 24 cores each, for a total theoretical peak performance of 11.14 petaflops per second. Most cores will have 2GB of memory, but 6 percent will have 8GB, for a total of 331.7TB of RAM. Each node will have a 240GB SSD, though eventually some will have 3D XPoint memory, when that is available. The nodes are to be connected via Intel’s Omni-Path interconnect and 10GB Ethernet. The system will also have six racks of storage from IBM, with 15 petabytes of storage, including a mix of flash and hard disk drives. Overall, the design will take up 62 racks—48 for computing, 6 for storage, 6 for networking, and 2 for management. It will fill 120 square meters (making for a very dense environment) and draw 1.3 megawatts of power, up from the 1 megawatt drawn by the previous design. Operation is expected to begin on July 1.

MareNostrum 1-2-3-4

One thing I found interesting here is how clearly the move to the new generation demonstrates the progression of technology. The previous generation had a peak performance of about 1 petaflop, and this system should be more than 10 times faster, while using only 30 percent more power. For comparison, the original MareNostrum supercomputer, installed in 2004, had a peak performance of 42 teraflops and used 640 kilowatts of power. (The details of performance improvements over four generations of MareNostrum are in the chart above). Girona says this means that what would have taken a year to run on the MareNostrum 1 can be done in a single day on the new system. Pretty impressive.

For emerging technology, the site will have three new clusters. One will consist of IBM Power 9 processors and Nvidia GPUs, designed to have a peak processing capability of over 1.5 Petaflop/s. This cluster will be built by IBM, and involves the same type of design being deployed in the Summit and Sierra supercomputers, which the US Department of Energy has commissioned for the Oak Ridge and Lawrence Livermore National Laboratories as part of its CORAL Collaboration at Oak Ridge, Argonne, and Lawrence Livermore national labs.

The second cluster will be made up of Intel Xeon Phi processors, with Lenovo building a system that uses the forthcoming Knights Hill (KNH) version and OmniPath, with a peak processing capability over 0.5 Petaflop/s. This also mimics the American CORAL program, and uses the same processors that will be inside the Aurora supercomputer, commissioned by the US Department of Energy for the Argonne National Laboratory.

Finally, a third cluster will be formed of 64-bit ARMv8 processors that Fujitsu will provide in a prototype machine, which is designed to use the same processors that Fujitsu is developing for a new Japanese system to supplant the current K supercomputer. This too should offer more than 0.5 Petaflop/s of peak performance. The exact timing for the beginning of operations on the emerging clusters has yet to be disclosed, Girona said.

Overall, the system will cost $34 million, in a contract won by IBM and funded by the Spanish government. One major reason for having all four types of computing on site is research, Girona said. The center, which employs 450 people in total, has 160 researchers with a focus on computer science, including architecture and tools. In particular, as a member of PRACE (Partnership for Advanced Computing in Europe), BSC is trying to focus on leading performance optimization and parallel computing.

Girona said that BSC wants to influence the development of new technologies, and is planning on using the new machine to analyze what will happen in the future, in particular to make sure that software is ready for whatever architecture the next machine—likely to arrive in about 3 years—will have. BSC has long worked on tools for emerging architectures, he noted.

Another topic researchers are considering is whether or not it would be worth developing a European processor for IT, likely based on the ARM architecture.

Barcelona won’t have the fastest supercomputer in the world; that record is currently held by the Chinese, with the Americans and Japanese trying to catch up. But MareNostrum 4 will be the most diverse, and potentially the most interesting.

You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *