Frontier Supercomputer Being Built at Oak Ridge Lab to Supercharge AI
Oak Ridge National Laboratory is close to turning on what is expected to be the world’s most powerful supercomputer, designed to accelerate innovation in AI.
By John P. Desmond, Editor, AI in Business
Passage of the National AI Infrastructure Act of 2020 by the US Congress set a path for how the government could support continued research into AI to help the US maintain its lead. Much of the emphasis was on a shared data architecture to provide AI researchers with access to computer resources and high-quality data.
The government is also investing in the computing infrastructure on which the AI systems run, such as high-performance computing (HPC) platforms. The government’s most recent strategy for HPC is laid out in the November 2020 strategic plan from the National Science and Technology Council, entitled “Pioneering the Future Advanced Computing Ecosystem.”
As part of the effort, the Department of Energy announced plans to build the Frontier supercomputer at its Oak Ridge National Laboratory, (ORNL) with the aim of having it be the world’s most powerful computer designed to accelerate innovation in AI.
Delivery of the supercomputer’s parts from Cray, Inc., now a subsidiary of Hewlett-Packard, began last August and was completed by the end of October, according to a recent account in the Oakridger. Integration of the system is now underway; Frontier is expected to be available to researchers for open science this year and will be in full operation on Jan. 1, 2023, according to an ORNL spokesperson.
The contract award with Cray was valued at more than $600 million for the system and technology development on the supercomputer.
By solving calculations up to 50 times faster than other top supercomputers — exceeding a quintillion, or 10 to the 18th power calculations per second — Frontier will enable researchers to deliver breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security, according to an ORNL press release.
Frontier is a second-generation AI system, following the Summit system deployed at ORNL in 2018. Frontier is expected to provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health, the release stated. Other applications include systems biology and energy production.
“ORNL’s vision is to sustain the nation’s preeminence in science and technology by developing and deploying leadership computing for research and innovation at an unprecedented scale,” stated Thomas Zacharia, Director, ORNL. “Frontier follows the well-established computing path charted by ORNL and its partners that will provide the research community with an exascale system ready for science on day one.”
Preparations for Frontier included “massive upgrades” to the power and cooling infrastructures of the ORNL. HPC gurus are waiting patiently for Frontier’s supercomputer rankings to be delivered from Frontier, which will be the first exascale-class machine up and running in the US. An exascale system is capable of calculating 10 to the 18th floating point operations per second, allowing improved scientific applications and better predictions such as in weather forecasting, climate modeling and personalized medicine. Exascale computing is also expected to reach the estimated processing power of the human brain at the neural level, a target of the Human Brain Project.
Based on information put out by ORNL on Frontier’s performance on some comparative benchmarks, “It looks like Frontier will have a little bit more raw performance than was expected three years ago,” stated HPC expert Timothy Prickett Morgan, writing recently in The Next Platform, a publication for which he is the co-editor.
The number of computer cabinets required to deliver the promised performance was fewer than expected, 74 versus an expected 100. “That is 4.8X more peak performance per node and a little more than 2X the number of nodes to deliver 10X the peak performance of Summit,” stated Morgan.
Oak Ridge provided examples of four real-world workloads that Frontier could be applied to when it comes available:
Cholla for astrophysics hydrodynamics simulations;
LSMS for materials modeling;
CANDLE transformer machine learning model for cancer research; and
NuCCOR simulation for nuclear physics
Performance should continue to improve as Oak Ridge scientists further tune Frontier, Morgan suggested.
Read the source articles and information from the Oakridger,in an Oak Ridge National Laboratory press release and in The Next Platform.
(Write to the editor here.)