Massive AI Clouds and Computers Are Under Construction

Microsoft, Nvidia, HPE, Meta, Google all working to field massive AI clouds to tap new business opportunities and advance research; some related business trickling down to smaller firms

Nov 21, 2022

By John P. Desmond, Editor, AI in Busines

Big tech companies are engaged in building massive AI clouds to take on new workloads as AI moves further into enterprise acceptance; ripple effect pulling in smaller players. (Photo by 辰曦 on Unsplash)

Massive AI clouds and massive AI computers are seen by big tech companies as where opportunity lies, with major efforts going on to take advantage. At the same time, industry alliances are being formed to handle big AI services contracts.

For example, Microsoft this week announced it is working with AI chipmaker Nvidia to build a “massive” computer to handle AI processing in the cloud.

The AI computer will operate on Microsoft’s Azure cloud, using tens of thousands of GPUs, based on Nvidia’s most powerful H100 and A100 chips. Nvidia did not specify the value of the deal, but each A100 chip is priced from $10,000 to $12,000, and the H100 commands a higher price, according to an account from Reuters .

“We're at that inflection point where AI is coming to the enterprise and getting those services out there that customers can use to deploy AI for business use cases is becoming real,” stated Ian Buck, Nvidia’s general manager for Hyperscale and HPC to Reuters. “We're seeing a broad groundswell of AI adoption ... and the need for applying AI for enterprise use cases.”

Oracle in October announced plans to deploy tens of thousands of Nvidia’s A100 and H100 compute GPUs to its Oracle Cloud Infrastructure, according to an account in tom’sHardware. The terms of the deal were not announced; the authors speculated it would be a transaction worth hundreds of millions of dollars. In an earlier account, tom’sHardware reported on a retailer taking orders for an Nvidia board including the H100 with 80GB AI, priced at $32,995 base price and a $13 delivery charge. They were unsure when it would ship, but anticipated by year-end.

In another big AI project, Google has announced plans to expand its already extensive language portfolio ten times, using an AI model the company is calling the 1,000 Languages Initiative.

"Language is fundamental to how people communicate and make sense of the world," stated Jeff Dean, Google senior fellow, in a recent account from ZDnet. "But more than 7,000 languages are spoken around the world, and only a few are well represented online today," he noted. Not much was said about the hardware platform the Languages Initiative would reside on; however, Nvidia has worked with Google to offer a GPU-accelerated version of Google Cloud. In addition to Google Cloud, Google offers Google Cloud Platform, a public cloud infrastructure for hosting web-based applications.

HPE Aims Superdome Flex at Big-Memory Demands

HPE, Hewlett Packard Enterprise, is also positioning for the big-cloud, big-AI business, especially with large and complex data sets that make partitioning between data elements difficult. “It’s better to keep the dataset in one piece,” stated Diana Cortes, marketing manager for mission-critical solutions for HPE, writing recently in EnterpriseAI.

An architecture with a large shared-memory capacity, such as the company offers with its HPE Superdome Flex family, is seen as having performance advantages.

The big AI platform is used, for example by researchers at the M.G. DeGRoote Institute for Infectious Disease Research, to track pathogens and how they evolve. Data analysis was sped up by a factor of 10 using the HPE shared memory, according to HPE.

The high performance platform is also in use at the University of Edinburgh, which is combining Superdome Flex with the Cerebras CS-1 AI accelerator from Cerebras Systems. The EPCC, the supercomputing center at the University of Edinburgh, is using Superdome Flex as a high performance front-end storage and pre-processing platform for the Cerebras CS-1 AI supercomputer. The platform enables application-specific pre- and post-processing of data for AI models training and inference, and the use of large datasets in memory.

The University of Edinburgh brings multidisciplinary teams together to research the use of AI in domains including health, finance, engineering, climate science, agriculture and the humanities, activities related to the good of humanity.

Zuckerberg Tapping AI to Build Virtual Worlds at Meta

Meanwhile at Meta, formerly Facebook, Mark Zuckerberg is building a massive AI computer to better simulate his imaginary metaverse world.

*Mark Zuckerberg, CEO, Meta/formerly Facebook*

Meta built the Research SuperCluster, or RSC, with 6,080 graphics processing units packaged into 760 Nvidia A100 modules, according to a January account in CNET. At that time, the RSC’s processing power was the same as the world’s fifth-fastest supercomputer. In a second phase planned for this year, Meta planned to boost performance 2.5 times by expanding to 16,000 GPUs.

Meta is using RSC for research projects that include analysis of multiple sources of input such as sound, images and actions, which could help spot content judged to be harmful. Meta’s AI team is also planning to use RSC to build out its virtual reality, such as by simultaneously translating speech for a large group of individuals who each speak a different language.

"The experiences we're building for the metaverse require enormous compute power," stated Zuckerberg in a press release. "RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages and more."

In a blog post, Meta researchers Kevin Lee and Shubho Sengupta stated that RSC is about 20 times faster than the 2017-era Nvidia machine at recognizing what is in a photo; for decoding human speech, RSC is about three times faster.

Anyscale’s Ray Open Source Framework Supports AI Scaling

Scaling AI applications is a major business focus for Anyscale, a company supporting Ray, an open-source framework for scaling AI and Python applications. The promise of Ray is to enable developers to scale any workload or application without the cost or expertise required to build a complex infrastructure.

At the recent Ray Summit 2022 event put on by the company, speakers outlined their experiences using the platform. Speakers represented companies including: Uber, Shopify, Spotify, the Qatar Computer Research Institute, Intel and Riot Games, a gaming company.

“Hearing from Meta AI, Open AI, Lyft, Uber and Instacart on Day 1 of Ray Summit and how they’re addressing some of the most complex challenges in distributed computing at scale was incredibly inspiring and enriching,” stated Robert Nishihara, CEO of Anyscale, in a press release

.Another way to achieve scale is through partnersips. Exemplifying this is General Dynamics IT, which has joined with a group of commercial technology companies to form a coalition targeting federal agencies and focused on emerging 5G and edge computing opportunities.

“We share a common vision of how 5G, edge and advanced wireless technologies can transform government operations,” stated Ben Gianni, GDIT’s senior vice president and chief technology officer, in a recent account in Washington Technology.

GDIT is working with Amazon Web Services, Cisco, Dell Technologies, Splunk and T-Mobile on building and demonstrating edge computing and 5G-enabled applications for federal agencies.

Each coalition member brings its own core focus area. AWS is to provide the cloud infrastructure. Cisco has the 5G core and mobile edge computing. Dell will bring infrastructure and edge operations software, plus AI-enhanced edge devices and sensors.

Splunk will provide cybersecurity automation and edge computing tools. T-Mobile will bring network bandwidth, expertise and solutions for large use cases such as smart infrastructure.

Read the source articles and information from Reuters, ZDnet, EnterpriseAI, CNET, in a press release from Anyscale and in Washington Technology.

AI in Business

Massive AI Clouds and Computers Are Under Construction

Microsoft, Nvidia, HPE, Meta, Google all working to field massive AI clouds to tap new business opportunities and advance research; some related business trickling down to smaller firms