NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing

GTC—To power the next wave of AI data centers, NVIDIA today announced its next-generation accelerated computing platform with NVIDIA Hopper architecture, delivering an order of magnitude performance leap over its predecessor. Named after Grace Hopper, a pioneering American computer scientist, the new architecture succeeds the NVIDIA Ampere architecture, which was launched two years ago.

The company also announced its first Hopper-based GPU, the NVIDIA H100, packed with 80 billion transistors. The world’s largest and most powerful accelerator, the H100 packs groundbreaking features such as a revolutionary Transformer Engine and a highly scalable NVIDIA NVLink interconnect for advancing massive AI language models, deep recommendation systems, genomics, and complex digital twins.

“Data centers are becoming AI factories — processing and refining mountains of data to produce intelligence,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA H100 is the engine of the world’s AI infrastructure that companies are using to accelerate their AI-driven businesses.”

H100 Technology Breakthroughs
The NVIDIA H100 GPU sets a new standard in accelerating large-scale AI and HPC, delivering six groundbreaking innovations:

  • The World’s Most Advanced Chip – Built with 80 billion transistors using an advanced TSMC 4N process designed for NVIDIA’s accelerated computing needs, the H100 offers major enhancements to accelerate AI, HPC, memory bandwidth, interconnection and communications, including nearly 5 terabyte per second of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to use HBM3, enabling 3TB/s memory bandwidth. Twenty H100 GPUs can handle the equivalent of the world’s Internet traffic, enabling customers to deliver advanced recommendation systems and large language models that infer data in real time.
  • New Transformer Engine — Now the default model choice for natural language processing, the Transformer is one of the most important deep learning models ever invented. The H100 accelerator’s Transformer Engine is built to make these networks up to 6x faster than the previous generation, without sacrificing accuracy.
  • 2nd Gen Secure Multi-Instance GPU — MIG technology allows a single GPU to be divided into seven smaller, fully isolated instances to perform different types of tasks. The Hopper architecture extends MIG capabilities up to 7x over the previous generation by providing secure multi-tenant configurations in cloud environments for each GPU instance.
  • Confidential Computing — H100 is the world’s first accelerator with confidential computing capabilities to protect AI models and customer data as it is processed. Customers can also apply confidential computing to federated learning for privacy-sensitive industries such as healthcare and financial services, as well as shared cloud infrastructures.
  • 4th Gen NVIDIA NVLink — To accelerate the largest AI models, NVLink combines with a new external NVLink switch to extend NVLink as a scalable network beyond the server, connecting up to 256 H100 GPUs with 9x the bandwidth than the previous generation with NVIDIA HDR Quantum InfiniBand.
  • DPX Instructions — New DPX instructions accelerate dynamic programming — used in a wide variety of algorithms, including route optimization and genomics — up to 40x compared to CPUs and up to 7x compared to previous-generation GPUs. This includes the Floyd-Warshall algorithm to find optimal routes for autonomous robot fleets in dynamic warehouse environments, and the Smith-Waterman algorithm used in sequence alignment for DNA and protein classification and folding.
  • The combined technology innovations of H100 extend NVIDIA’s AI inference and training leadership to enable real-time and immersive applications using massive AI models. The H100 enables chatbots using the world’s most powerful monolithic transformer language model, Megatron 530B, with up to 30x higher throughput than the previous generation, while meeting the sub-second latency required for real-time conversational AI. With H100, researchers and developers can also train massive models, such as Mixture or Experts, with 395 billion parameters, up to 9x faster, reducing training time from weeks to days.

Wide NVIDIA H100 Adoption
NVIDIA H100 can be deployed in any type of data center, including on-premises, cloud, hybrid cloud and edge. It is expected to be available worldwide later this year from the world’s leading cloud service providers and computer makers, as well as directly from NVIDIA.

NVIDIA’s fourth-generation DGX system, DGX H100, features eight H100 GPUs to deliver 32 petaflops of AI performance with new FP8 precision, providing the scale to meet the massive computational demands of major language models, recommendation systems, health research, and climate science .

Each GPU in DGX H100 systems is connected by fourth-generation NVLink, which provides 900GB/s connectivity, 1.5X more than the previous generation. NVSwitch allows all eight H100 GPUs to connect via NVLink. An external NVLink Switch can network up to 32 DGX H100 nodes in next-generation NVIDIA DGX SuperPOD supercomputers.

Hopper has received broad industry support from leading cloud service providers Alibaba Cloud, Amazon Web Services, Baidu AI Cloud, Google Cloud, Microsoft Azure, Oracle Cloud and Tencent Cloud, who plan to offer H100-based instances.

A wide range of servers with H100 accelerators are expected from the world’s leading system manufacturers, including Atos, BOXX Technologies, Cisco, Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, Lenovo, Nettrix and Supermicro.

NVIDIA H100 at any scale
H100 comes in SXM and PCIe form factors to support a wide variety of server design requirements. A converged accelerator will also be available, pairing an H100 GPU with an NVIDIA ConnectX-7 400Gb/s InfiniBand and Ethernet SmartNIC.

NVIDIA’s H100 SXM will be available in HGX H100 server boards with four- and eight-way configurations for enterprises with applications that can scale to multiple GPUs in a server and across multiple servers. HGX H100-based servers deliver the highest application performance for AI training and inference along with data analytics and HPC applications.

The H100 PCIe, with NVLink to connect two GPUs, offers more than 7x the bandwidth of PCIe 5.0 and delivers excellent performance for applications running on regular corporate servers. The form factor makes it easy to integrate into existing data center infrastructure.

The H100 CNX, a new converged accelerator, pairs an H100 with a ConnectX-7 SmartNIC to deliver breakthrough performance for I/O-intensive applications such as multinode AI training in enterprise data centers and 5G signal processing at the edge.

NVIDIA Hopper architecture-based GPUs can also interface with NVIDIA Grace CPUs with an ultra-fast NVLink-C2C interconnect for over 7x faster communication between the CPU and GPU compared to PCIe 5.0. This combination – the Grace Hopper Superchip – is an integrated module designed for large-scale HPC and AI applications.

NVIDIA software support
The NVIDIA H100 GPU is supported by powerful software tools that help developers and enterprises build and accelerate applications from AI to HPC. This includes major updates to the NVIDIA AI software suite for workloads such as speech, recommendation systems, and hyperscale inference.

NVIDIA has also released more than 60 updates to its CUDA-X collection of libraries, tools, and technologies to accelerate work in quantum computing and 6G research, cybersecurity, genomics, and drug discovery.

NVIDIA H100 will be available from the third quarter.

Leave a Comment