The NVIDIA H200 Tensor Core GPU is built to power the next generation of generative AI and high-performance computing (HPC). With advanced HBM3e memory, it delivers faster speeds and larger capacity, enabling smooth performance for large language models (LLMs), AI workloads, and scientific computing.
Specification | H200 SXM1 | H200 NVL1 |
---|---|---|
FP64 | 34 TFLOPS | 30 TFLOPS |
FP64 Tensor Core | 67 TFLOPS | 60 TFLOPS |
FP32 | 67 TFLOPS | 60 TFLOPS |
TF32 Tensor Core2 | 989 TFLOPS | 835 TFLOPS |
BFLOAT16 Tensor Core2 | 1,979 TFLOPS | 1,671 TFLOPS |
FP16 Tensor Core2 | 1,979 TFLOPS | 1,671 TFLOPS |
FP8 Tensor Core2 | 3,958 TFLOPS | 3,341 TFLOPS |
INT8 Tensor Core2 | 3,958 TFLOPS | 3,341 TFLOPS |
GPU Memory | 141 GB | 141 GB |
GPU Memory Bandwidth | 4.8 TB/s | 4.8 TB/s |
Decoders | 7 NVDEC 7 JPEG |
7 NVDEC 7 JPEG |
Confidential Computing | Supported | Supported |
Max Thermal Design Power (TDP) | Up to 700 W (configurable) | Up to 600 W (configurable) |
Multi-Instance GPUs (MIG) | Up to 7 MIGs @ 18 GB each | Up to 7 MIGs @ 16.5 GB each |
Form Factor | SXM | PCIe Dual-slot air-cooled |
Interconnect | NVIDIA NVLink™: 900 GB/s PCIe Gen5: 128 GB/s |
2- or 4-way NVIDIA NVLink bridge: 900 GB/s per GPU PCIe Gen5: 128 GB/s |
Server Options | NVIDIA HGX™ H200 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs | NVIDIA MGX™ H200 NVL partner and NVIDIA-Certified Systems with up to 8 GPUs |
NVIDIA AI Enterprise | Add-on | Included |
1 Preliminary specifications. May be subject to change. 2 With sparsity. |
The NVIDIA H200, built on the NVIDIA Hopper™ architecture, is the first GPU with 141GB of HBM3e memory and 4.8TB/s bandwidth—nearly twice the capacity and 1.4× the bandwidth of the H100 Tensor Core GPU. Its larger, faster memory speeds up generative AI and large language models (LLMs) while improving HPC workloads with greater efficiency and lower total cost of ownership.
LLMs are key for today’s AI applications, and running them at scale demands speed and efficiency. The NVIDIA H200 delivers up to 2× faster inference than H100 GPUs on models like Llama 2, helping businesses serve massive user bases with higher throughput and lower total cost of ownership.
Memory bandwidth plays a vital role in HPC, allowing faster data transfer and reducing processing bottlenecks. For workloads like simulations, scientific research, and AI, the NVIDIA H200’s higher bandwidth ensures data is accessed and processed efficiently—delivering up to 110× faster results compared to CPUs.
The NVIDIA H200 sets a new standard in energy efficiency and total cost of ownership. Delivering unmatched performance within the same power profile as the H100, it enables AI and supercomputing systems that are faster, more eco-friendly, and more cost-effective—giving businesses and researchers a competitive edge.
The NVIDIA H200 is a high-performance GPU built on the Hopper™ architecture, designed for generative AI, large language models (LLMs), and high-performance computing (HPC) workloads.
The H200 features 141GB of HBM3e memory with a bandwidth of 4.8TB/s, nearly double the capacity of the H100 GPU.
It is ideal for AI inference, training LLMs like Llama2, scientific computing, simulations, and other memory-intensive HPC applications.
Yes, the H200 is designed for AI factories, HPC clusters, and supercomputing environments, providing faster computation with energy efficiency.
The H200 offers nearly double the memory, 1.4× more memory bandwidth, up to 2× faster AI inference, and better energy efficiency, making it a next-generation upgrade.