NVIDIA GH200, H100, and L4 GPUs combined with Jetson Orin modules (specialized hardware components) demonstrate outstanding performance in implementing AI in real-world applications. They excel in implementing AI technology in a variety of environments, from cloud-based systems to network edges.
Nvidia’s superchip GH200 Grace Hopper has made an impressive debut in a series of tests called MLPerf industry benchmarks. The chip performed well in these tests, especially in the data center inference tests, outperforming the previous leading PC chips, the NVIDIA H100 Tensor Core GPUs.
The overall results demonstrated the exceptional performance and versatility of NVIDIA’s AI platform in a variety of situations, from cloud computing to the network edge (computing at the network edge, closer to where data is generated or consumed).
In a separate announcement, Nvidia unveiled new inference software that offers users significant improvements in performance, energy efficiency and cost savings.
GH200 Superchip shines in MLPerf tests
The GH200’s unique combination of a Hopper GPU and a Grace processor on a superchip enables it to deliver superior performance by optimizing automatic power transfer between CPU and GPU and increasing memory bandwidth.
Separately, the NVIDIA HGX H100 systems, featuring eight H100 GPUs, delivered the highest throughput in every MLPerf Inference test this round.
Grace Hopper’s Superchips and H100 GPUs performed exceptionally well in all MLPerf data center tests. These tests include tasks such as computer vision, speech recognition and medical imaging. Additionally, they have excelled at more challenging tasks such as recommender systems and large language models used in generative artificial intelligence.
Overall, the results continue NVIDIA’s track record of demonstrating performance leadership in AI training and inference at every stage since the launch of the MLPerf benchmarks in 2018.
The last round of MLPerf included two important events; An updated test for recommender systems and a new test for GPT-J, a language model with six billion parameters.
Inference of TensorRT-LLM Supercharges
Nvidia has developed a software called TensorRT-LLM, which is an artificial intelligence program that helps to simplify and reduce the volume of complex tasks of any size. This software was not ready for MLPerf’s August shipment, but it allows customers to significantly (more than double) the performance of their H100 GPUs (graphics processing units) at no additional cost.
Nvidia’s internal tests show that using TensorRT-LLM on H100 GPUs increases performance by up to 8x compared to previous generation GPUs running GPT-J 6B without software.
The software began accelerating and optimizing LLM inference with leading companies including Meta, AnyScale, Cohere, Deci, Grammarly, Mistral AI, MosaicML (now part of Databricks), OctoML, Tabnine and Together AI.
According to Naveen Rao, vice president of engineering at Databricks, who describes the process as very easy and straightforward, MosaicML seamlessly integrates the essential features of TensorRT-LLM into its existing service stack.
RCO NEWS