The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.
]]>As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance, delivered at data center scale, is required. The NVIDIA Blackwell platform, launched at GTC 2024 and now in full production, integrates seven types of chips: GPU, CPU, DPU, NVLink Switch chip, InfiniBand Switch, and Ethernet Switch.
]]>In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C��
]]>Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a highly optimized inference engine are required for high-throughput, low-latency inference. MLPerf Inference v4.1 is the latest version of the popular and widely recognized MLPerf Inference benchmarks, developed by the MLCommons��
]]>Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and more. And, as these models continue to grow in size and are trained on even more data, they are producing even higher-quality outputs. Building and deploying these more intelligent models is incredibly compute-intensive��
]]>Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI models��including large language models (LLMs)��are used for crafting marketing copy, writing computer code, rendering detailed images, composing music, generating videos, and more. The amount of compute required by the latest models is immense and��
]]>Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep learning�Cbased computational methods have been emerging and have shown promising results. Among these lines of work, AlphaFold2 is the first method that has achieved results comparable to slower physics-based computational methods.
]]>Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT models and Meta��s Llama 2, skillfully perform a variety of tasks on text-based content. These tasks include summarization, translation, classification, and generation of new content such as computer code, marketing copy, poetry, and much more.
]]>AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video synthesis from simple text prompts, personalized content recommendations, and medical imaging are just a few examples of AI-powered applications. Inference workloads are both computationally demanding and diverse, requiring that platforms be��
]]>In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter suite. The Network division is designed to simulate a real data center setup and strives to include the effect of networking��including both hardware and software��in end-to-end inference performance. In the Network division��
]]>At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a process that requires an immense amount of AI computing power. AI training is also an ongoing process, with models constantly retrained with new data to ensure high-quality results. Faster model training means that AI-powered applications��
]]>The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��
]]>As the fusion of AI and simulation accelerates scientific discovery, the need has arisen for a means to measure and rank the speed and throughput for building AI models of the world��s supercomputers. MLPerf HPC, now in its third iteration, has emerged as an industry-standard measure of system performance using workloads traditionally performed on supercomputers.
]]>MLPerf benchmarks, developed by MLCommons, are critical evaluation tools for organizations to measure the performance of their machine learning models�� training across workloads. MLPerf Training v2.1��the seventh iteration of this AI training-focused benchmark suite��tested performance across a breadth of popular AI use cases, including the following: Many AI applications take advantage of��
]]>Today��s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in a pipeline. To meet the increasing demands of AI-infused applications, an AI platform must not only deliver high performance but also be versatile enough to deliver that performance across a diverse range of AI models.
]]>MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and useful measures of deep learning performance. MLPerf training focuses on measuring time to train a range of commonly used neural networks for the following tasks: Lower training times are important to speed time to deployment��
]]>NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).
]]>Models like Megatron 530B are expanding the range of problems AI can address. However, as models continue to grow complexity, they pose a twofold challenge for AI compute platforms: What��s needed is a versatile AI platform that can deliver the needed performance on a wide variety of models for both training and inference. To evaluate that performance, MLPerf is the only industry��
]]>AI is transforming every industry, enabling powerful new applications and use cases that simply weren��t possible with traditional software. As AI continues to proliferate, and with the size and complexity of AI models on the rise, significant advances in AI compute performance are required to keep up. That��s where the NVIDIA platform comes in. With a full-stack approach spanning chips��
]]>Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire hardware and software stack sees continuing improvement across the benchmarking suite for the submissions based on NVIDIA platform. This improvement is observed consistently at all different scales, from single machines all the way to industrial��
]]>In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI consortium, MLPerf HPC evaluates a suite of performance benchmarks covering a range of widely used AI workloads. In this round, NVIDIA delivered 5x better results for CosmoFlow, and 7x more performance on DeepCAM, compared to strong scaling results from��
]]>AI continues to drive breakthrough innovation across industries, including consumer Internet, healthcare and life sciences, financial services, retail, manufacturing, and supercomputing. Researchers continue to push the boundaries of what��s possible with rapidly evolving models that are growing in size, complexity, and diversity. In addition, many of these complex, large-scale models need to��
]]>MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The latest MLPerf v1.0 training round includes vision, language and recommender systems, and reinforcement learning tasks. It is continually evolving to reflect the state-of-the-art AI applications. NVIDIA submitted MLPerf v1.0��
]]>Inference is where we interact with AI. Chat bots, digital assistants, recommendation engines, fraud protection services, and other applications that you use every day��all are powered by AI. Those deployed applications use inference to get you the information that you need. Given the wide array of usages for AI inference, evaluating performance poses numerous challenges for developers and��
]]>Data scientists and researchers work toward solving the grand challenges of humanity with AI projects such as developing autonomous cars or nuclear fusion energy research. They depend on powerful, high-performance AI platforms as essential tools to conduct their work. Even enterprise-grade AI implementation efforts��adding intelligent video analytics to existing video camera streams or image��
]]>Three trends continue to drive the AI inference market for both training and inference: growing data sets, increasingly complex and diverse networks, and real-time AI services. MLPerf Inference 0.7, the most recent version of the industry-standard AI benchmark, addresses these three trends, giving developers and organizations useful data to inform platform choices, both in the datacenter and at��
]]>NVIDIA today announced the fastest AI training performance among commercially available products, according to MLPerf benchmarks. The A100 Tensor Core GPU demonstrated the fastest performance per accelerator on all eight MLPerf benchmarks. The DGX SuperPOD system, a massive cluster of DGX A100 systems connected with HDR InfiniBand, also set eight new performance milestones.
]]>The MLPerf consortium mission is to ��build fair and useful benchmarks�� to provide an unbiased training and inference performance reference for ML hardware, software, and services. MLPerf Training v0.7 is the third instantiation for training and continues to evolve to stay on the cutting edge. This round consists of eight different workloads that cover a broad diversity of use cases��
]]>MLPerf is an industry-wide AI consortium that has developed a suite of performance benchmarks covering a range of leading AI workloads that are widely in use today. The latest MLPerf v0.7 training submission includes vision, language, recommenders, and reinforcement learning. NVIDIA submitted MLPerf v0.7 training results for all eight tests and the NVIDIA platform set records in all��
]]>If there��s one constant in AI and deep learning, it��s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it��s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional neural networks (CNNs), where applications can get 3x+ speedups. NVIDIA��s Turing architecture��
]]>Inference is where AI goes to work. Identifying diseases. Answering questions. Recommending products and services. The inference market is also diffuse, and will happen everywhere from the data center to edge to IoT devices across multiple use-cases including image, speech and recommender systems to name a few. As a result, creating a benchmark to measure the performance of these diverse platforms��
]]>The NVIDIA DGX SuperPOD simplifies how the world approaches supercomputing, delivering world-record setting performance that can now be acquired by every enterprise in weeks instead of years. NVIDIA sets the bar once again in supercomputing, building a well-balanced system with 96 NVIDIA? DGX-2H servers containing 1,536 NVIDIA Tesla? V100 SXM3 GPUs. The DGX SuperPOD has earned the 22nd spot on the��
]]>Today, the MLPerf consortium published its first results for the seven tests that currently comprise this new industry-standard benchmark for machine learning. For the six test categories where NVIDIA submitted results, we��re excited to tell you that NVIDIA platforms have finished with leading single-node and at-scale results for all six, a testament to our total platform approach to accelerating��
]]>