LLM Benchmarking – NVIDIA Technical Blog

LLM Benchmarking – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-09T19:00:00Z http://www.open-lab.net/blog/feed/ Francesco Di Natale <![CDATA[LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM]]> http://www.open-lab.net/blog/?p=102816 2025-07-07T18:11:34Z 2025-07-07T17:00:00Z

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...]]>

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...

llm inference perf

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference with TensorRT-LLM. See LLM Inference Benchmarking: Fundamental Concepts for background knowledge on common metrics for benchmarking and parameters. And read LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM for tips on using GenAI��

]]> 0 Vinh Nguyen <![CDATA[Benchmarking LLM Inference Costs for Smarter Scaling and Deployment]]> http://www.open-lab.net/blog/?p=102298 2025-06-26T18:55:58Z 2025-06-18T15:00:00Z

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM...]]>

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM... Decorative image.

Decorative image.

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM inference by estimating the total cost of ownership (TCO). See LLM Inference Benchmarking: Fundamental Concepts for background knowledge on common metrics for benchmarking and parameters. See LLM Inference Benchmarking Guide: NVIDIA��

]]> 0 Micha? Marcinkiewicz <![CDATA[Reproducing NVIDIA MLPerf v5.0 Training Scores for LLM Benchmarks]]> http://www.open-lab.net/blog/?p=101228 2025-06-12T18:48:47Z 2025-06-04T17:27:18Z

The previous post, NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0, explains how the NVIDIA platform delivered the fastest time...]]>

The previous post, NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0, explains how the NVIDIA platform delivered the fastest time...

data-center

The previous post, NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0, explains how the NVIDIA platform delivered the fastest time to train across all seven benchmarks in this latest MLPerf round. This post provides a guide to reproduce the performance of NVIDIA MLPerf v5.0 submissions of Llama 2 70B LoRA fine-tuning and Llama 405B pretraining.

]]> 0 Sukru Burc Eryilmaz <![CDATA[NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0]]> http://www.open-lab.net/blog/?p=101269 2025-06-18T16:24:52Z 2025-06-04T17:26:38Z

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is...]]>

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is...

gb200-nvl-rack

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is computationally demanding, with popular open-weights models featuring tens to hundreds of billions parameters and trained using trillions of tokens. As model intelligence grows with increasing model parameter count and training dataset size��

]]> 0 Abhishek Sinha <![CDATA[Announcing NVIDIA Exemplar Clouds for Benchmarking AI Cloud Infrastructure]]> http://www.open-lab.net/blog/?p=100157 2025-05-29T17:30:58Z 2025-05-19T06:00:00Z

Developers and enterprises training large language models (LLMs) and deploying AI workloads in the cloud have long faced a fundamental challenge: it��s nearly...]]>

Developers and enterprises training large language models (LLMs) and deploying AI workloads in the cloud have long faced a fundamental challenge: it��s nearly...

cloud-between-computers

Developers and enterprises training large language models (LLMs) and deploying AI workloads in the cloud have long faced a fundamental challenge: it��s nearly impossible to know in advance if a cloud platform will deliver the performance, reliability, and cost efficiency their applications require. In this context, the difference between theoretical peak performance and actual��

]]> 0 Vinh Nguyen <![CDATA[LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM]]> http://www.open-lab.net/blog/?p=99180 2025-05-29T19:05:20Z 2025-05-06T17:35:39Z

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?... Decorative image of a datacenter with floating icons overlaid.

Decorative image of a datacenter with floating icons overlaid.

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. When building LLM-based applications, it is critical to understand the performance characteristics of these models on a given hardware. This serves multiple purposes: As a client-side LLM-focused benchmarking tool��

]]> 0 Davide Paglieri <![CDATA[Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=99202 2025-05-15T19:08:40Z 2025-04-24T17:00:00Z

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...

nvidia-nim-microservices

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new game-based benchmark suite, Benchmarking Agentic LLM and VLM Reasoning On Games��

]]> 0 Emily Potyraj <![CDATA[Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking]]> http://www.open-lab.net/blog/?p=97548 2025-05-06T17:00:29Z 2025-03-18T21:21:17Z

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...]]>

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...

dgx-cloud-benchmark

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical validation and business planning. Organizations need a better way to assess real-world, end-to-end AI workload performance and the total cost of ownership rather than just comparing raw FLOPs or hourly cost per GPU.

]]> 0 Emily Potyraj <![CDATA[NVIDIA DGX Cloud Introduces Ready-To-Use Templates to Benchmark AI Platform Performance]]> http://www.open-lab.net/blog/?p=95558 2025-05-06T17:01:29Z 2025-02-11T17:00:00Z

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a...]]>

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a... Three icons in a row, including DGX in the middle.

Three icons in a row, including DGX in the middle.

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make��

]]> 0 ��˳��97caoporen��