Dave Salvator – NVIDIA Technical Blog

Dave Salvator – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-20T22:34:30Z http://www.open-lab.net/blog/feed/ Dave Salvator <![CDATA[NVIDIA Blackwell Ultra for the Era of AI Reasoning]]> http://www.open-lab.net/blog/?p=96761 2025-03-20T22:34:30Z 2025-03-19T18:00:15Z

For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead...]]>

For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models.

]]> Dave Salvator <![CDATA[Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=75194 2023-12-14T23:33:04Z 2023-12-14T19:59:00Z

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released...]]>

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM, which includes the latest kernel optimizations for the NVIDIA Hopper architecture at the heart of the NVIDIA H100 Tensor Core GPU. These optimizations enable models like Llama 2 70B to execute using…

]]> 1 Dave Salvator <![CDATA[NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200]]> http://www.open-lab.net/blog/?p=74771 2023-12-14T19:27:30Z 2023-12-05T01:11:43Z

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute...]]>

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute throughput as well as large amounts of high-bandwidth memory. NVIDIA TensorRT-LLM provides optimizations for both peak throughput and memory optimization, delivering massive improvements in LLM inference performance.

]]> 0 Dave Salvator <![CDATA[New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility]]> http://www.open-lab.net/blog/?p=74615 2024-04-15T19:49:44Z 2023-12-04T18:00:00Z

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance....]]>

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance. Delivering top performance requires the ability to train models at the scale of an entire data center efficiently. This is achieved through exceptional craftsmanship at every layer of the technology stack, spanning chips, systems, and software.

]]> 0 Dave Salvator <![CDATA[New MLPerf Inference Network Division Showcases NVIDIA InfiniBand and GPUDirect RDMA Capabilities]]> http://www.open-lab.net/blog/?p=67021 2023-07-27T18:54:26Z 2023-07-06T16:00:00Z

In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter...]]>

In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter suite. The Network division is designed to simulate a real data center setup and strives to include the effect of networking—including both hardware and software—in end-to-end inference performance. In the Network division…

]]> 0 Dave Salvator <![CDATA[Getting the Best Performance on MLPerf Inference 2.0]]> http://www.open-lab.net/blog/?p=46305 2023-07-05T19:28:16Z 2022-04-07T00:26:22Z

Models like Megatron 530B are expanding the range of problems AI can address. However, as models continue to grow complexity, they pose a twofold challenge for...]]>

Models like Megatron 530B are expanding the range of problems AI can address. However, as models continue to grow complexity, they pose a twofold challenge for AI compute platforms: What’s needed is a versatile AI platform that can deliver the needed performance on a wide variety of models for both training and inference. To evaluate that performance, MLPerf is the only industry…

]]> 0 Dave Salvator <![CDATA[AWS Launches First NVIDIA GPU-Accelerated Graviton-Based Instance with Amazon EC2 G5g]]> http://www.open-lab.net/blog/?p=41688 2022-08-21T23:53:09Z 2021-11-29T17:57:46Z

Today at AWS re:Invent 2021, AWS announced the general availability of Amazon EC2 G5g instances��bringing the first NVIDIA GPU-accelerated Arm-based instance...]]>

Today at AWS re:Invent 2021, AWS announced the general availability of Amazon EC2 G5g instances—bringing the first NVIDIA GPU-accelerated Arm-based instance to the AWS cloud. The new EC2 G5g instance features AWS Graviton2 processors, based on the 64-bit Arm Neoverse cores, and NVIDIA T4G Tensor Core GPUs, enhanced for graphics-intensive applications. This powerful combination creates an…

]]> 0 Dave Salvator <![CDATA[Furthering NVIDIA Performance Leadership with MLPerf Inference 1.1 Results]]> http://www.open-lab.net/blog/?p=37689 2023-07-05T19:30:25Z 2021-09-22T17:00:00Z

AI continues to drive breakthrough innovation across industries, including consumer Internet, healthcare and life sciences, financial services, retail,...]]>

AI continues to drive breakthrough innovation across industries, including consumer Internet, healthcare and life sciences, financial services, retail, manufacturing, and supercomputing. Researchers continue to push the boundaries of what’s possible with rapidly evolving models that are growing in size, complexity, and diversity. In addition, many of these complex, large-scale models need to…

]]> 0 Dave Salvator <![CDATA[Getting the Most Out of NVIDIA T4 on AWS G4 Instances]]> http://www.open-lab.net/blog/?p=31638 2022-08-21T23:51:42Z 2021-05-14T18:43:00Z

As the explosive growth of AI models continues unabated, natural language processing and understanding are at the forefront of this growth. As the industry...]]>

As the explosive growth of AI models continues unabated, natural language processing and understanding are at the forefront of this growth. As the industry heads toward trillion-parameter models and beyond, acceleration for AI inference is now a must-have. Many organizations deploy these services in the cloud and seek to get optimal performance and utility out of every instance they rent.

]]> 0 Dave Salvator <![CDATA[MLOps Made Simple & Cost Effective with Google Kubernetes Engine and NVIDIA A100 Multi-Instance GPUs]]> http://www.open-lab.net/blog/?p=30918 2024-10-28T19:09:18Z 2021-05-03T16:29:00Z

Building, deploying, and managing end-to-end ML pipelines in production, particularly for applications like recommender systems is challenging. Operationalizing...]]>

Building, deploying, and managing end-to-end ML pipelines in production, particularly for applications like recommender systems is challenging. Operationalizing ML models, within enterprise applications, to deliver business value involves a lot more than developing the machine learning algorithms and models themselves – it’s a continuous process of data collection and preparation, model building…

]]> 0 Dave Salvator <![CDATA[Extending NVIDIA Performance Leadership with MLPerf Inference 1.0 Results]]> http://www.open-lab.net/blog/?p=30931 2023-09-19T16:28:44Z 2021-04-22T17:22:00Z

Inference is where we interact with AI. Chat bots, digital assistants, recommendation engines, fraud protection services, and other applications that you use...]]>

Inference is where we interact with AI. Chat bots, digital assistants, recommendation engines, fraud protection services, and other applications that you use every day—all are powered by AI. Those deployed applications use inference to get you the information that you need. Given the wide array of usages for AI inference, evaluating performance poses numerous challenges for developers and…

]]> 3 Dave Salvator <![CDATA[Amazon Elastic Kubernetes Services Now Offers Native Support for NVIDIA A100 Multi-Instance GPUs]]> https://news.www.open-lab.net/?p=20074 2024-10-28T19:00:12Z 2021-03-26T18:58:31Z

Deployment and integration of trained machine learning (ML) models in production remains a hard problem, both for application developers and the infrastructure...]]>

Deployment and integration of trained machine learning (ML) models in production remains a hard problem, both for application developers and the infrastructure teams supporting them. How do you ensure you have the right-sized compute resources to support multiple end-users, serve multiple disparate workloads at the highest level of performance, automatically balancing the load, scale up or down…

]]> Dave Salvator <![CDATA[NVIDIA A100 GPUs Available on Google Cloud��s Compute Engine]]> https://news.www.open-lab.net/?p=19868 2022-08-21T23:51:11Z 2021-03-18T16:05:00Z

By Dave Salvator, Senior Manager, Product Marketing at NVIDIA NVIDIA and Google Cloud are making it possible for applications to push the boundaries of...]]>

By Dave Salvator, Senior Manager, Product Marketing at NVIDIA NVIDIA and Google Cloud are making it possible for applications to push the boundaries of accelerated AI across a wide array of applications. With its new A2 VM, announced today, Google Cloud provides customers the largest configuration of 16 NVIDIA A100 GPUs in a single VM. Also available are smaller GPU configurations including 1…

]]> Dave Salvator <![CDATA[Getting Immediate Speedups with NVIDIA A100 TF32]]> http://www.open-lab.net/blog/?p=22210 2023-04-04T17:01:17Z 2020-11-13T21:03:46Z

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company��s history. These speedups are a product of architectural...]]>

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company’s history. These speedups are a product of architectural innovations that include Multi-Instance GPU (MIG), support for accelerated structural sparsity, and a new precision called TF32, which is the focus of this post. TF32 is a great precision to use for deep learning training, as it combines the range of…

]]> 1 Dave Salvator <![CDATA[Winning MLPerf Inference 0.7 with a Full-Stack Approach]]> http://www.open-lab.net/blog/?p=21799 2023-07-05T19:35:26Z 2020-10-21T17:09:36Z

Three trends continue to drive the AI inference market for both training and inference: growing data sets, increasingly complex and diverse networks, and...]]>

Three trends continue to drive the AI inference market for both training and inference: growing data sets, increasingly complex and diverse networks, and real-time AI services. MLPerf Inference 0.7, the most recent version of the industry-standard AI benchmark, addresses these three trends, giving developers and organizations useful data to inform platform choices, both in the datacenter and at…

]]> 0 Dave Salvator <![CDATA[Int4 Precision for AI Inference]]> http://www.open-lab.net/blog/?p=15821 2023-02-13T17:33:48Z 2019-11-06T18:00:57Z

INT4 Precision Can Bring an Additional 59% Speedup Compared to INT8 If there��s one constant in AI and deep learning, it��s never-ending optimization to wring...]]>

If there’s one constant in AI and deep learning, it’s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it’s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional neural networks (CNNs), where applications can get 3x+ speedups. NVIDIA’s Turing architecture…

]]> 2 Dave Salvator <![CDATA[MLPerf Inference: NVIDIA Innovations Bring Leading Performance]]> http://www.open-lab.net/blog/?p=15851 2023-07-05T19:38:49Z 2019-11-06T18:00:22Z

New TensorRT 6 Features Combine with Open-Source Plugins to Further Accelerate Inference? Inference is where AI goes to work. Identifying diseases. Answering...]]>

Inference is where AI goes to work. Identifying diseases. Answering questions. Recommending products and services. The inference market is also diffuse, and will happen everywhere from the data center to edge to IoT devices across multiple use-cases including image, speech and recommender systems to name a few. As a result, creating a benchmark to measure the performance of these diverse platforms…

]]> 0 Dave Salvator <![CDATA[NVIDIA Boosts AI Performance in MLPerf v0.6]]> http://www.open-lab.net/blog/?p=15214 2023-07-05T19:40:17Z 2019-07-10T17:00:26Z

The relentless pace of innovation is most apparent in the AI domain. Researchers and developers discovering new network architectures, algorithms and...]]>

]]> 0 Dave Salvator <![CDATA[GPU Inference Momentum Continues to Build]]> https://news.www.open-lab.net/?p=13176 2023-12-15T19:19:07Z 2019-03-18T22:45:23Z

AI algorithms trained on NVIDIA GPUs have proven their mettle to draw insights from huge swaths of data. They have enabled researchers and companies to gain...]]>

AI algorithms trained on NVIDIA GPUs have proven their mettle to draw insights from huge swaths of data. They have enabled researchers and companies to gain new, deeper insights, and deliver more insights in less time. This evolution has taken training times from days to minutes, and researchers have invented sophisticated techniques that use multiple networks in combination to solve knotty…

]]> 0 Dave Salvator <![CDATA[NVIDIA Achieves 4X Speedup on BERT Neural Network]]> https://news.www.open-lab.net/?p=12331 2023-12-30T01:14:04Z 2018-12-12T19:16:54Z

Among the many knotty problems that AI can help solve, speech and natural language processing (NLP) represent areas poised for significant growth in the coming...]]>

Among the many knotty problems that AI can help solve, speech and natural language processing (NLP) represent areas poised for significant growth in the coming years. Recently, a new language representation model called BERT (Bidirectional Encoder Representations from Transformers) was described by Google Research. According to the paper’s authors, “BERT is designed to pre-train deep…

]]> 0 Dave Salvator <![CDATA[NVIDIA AI Inference Performance Milestones: Delivering Leading Throughput, Latency and Efficiency]]> https://news.www.open-lab.net/?p=12042 2023-03-13T19:08:51Z 2018-11-13T00:28:58Z

Inference is where AI-based applications really go to work. Object recognition, image classification, natural language processing, and recommendation engines...]]>

Inference is where AI-based applications really go to work. Object recognition, image classification, natural language processing, and recommendation engines are but a few of the growing number of applications made smarter by AI. Recently, TensorRT 5, the latest version of NVIDIA’s inference optimizer and runtime, became available. This version brings new features including support for our…

]]> 0 ��˳��97caoporen��