Vinh Nguyen – NVIDIA Technical Blog

Vinh Nguyen – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-29T19:05:20Z http://www.open-lab.net/blog/feed/ Vinh Nguyen <![CDATA[Build Custom Reasoning Models with Advanced, Open Post-Training Datasets]]> http://www.open-lab.net/blog/?p=98680 2025-05-29T19:05:03Z 2025-05-14T16:33:26Z

Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from...]]>

Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from either a single or cohort of open-source, commercially permissible LLMs, a base LLM is finetuned either with supervised finetuning or RLHF to gain instruction-following and reasoning skills. This process can be seen as a knowledge…

]]> Vinh Nguyen <![CDATA[LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM]]> http://www.open-lab.net/blog/?p=99180 2025-05-29T19:05:20Z 2025-05-06T17:35:39Z

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. When building LLM-based applications, it is critical to understand the performance characteristics of these models on a given hardware. This serves multiple purposes: As a client-side LLM-focused benchmarking tool…

]]> Vinh Nguyen <![CDATA[LLM Inference Benchmarking: Fundamental Concepts]]> http://www.open-lab.net/blog/?p=98215 2025-05-09T18:23:04Z 2025-04-02T17:00:00Z

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...]]>

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM benchmarking, fundamental concepts, and how to benchmark your LLM applications. The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.

]]> Vinh Nguyen <![CDATA[LLM Model Pruning and Knowledge Distillation with NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=93451 2025-04-23T02:53:00Z 2025-02-12T17:54:52Z

Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. ...]]>

Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. The How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model post discussed the best practices of using large language models (LLMs) that combine depth, width, attention, and MLP pruning with knowledge distillation…

]]> Vinh Nguyen <![CDATA[Mistral-NeMo-Minitron 8B Model Delivers Unparalleled Accuracy]]> http://www.open-lab.net/blog/?p=87739 2024-10-17T18:51:42Z 2024-10-08T19:20:54Z

This post was originally published August 21, 2024 but has been revised with current data. Recently, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading...]]>

This post was originally published August 21, 2024 but has been revised with current data. Recently, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading state-of-the-art large language model (LLM). Mistral NeMo 12B consistently outperforms similarly sized models on a wide range of benchmarks. We announced Mistral-NeMo-Minitron 8B, one of the most advanced open-access models in…

]]> Vinh Nguyen <![CDATA[How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model]]> http://www.open-lab.net/blog/?p=87164 2024-08-22T18:24:58Z 2024-08-14T15:50:05Z

Large language models (LLM) are now a dominant force in natural language processing and understanding, thanks to their effectiveness and versatility. LLMs such...]]>

Large language models (LLM) are now a dominant force in natural language processing and understanding, thanks to their effectiveness and versatility. LLMs such as Llama 3.1 405B and NVIDIA Nemotron-4 340B excel in many challenging tasks, including coding, reasoning, and math. They are, however, resource-intensive to deploy. As such, there is another trend in the industry to develop small language…

]]> 7 Vinh Nguyen <![CDATA[Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=83606 2024-06-13T19:06:00Z 2024-06-07T16:00:00Z

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They...]]>

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…

]]> Vinh Nguyen <![CDATA[Applying Mixture of Experts in LLM Architectures]]> http://www.open-lab.net/blog/?p=79605 2024-06-06T14:53:24Z 2024-03-14T20:01:00Z

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models...]]>

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models with the open-source release of Mistral Mixtral 8x7B. The strong relative performance of the Mixtral model has raised much interest and numerous questions about MoE and its use in LLM architectures. So, what is MoE and why is it important?

]]> Vinh Nguyen <![CDATA[Build Enterprise Retrieval-Augmented Generation Apps with NVIDIA Retrieval QA Embedding Model]]> http://www.open-lab.net/blog/?p=74346 2024-10-28T22:00:06Z 2023-11-28T18:10:50Z

Large language models (LLMs) are transforming the AI landscape with their profound grasp of human and programming languages. Essential for next-generation...]]>

Large language models (LLMs) are transforming the AI landscape with their profound grasp of human and programming languages. Essential for next-generation enterprise productivity applications, they enhance user efficiency across tasks like programming, copy editing, brainstorming, and answering questions on a wide range of topics. However, these models often struggle with real-time events and…

]]> 0 Vinh Nguyen <![CDATA[How to Create a Custom Language Model]]> http://www.open-lab.net/blog/?p=61684 2023-06-13T17:55:25Z 2023-03-15T17:00:00Z

Generative AI has captured the attention and imagination of the public over the past couple of years. From a given natural language prompt, these generative...]]>

Generative AI has captured the attention and imagination of the public over the past couple of years. From a given natural language prompt, these generative models are able to generate human-quality results, from well-articulated children’s stories to product prototype visualizations. Large language models (LLMs) are at the center of this revolution. LLMs are universal language comprehenders…

]]> 0 Vinh Nguyen <![CDATA[Introducing NVIDIA Riva: A GPU-Accelerated SDK for Developing Speech AI Applications]]> http://www.open-lab.net/blog/?p=17451 2023-05-22T22:12:28Z 2022-12-08T23:37:19Z

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact...]]>

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact centers’ agent assists for empowering human agents, voice interfaces for intelligent virtual assistants (IVAs), and live captioning in video conferencing. To support these features, speech AI technology includes automatic speech recognition…

]]> 3 Vinh Nguyen <![CDATA[Making an NVIDIA Riva ASR Service for a New Language]]> http://www.open-lab.net/blog/?p=50426 2024-08-28T14:49:34Z 2022-10-28T17:00:00Z

Speech AI is the ability of intelligent systems to communicate with users using a voice-based interface, which has become ubiquitous in everyday life. People...]]>

Speech AI is the ability of intelligent systems to communicate with users using a voice-based interface, which has become ubiquitous in everyday life. People regularly interact with smart home devices, in-car assistants, and phones through speech. Speech interface quality has improved leaps and bounds in recent years, making them a much more pleasant, practical, and natural experience than just a…

]]> 4 Vinh Nguyen <![CDATA[Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server]]> http://www.open-lab.net/blog/?p=51300 2023-05-24T00:22:56Z 2022-08-03T17:00:00Z

This is the first part of a two-part series discussing the NVIDIA Triton Inference Server��s FasterTransformer (FT) library, one of the fastest libraries for...]]>

This is the first part of a two-part series discussing the NVIDIA Triton Inference Server’s FasterTransformer (FT) library, one of the fastest libraries for distributed inference of transformers of any size (up to trillions of parameters). It provides an overview of FasterTransformer, including the benefits of using the library. Join the NVIDIA Triton and NVIDIA TensorRT community to stay…

]]> 1 Vinh Nguyen <![CDATA[Deploying GPT-J and T5 with NVIDIA Triton Inference Server]]> http://www.open-lab.net/blog/?p=51318 2023-03-14T23:22:55Z 2022-08-03T17:00:00Z

This is the second part of a two-part series about NVIDIA tools that allow you to run large transformer models for accelerated inference. For an introduction to...]]>

This is the second part of a two-part series about NVIDIA tools that allow you to run large transformer models for accelerated inference. For an introduction to the FasterTransformer library (Part 1), see Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates…

]]> 7 Vinh Nguyen <![CDATA[NVIDIA AI Platform Delivers Big Gains for Large Language Models]]> http://www.open-lab.net/blog/?p=51198 2023-03-14T23:23:58Z 2022-07-28T18:35:00Z

As the size and complexity of large language models (LLMs) continue to grow, NVIDIA is today announcing updates to the NeMo framework that provide training...]]>

As the size and complexity of large language models (LLMs) continue to grow, NVIDIA is today announcing updates to the NeMo framework that provide training speed-ups of up to 30%. These updates–which include two trailblazing techniques and a hyperparameter tool to optimize and scale training of LLMs on any number of GPUs–offer new capabilities to train and deploy models using the NVIDIA AI…

]]> 0 Vinh Nguyen <![CDATA[A?Guide to?Understanding Essential Speech AI Terms]]> http://www.open-lab.net/blog/?p=50343 2023-06-12T09:18:28Z 2022-07-26T17:43:37Z

Speech AI is the technology that makes it possible to communicate with computer systems using your voice. Commanding an in-car assistant or handling a smart...]]>

Speech AI is the technology that makes it possible to communicate with computer systems using your voice. Commanding an in-car assistant or handling a smart home device? An AI-enabled voice interface helps you interact with devices without having to type or tap on a screen. Sign up for the latest Data Science news. Get the latest announcements, notebooks, hands-on tutorials, events…

]]> 0 Vinh Nguyen <![CDATA[Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=41964 2023-06-12T21:06:31Z 2021-12-02T17:00:00Z

The transformer architecture has wholly transformed (pun intended) the domain of natural language processing (NLP). Over the recent years, many novel network...]]>

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. The transformer architecture has wholly transformed (pun intended) the domain of natural language processing (NLP). Over the recent years, many novel network architectures have been built on the transformer building blocks: BERT, GPT, and T5…

]]> 4 Vinh Nguyen <![CDATA[Boosting NVIDIA MLPerf Training v1.1 Performance with Full Stack Optimization]]> http://www.open-lab.net/blog/?p=41919 2023-07-05T19:29:06Z 2021-12-01T21:33:20Z

Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire...]]>

Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire hardware and software stack sees continuing improvement across the benchmarking suite for the submissions based on NVIDIA platform. This improvement is observed consistently at all different scales, from single machines all the way to industrial…

]]> 2 Vinh Nguyen <![CDATA[Accelerating Embedding with the HugeCTR TensorFlow Embedding Plugin]]> http://www.open-lab.net/blog/?p=37559 2022-08-21T23:52:42Z 2021-09-24T19:00:00Z

Recommender systems are the economic engine of the Internet. It is hard to imagine any other type of applications with more direct impact in our daily digital...]]>

Recommender systems are the economic engine of the Internet. It is hard to imagine any other type of applications with more direct impact in our daily digital lives: Trillions of items to be recommended to billions of people. Recommender systems filter products and services among an overwhelming number of options, easing the paradox of choice that most users face. As the amount of data…

]]> 0 Vinh Nguyen <![CDATA[Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps]]> http://www.open-lab.net/blog/?p=33639 2024-10-28T19:22:30Z 2021-07-01T00:23:02Z

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially...]]>

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially relevant products and services amongst an overwhelmingly large and ever-increasing number of offerings. NVIDIA Merlin is an application framework that accelerates all phases of recommender system development on NVIDIA GPUs…

]]> 2 Vinh Nguyen <![CDATA[MLPerf v1.0 Training Benchmarks: Insights into a Record-Setting NVIDIA Performance]]> http://www.open-lab.net/blog/?p=33929 2023-07-05T19:31:00Z 2021-06-30T17:00:00Z

MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The...]]>

MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The latest MLPerf v1.0 training round includes vision, language and recommender systems, and reinforcement learning tasks. It is continually evolving to reflect the state-of-the-art AI applications. NVIDIA submitted MLPerf v1.0…

]]> 1 Vinh Nguyen <![CDATA[Accelerating Recommender Systems Training with NVIDIA Merlin Open Beta]]> http://www.open-lab.net/blog/?p=21196 2024-10-28T18:23:10Z 2020-10-05T13:00:00Z

NVIDIA Merlin is an open beta application framework and ecosystem that enables the end-to-end development of recommender systems, from data preprocessing to...]]>

NVIDIA Merlin is an open beta application framework and ecosystem that enables the end-to-end development of recommender systems, from data preprocessing to model training and inference, all accelerated on NVIDIA GPU. We announced Merlin in a previous post and have been continuously making updates to the open beta. In this post, we detail the new features added to the open beta NVIDIA Merlin…

]]> 0 Vinh Nguyen <![CDATA[Announcing the NVIDIA NVTabular Open Beta with Multi-GPU Support and New Data Loaders]]> http://www.open-lab.net/blog/?p=21200 2024-10-28T18:24:20Z 2020-10-05T13:00:00Z

Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale...]]>

Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale deep learning recommenders. With NVIDIA Merlin, data scientists, machine learning engineers, and researchers can accelerate their entire workflow pipeline from ingesting and training to deploying GPU-accelerated recommenders (Figure 1).

]]> 0 Vinh Nguyen <![CDATA[Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC]]> http://www.open-lab.net/blog/?p=19139 2023-07-05T19:37:55Z 2020-07-29T17:00:00Z

The MLPerf consortium mission is to ��build fair and useful benchmarks�� to provide an unbiased training and inference performance reference for ML hardware,...]]>

The MLPerf consortium mission is to “build fair and useful benchmarks” to provide an unbiased training and inference performance reference for ML hardware, software, and services. MLPerf Training v0.7 is the third instantiation for training and continues to evolve to stay on the cutting edge. This round consists of eight different workloads that cover a broad diversity of use cases…

]]> 0 Vinh Nguyen <![CDATA[Accelerating TensorFlow on NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18957 2023-06-12T21:15:05Z 2020-07-24T22:22:06Z

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...]]>

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.

]]> 0 Vinh Nguyen <![CDATA[Accelerating ETL for Recommender Systems on NVIDIA GPUs with NVTabular]]> http://www.open-lab.net/blog/?p=18907 2024-10-28T18:16:58Z 2020-07-16T01:48:04Z

Recommender systems are ubiquitous in online platforms, helping users navigate through an exponentially growing number of goods and services. These models are...]]>

Recommender systems are ubiquitous in online platforms, helping users navigate through an exponentially growing number of goods and services. These models are key in driving user engagement. With the rapid growth in scale of industry datasets, deep learning (DL) recommender models have started to gain advantages over traditional methods by capitalizing on large amounts of training data.

]]> 0 Vinh Nguyen <![CDATA[Optimizing the Deep Learning Recommendation Model on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=18109 2024-10-28T18:15:57Z 2020-06-18T23:36:41Z

Recommender systems help people find what they��re looking for among an exponentially growing number of options. They are a critical component for driving user...]]>

Recommender systems help people find what they’re looking for among an exponentially growing number of options. They are a critical component for driving user engagement on many online platforms. With the rapid growth in scale of industry datasets, deep learning (DL) recommender models, which capitalize on large amounts of training data, have started to show advantages over traditional…

]]> 0 Vinh Nguyen <![CDATA[Improving Computer Vision with NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18363 2023-04-04T17:01:27Z 2020-06-16T17:23:00Z

During the 2020 NVIDIA GPU Technology Conference keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the NVIDIA...]]>

During the 2020 NVIDIA GPU Technology Conference keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture. In this post, we detail the exciting new features of the A100 that make NVIDIA GPUs an ever-better powerhouse for computer vision workloads. We also showcase two recent CV research projects from NVIDIA Research…

]]> 0 Vinh Nguyen <![CDATA[Announcing NVIDIA Merlin: An Application Framework for Deep Recommender Systems]]> http://www.open-lab.net/blog/?p=17680 2024-10-28T18:13:37Z 2020-05-14T20:10:45Z

Recommender systems drive every action that you take online, from the selection of this web page that you��re reading now to more obvious examples like online...]]>

Recommender systems drive every action that you take online, from the selection of this web page that you’re reading now to more obvious examples like online shopping. They play a critical role in driving user engagement on online platforms, selecting a few relevant goods or services from the exponentially growing number of available options. On some of the largest commercial platforms…

]]> 0 ��˳��97caoporen��