Erin Ho – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-04-23T00:23:25Z http://www.open-lab.net/blog/feed/ Erin Ho <![CDATA[NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance]]> http://www.open-lab.net/blog/?p=97352 2025-04-23T00:23:25Z 2025-03-18T17:41:42Z NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...]]>

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance…

Source

]]>
1
Erin Ho <![CDATA[Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs]]> http://www.open-lab.net/blog/?p=88017 2024-11-14T15:58:41Z 2024-08-28T19:30:00Z The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a...]]>

The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a variety of use cases. With 405 billion parameters and support for context lengths of up to 128K tokens, Llama 3.1 405B is also one of the most demanding LLMs to run. To deliver both low latency to optimize the user experience and high…

Source

]]>
1
Erin Ho <![CDATA[NVIDIA TensorRT Model Optimizer v0.15 Boosts Inference Performance and Expands Model Support]]> http://www.open-lab.net/blog/?p=87227 2024-08-22T18:24:54Z 2024-08-15T17:11:37Z NVIDIA has announced the latest v0.15 release of NVIDIA TensorRT Model Optimizer, a state-of-the-art quantization toolkit of model optimization techniques...]]>

NVIDIA has announced the latest v0.15 release of NVIDIA TensorRT Model Optimizer, a state-of-the-art quantization toolkit of model optimization techniques including quantization, sparsity, and pruning. These techniques reduce model complexity and enable downstream inference frameworks like NVIDIA TensorRT-LLM and NVIDIA TensorRT to more efficiently optimize the inference speed of generative AI…

Source

]]>
Erin Ho <![CDATA[Train Generative AI Models More Efficiently with New NVIDIA Megatron-Core Functionalities]]> http://www.open-lab.net/blog/?p=84953 2024-07-25T18:14:45Z 2024-07-12T22:25:42Z First introduced in 2019, NVIDIA Megatron-LM sparked a wave of innovation in the AI community, enabling researchers and developers to use the underpinnings of...]]>

First introduced in 2019, NVIDIA Megatron-LM sparked a wave of innovation in the AI community, enabling researchers and developers to use the underpinnings of this open-source library to further large language model (LLM) advancements. Today, many of the most popular LLM developer frameworks have been inspired by and built using the Megatron-LM library, spurring a wave of foundation models and AI…

Source

]]>
Erin Ho <![CDATA[Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available]]> http://www.open-lab.net/blog/?p=81860 2024-06-13T22:22:46Z 2024-05-08T19:00:00Z In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model...]]>

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model size and complexity, the need to swiftly produce results to serve numerous users simultaneously continues to grow. The NVIDIA platform stands at the forefront of this endeavor, delivering perpetual performance leaps through innovations across…

Source

]]>
3
Erin Ho <![CDATA[NVIDIA TensorRT Accelerates Stable Diffusion Nearly 2x Faster with 8-bit Post-Training Quantization]]> http://www.open-lab.net/blog/?p=78835 2024-04-09T23:45:30Z 2024-03-08T01:17:34Z In the dynamic realm of generative AI, diffusion models stand out as the most powerful architecture for generating high-quality images with text prompts. Models...]]>

In the dynamic realm of generative AI, diffusion models stand out as the most powerful architecture for generating high-quality images with text prompts. Models like Stable Diffusion have revolutionized creative applications. However, the inference process of diffusion models can be computationally intensive due to the iterative denoising steps required. This presents significant challenges…

Source

]]>
10
���˳���97caoporen����