Chenjie Luo – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-09-19T19:33:05Z http://www.open-lab.net/blog/feed/ Chenjie Luo <![CDATA[Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer]]> http://www.open-lab.net/blog/?p=88489 2024-09-19T19:33:05Z 2024-09-10T16:00:00Z As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of...]]>

As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of serving such LLMs is becoming higher. One way to reduce this cost is to apply post-training quantization (PTQ), which consists of techniques to reduce computational and memory requirements for serving trained models. In this post…

Source

]]>
Chenjie Luo <![CDATA[Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available]]> http://www.open-lab.net/blog/?p=81860 2024-06-13T22:22:46Z 2024-05-08T19:00:00Z In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model...]]>

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model size and complexity, the need to swiftly produce results to serve numerous users simultaneously continues to grow. The NVIDIA platform stands at the forefront of this endeavor, delivering perpetual performance leaps through innovations across…

Source

]]>
3
���˳���97caoporen����