Chenhan Yu – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-09-19T19:33:05Z http://www.open-lab.net/blog/feed/ Chenhan Yu <![CDATA[Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer]]> http://www.open-lab.net/blog/?p=88489 2024-09-19T19:33:05Z 2024-09-10T16:00:00Z As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of...]]>

As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of serving such LLMs is becoming higher. One way to reduce this cost is to apply post-training quantization (PTQ), which consists of techniques to reduce computational and memory requirements for serving trained models. In this post…

Source

]]>
���˳���97caoporen����