Hao Wu – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-29T19:05:13Z http://www.open-lab.net/blog/feed/ Hao Wu <![CDATA[Turbocharge LLM Training Across Long-Haul Data Center Networks with NVIDIA Nemo Framework]]> http://www.open-lab.net/blog/?p=99764 2025-05-29T19:05:13Z 2025-05-08T18:28:58Z Multi-data center training is becoming essential for AI factories as pretraining scaling fuels the creation of even larger models, leading the demand for...]]>

Multi-data center training is becoming essential for AI factories as pretraining scaling fuels the creation of even larger models, leading the demand for computing performance to outpace the capabilities of a single facility. By distributing workloads across multiple data centers, organizations can overcome limitations in power, cooling, and space, enabling the training of even larger…

Source

]]>
Hao Wu <![CDATA[Fast, Terabyte-Scale Recommender Training Made Easy with NVIDIA Merlin Distributed-Embeddings]]> http://www.open-lab.net/blog/?p=54372 2022-09-01T23:00:57Z 2022-08-31T16:00:00Z Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be...]]>

Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be processed by the math layers or multilayer perceptrons (MLPs). Embeddings often constitute most of the parameters in deep learning recommender models and can be quite large, even reaching into the terabyte scale. It can be difficult to fit…

Source

]]>
0
Hao Wu <![CDATA[Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=34216 2023-06-12T21:09:34Z 2021-07-20T13:00:00Z Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and...]]>

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and segmentation for computer vision, and text extraction, classification…

Source

]]>
1
Hao Wu <![CDATA[Int4 Precision for AI Inference]]> http://www.open-lab.net/blog/?p=15821 2023-02-13T17:33:48Z 2019-11-06T18:00:57Z INT4 Precision Can Bring an Additional 59% Speedup Compared to INT8 If there��s one constant in AI and deep learning, it��s never-ending optimization to wring...]]>

If there’s one constant in AI and deep learning, it’s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it’s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional neural networks (CNNs), where applications can get 3x+ speedups. NVIDIA’s Turing architecture…

Source

]]>
2
���˳���97caoporen����